Senior Technical Program Manager, Cloud Infrastructure

Nvidia

Quick summary

Work type
On-site
Location
Santa Clara, CASeattle, WA
Salary
$168,000–$258,750 / yr
Posted
1 day ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $182k
This role $213k
$123k most similar roles pay here $273k

This role pays more than 73% of similar roles. Most pay $148,875–$215,000 — the shaded band above. At the midpoint, this role pays about $213k versus about $182k for comparable roles.

Based on 232 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 980 open roles on FindRole.

Listed pay typically runs $168,000–$270,250 across 966 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Technical Program Manager, Cloud Infrastructure

NVIDIA’s DGX Cloud team is seeking a seasoned Technical Program Manager (TPM) to drive critical infrastructure programs, including on-prem bring-up and hardware automation. This TPM will collaborate closely with internal teams like Software Engineering and Data Center Operations to develop foundational capabilities for hardware validation, remote fleet bootstrapping, and operational workflows. Key responsibilities include coordinating the end-to-end setup of large server fleets, integrating NVIDIA software stacks in on-prem environments, and managing capacity operations. The role also involves cross-team coordination to align technology roadmaps with data center readiness timelines, program governance using Jira, and establishing clear metrics for infrastructure health and cluster availability. Ideal candidates have over 10 years of experience in technical program management, expertise in cloud or enterprise infrastructure, proficiency with Jira and similar tools, strategic leadership skills, and a deep understanding of NVIDIA GPU products and HPC infrastructure bring-up.

What you'll do

  • Coordinate end-to-end remote bootstrapping of large server fleets for customer readiness.
  • Lead day-to-day capacity operations focusing on availability and multi-tiered infrastructure workflows.
  • Work across diverse teams to map technology roadmaps against data center readiness timelines.
  • Manage risks and build comprehensive roadmaps under the Product Lifecycle framework.
  • Establish critical metrics to track program velocity, infrastructure availability, and cluster health.
  • Develop robust communication strategies for weekly operations reviews and collaborative sessions.

What we're looking for

  • 10+ years of technical program management in cloud infrastructure programs.
  • Extensive hands-on experience in on-prem enterprise infrastructure and data center bring-up.
  • Expert-level proficiency with Jira or similar program management tools.
  • Strategic leadership skills to build consensus and drive program success.
  • In-depth knowledge of NVIDIA GPU products and HPC infrastructure bring-up.
  • Experience managing dependencies between AI software stacks and physical infrastructure.

More like this

Similar roles

Technical Program Manager, Cloud Infrastructure

Nvidia

Santa Clara, CA +1 8 days ago $168,000$258,750
Jira Kubernetes Terraform API integration CI/CD AWS Azure GCP PostgreSQL Docker Prometheus Grafana GitLab Python NVIDIA GPU products Cloud Service Providers DevOps methodologies Scrum Agile
Hybrid

Senior Technical Program Manager, Cloud Infrastructure NPI

Nvidia

Santa Clara, CA +1 15 days ago $168,000$258,750
Kubernetes CI/CD JIRA Confluence GPU AWS Azure Grafana Prometheus Terraform Python PostgreSQL Docker Ansible GitLab New Product Introduction (NPI) AI infrastructure Process automation Observability Health check frameworks

Senior Technical Program Manager, DGX Cloud Software Products and Services

Nvidia

Santa Clara, CA 43 days ago $168,000$258,750
Jira Aha! Confluence Git Distributed version control systems Reliability engineering Resilience development Service performance metrics Goodput Efficiency Utilization Distributed training frameworks Checkpointing NCCL Slurm AI infrastructure Large-scale compute platforms CI/CD