Senior Datacenter Technical Program Manager, At-Scale AI Clusters

Nvidia

Remote

Quick summary

Work type: Remote
Location: Santa Clara, CA
Salary: $168,000–$258,750 / yr
Posted: 7 days ago
Nearby: 99+ roles within 25 mi

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $194k

This role $213k

$139k most similar roles pay here $272k

This role pays more than 60% of similar roles. Most pay $152,753–$236,187 — the shaded band above. At the midpoint, this role pays about $213k versus about $194k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 994 open roles on FindRole.

Listed pay typically runs $168,000–$270,250 across 977 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Datacenter Technical Program Manager, At-Scale AI Clusters

Apply Now Log in to save

NVIDIA seeks a Technical Program Manager to join its Applied Systems Engineering Team, focusing on driving datacenter integration for next-generation AI supercomputing systems. This TPM will collaborate across hardware and software teams to build large-scale GPU computing systems, lead the integration of new AI clusters into datacenters with stringent power and cooling requirements, coordinate facility design and fit-out, produce detailed documentation, and communicate with engineering leadership to address critical issues. The ideal candidate has a BS in Applied Science or Engineering, 8+ years of experience, expertise in high-performance computing systems, and familiarity with datacenter design and system monitoring tools like Prometheus, Grafana, and BACNet. Strong teamwork skills are essential for facilitating collaboration among multiple teams in this fast-paced environment.

Skills

Prometheus Grafana Splunk Modbus BACNet Kubernetes Terraform AWS PostgreSQL CI/CD Python Docker High-Performance Computing GPU Clusters Datacenter Design Power and Cooling Technologies

What you'll do

Lead integration of new AI clusters into datacenter environments with stringent power and cooling requirements.
Coordinate design and construction of new datacenter facilities for GPU computing systems.
Produce comprehensive documentation for datacenter fit-out and system integration processes.
Collaborate with engineering leaders to prioritize and resolve critical issues for large-scale deployments.
Drive the development of reference architectures for AI supercomputing systems.

What we're looking for

8+ years of experience in technical roles related to hardware/software systems.
BS in Applied Science or Engineering (or equivalent).
Extensive experience with GPU clusters and HPC systems in datacenters.
Strong problem-solving skills for complex technical challenges.
Proven ability to collaborate effectively across multiple engineering teams.
Deep understanding of datacenter design, including power and cooling technologies.
Expertise in system monitoring tools like Prometheus, Grafana, Splunk.

Similar roles

Datacenter AI Systems and Solutions Engineer, Sr Staff

Qualcomm

San Diego, CA 5 days ago $162,600–$244,000

Python Docker Kubernetes MLOps GitOps CI/CD Prometheus Grafana PostgreSQL Redis Slurm Apache Kafka OpenAPI Swagger Terraform Ansible Jenkins GitHub GitLab Bitbucket Travis CI CircleCI

Save

Senior Deep Learning Systems Engineer, Datacenters

Nvidia

Santa Clara, CA +1 39 days ago $184,000–$287,500

Python C/C++ CUDA PyTorch TensorFlow Linux Docker Slurm perf gprof nvidia-smi dcgm

Hybrid

Save

Technical Program Manager, Data Center Platform

Qualcomm

San Diego, CA 17 days ago $171,200–$256,800

Python C++ Java Linux Git JIRA Confluence Agile CI/CD Docker Kubernetes AWS PostgreSQL Mariadb RESTful APIs JSON YAML Selenium JUnit

Save

Senior Solutions Architect, AI Cluster Performance and Telemetry

Nvidia

Santa Clara, CA +1 11 days ago $184,000–$287,500

Perf eBPF Prometheus Grafana Docker Kubernetes SLURM Ansible NCCL NVIDIA Nsight Python C++ CUDA TensorFlow PyTorch CI/CD

Save

Senior Manager, Data Center Facilities Development

Oracle

US 60 days ago

CI/CD Kubernetes Docker AWS Python PostgreSQL Git Jenkins Ansible Linux Terraform Prometheus Grafana

Save

Senior Manager, Data Center Facilities Development

Oracle

Abilene, TX 59 days ago $120,100–$251,600

Oracle Cloud Infrastructure Data Center Construction Project Management CI/CD Budget Management Risk Management Vendor Management Regulatory Compliance MEP Infrastructure High Density Liquid Cooling Base Building Data Center Construction Problem Solving Strategic Planning

Save