Principal Product Manager

Nvidia

Hybrid Actively hiring
Santa Clara, CA · New York, NY · Seattle, WA Posted 15 days ago $240,000$379,500 / year

At a glance

AI generated

TL;DR

As the Product Manager for resilient automation at NVIDIA’s AI Factory, you will lead the strategic direction and roadmap of the break-fix automation system used in DGX Cloud. Your responsibilities include defining automation thresholds, integrating failure attribution with automated repair actions, and building a user-friendly operator experience that enhances workflow transparency and audit trails. You will collaborate closely with NCP operators, SRE teams, and hardware vendor partners to optimize repair workflows at scale, ensuring high reliability and operational safety. This role requires 15+ years of product management experience in infrastructure or MLOps, expertise in distributed systems and workflow orchestration, and a strong background in GPU infrastructure and datacenter operations. You will work with cutting-edge technologies to ensure AI factories can self-heal efficiently at scale.

Skills

AWS Kubernetes Terraform Docker CI/CD Prometheus Grafana Python PostgreSQL Git GitHub Slack Jira Confluence MLOps GPU Datacenter Operations Agentic AI Workflow Software RMA Logistics Vendor SLA Oversight Chaos Engineering Fault Injection Testing

What you'll do

  • Define strategic direction and roadmap for break-fix automation system across multiple vendors and CSPs.
  • Set automation confidence thresholds to balance speed with operational safety in AI factories.
  • Develop operator UX for repair queues, ensuring on-call engineers have necessary context for quick actions.
  • Drive integration between failure detection and automated repair processes for seamless resolution.
  • Define service level objectives (SLOs) and metrics for fleet availability and time-to-drain.

What we're looking for

  • 15+ years of product management experience in infrastructure, platform, or MLOps areas.
  • BS or MS in Computer Science, Engineering, or related technical field.
  • Expertise with distributed systems and workflow orchestration.
  • Proven ability to build products with real-world operational consequences.
  • Strong operator UX instincts for complex system states under pressure.
  • Experience with GPU infrastructure, datacenter operations, and AI factory environments.

Market check

Salary context

This $240,000–$379,500 range sits above 96% of similar postings on FindRole.

Peer median band

$163,000$229,100

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$162,000$224,150

Middle half of comparable postings.

Based on 239 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Principal Product Manager

Circle

San Francisco, California, US 47 days ago $200,000$260,000
AI ML KYB KYC AML APIs Fintech RegTech Cloud Services CI/CD Python PostgreSQL AWS Kubernetes

Principal Product Manager

Adobe

San Jose, US 10 days ago $194,800$282,100
AI ML Cloud Services Search Technologies Information Retrieval Recommendation Systems RAG Agentic Search Product Management Technical Leadership Data-Driven Products Compliance Privacy Legal Security Analytics Interpersonal Skills Problem Solving Project Coordination

Senior Principal Product Manager

Circle

San Francisco, California, US 47 days ago $230,000$295,000
AWS Kubernetes Terraform Python PostgreSQL CI/CD Docker Prometheus Grafana GitLab GitHub Jenkins Ansible JSON REST GraphQL React Node.js MongoDB Elasticsearch Redis Linux Windows Server

Senior Principal Product Manager

Genentech

US 79 days ago $155,400$288,600
Agile Scrum Kanban Jira Python R SQL PostgreSQL AWS Azure Google Cloud Platform Docker Kubernetes CI/CD Git GitHub Swagger RESTful APIs JSON XML GraphQL MLOps TensorFlow PyTorch Snowflake Tableau PowerBI

Product Manager

Broadcom

Usa-Tx Plano Legacy Drive Suite 700, US 135 days ago $104,100$166,500
mainframe Product Management Agile Methodology CI/CD Artificial Intelligence Machine Learning UX Design Sales Initiatives Customer Relationship Management Software Licensing B2B Business Model Go-To-Market Plan Project Management Stakeholder Management

Product Manager

Q2

Austin, Texas, US 123 days ago
Excel Pendo CI/CD Kubernetes AWS Terraform Python PostgreSQL Docker Prometheus Grafana GitLab Jira Confluence