Engineering Manager, DGX Cloud Production Engineering
At a glance
AI generatedTL;DR
As an Engineering Manager at NVIDIA DGX Cloud, you will lead a team of software and production engineers responsible for building and operating GPU infrastructure across various environments. Your day-to-day responsibilities include driving execution in areas such as Kubernetes operability, automation, observability, and incident response while partnering with other teams to enhance production readiness. You will define priorities, roadmaps, and staffing needs, coach engineers, and foster a culture of learning and ownership. Ideal candidates have 8+ years of industry experience, including 2+ years in leadership roles, with expertise in reliability engineering, Kubernetes environments, and distributed systems. Strong communication skills and the ability to work effectively across teams are essential, as is hands-on experience with GPU infrastructure and multi-cloud environments.
Skills
What you'll do
- Lead a team building and operating DGX Cloud infrastructure in various environments.
- Drive execution for cluster operations, Kubernetes operability, automation, and observability.
- Define team priorities, roadmap, staffing, and operational ownership.
- Partner with cross-functional teams to enhance production readiness and reliability.
- Build an on-call culture focused on learning, ownership, and durable fixes.
What we're looking for
- 8+ years of industry experience including 2+ years in engineering leadership roles.
- Proven track record leading teams focused on production infrastructure, Kubernetes operations, or distributed systems.
- Deep understanding of reliability engineering, automation, observability, and incident response practices.
- Strong ability to collaborate across multiple teams and influence without direct authority.
- Clear communication skills with expertise in prioritization and decision-making under pressure.
- BS/MS in Computer Science or equivalent practical experience required.
Employer
About Nvidia
Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing
Nvidia currently has 825 open roles on FindRole.
Listed pay typically runs $184,000–$287,500 across 813 roles with salary data.
Most-posted roles
- Senior Solutions Architect, AI Infrastructure 4
- Senior System Software Engineer - AV Platform 4
- Senior Circuit Design Engineer 3
- Senior Circuit Methodology Engineer 3
- Senior Deep Learning Performance Architect 3