Senior Director, System Software Engineering - DGX Cloud

Nvidia

Actively hiring Posted this week
Santa Clara, CA Posted 6 days ago $384,000$575,000 / year

At a glance

AI generated

TL;DR

NVIDIA seeks a Senior Director of System Software Engineering to lead capacity management for DGX Cloud, focusing on scalable system software that automates GPU management. This role involves defining and driving the strategy for core platform capabilities such as runtime software, host and cluster management, provisioning, observability, reliability, security, and performance optimization. The candidate will build a strong execution model across planning, architecture reviews, release readiness, quality, and operational excellence while partnering closely with security, DevOps, research, and product teams to deliver reliable high-performance software. Essential skills include deep technical expertise in operating systems, distributed systems, platform architecture, cloud infrastructure, and large-scale systems software, along with proven leadership in delivering complex software platforms and building high-performing engineering teams. Experience with AI infrastructure, accelerated computing, GPU-optimized software stacks, and hybrid-cloud deployments is highly valued.

Skills

Kubernetes Docker AWS Azure CI/CD GitLab Python Go PostgreSQL Prometheus Grafana Terraform Ansible OpenStack Linux NVIDIA_GPU AI_tools Cloud_infrastructure DevOps Scalable_system_software

What you'll do

  • Define and drive the system software strategy for capacity management in DGX Cloud's GPU cloud platforms.
  • Lead engineering teams responsible for core platform capabilities such as runtime software and cluster management.
  • Build an execution model for planning, architecture reviews, release readiness, quality, and operational excellence.
  • Partner with security, DevOps, research, and product organizations to translate platform requirements into roadmaps.
  • Establish measurable goals for engineering efficiency, service reliability, and customer impact using data-driven methods.

What we're looking for

  • Over 16 years of management experience in system software or distributed systems engineering, including significant leadership roles.
  • Deep technical expertise in operating systems, distributed systems, and large-scale systems software.
  • Proven track record leading the delivery of complex software platforms with reliability, performance, scalability, security, and observability.
  • Strong leadership skills to influence across engineering, product, program management, and executive teams.
  • Experience building and scaling high-performing engineering teams through growth and change.
  • Demonstrated success in working with AI infrastructure, accelerated computing, and GPU-optimized software stacks.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 825 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 813 roles with salary data.

Most-posted roles

View all roles at Nvidia