Distinguished Engineer, GPU Fleet Operations Automation

Nvidia

Remote

Quick summary

Work type
Remote
Location
Santa Clara, CA
Salary
$320,000–$488,750 / yr
Posted
146 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $202k
This role $404k
$115k most similar roles pay here $529k

This role pays more than 99% of similar roles. Most pay $162,000–$241,600 — the shaded band above. At the midpoint, this role pays about $404k versus about $202k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 855 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 843 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Distinguished Engineer, GPU Fleet Operations Automation

As a senior technology leader at NVIDIA, you will lead the development of DGX Cloud strategy for GPU fleet lifecycle management, health monitoring, and utilization tracking across various environments including bare metal, cloud service providers, and neoclouds. Your role involves defining auto-remediation strategies to ensure system stability and operational excellence while collaborating with cross-functional teams to deliver high-availability infrastructure. You will guide technical delivery into DGX Cloud across diverse deployment scenarios and engage stakeholders to set industry standards for operational practices. The ideal candidate has 15-18 years of experience in cloud infrastructure operations, a strong background in multi-tenant data center architectures, proficiency with Kubernetes and AI/ML platforms, and proven success in delivering complex technical solutions. Additionally, you should possess robust communication skills and the ability to influence open-source project governance.

What you'll do

  • Define and drive the technical strategy for DGX Cloud operations practice for GPU fleet lifecycle.
  • Develop auto-remediation strategies to detect, fix, validate, and restore critical systems.
  • Guide technical delivery of DGX Cloud across enterprise, public cloud, and high-security environments.
  • Collaborate with customers, infrastructure providers, and partners to ensure industry-standard operational excellence.
  • Lead full software and system lifecycle management for large technical scope in planning and continuous evolution.

What we're looking for

  • 15+ years of experience in cloud infrastructure operations and automation.
  • Proven track record of delivering complex solutions for resource utilization and performance insights.
  • Technical proficiency in multi-tenant data center and cloud-native architectures including Kubernetes and Slurm.
  • Demonstrated ability to lead technical strategy across multiple environments (bare metal, public cloud).
  • Experience applying AI for issue identification and remediation in component and system levels.

More like this

Similar roles

Senior Engineer, GPU RTL Power

Samsung Electronics

Remote (3900 N Capital Of Texas Hwy, Austin, Tx, Usa, US) 7 days ago $124,000$186,000
SystemVerilog PowerArtist PTPX Empower Python Perl ASIC RTL Synthesis Timing Analysis Clock Gating Power Gating Physical Design STA High-Performance Digital Design GPU Architecture CPU Architecture Scripting Languages Data-Driven Debugging
Remote

GPU Systems Driver Engineer

Qualcomm

San Diego, CA 42 days ago $195,200$292,800
C/C++ Vulkan D3D12 OpenGL GLSL HLSL GPU programming Device driver development Large scale system software Real-time computing graphics technology

Principal Engineer, GPU Architect & Modeling

Samsung Electronics

Remote (3655 N 1St St, San Jose, Ca, Usa, US) 16 days ago $221,700$364,800
GPU Graphics Architecture PPA Optimization GPU Modeling Methodologies Performance Simulation Microarchitectural Analysis Ray Tracing AI/ML Acceleration Shader Architecture Texture Architecture Cross-Functional Collaboration Technical Leadership GPU Programming Models
Remote

GPU Implementation Engineer(Austin & San Diego)

Qualcomm

Austin, TX 56 days ago $161,800$242,600
Design Compiler Fusion Compiler Genus Innovus Conformal LEC Formality PrimeTime Tcl Python UPF GPU microarchitecture EDA tools Power vector generation Power analysis Synthesis and place-and-route tools Advanced process nodes