Senior Systems Software Engineer, Kubernetes Scale - DGX Cloud

Nvidia

Hybrid

Quick summary

Work type
Hybrid
Location
Santa Clara, CASeattle, WA
Salary
$184,000–$287,500 / yr
Posted
7 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $195k
This role $236k
$146k most similar roles pay here $303k

This role pays more than 83% of similar roles. Most pay $162,000–$228,950 — the shaded band above. At the midpoint, this role pays about $236k versus about $195k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 980 open roles on FindRole.

Listed pay typically runs $168,000–$270,250 across 966 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Systems Software Engineer, Kubernetes Scale - DGX Cloud

As a Senior Systems Software Engineer at NVIDIA’s DGX Cloud organization, you will join a team of innovative engineers focused on advancing accelerated computing for cutting-edge AI workloads. Your primary responsibilities include driving performance and scale characterization across the entire software stack, from Kubernetes to NVIDIA components like GPU Operator and DCGM, while collaborating with researchers and developers to create automated tests simulating real-world user scenarios. You will also develop monitoring tools, debug complex distributed systems issues, and engage with upstream communities such as Kubernetes and CNCF to validate AI workload performance early on. The role requires expertise in Kubernetes, Golang/Python, and the NVIDIA software ecosystem, along with experience scaling large-scale parallel systems and a background in optimizing AI workloads at scale.

What you'll do

  • Drive end-to-end performance and scale characterization for NVIDIA DGX Cloud software stack.
  • Develop automated tests simulating real user workloads using custom-built and open-source tools.
  • Investigate and resolve root causes of performance and scalability issues in distributed systems.
  • Design monitoring, reporting, and analysis tools for performance testing across resources.
  • Triage, debug, and address issues related to operating Kubernetes clusters at ultra-large scale.
  • Build a high-velocity framework for continuous performance and scale testing via CI/CD.

What we're looking for

  • 8+ years of experience in computer architecture, networking, storage systems, accelerators.
  • Expertise in Kubernetes and related CNCF projects.
  • Background in large-scale parallel and distributed accelerator-based systems.
  • Experience optimizing performance and AI workloads on large scale systems.
  • Proficiency in Golang/Python and familiarity with NVIDIA software ecosystem.
  • Strong operational experience with Kubernetes distributions at ultra-large scale.
  • Demonstrated history of contributing to the open-source community.

More like this

Similar roles

Senior Systems Software Engineer, Kubernetes Node Lifecycle - DGX Cloud

Nvidia

Santa Clara, CA +1 7 days ago $184,000$287,500
Kubernetes CAPI Golang Python AWS Azure OCI Cluster_API kubelet OS_image_build_pipelines node_image_packaging cloud-init packer bring-your-own-node image-builder containerd CI/CD vulnerability_scanning patch_automation CIS_benchmarks SBOM_generation Karpenter Flatcar Bottlerocket immutable_OS_images supply_chain_security node_image_signing provenance_attestation automated_CVE_remediation

Senior Software Engineer - Cloud and Kubernetes

Nvidia

Remote (Santa Clara, CA) 47 days ago $184,000$287,500
Kubernetes Go C++ CI/CD Jenkins GitLab GitHub Python Rust Docker Prometheus Grafana NVIDIA GPUs ConnectX BlueField NICs HPC AIInfrastructure Networking
Remote

Senior System Software Engineer, Kubernetes and KubeVirt

Nvidia

Remote (Santa Clara, CA) 132 days ago $184,000$287,500
Kubernetes KubeVirt Go CI/CD REST gRPC Docker APIs Cloud Infrastructure Virtualization Container Orchestration Load Balancing Security Multi-Tenant Cloud Platforms AI-Assisted Development Tools CNCF/Open Source Projects Device Plugins
Remote

Senior Kubernetes Software Engineer

Broadcom

Palo Alto, CA 70 days ago $120,000$192,000
Kubernetes Go CNCF CI/CD vSphere Docker Terraform AWS GCP Azure PostgreSQL Prometheus GitLab GitHub Maven Jenkins Ansible Python Shell_scripting

Senior Software Engineer - Accelerated Kubernetes Runtime Team

Nvidia

Remote (Us, Wa, Remote, US) 71 days ago $184,000$287,500
Kubernetes Go Helm Kustomize CustomResourceDefinitions Controllers Operators OCI registries Artifact signing SBOM generation Supply chain security API design Versioning Backward compatibility Admission controllers NVIDIA GPU operator Device plugins Multi-tenant platform services
Remote