Senior Systems Software Engineer, Accelerated Kubernetes Performance and Scale

Nvidia

Hybrid

Quick summary

Work type
Hybrid
Location
Santa Clara, CASeattle, WA
Salary
$184,000–$287,500 / yr
Posted
3 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $195k
This role $236k
$136k most similar roles pay here $304k

This role pays more than 85% of similar roles. Most pay $162,000–$228,950 — the shaded band above. At the midpoint, this role pays about $236k versus about $195k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 950 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 939 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Systems Software Engineer, Accelerated Kubernetes Performance and Scale

As a Senior Systems Software Engineer at NVIDIA’s DGX Cloud organization, you will join a team dedicated to advancing AI infrastructure by addressing complex challenges in distributed systems. Your primary responsibilities include leading performance and scalability analysis across Kubernetes-based accelerated runtime stacks, designing architectural changes for reliable operation at hyperscale, improving container startup latency, and enhancing the efficiency of confidential containers on Kubernetes. You will also collaborate with researchers, developers, and customers to develop automated workload tests and integrate continuous performance testing into CI/CD workflows. The role requires expertise in Kubernetes, distributed systems, and GPU operators, along with proficiency in Golang or Python, and familiarity with major cloud platforms like AWS, Azure, or GCP. This position demands a deep understanding of the NVIDIA software stack and experience scaling Kubernetes clusters to ultra-large node counts, making significant contributions to open-source projects and industry standards.

What you'll do

  • Lead performance and scalability analysis across Kubernetes-based accelerated runtime stack, including NVIDIA components.
  • Design and contribute upstream architectural changes to Kubernetes control plane for reliable operation at hyperscale cluster sizes.
  • Improve container startup latency on Kubernetes to enable low-latency inference scaling across thousands of GPU nodes.
  • Assess and enhance open-source projects like Grove and gateway-api-inference-extension for AI workloads scalability and resilience.
  • Use DSX and large-scale simulation infrastructure to model full AI-factory deployments, validating scalability across simulated GPUs.

What we're looking for

  • 8+ years of experience in computer architecture, networking, storage systems, and accelerator-based platforms.
  • Expertise in Kubernetes and familiarity with CNCF ecosystem tools and practices.
  • Deep experience with large-scale, parallel, distributed accelerator systems and AI workload performance optimization.
  • Proficiency in Golang or Python for system development and automation.
  • Strong operational experience with major public cloud providers like AWS, Azure, GCP, or OCI.
  • Demonstrated history of contributing to open-source projects and working collaboratively within communities.

More like this

Similar roles

Senior Systems Software Engineer, Kubernetes Node Lifecycle - DGX Cloud

Nvidia

Santa Clara, CA +1 18 days ago $184,000$287,500
Kubernetes Cluster_API Golang Python AWS Azure OCI CI/CD OS_image_build_pipelines cloud_init packer kubelet containerd Flatcar Bottlerocket immutable_OS_images CIS_benchmarks CVE_remediation SBOM_generation Karpenter supply_chain_security image_signing provenance_attestation

Senior Software Engineer - Accelerated Kubernetes Runtime Team

Nvidia

Remote 82 days ago $184,000$287,500
Kubernetes Go Helm Kustomize CustomResourceDefinitions Controllers Operators OCI registries SBOM generation API design Versioning Backward compatibility Admission controllers Artifact signing Multi-tenant platform services Supply chain security
Remote

Senior DevOps Engineer - Kubernetes

FICO

Remote 26 days ago $115,500$181,500
AWS Kubernetes Terraform Python Bash Prometheus Grafana ArgoCD Tekton Helm CI/CD GitHub Workflow EKS EC2 S3 IAM Route 53 ECR CrossPlane AWS ACK
Remote

Senior Cloud Platform Kubernetes Specialist

Boeing

Remote (Seattle, WA) +4 5 days ago $164,900$239,200
Kubernetes AWS Azure Terraform Docker CI/CD Prometheus Grafana Python Bash Istio Elasticsearch Kibana Puppet Ansible Chef
Remote