Senior Software Engineer, Cloud-Native Stack – CSP Engagements

Nvidia

Remote Actively hiring
Remote, USA · Santa Clara, CA · Austin, TX · Redmond, WA · Seattle, WA Posted 14 days ago $184,000$287,500 / year

At a glance

AI generated

TL;DR

As a Senior Software Engineer on the CSP Engagements team at NVIDIA, you will focus on developing and enhancing cloud-native stacks for advanced AI/ML datacenters equipped with GB200 GPUs. Your responsibilities include deep-dive debugging of multi-rack, multi-tenant clusters, gathering customer requirements, and prototyping feature extensions for Kubernetes operators and Slurm plugins. You will also drive architecture reviews, create reproducible testbeds using Helm, Ansible, and Terraform, and deliver technical documentation and presentations at industry events. The ideal candidate has extensive experience with Kubernetes internals, Slurm federation, and integrating next-gen GPUs into containerized clusters, along with a strong background in distributed systems development (Go, Rust, C/C++, Python) and familiarity with CI/CD pipelines, observability tools, and infrastructure-as-code practices.

Skills

Kubernetes Slurm Terraform Helm Ansible CI/CD GitHub Actions Tekton Prometheus OpenTelemetry Go Rust C C++ Python NVIDIA GB200 NVIDIA GB300 RDMA RoCE CUDA

What you'll do

  • Perform in-depth debugging of multi-rack, multi-tenant clusters for scheduler and container runtime issues.
  • Prototype feature extensions for Kubernetes operators and Slurm plugins based on customer requirements.
  • Conduct joint architecture reviews and create RFCs from findings with CSP teams.
  • Develop automated testbeds and validation suites using Helm/Ansible/Terraform to mirror customer environments.
  • Deliver technical documentation, design guides, and present at industry conferences and customer meetings.

What we're looking for

  • Strong expertise in Kubernetes internals and Slurm for multi-rack, multi-tenant clusters.
  • Hands-on experience integrating next-gen GPUs into containerized clusters.
  • Proven ability to debug complex cloud-native stacks across networking, storage, and control planes.
  • Experience gathering customer requirements and prototyping feature extensions for Kubernetes operators.
  • Familiarity with CI/CD tools, observability systems, and infrastructure-as-code practices.
  • 10+ years of professional software development in distributed systems with relevant programming languages.
  • Upstream contributions to open-source projects like Kubernetes or Slurm.

Market check

Salary context

This $184,000–$287,500 range sits above 93% of similar postings on FindRole.

Peer median band

$117,000$212,080

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$142,400$200,000

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Software Engineer (Java + Cloud-Native)

Motorola Solutions

Chicago, Il, US 46 days ago $150,000$175,000
Java Spring Boot AWS Docker Kubernetes MySQL CI/CD Vue React Angular Terraform Microservices Event-driven architectures SQL Agile Monitoring Logging Tracing Alerting Incident response readiness CloudFormation

Senior Software Engineer, Developer Tools for Cloud

Nvidia

Remote (Us, Wa, Redmond, US) 9 days ago $152,000$241,500
Python JavaScript C++ Kubernetes GraphQL Go Rust Datadog ClickHouse Grafana CUDA HPC Networking Performance Optimization Microservices Web APIs Distributed Environments Algorithms Computer Architecture
Remote

Senior Software Engineer, Cloud

Abbott

Remote (United States Of America : Remote, US) 30 days ago $86,700$173,300
Go SQL Server Postgres RESTful APIs microservices Kubernetes Docker Linux CI/CD JIRA Confluence Python AWS Git Terraform Prometheus Grafana
Remote

Senior Software Engineer, Cloud

Abbott

US 30 days ago $99,300$198,700
Go SQL Server PostgreSQL RESTful APIs microservices Kubernetes Docker TDD CI/CD Linux Open Telemetry pprof

Senior Software Engineer, Cloud

Abbott

US 30 days ago $86,700$173,300
Go SQL Server PostgreSQL Kubernetes Docker RESTful APIs microservices Linux CI/CD Agile Confluence JIRA