Senior System Software Engineer - DevOps and Infrastructure Automation

Nvidia

Remote Actively hiring
Remote · Santa Clara, CA · Seattle, WA Posted 17 days ago $184,000$287,500 / year

At a glance

AI generated

TL;DR

Join NVIDIA's AI Inference Operations Team as a Senior System Software Engineer, where you will work with passionate engineers to build and manage the infrastructure backbone for AI inference products. Your responsibilities include designing and operating Kubernetes deployments across cloud and on-prem environments, architecting CI/CD pipelines, and ensuring observability through dashboards and automated checks. You will also manage security posture and collaborate closely with deep learning framework teams. Ideal candidates have a BS/MS in CS/CE or equivalent experience, along with 7+ years of production distributed systems operation, deep Kubernetes expertise, strong CI/CD skills, IaC fluency, and containerization depth. Experience with GPU software stacks, MLOps, and open-source development workflows is a plus.

Skills

Kubernetes CI/CD Terraform Python Bash Docker Prometheus Grafana Ansible Helm Crossplane GitHub Actions GitLab CI Linux Observability SLOs/SLIs PostgreSQL MySQL MLOps CUDA cuDNN TensorRT

What you'll do

  • Design and build the infrastructure backbone for AI inference products.
  • Own Kubernetes deployments end-to-end across cloud and on-prem environments.
  • Architect CI/CD pipelines for automated deployment of inference libraries.
  • Build observability tools to monitor platform health and lead incident triage.
  • Manage cloud and on-prem environments using infrastructure-as-code practices.
  • Ensure security posture by conducting vulnerability scans and remediation.

What we're looking for

  • 7+ years of experience in operating production distributed systems as an SRE, DevOps engineer, or Platform Ops specialist.
  • Deep expertise with Kubernetes, including hands-on debugging of telemetry-heavy microservices across multiple cloud platforms and on-premises environments.
  • Strong proficiency in CI/CD tools (GitLab CI, GitHub Actions), Git-based workflows, Linux systems programming, and scripting in Python and Bash.
  • Fluency in Infrastructure as Code (IaC) with Terraform, Ansible, Helm, Crossplane, and containerization technologies like Docker and containerd.
  • Proven reliability ownership through experience with SLOs/SLIs, on-call responsibilities, incident response, and observability stacks such as Prometheus, Grafana, and Loki.
  • Clear communication skills to write effective runbooks and collaborate closely with cross-functional teams including deep learning framework engineers and compiler teams.

Market check

Salary context

This $184,000–$287,500 range sits above 90% of similar postings on FindRole.

Peer median band

$117,000$201,250

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$135,300$197,562

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Software Development Engineer (DevOps)

CVS Health

Remote (Richardson-909 E Collins Blvd, US) 15 days ago $92,700$203,940
GCP Azure GitHub Actions Kubernetes Helm CI/CD Java Python Node.js Git Docker Terraform Microservices Agile Observability Telemetry
Remote

Software Engineer Senior DevOps

PNC

Two Pnc Plaza (Pa374), US 44 days ago $97,500$152,375
OpenShift IIS GitHub Bitbucket CI/CD Docker Kubernetes AWS Azure Terraform Ansible Jenkins Python Shell scripting PostgreSQL MySQL Nginx Prometheus Grafana

DevOps Engineer, Senior

Booz Allen Hamilton

US 23 days ago $77,600$176,000
AWS Kubernetes Infrastructure-as-Code Python Java Prometheus Grafana ElasticSearch Kibana FluentD Jenkins Git TeamCity Bash PowerShell JSON YAML Linux CI/CD ArgoCD Flux

DevOps Engineer, Senior

Booz Allen Hamilton

Locations Chantilly, Virginia, US 63 days ago $77,600$176,000
Kubernetes Docker AWS Azure Google Cloud Platform Oracle Cloud Infrastructure Infrastructure as Code (IaC) Configuration as Code (CaC) CI/CD

DevOps Engineer, Senior

Booz Allen Hamilton

Locations Moorestown, New Jersey, US 42 days ago $77,600$176,000
Kubernetes Gitlab RESTful services gRPC Docker CI/CD Terraform Ansible Java C++ TCP/IP HTTP AWS Azure Google Cloud Prometheus Grafana ELK stack

DevOps Engineer, Senior

Booz Allen Hamilton

Locations Chantilly, Virginia, US 63 days ago $77,600$176,000
AWS Kubernetes CI/CD Helm Ansible Puppet Chef YAML DevSecOps Maven Prometheus Grafana