Senior System Software Engineer - DevOps and Infrastructure Automation

Nvidia

Remote

Quick summary

Work type
Remote
Location
Santa Clara, CASeattle, WA
Salary
$184,000–$287,500 / yr
Posted
8 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $180k
This role $236k
$121k most similar roles pay here $305k

This role pays more than 88% of similar roles. Most pay $142,400–$217,725 — the shaded band above. At the midpoint, this role pays about $236k versus about $180k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 980 open roles on FindRole.

Listed pay typically runs $168,000–$270,250 across 966 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior System Software Engineer - DevOps and Infrastructure Automation

Join NVIDIA's AI Inference Operations Team as a Senior System Software Engineer, working closely with passionate engineers to build and manage infrastructure for AI inference products. You will design and operate reliable, performant, and scalable systems using Kubernetes across cloud and on-prem environments, architect CI/CD pipelines, and ensure observability through dashboards and metrics. Key responsibilities include managing IaC with tools like Terraform and Ansible, securing infrastructure components, and collaborating with deep learning framework engineers to streamline deployments. Ideal candidates have a BS/MS in CS/CE or equivalent experience, 7+ years of production distributed systems operation, deep Kubernetes expertise, strong CI/CD skills, and proficiency in Python and Bash scripting. Experience with MLOps, GPU software stacks, and custom test automation frameworks is a plus.

What you'll do

  • Design and build the infrastructure backbone for AI inference products.
  • Own Kubernetes deployments end-to-end in cloud and on-prem environments.
  • Architect CI/CD pipelines for automated deployment of inference libraries.
  • Build observability tools to monitor platform health and lead incident triage.
  • Manage cloud and on-prem environments using infrastructure-as-code practices.
  • Ensure security posture by conducting vulnerability scans and remediation.

What we're looking for

  • 7+ years of experience in operating production distributed systems as an SRE, DevOps engineer, or Platform Ops specialist.
  • Deep expertise in Kubernetes, including hands-on debugging of telemetry-heavy microservices across multiple cloud platforms and on-premises environments.
  • Strong proficiency in CI/CD tools (GitLab CI, GitHub Actions), Git-based workflows, Linux systems programming, and scripting with Python and Bash.
  • Fluency in infrastructure-as-code practices using Terraform, Ansible, Helm, Crossplane, and containerization technologies like Docker and containerd.
  • Proven reliability ownership experience, including SLO/SLI management, on-call responsibilities, incident response, and post-incident reviews to drive improvements.
  • Clear communication skills for writing effective runbooks and contributing to observability stacks such as Prometheus, Grafana, and Loki.

More like this

Similar roles

Senior DevOps Engineer

Nvidia

Santa Clara, CA 8 days ago $184,000$287,500
Python Kubernetes Docker GitLab AWS Azure CI/CD Linux PostgreSQL MySQL HDFS Ceph Terraform Ansible Prometheus Grafana Jenkins Windows_Server

Senior Software Engineer, DevOps

Anduril Industries

Costa Mesa, CA 12 days ago $166,000$220,000
GitHub Actions Jfrog Artifactory Terraform Ansible Azure AWS GCP Docker Kubernetes MLflow Kubeflow ELK Stack Prometheus Grafana CUDA OpenCL

Senior Software Engineer (Devops)

Electronic Arts

Vancouver, British Columbia, Canada 12 days ago $122,300$170,700
CI/CD Terraform Jenkins Azure DevOps GitLab C# PowerShell Bash AWS GCP Azure Packer Ansible Chef Perforce Prometheus Grafana
Hybrid

Senior Software Development Engineer (DevOps)

CVS Health

Remote (Richardson, TX) 18 days ago $92,700$203,940
GCP Azure GitHub Actions Kubernetes Helm CI/CD Java Python Node.js Git Docker Terraform Jenkins CircleCI Microservices Agile Observability Telemetry
Remote

Software Engineer Senior DevOps

PNC

Pittsburgh, PA +4 8 days ago $97,500$152,375
OpenShift IIS GitHub Bitbucket CI/CD Docker Kubernetes Terraform AWS Azure Python Shell SQL PostgreSQL Nginx Prometheus Grafana Ansible Jenkins