Principal Software Engineer - Compute Infrastructure

Nvidia

Remote Actively hiring
Remote, USA · Santa Clara, CA Posted 16 days ago $248,000$391,000 / year

At a glance

AI generated

TL;DR

NVIDIA is seeking a Principal Software Engineer to join its innovative team, focusing on defining platform architecture for a global enterprise compute platform running thousands of nodes and tens of thousands of VMs and containers via OpenShift and KubeVirt. The role involves operationalizing internal AI inference systems by developing automated remediation pipelines and hardware watchdogs, driving strategic capacity planning, and leading complex migrations to Kubernetes orchestration. Ideal candidates have over 15 years of experience in compute platform engineering with expertise in Kubernetes architecture, virtualization technologies like KubeVirt, and infrastructure-as-code tools such as Terraform. Proficiency in Go or Python, deep knowledge of hardware technologies including GPUs, and strong leadership skills are essential for this role that demands hands-on management of pre-release hardware and advanced storage migrations across multi-cloud environments.

Skills

Kubernetes OpenShift Terraform Go Python GitOps ArgoCD AWS GCP NFSv4 NVMe/TCP Hyperconverged storage CI/CD Microservices Self-service architecture SLAs

What you'll do

  • Define service tiers, SLAs, and automated cluster lifecycles for a global enterprise compute platform.
  • Develop automated remediation pipelines and hardware watchdogs for pre-release AI inference systems.
  • Drive capacity planning strategies to manage extreme hardware supply constraints and scale infrastructure.
  • Design self-service architectures and APIs that autonomous teams want to use for standard platforms.
  • Lead the migration of large legacy workloads into modern Kubernetes orchestration environments.

What we're looking for

  • 15+ years experience in compute platform engineering, site reliability, or systems architecture with a focus on automation at massive scale.
  • Deep expertise in Kubernetes architecture and deploying virtualization architectures like KubeVirt and OpenShift.
  • In-depth knowledge of hardware technologies including GPUs and high-speed networking to mitigate large-scale failures.
  • Experience managing bleeding-edge pre-release hardware in production environments.
  • Proficiency in programming languages such as Go or Python, with expert-level infrastructure-as-code development skills (Terraform).
  • Strong leadership and influence over technical direction across autonomous teams without top-down mandates.
  • Solid understanding of microservices architecture and multi-cloud deployment strategies.

Market check

Salary context

This $248,000–$391,000 range sits above 98% of similar postings on FindRole.

Peer median band

$143,000$244,000

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$165,000$214,500

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Principal Software Engineer

Intuit

New York, New York, US 42 days ago $261,000$353,000
Python Java JavaScript React Node.js Docker Kubernetes AWS Azure CI/CD Git PostgreSQL MongoDB Agile Scrum

Principal Software Engineer

Intuit

Mountain View, California, US 42 days ago $261,500$353,500
Python Java JavaScript Docker Kubernetes AWS CI/CD PostgreSQL MongoDB Redis Git Jenkins Swagger RESTful_APIs

Principal Software Engineer

The Federal Reserve

Boston, Ma, US 81 days ago $173,400$216,700
Java Spring Terraform AWS NoSQL RDS CI/CD Serverless Agile Linux Unix Windows Oracle DB2 Kafka DevOps Scrum Python Infrastructure-as-Code

Principal Software Engineer

Microsoft

Redmond, Wa,Us, US 37 days ago $139,900$274,800
.NET Aspire .NET Core Azure CI/CD Terraform Kubernetes Docker Prometheus Grafana PostgreSQL Python Go MCPservers structuredAPIs

Principal Software Engineer

Oracle

US 21 days ago $99,600$223,400
Python Java Go JavaScript TypeScript Kubernetes Docker Terraform CI/CD APIs LLMs Cursor Copilot Claude Codex Observability Telemetry Vector databases Infrastructure as Code

Principal Software Engineer

Cisco

Remote (Usa-San Jose, US) 85 days ago $231,400$331,800
Python C++ ASIC development Networking function implementation CI/CD PostgreSQL Kubernetes AWS Docker Prometheus Grafana P4 programming SDK development Linux操作系统 Git Jira Confluence
Remote