Senior Staff AI Platform Engineer

Nvidia

Actively hiring
Us, Ca, Santa Clara, US Posted 66 days ago $168,000$270,250 / year

At a glance

AI generated

TL;DR

NVIDIA seeks a Senior AI Platform Engineer to join its Cloud and AI/ML teams, focusing on building scalable and secure enterprise products that enhance engineering efficiency. This role involves defining infrastructure roadmaps, architecting LLM/ML systems across cloud-native clusters, designing observability for model performance, and developing automation tools to ensure reliability and scalability. The ideal candidate will have over a decade of experience in cloud or SRE roles with expertise in Python and distributed systems languages like C++, Go, or Rust. They should possess deep knowledge of Kubernetes, observability practices, AI/ML platforms such as Hugging Face and Weights & Biases, and infrastructure security standards including FedRAMP and SOC 2 compliance. This position requires hands-on experience with AI-assisted development tools and a solid foundation in data structures and algorithms to drive an AI-first culture within the organization.

Skills

Python Kubernetes C++ Go Rust MLOps Hugging Face Weights & Biases NVIDIA NIM Prometheus Grafana Docker CI/CD AWS Azure Google Cloud Platform PostgreSQL MySQL Redis Git GitHub Jenkins Terraform Ansible Knative OpenTelemetry FedRAMP SOC 2

What you'll do

  • Define and lead AI-native infrastructure roadmaps and cross-organizational initiatives.
  • Architect and scale LLM/ML infrastructure across cloud-native clusters and on-premises hardware.
  • Design and implement observability for infrastructure health and AI model performance.
  • Build monitoring systems to leverage AI for improving incident response and reducing maintenance tasks.
  • Develop automation and tooling to ensure reliability, scalability, and developer self-services.

What we're looking for

  • Over 10 years of experience in cloud, platform engineering, or SRE roles with a strong background in distributed systems.
  • Expertise in Python and at least one systems language (C++, Go, Rust) with proven skills in debugging complex distributed systems.
  • Extensive hands-on experience building and scaling Kubernetes and bare-metal infrastructure for AI/ML workloads.
  • Deep knowledge of observability design including metrics, logging, tracing, and AI quality signals across infrastructure and AI workloads.
  • Practical experience operating AI/ML platforms, including MLOps, model serving, and GPU-accelerated environments.

Market check

Salary context

This $168,000–$270,250 range sits above 53% of similar postings on FindRole.

Peer median band

$178,400$259,375

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$177,425$246,150

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 802 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 798 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior AI Platform Engineer

Adobe

San Jose, US 73 days ago $211,800$306,625
TypeScript Python Java Go C++ LLMs Terraform Kubernetes Docker CI/CD Prometheus Grafana PostgreSQL Redis Elasticsearch AWS Azure Google Cloud Platform Git Jenkins GitHub Slack Confluence Jira Swagger OpenAPI GraphQL RESTful APIs Microservices MVP

AI Platform Engineer, Senior

Booz Allen Hamilton

Locations Laurel, Maryland, US 41 days ago $86,800$198,000
AWS Python Kubernetes Prometheus Grafana OpenTelemetry CI/CD

Senior AI Engineer

Morgan Stanley

750 Seventh Ave- Ny, US 22 days ago $120,000$165,000
Python FastAPI Flask SQL Kubernetes Docker OpenTelemetry Grafana Prometheus CI/CD Jenkins GitHub Actions Redis Kafka

Senior AI Engineer

Adobe

San Jose, US 56 days ago $187,100$270,950
Python Azure AWS GCP Spark Databricks LLMs RAG AI-assisted coding tools CI/CD GitHub Copilot Claude Code Cursor Prometheus Grafana

Senior AI Engineer

Intapp

Remote (Us Ca Palo Alto, US) 43 days ago
Python LangGraph LangChain Azure AWS Google Cloud Docker PostgreSQL Vector CI/CD Github Actions Azure Pipelines
Remote

Senior AI Engineer

Allstate

Remote (Usa - Il (Remote), US) 14 days ago $100,000$170,500
Python RDF OWL SPARQL LLM Google ADK Microsoft Fabric Azure CI/CD MLOps Docker
Remote