Senior ML Platform Engineer

Nvidia

Remote

Quick summary

Work type
Remote
Location
Santa Clara, CAWestford, MADurham, NCBoulder, CO
Salary
$152,000–$241,500 / yr
Posted
7 days ago

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $202k
This role $197k
$141k most similar roles pay here $256k

This role pays less than 56% of similar roles. Most pay $166,100–$236,900 — the shaded band above. At the midpoint, this role pays about $197k versus about $202k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 985 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 971 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior ML Platform Engineer

As a ML Platform Engineer at NVIDIA, you will join our cutting-edge team to architect and scale high-performance machine learning infrastructure using Infrastructure-as-Code practices. Your daily tasks include designing and maintaining core ML platform infrastructure with Ansible and Terraform, applying SRE principles to ensure system reliability, and developing automation tools for seamless workflow orchestration across multi-cloud environments. You will collaborate closely with researchers to build scalable solutions and participate in on-call rotations to support critical ML jobs. Essential skills include proficiency in Python or Go, experience with Kubernetes and Docker, and a deep understanding of distributed training techniques like Horovod and NCCL. This role demands expertise in modern CI/CD methodologies and GitOps practices, as well as a commitment to building robust, user-friendly platforms for advanced GPU systems.

What you'll do

  • Design and maintain ML platform infrastructure using Ansible and Terraform.
  • Diagnose and resolve complex system issues across the entire stack to ensure high availability.
  • Develop internal automation for ML workflow orchestration and resource scheduling.
  • Collaborate with researchers to build solutions that streamline end-to-end experimentation.
  • Evolve multi-cloud environments, implementing monitoring and incident response protocols.
  • Write maintainable code in Python or Go to automate manual processes and contribute to core platforms.

What we're looking for

  • 5+ years of experience in software/platform engineering or SRE roles, including ML infrastructure.
  • Strong proficiency in Ansible and Terraform for production infrastructure management.
  • Extensive experience in diagnosing system-level issues and ensuring platform reliability.
  • Solid understanding of ML workflows from data preprocessing to deployment.
  • Proficiency in Kubernetes and Docker for operating containerized workloads.
  • Software engineering skills in Python or Go, focusing on automation and tooling.
  • Experience with Linux systems internals, networking, and performance tuning at scale.

More like this

Similar roles

ML Platform Engineer

Apple Inc

Sunnyvale, CA 57 days ago $147,400$272,100
Python PyTorch TensorFlow JAX Docker Kubernetes CI/CD AWS GCP Azure Spark CoreML Metal CUDA OpenCL Swift C++ Terraform Prometheus

Senior ML Infrastructure Engineer, Inference Platform

General Motors (GM)

Sunnyvale, CA +3 15 days ago $155,420$205,900
Python Triton RayServe vLLM C++ Kubernetes Docker CI/CD Prometheus Grafana PostgreSQL Redis AWS Azure Google Cloud Platform Git Jenkins GitHub Slack Confluence Jira
Hybrid

Senior ML Engineer, ML compute

General Motors (GM)

Mountain View, California 113 days ago $155,420$395,900
Python Kubernetes Go C++ GCP Azure AWS PyTorch TorchX Ray Docker CI/CD
Hybrid

Senior ML/AI Engineer

Genworth Financial

Richmond, VA +23 42 days ago $114,900$114,900
Python Databricks MLflow Spark Delta_Lake Feature_Store CI/CD MLOps A/B_Testing Kubernetes AWS Azure SQL LLM RAG Prometheus Grafana
Hybrid

Senior Principal Engineer, AI/ML Platform

Autodesk

San Francisco, CA 17 days ago $165,000$296,450
AWS Azure Google Cloud Platform Kubernetes Docker MLOps CI/CD Scrum Python PostgreSQL Terraform SageMaker Bedrock Machine Learning Deep Learning Statistical Modeling Neural Networks API Ecosystems Guardrails Context Management

Senior AI/ML Engineer

General Motors (GM)

Remote (Mountain View, CA) 10 days ago $170,600$261,300
Python Transformers Generative_AI Multimodal_Systems AutoML Quantization Model_Distillation Architecture_Search CVPR ICML NeurIPS IJCAI KDD Robotics_Conference_Papers AV_ADAS_Experience
Remote Hybrid