Senior Research Engineer, Foundation Model Training Infrastructure

Nvidia

Actively hiring
Us, Ca, Santa Clara, US Posted 139 days ago $224,000$356,500 / year

At a glance

AI generated

TL;DR

NVIDIA seeks a senior or principal engineer to join its Generalist Embodied Agent Research (GEAR) team for Project GR00T, focusing on building advanced infrastructure for training foundation models in humanoid robotics. This role involves designing and maintaining distributed training systems for multi-modal datasets, optimizing GPU utilization, implementing scalable data loaders, developing monitoring tools, and integrating cutting-edge model architectures into scalable pipelines. Ideal candidates have over 10 years of experience in large-scale MLOps and AI infrastructure, with expertise in frameworks like PyTorch, JAX, or TensorFlow, CUDA programming, Kubernetes, Python, and C++. Additional qualifications include a master’s or PhD degree, tech lead experience, contributions to open-source AI frameworks, and publications in top-tier conferences.

Skills

PyTorch TensorFlow JAX Kubernetes Python C++ CUDA SLURM HPC GPU Distributed Systems Multimodal Data Processing Monitoring Tools Debugging Tools Large Scale Clusters CI/CD

What you'll do

  • Design and maintain distributed training systems for multi-modal foundation models.
  • Optimize GPU and cluster utilization for efficient model training on massive datasets.
  • Implement scalable data loaders and preprocessors for multimodal datasets in robotics.
  • Develop monitoring and debugging tools for reliable performance of training workflows.
  • Integrate cutting-edge model architectures into scalable training pipelines for research.

What we're looking for

  • 10+ years of industry experience in large-scale MLOps and AI infrastructure.
  • Proven expertise in designing and optimizing distributed training systems with PyTorch, JAX, or TensorFlow.
  • Deep knowledge of GPU acceleration, CUDA programming, and cluster management tools like Kubernetes.
  • Strong programming skills in Python and a high-performance language such as C++.
  • Experience with large-scale GPU clusters, HPC environments, and job scheduling tools (e.g., SLURM, Kubernetes).
  • Master’s or PhD degree in Computer Science, Robotics, Engineering, or related field.

Market check

Salary context

This $224,000–$356,500 range sits above 94% of similar postings on FindRole.

Peer median band

$128,800$208,080

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$138,500$220,450

Middle half of comparable postings.

Based on 239 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 802 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 798 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Research Engineer, Simulation

Nvidia

Us, Ca, Santa Clara, US 139 days ago $224,000$356,500
MuJoCo Isaac Sim PyBullet Drake Gazebo USD Python CUDA ROS Reinforcement Learning Neural Network Training CI/CD Git Linux GPU Photorealistic Rendering Procedural Generation Sim2Real Mujoco Physics Engine

Senior Core Infrastructure Engineer

Highnote

US 83 days ago $170,000$230,000
GCP AWS Kubernetes Istio Python Java CI/CD Prometheus Grafana Spanner BigQuery Dataflow Pub/Sub

Research Engineer

Adobe

Seattle, US 70 days ago $146,300$211,850
Python PyTorch TensorFlow JAX C++ TensorRT AITemplate CoreML WinML TensorFlow Lite ONNXRuntime Diffusion models Neural network pruning Knowledge distillation Quantization Architecture search Sub-quadratic attention optimization Sparse mixture of experts Cloud deployment Mobile deployment

Senior Engineer

GEICO

Remote (Md Bethesda Office, US) 92 days ago $105,000$230,000
ReactJS Redux Hooks .NET Java Azure AWS GCP RESTful APIs Docker Kubernetes CI/CD SQL NoSQL Python Go JSON YAML
Remote

Senior Engineer

GEICO

Remote (Ca Palo Alto Office, US) 111 days ago $105,000$215,000
JavaScript TypeScript React Node.js Python SQL NoSQL Docker Kubernetes Azure DevOps PowerShell Active Directory SAML OAuth DevOps CI/CD REST APIs Microservices Event-driven Architecture Queue Management Analytics Problem Solving AI Tools
Remote

Senior Engineer

GEICO

Remote (Md Bethesda Office, US) 92 days ago $105,000$230,000
ReactJS Redux Hooks .NET Java AWS Azure GCP RESTful APIs Docker Kubernetes CI/CD SQL NoSQL Python Go JSON YAML
Remote