Senior AI Performance and Efficiency Engineer

Nvidia

Remote

Quick summary

Work type: Remote
Location: Santa Clara, CA · New York, NY · Seattle, WA
Salary: $152,000–$241,500 / yr
Posted: 79 days ago

Market check

Salary context

Below market

How this pay compares to similar roles

Similar $220k

This role $197k

$139k most similar roles pay here $276k

This role pays less than 70% of similar roles. Most pay $194,594–$246,150 — the shaded band above. At the midpoint, this role pays about $197k versus about $220k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 563 open roles on FindRole.

Listed pay typically runs $168,000–$264,500 across 556 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior AI Performance and Efficiency Engineer

Apply Now Log in to save

Join NVIDIA as a Senior AI/ML Performance and Efficiency Engineer, GPU Clusters, where you will collaborate closely with researchers to enhance the efficiency of ML models, leading to significant productivity improvements and cost savings. Your day-to-day tasks include building tools and frameworks to detect and analyze efficiency bottlenecks, working on innovative ML workloads across robotics, autonomous vehicles, LLMs, videos, and more, while also monitoring fleet-wide utilization patterns and delivering scalable solutions. You will need a strong background in computer science, at least 5 years of experience with large-scale compute infrastructure, expertise in modern ML techniques, and proficiency in Python, Go, Bash, as well as cloud computing platforms like AWS, GCP, Azure. Familiarity with NVIDIA GPUs, CUDA programming, NSight Systems, NCCL, PyTorch, TensorFlow, and distributed storage systems is highly desirable.

Skills

Python Go Bash AWS GCP Azure CUDA NCCL MLPerf PyTorch TensorFlow NSight_Systems NSight_Compute InfiniBand IBOP RDMA Lustre GPFS Kubernetes Docker

What you'll do

Identify and address efficiency bottlenecks in ML models used by researchers.
Develop tools and frameworks to optimize GPU cluster performance and scalability.
Analyze fleet-wide utilization patterns to enhance hardware and software efficiency.
Monitor and improve the performance of large-scale distributed training systems.
Stay updated on AI/ML technology advancements and advocate for their adoption.

What we're looking for

5+ years of experience designing and operating large-scale compute infrastructure
Strong understanding of modern ML techniques and tools, including debugging with NSight Systems/Compute
Experience in optimizing training and inference performance across distributed systems using NCCL
Proficiency in Python, Go, Bash, and familiarity with cloud computing platforms like AWS/GCP/Azure
Dedication to ongoing learning and staying updated on new AI/ML technologies and methods
Background with NVIDIA GPUs, CUDA programming, and MLPerf benchmarking
Excellent communication and collaboration skills for effective teamwork across diverse backgrounds

Similar roles

Senior High-Performance AI Training Engineer

Nvidia

Santa Clara, CA 114 days ago $184,000–$287,500

Python C++ CUDA MLPerf NVIDIA_Deep_Learning_Platform GPU Computer_Architecture Performance_Modeling CI/CD Docker Kubernetes Terraform AWS Prometheus Grafana

Save

Senior Staff AI Platform Engineer

Nvidia

Santa Clara, CA 75 days ago $168,000–$270,250

Python Kubernetes C++ Go Rust MLOps Hugging Face Weights & Biases NVIDIA NIM Prometheus Grafana Docker CI/CD AWS Azure Google Cloud Platform PostgreSQL MySQL Redis Git GitHub Jenkins Terraform Ansible Knative OpenTelemetry FedRAMP SOC 2

Save

Senior AI Solutions Engineer

Elevance Health

Chicago, IL 19 days ago $132,088–$198,132

Python SQL AWS Bedrock LLMs OpenSearch RAG NLP APIs MLOps CI/CD Healthcare data privacy Responsible AI principles Cloud-based ML platforms Microservices Evaluation frameworks for ML/LLM systems

Hybrid

Save

Senior AI Engineer

Allstate

Remote (Usa - Il (Remote), US) 23 days ago $100,000–$170,500

Python RDF OWL SPARQL LLM Google ADK Microsoft Fabric Azure CI/CD MLOps Docker

Remote

Save

Senior Distinguished AI Engineer

Capital One Financial

San Francisco, CA 73 days ago $314,800–$359,300

Python Go Scala Java CI/CD AWS Kubernetes Terraform Docker Prometheus Grafana

Save

Senior Quality Engineer, Applied AI

Anduril Industries

Seattle, WA 2 days ago $146,000–$194,000

CI/CD Python Docker Kubernetes AWS Terraform PostgreSQL Prometheus Grafana LLM-enabled applications Agentic systems Infrastructure as code Observability stacks Release engineering Production operations

Save