Senior AI Performance and Efficiency Engineer

Nvidia

Remote

Quick summary

Work type
Remote
Location
Santa Clara, CA · New York, NY · Seattle, WA
Salary
$152,000–$241,500 / yr
Posted
79 days ago

Market check

Salary context

Below market

How this pay compares to similar roles

Similar $220k
This role $197k
$139k most similar roles pay here $276k

This role pays less than 70% of similar roles. Most pay $194,594–$246,150 — the shaded band above. At the midpoint, this role pays about $197k versus about $220k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 563 open roles on FindRole.

Listed pay typically runs $168,000–$264,500 across 556 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior AI Performance and Efficiency Engineer

Join NVIDIA as a Senior AI/ML Performance and Efficiency Engineer, GPU Clusters, where you will collaborate closely with researchers to enhance the efficiency of ML models, leading to significant productivity improvements and cost savings. Your day-to-day tasks include building tools and frameworks to detect and analyze efficiency bottlenecks, working on innovative ML workloads across robotics, autonomous vehicles, LLMs, videos, and more, while also monitoring fleet-wide utilization patterns and delivering scalable solutions. You will need a strong background in computer science, at least 5 years of experience with large-scale compute infrastructure, expertise in modern ML techniques, and proficiency in Python, Go, Bash, as well as cloud computing platforms like AWS, GCP, Azure. Familiarity with NVIDIA GPUs, CUDA programming, NSight Systems, NCCL, PyTorch, TensorFlow, and distributed storage systems is highly desirable.

What you'll do

  • Identify and address efficiency bottlenecks in ML models used by researchers.
  • Develop tools and frameworks to optimize GPU cluster performance and scalability.
  • Analyze fleet-wide utilization patterns to enhance hardware and software efficiency.
  • Monitor and improve the performance of large-scale distributed training systems.
  • Stay updated on AI/ML technology advancements and advocate for their adoption.

What we're looking for

  • 5+ years of experience designing and operating large-scale compute infrastructure
  • Strong understanding of modern ML techniques and tools, including debugging with NSight Systems/Compute
  • Experience in optimizing training and inference performance across distributed systems using NCCL
  • Proficiency in Python, Go, Bash, and familiarity with cloud computing platforms like AWS/GCP/Azure
  • Dedication to ongoing learning and staying updated on new AI/ML technologies and methods
  • Background with NVIDIA GPUs, CUDA programming, and MLPerf benchmarking
  • Excellent communication and collaboration skills for effective teamwork across diverse backgrounds

More like this

Similar roles

Senior High-Performance AI Training Engineer

Nvidia

Santa Clara, CA 114 days ago $184,000$287,500
Python C++ CUDA MLPerf NVIDIA_Deep_Learning_Platform GPU Computer_Architecture Performance_Modeling CI/CD Docker Kubernetes Terraform AWS Prometheus Grafana

Senior Staff AI Platform Engineer

Nvidia

Santa Clara, CA 75 days ago $168,000$270,250
Python Kubernetes C++ Go Rust MLOps Hugging Face Weights & Biases NVIDIA NIM Prometheus Grafana Docker CI/CD AWS Azure Google Cloud Platform PostgreSQL MySQL Redis Git GitHub Jenkins Terraform Ansible Knative OpenTelemetry FedRAMP SOC 2

Senior AI Solutions Engineer

Elevance Health

Chicago, IL 19 days ago $132,088$198,132
Python SQL AWS Bedrock LLMs OpenSearch RAG NLP APIs MLOps CI/CD Healthcare data privacy Responsible AI principles Cloud-based ML platforms Microservices Evaluation frameworks for ML/LLM systems
Hybrid

Senior AI Engineer

Allstate

Remote (Usa - Il (Remote), US) 23 days ago $100,000$170,500
Python RDF OWL SPARQL LLM Google ADK Microsoft Fabric Azure CI/CD MLOps Docker
Remote

Senior Distinguished AI Engineer

Capital One Financial

San Francisco, CA 73 days ago $314,800$359,300
Python Go Scala Java CI/CD AWS Kubernetes Terraform Docker Prometheus Grafana

Senior Quality Engineer, Applied AI

Anduril Industries

Seattle, WA 2 days ago $146,000$194,000
CI/CD Python Docker Kubernetes AWS Terraform PostgreSQL Prometheus Grafana LLM-enabled applications Agentic systems Infrastructure as code Observability stacks Release engineering Production operations