Senior High-Performance LLM Training Engineer

Nvidia

Actively hiring
Us, Ca, Santa Clara, US Posted 50 days ago $184,000$287,500 / year

At a glance

AI generated

TL;DR

Join NVIDIA as a Senior High-Performance LLM Training Engineer, where you will optimize AI training workloads on thousands of GPUs using frameworks like PyTorch and JAX, contributing to the next generation of GPU hardware. Your day-to-day involves analyzing and profiling complex neural networks, implementing software across NVIDIA’s deep learning platform stack from drivers to DL frameworks, and building tools for automated workload analysis and optimization. Ideal candidates have a PhD or MS in Computer Science, Electrical Engineering, or related fields with extensive experience in deep learning, computer architecture, and GPU performance tuning. Proficiency in C++, Python, and CUDA is essential as you work on cutting-edge AI projects that drive innovation across data centers, cloud services, and edge devices.

Skills

Python C++ CUDA PyTorch JAX GPU MLPerf NVIDIA Deep Learning Computer Architecture Performance Modelling Automation Tools System Simulators Cloud Services Data Centers

What you'll do

  • Analyze and optimize AI training workloads on GPUs for high efficiency.
  • Solve complex performance issues in state-of-the-art neural networks.
  • Implement production-quality software across NVIDIA’s deep learning platform stack.
  • Support NVIDIA submissions to the MLPerf Training benchmark suite.
  • Develop tools for automating workload analysis and optimization workflows.

What we're looking for

  • PhD in Computer Science, Electrical Engineering or equivalent with 5+ years of experience or MS with 8+ years.
  • Deep expertise in deep learning and neural network training.
  • Strong background in computer architecture, especially GPU architecture.
  • Proven ability to analyze and tune application performance on GPUs.
  • Proficiency in C++, Python, and CUDA programming languages.
  • Experience implementing production-quality software across NVIDIA's DL platform stack.

Market check

Salary context

This $184,000–$287,500 range sits above 90% of similar postings on FindRole.

Peer median band

$119,800$212,750

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$139,075$197,062

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 802 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 798 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Principal High-Performance LLM Training Engineer

Nvidia

Us, Ca, Santa Clara, US 30 days ago $272,000$431,250
PyTorch JAX NeMo CUDA Distributed Systems High-Performance Computing Mixed Precision Training Activation Checkpointing Profiling Tools Tracing Tools Benchmarking Tools GPU Architecture TensorFlow Kubernetes AWS Azure Google Cloud Platform PostgreSQL CI/CD

Senior Deep Learning Software Engineer, LLM Performance

Nvidia

Us, Ca, Santa Clara, US 42 days ago $184,000$287,500
Python C++ CUDA TensorRT Triton PyTorch JAX TensorFlow VLLM SGLang DL compiler Performance modeling Profiling Debugging Code optimization GPU programming Deep learning framework CI/CD

Senior Certification Engineer

GE Aerospace

Remote (Grand Rapids, US) 20 days ago
RTCA/DO-254 DO-178C ARP4754A ASIC FPGA CPLD CI/CD Python PostgreSQL Kubernetes AWS Git Jira Confluence
Remote

Senior ML Ops Engineer

Prudential Financial

Wash, 213 Washington St., Newark, Nj, US 78 days ago $104,000$171,600
Python Java Node Groovy Shell Terraform Ansible CloudFormation Docker Kubernetes CI/CD AWS Azure GCP PostgreSQL Maven Gradle Nuget NPM Atlassian Sonar Artifactory CheckMarx

Senior Machine Engineer, ML Systems and Infrastructure

Autodesk

Remote (Amer - United States - Massachusetts - Boston - Drydock, US) 22 days ago
Python AWS Azure GCP Docker CI/CD Prometheus Kubernetes Terraform Airflow Spark PyTorch Lightning DeepSpeed FSDP Megatron PostgreSQL Redis
Remote