Principal High-Performance LLM Training Engineer

Nvidia

Quick summary

Work type
On-site
Location
Santa Clara, CA
Salary
$272,000–$431,250 / yr
Posted
47 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $209k
This role $352k
$136k most similar roles pay here $463k

This role pays more than 99% of similar roles. Most pay $171,500–$246,150 — the shaded band above. At the midpoint, this role pays about $352k versus about $209k for comparable roles.

Based on 239 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 967 open roles on FindRole.

Listed pay typically runs $168,000–$270,250 across 950 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Principal High-Performance LLM Training Engineer

NVIDIA is hiring a Principal Engineer to lead the optimization of large-scale AI training and post-training workloads on its advanced hardware and software platforms. This role involves analyzing and enhancing frontier-scale LLM workloads running on thousands of GPUs, driving improvements in frameworks like PyTorch, JAX, NeMo, and NeMo RL, and shaping future NVIDIA GPU, system, and software roadmaps based on real-world insights. The ideal candidate will have extensive experience in large-scale AI training systems, GPU performance optimization, distributed systems, and high-performance computing, with a deep understanding of GPU architecture from individual accelerators to datacenter-scale systems. They should be proficient in using profiling, tracing, benchmarking tools, and possess strong technical leadership skills to influence multi-functional decisions across NVIDIA’s teams. This role offers the chance to collaborate on cutting-edge AI projects that impact the future of computing and social progress.

What you'll do

  • Lead end-to-end performance analysis and optimization of large-scale LLM training on NVIDIA platforms.
  • Identify and eliminate bottlenecks in compute, memory, communication, and scheduling for AI workloads.
  • Develop software tools and benchmarks to enhance efficiency and developer productivity across AI stacks.
  • Guide future GPU and system architecture decisions with insights from workload characterizations and simulations.
  • Serve as a technical expert for AI training performance, collaborating with cross-functional teams at NVIDIA.

What we're looking for

  • MS or PhD in Computer Science, Electrical Engineering, or related field with 12+ years of relevant experience.
  • Proven technical impact in large-scale AI training systems, GPU optimization, distributed systems, HPC, ML frameworks, compilers/runtimes, or hardware/software co-design.
  • Deep hands-on expertise in analyzing and optimizing performance of large-scale deep learning workloads, especially transformer-based models.
  • Strong understanding of GPU and AI accelerator architecture from individual accelerators to datacenter-scale systems.
  • Experience with distributed training techniques including various parallelism strategies and mixed precision training.
  • Extensive use of profiling, tracing, benchmarking, and modeling tools to diagnose complex bottlenecks and drive performance improvements.
  • Excellent communication and technical leadership skills to influence multi-functional decisions across teams.

More like this

Similar roles

Senior High-Performance LLM Training Engineer

Nvidia

Santa Clara, CA 67 days ago $184,000$287,500
Python C++ CUDA PyTorch JAX GPU MLPerf NVIDIA Deep Learning Computer Architecture Performance Modelling Automation Tools System Simulators Cloud Services Data Centers
Hybrid

Principal Engineer, LLM

Upstart

Remote (Canada) 53 days ago $238,400$330,200
LLM ONNX Vector databases LangChain LlamaIndex OpenAI APIs Kubernetes Docker Terraform Python FastAPI React TypeScript CI/CD Cloud-native architectures PostgreSQL Redis Git GitHub Jenkins Prometheus Grafana
Remote

AI LLM Engineer

Siemens Healthineers

Atlanta, GA 11 days ago $93,680$128,810
Python LLMFrameworks AzureOpenAI Databricks RAGPipelines PromptEngineering Snowflake PowerBI LangChain SemanticKernel MicrosoftCopilotStudio SQL VectorDatabases DataModeling ELTPipelines ModelDeployment CI/CD PowerAutomate MultiAgentOrchestration
Hybrid

Applied LLM Research Engineer, Input Experience

Apple Inc

Cupertino, CA 58 days ago $147,400$272,100
Python PyTorch JAX TensorFlow SFT RLHF Data Synthesis Parameter-Efficient Fine-Tuning RLVR Reward Modeling Environment Design Speculative Decoding CI/CD

Principal SW Engineer - LLM Serving (Cloud AI)

Qualcomm

San Diego, CA 116 days ago $200,800$301,200
PyTorch Python C++ LLMs Multi-modal models Reasoning models Neural networks High performance software Multicore systems Performance analysis Multi-core architecture SoC architectures Performance modeling Machine learning accelerators Neural network operators Linear algebra Math libraries