Senior Performance Engineer - Deep Learning

Nvidia

Actively hiring
Santa Clara, US Posted 86 days ago $152,000$241,500 / year

At a glance

AI generated

TL;DR

As a software engineer on NVIDIA’s Deep Learning models performance engineering team, you will work across all levels of expertise to build and optimize libraries and tools that enhance AI application efficiency. Your daily tasks include developing Transformer Engine, an open-source library for accelerating Large Language Model training, conducting systems research to improve model performance through low-precision training and parallelism methods, implementing new Deep Learning models from cutting-edge research, and contributing to community benchmarks like MLPerf. You will also engage with the open-source community and enterprise customers, influence hardware design, and optimize software components for NVIDIA’s AI platform. The ideal candidate has a strong background in C++ and Python programming, experience with parallel systems on GPUs, knowledge of computer architecture and optimization techniques, and familiarity with Deep Learning frameworks like PyTorch and JAX, as well as low-level libraries such as cuBLAS and cuDNN.

Skills

Python C++ PyTorch JAX CUDA cuBLAS cuDNN cuSOLVER GPU MLPerf OpenAI_Triton Pallas CI/CD

What you'll do

  • Build and support Transformer Engine to accelerate training of Large Language Models.
  • Implement new Deep Learning models from research to scale efficiently on NVIDIA GPUs.
  • Contribute to NVIDIA submissions on community benchmarks like MLPerf.
  • Engage with open-source community and enterprise customers for software innovation delivery.
  • Optimize Deep Learning model performance using low precision and parallelism methods.
  • Influence design of new hardware generations and core platform software components.

What we're looking for

  • 3+ years of C++ and Python programming experience in software development.
  • Strong background in parallel systems programming on GPUs and computer architecture.
  • Experience developing large-scale software projects and optimizing code performance.
  • Proficiency in PyTorch, JAX, or other deep learning frameworks.
  • Knowledge of modern LLM architectures and low-level DL libraries like cuBLAS.
  • Ability to write GPU kernels using CUDA, OpenAI Triton, or similar libraries.
  • Active participation in the open-source community and multidisciplinary team collaboration.

Market check

Salary context

This $152,000–$241,500 range sits above 38% of similar postings on FindRole.

Peer median band

$171,700$262,400

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$183,287$246,150

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Deep Learning Software Engineer, LLM Performance

Nvidia

Us, Ca, Santa Clara, US 43 days ago $184,000$287,500
Python C++ CUDA TensorRT Triton PyTorch JAX TensorFlow VLLM SGLang DL compiler Performance modeling Profiling Debugging Code optimization GPU programming Deep learning framework CI/CD

Senior Deep Learning Software Engineer, Inference

Nvidia

Remote (Us, Ca, Santa Clara, US) 24 days ago $184,000$287,500
C++ Python CUDA NCCL NVSHMEM OAI_TRITON CUTLASS PyTorch vLLM SGLang FlashInfer Multi-GPU_Communications Deep_Learning_Frameworks Performance_Optimization GPU_Acceleration
Remote

Senior Deep Learning Framework Communications Engineer

Nvidia

Remote (Us, Ca, Santa Clara, US) 11 days ago $152,000$241,500
PyTorch C++ CUDA Python NCCL NVSHMEM JAX TRT-LLM vLLM SGLang HPC AI MPI TensorRT NVIDIA_Nsight_Systems Performance_Profiling Parallel_Programming Compiler_Technologies Memory_Hierarchy Tensor_Layout Distributed_Inference Mixture_of_Experts Reinforcement_Learning
Remote

Deep Learning Software Engineer, TensorRT Performance - New College Grad 2026

Nvidia

Us, Ca, Santa Clara, US 56 days ago $124,000$195,500
C++ Python TensorRT PyTorch CUDA ONNX JAX TensorFlow performance analysis GPU architecture Transformers Recommenders ASR TTS Visual Understanding graph compilers Jetson systems deep learning inference low-latency systems resource-constrained systems

Senior Deep Learning Performance Architect

Nvidia

Us, Ca, Santa Clara, US 140 days ago $184,000$287,500
Python C++ GPU Deep_Learning ASIC Transformer_Models Computer_Architecture Interconnect_Fabrics Parallel_Computing AI_Algorithms