Senior Deep Learning Performance Architect

Nvidia

Actively hiring
Santa Clara, CA · Austin, TX Posted 24 days ago $184,000$287,500 / year

At a glance

AI generated

TL;DR

As a Senior Deep Learning Performance Architect at NVIDIA, you will join the Deep Learning Architecture team to design and evaluate hardware architectures that enhance the performance, efficiency, and scalability of AI workloads. Your daily tasks include analyzing and optimizing large-scale deep learning models, particularly LLM inference and training in real-world settings, using Python and C++ for building performance and power models. You will identify system bottlenecks across compute, memory, and interconnect, evaluate PPA trade-offs, and collaborate with software, systems, and product teams to align hardware capabilities with workload requirements. Ideal candidates have a strong background in GPU/ASIC architecture, parallel computing, and deep learning workloads, along with experience in debugging, profiling, and performance tuning on real systems.

Skills

Python C++ GPU ASIC Deep Learning LLM Batching KV-cache Latency/Tuning Multi-node Scaling Memory Hierarchy Scalability System Architecture Performance Tuning Profiling Debugging

What you'll do

  • Design and evaluate hardware architectures to enhance AI workload performance, efficiency, and scalability.
  • Analyze and optimize large-scale deep learning workloads for real-world deployment.
  • Build and utilize performance models in Python/C++ to inform architecture decisions.
  • Identify and resolve system bottlenecks across compute, memory, and interconnect components.
  • Evaluate power-performance-area trade-offs to guide feature prioritization for new GPU designs.

What we're looking for

  • MS or PhD in Computer Science, Electrical Engineering, or equivalent experience.
  • 5+ years of hands-on GPU/ASIC architecture and parallel computing experience.
  • Deep understanding of system architecture and performance optimization techniques.
  • Proficiency in Python and C++ for building performance models and analysis tools.
  • Experience optimizing large-scale deep learning workloads in production environments.

Market check

Salary context

This $184,000–$287,500 range sits above 75% of similar postings on FindRole.

Peer median band

$181,087$262,400

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$185,162$240,225

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Deep Learning Performance Architect

Nvidia

Us, Ca, Santa Clara, US 140 days ago $184,000$287,500
Python C++ GPU Deep_Learning ASIC Transformer_Models Computer_Architecture Interconnect_Fabrics Parallel_Computing AI_Algorithms

Senior Deep Learning Performance Architect

Nvidia

Us, Ca, Santa Clara, US 140 days ago $184,000$287,500
Python C C++ Pytorch JAX TensorRT CUDNN CUBLAS CUTLASS MLIR Triton CUDA OpenCL GPU Deep Learning ASIC Performance Modeling Architecture Simulation Profiling Analysis

Senior Deep Learning Performance Architect - LPU

Nvidia

Remote (Us, Ca, Remote, US) 11 days ago $152,000$241,500
Python C C++ CUDA MPI OpenMP HPC GPU Deep Learning Machine Learning Performance Modeling Systems Performance Analysis AI Inference Workloads CUDA Kernels Custom ASIC Hardware
Remote

Senior Deep Learning Computer Architect

Nvidia

Us, Ca, Santa Clara, US 140 days ago $184,000$287,500
C++ Python CUDA PyTorch GPU ComputerArchitecture DeepLearningKernels LLMWorkloads PerformanceAnalysis ParallelizationStrategies FusionStrategies

Senior Deep Learning Communication Architect

Nvidia

Us, Ca, Santa Clara, US 9 days ago $184,000$287,500
PyTorch TensorRT-LLM vLLM SGLang C++ Python CUDA OpenCL InfiniBand RoCE MPI NCCL UCX UCC NVSHMEM Data Parallelism Pipeline Parallelism Tensor Parallelism Expert Parallelism FSDP Disaggregated Serving Dynamo Triton