Senior Deep Learning Kernel Software Performance Architect

Nvidia

Actively hiring
Us, Ca, Santa Clara, US Posted 135 days ago $152,000$218,500 / year

At a glance

AI generated

TL;DR

NVIDIA is seeking a Senior Kernel Performance Architect to join its Deep Learning Architecture team, where you will develop cutting-edge GPU-accelerated system architectures that enhance machine learning and data analytics performance. Your daily tasks include prototyping high-performance software, analyzing and optimizing software performance using various tools, and collaborating with cross-functional teams such as CUDA Compiler, AI/ML training, inference performance, and hardware architecture groups to identify and resolve critical deep learning bottlenecks. Ideal candidates possess a Master’s or PhD in Computer Science, Electrical Engineering, or related fields, along with extensive experience in high-performance kernel development, GPU computing, and parallel programming models. Proficiency in Python, C, and C++ is essential, as well as expertise in analytical performance modeling and profiling techniques. This role offers the opportunity to contribute significantly to NVIDIA’s mission of advancing real-time, cost-effective AI computing solutions at scale.

Skills

Python C C++ GPU CUDA Parallel Programming Performance Modeling Profiling Kernel Development Math Library Optimization Deep Learning Machine Learning

What you'll do

  • Craft GPU-accelerated system architectures to enhance deep learning performance.
  • Prototype high-performance software for deep learning and data analytics workloads.
  • Analyze and optimize software performance using models, simulators, and test suites.
  • Identify and resolve critical performance issues with CUDA Compiler teams.
  • Optimize deep learning layers in collaboration with AI/ML training and inference teams.
  • Define expectations for emerging deep learning hardware features with hardware architecture teams.

What we're looking for

  • Master's or PhD in Computer Science, Electrical Engineering, or equivalent experience.
  • 5+ years of industry or research experience in relevant fields.
  • Strong foundation in machine learning and deep learning fundamentals.
  • Experience with high-performance kernel work and math library performance analysis.
  • Fluency in Python, C, and C++ programming languages.
  • Familiarity with GPU computing and parallel programming models.
  • Hands-on experience with analytical performance modeling and profiling.

Market check

Salary context

This $152,000–$218,500 range sits above 23% of similar postings on FindRole.

Peer median band

$170,300$262,800

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$196,750$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 802 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 798 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Deep Learning Performance Architect

Nvidia

Us, Ca, Santa Clara, US 139 days ago $184,000$287,500
Python C C++ Pytorch JAX TensorRT CUDNN CUBLAS CUTLASS MLIR Triton CUDA OpenCL GPU Deep Learning ASIC Performance Modeling Architecture Simulation Profiling Analysis

Senior Deep Learning Performance Architect

Nvidia

Us, Ca, Santa Clara, US 139 days ago $184,000$287,500
Python C++ GPU Deep_Learning ASIC Transformer_Models Computer_Architecture Interconnect_Fabrics Parallel_Computing AI_Algorithms

Senior Deep Learning Performance Architect

Nvidia

Us, Ca, Santa Clara, US 23 days ago $184,000$287,500
Python C++ GPU ASIC Deep Learning LLM Batching KV-cache Latency/Tuning Multi-node Scaling Memory Hierarchy Scalability System Architecture Performance Tuning Profiling Debugging

Senior Deep Learning Computer Architect

Nvidia

Us, Ca, Santa Clara, US 139 days ago $184,000$287,500
C++ Python CUDA PyTorch GPU ComputerArchitecture DeepLearningKernels LLMWorkloads PerformanceAnalysis ParallelizationStrategies FusionStrategies

Senior Deep Learning Performance Architect - LPU

Nvidia

Remote (Us, Ca, Remote, US) 10 days ago $152,000$241,500
Python C C++ CUDA MPI OpenMP HPC GPU Deep Learning Machine Learning Performance Modeling Systems Performance Analysis AI Inference Workloads CUDA Kernels Custom ASIC Hardware
Remote