Deep Learning Kernel Software Performance Architect - New College Grad 2026

Nvidia

Actively hiring
Santa Clara, CA Posted 46 days ago $124,000$195,500 / year

At a glance

AI generated

TL;DR

NVIDIA is seeking a Performance Architect for Deep Learning Software at the senior level to join its cutting-edge Deep Learning Architecture team. This role involves validating and analyzing the performance of GPU-accelerated systems and software architectures, debugging deep learning and data analytics applications to identify performance bottlenecks, and developing scripts and tools for analysis and visualization using analytical models and simulators. The ideal candidate will collaborate with various NVIDIA teams, including CUDA and AI Compiler teams, AI/ML training and inference performance teams, and hardware architecture performance teams, to enhance system throughput and optimize critical deep learning layers. Candidates should have a Master’s or PhD in Computer Science, Electrical Engineering, or related fields, along with expertise in software design, parallel programming, computer architecture, and machine learning fundamentals. Proficiency in Python, C, and C++ is required, as well as experience with GPU computing and analytical performance modeling.

Skills

Python C C++ GPU CUDA Parallel_Programming Performance_Analysis Profiling Machine_Learning Deep_Learning Computer_Architecture High_Performance_Computing Energy_Efficient_Designs Analytical_Modeling NVIDIA_CUDA AI_Compiler

What you'll do

  • Validate and analyze performance of GPU-accelerated system architectures for deep learning.
  • Debug deep learning software to resolve performance bottlenecks and improve efficiency.
  • Develop scripts and tools for analyzing, visualizing, and debugging software using models and simulators.
  • Work with CUDA and AI Compiler teams to identify and fix performance issues in software.
  • Collaborate with hardware architecture teams to define expectations for new deep learning hardware features.

What we're looking for

  • Master's or PhD in Computer Science, Electrical Engineering, or related field.
  • Expertise in software design, debugging, and performance analysis.
  • Hands-on experience with parallel programming and GPU computing.
  • Strong understanding of computer architecture and performance optimization.
  • Fluency in Python, C, and C++ for developing and analyzing software.
  • Experience in machine learning fundamentals and high-performance computing.

Market check

Salary context

Below market

How this pay compares to similar roles

This role $160k
$110k most similar roles pay here $256k

This role pays less than 87% of similar roles. Most pay $181,758–$235,750 — the shaded band above.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 825 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 813 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Deep Learning Performance Architect

Nvidia

Santa Clara, CA 143 days ago $184,000$287,500
Python C C++ Pytorch JAX TensorRT CUDNN CUBLAS CUTLASS MLIR Triton CUDA OpenCL GPU Deep Learning ASIC Performance Modeling Architecture Simulation Profiling Analysis

Senior Deep Learning Performance Architect

Nvidia

Santa Clara, CA 27 days ago $184,000$287,500
Python C++ GPU ASIC Deep Learning LLM Batching KV-cache Latency/Tuning Multi-node Scaling Memory Hierarchy Scalability System Architecture Performance Tuning Profiling Debugging

Senior Deep Learning Performance Architect

Nvidia

Santa Clara, CA 143 days ago $184,000$287,500
Python C++ GPU Deep_Learning ASIC Transformer_Models Computer_Architecture Interconnect_Fabrics Parallel_Computing AI_Algorithms

Senior Deep Learning Computer Architect

Nvidia

Santa Clara, CA 143 days ago $184,000$287,500
C++ Python CUDA PyTorch GPU ComputerArchitecture DeepLearningKernels LLMWorkloads PerformanceAnalysis ParallelizationStrategies FusionStrategies
Hybrid

Senior Deep Learning Performance Architect - LPU

Nvidia

Remote (CA) 14 days ago $152,000$241,500
Python C C++ CUDA MPI OpenMP HPC GPU Deep Learning Machine Learning Performance Modeling Systems Performance Analysis AI Inference Workloads CUDA Kernels Custom ASIC Hardware
Remote