Senior Deep Learning Hardware Modeling Architect - LPU

Nvidia

Remote

Quick summary

Work type
Remote
Location
San Francisco, CA · Austin, TX · Boston, MA
Salary
$152,000–$241,500 / yr
Posted
2 days ago

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $216k
This role $197k
$139k most similar roles pay here $276k

This role pays less than 66% of similar roles. Most pay $185,962–$246,150 — the shaded band above. At the midpoint, this role pays about $197k versus about $216k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 985 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 971 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Deep Learning Hardware Modeling Architect - LPU

NVIDIA is seeking a Senior Deep Learning Hardware Modeling Architect to join its pioneering team focused on optimizing AI inference performance through hardware-software co-design. In this role, you will drive architectural specifications across multiple stakeholders, develop detailed component-level and system-level designs, and create executable models used by NVIDIA’s customers worldwide. Your responsibilities include ensuring high performance using robust C++ practices, algorithms, and parallelism, as well as resolving complex issues across chip and hardware subsystems in collaboration with various teams. Ideal candidates possess a strong background in C++, experience in RTL design, and an automation-centered mindset to enhance work efficiency. This role offers the chance to contribute significantly to cutting-edge LLM inference solutions at NVIDIA.

What you'll do

  • Drive architectural specifications to closure across multiple stakeholders.
  • Develop detailed written specifications for component and system designs.
  • Create executable models used by customers for AI inference solutions.
  • Ensure high performance through effective use of C++, algorithms, and parallelism.
  • Resolve complex performance and correctness issues across hardware subsystems.

What we're looking for

  • 5+ years of relevant experience in a technical field such as CS, EE, or Math.
  • Expert programming skills in C++ for developing high-performance software.
  • Experience in RTL design with understanding of chip design concepts.
  • Ability to develop and embody architectural specifications in executable models.
  • Strong automation mindset and desire to improve work efficiency using AI.

More like this

Similar roles

Senior Deep Learning Performance Architect - LPU

Nvidia

Remote (CA) 19 days ago $152,000$241,500
Python C C++ CUDA MPI OpenMP HPC GPU Deep Learning Machine Learning Performance Modeling Systems Performance Analysis AI Inference Workloads CUDA Kernels Custom ASIC Hardware
Remote

Senior Deep Learning Performance Architect

Nvidia

Santa Clara, CA 32 days ago $184,000$287,500
Python C++ GPU ASIC Deep Learning LLM Batching KV-cache Latency/Tuning Multi-node Scaling Memory Hierarchy Scalability System Architecture Performance Tuning Profiling Debugging

Senior Deep Learning Computer Architect

Nvidia

Santa Clara, CA 148 days ago $184,000$287,500
C++ Python CUDA PyTorch GPU ComputerArchitecture DeepLearningKernels LLMWorkloads PerformanceAnalysis ParallelizationStrategies FusionStrategies
Hybrid

Senior Deep Learning Performance Architect

Nvidia

Santa Clara, CA 148 days ago $184,000$287,500
Python C++ GPU Deep_Learning ASIC Transformer_Models Computer_Architecture Interconnect_Fabrics Parallel_Computing AI_Algorithms

Senior Deep Learning Performance Architect

Nvidia

Santa Clara, CA 148 days ago $184,000$287,500
Python C C++ Pytorch JAX TensorRT CUDNN CUBLAS CUTLASS MLIR Triton CUDA OpenCL GPU Deep Learning ASIC Performance Modeling Architecture Simulation Profiling Analysis