Senior Systems Software Engineer, AI Stack and Performance - DGX Station

Nvidia

Remote

Quick summary

Work type
Remote
Location
Santa Clara, CA
Salary
$224,000–$356,500 / yr
Posted
3 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $197k
This role $290k
$127k most similar roles pay here $381k

This role pays more than 99% of similar roles. Most pay $157,500–$235,750 — the shaded band above. At the midpoint, this role pays about $290k versus about $197k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 855 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 843 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Systems Software Engineer, AI Stack and Performance - DGX Station

As a senior systems software engineer on NVIDIA’s DGX Station team, you will ensure production readiness of AI applications like NemoClaw, LLM inference via NIM, and deep learning frameworks by profiling workloads, identifying bottlenecks across GPU compute, NVLink, memory, and host interconnects, and driving optimizations from kernel tuning to application-level improvements. You’ll collaborate with framework, compiler, and GPU architecture teams to enhance performance on the GB300 Blackwell multi-GPU platform, validate multi-user scenarios, and ensure version compatibility of NVIDIA’s AI software stack. Proficiency in deep learning frameworks (PyTorch, TensorFlow, JAX), GPU profiling tools, C/C++, CUDA, and Python is essential, along with experience optimizing LLM training or inference on multi-GPU systems and contributing to open-source projects.

What you'll do

  • Own production readiness of AI applications on DGX Station across single-GPU and multi-GPU configurations.
  • Profile and optimize LLM and deep learning workloads for GB300 Blackwell multi-GPU architecture.
  • Identify bottlenecks in GPU compute, NVLink bandwidth, host memory, PCIe, and CPU–GPU communication.
  • Work with framework, compiler, and GPU architecture teams to improve kernel fusion and graph execution.
  • Validate multi-user and concurrent workload scenarios on DGX Station for reliable performance.
  • Ensure version compatibility and functional correctness of NVIDIA AI software stack on DGX Station.

What we're looking for

  • 12+ years of experience in systems software engineering with focus on AI/ML workload optimization.
  • Deep expertise in profiling and optimizing GPU workloads using tools like Nsight Systems and Compute.
  • Strong proficiency with deep learning frameworks (PyTorch, TensorFlow, JAX) including their internals.
  • Experience in multi-GPU communication optimization and NCCL tuning for performance improvements.
  • Proficiency in C/C++, CUDA, Python, and ability to read/modify GPU kernels.
  • Track record of collaborating with compiler and hardware architecture teams on kernel fusion and graph optimization.
  • Contributions to open-source AI frameworks or CUDA libraries demonstrating technical depth.

More like this

Similar roles

Senior AI Platform Engineer- Data and Systems

Adobe

San Jose 37 days ago $208,300$301,600
Apache_Spark Databricks Delta_Lake Kafka Kinesis Flink Python Scala SQL AWS Azure Docker Kubernetes CI/CD MCP LangChain LLMs Feature_Stores RAG Unity_Catalog FAISS Pinecone Weaviate Semantic_layers DataHub OpenMetadata AI-powered_developer_tools