Senior Systems Software Engineer, AI Stack and Performance - DGX Station

Nvidia

Remote

Quick summary

Work type: Remote
Location: Santa Clara, CA
Salary: $224,000–$356,500 / yr
Posted: 3 days ago
Nearby: 99+ roles within 25 mi

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $197k

This role $290k

$127k most similar roles pay here $381k

This role pays more than 99% of similar roles. Most pay $157,500–$235,750 — the shaded band above. At the midpoint, this role pays about $290k versus about $197k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 855 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 843 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Systems Software Engineer, AI Stack and Performance - DGX Station

Apply Now Log in to save

As a senior systems software engineer on NVIDIA’s DGX Station team, you will ensure production readiness of AI applications like NemoClaw, LLM inference via NIM, and deep learning frameworks by profiling workloads, identifying bottlenecks across GPU compute, NVLink, memory, and host interconnects, and driving optimizations from kernel tuning to application-level improvements. You’ll collaborate with framework, compiler, and GPU architecture teams to enhance performance on the GB300 Blackwell multi-GPU platform, validate multi-user scenarios, and ensure version compatibility of NVIDIA’s AI software stack. Proficiency in deep learning frameworks (PyTorch, TensorFlow, JAX), GPU profiling tools, C/C++, CUDA, and Python is essential, along with experience optimizing LLM training or inference on multi-GPU systems and contributing to open-source projects.

Skills

PyTorch TensorFlow JAX CUDA C/C++ Python Nsight Systems Nsight Compute CUPTI NCCL TensorRT NVIDIA GPU Architecture Multi-GPU Scaling GPU Memory Management Inference Optimization LLM Training Open-Source Contributions CI/CD

What you'll do

Own production readiness of AI applications on DGX Station across single-GPU and multi-GPU configurations.
Profile and optimize LLM and deep learning workloads for GB300 Blackwell multi-GPU architecture.
Identify bottlenecks in GPU compute, NVLink bandwidth, host memory, PCIe, and CPU–GPU communication.
Work with framework, compiler, and GPU architecture teams to improve kernel fusion and graph execution.
Validate multi-user and concurrent workload scenarios on DGX Station for reliable performance.
Ensure version compatibility and functional correctness of NVIDIA AI software stack on DGX Station.

What we're looking for

12+ years of experience in systems software engineering with focus on AI/ML workload optimization.
Deep expertise in profiling and optimizing GPU workloads using tools like Nsight Systems and Compute.
Strong proficiency with deep learning frameworks (PyTorch, TensorFlow, JAX) including their internals.
Experience in multi-GPU communication optimization and NCCL tuning for performance improvements.
Proficiency in C/C++, CUDA, Python, and ability to read/modify GPU kernels.
Track record of collaborating with compiler and hardware architecture teams on kernel fusion and graph optimization.
Contributions to open-source AI frameworks or CUDA libraries demonstrating technical depth.

Similar roles

Senior AI Infrastructure Software Engineer - DGX Cloud

Nvidia

Remote (Santa Clara, CA) 23 days ago $184,000–$287,500

Kubernetes Python C Prometheus Loki ELK TensorFlow PyTorch JAX Ray NCCL RDMA IB NVIDIA GPUs CI/CD

Remote

Save

Senior System Software Engineer - AI Performance and Efficiency Tools

Nvidia

Santa Clara, CA 146 days ago $184,000–$287,500

C++ Python PyTorch TensorFlow Kubernetes Slurm CUDA NCCL Linux NVIDIA_GPUs GPU_Cluster_Job_Scheduling Distributed_Training_Inference CI/CD Prometheus Grafana

Hybrid

Save

Senior System Software Engineer - AI Performance and Efficiency Tools

Nvidia

Santa Clara, CA 29 days ago $184,000–$287,500

Python C++ PyTorch TensorFlow Kubernetes Slurm CUDA NCCL NVIDIA_GPUs Linux_device_drivers Compiler_implementation GPU_architecture CPU_architecture Computer_architecture_principles CI/CD

Hybrid

Save

Senior AI Platform Engineer- Data and Systems

Adobe

San Jose 37 days ago $208,300–$301,600

Apache_Spark Databricks Delta_Lake Kafka Kinesis Flink Python Scala SQL AWS Azure Docker Kubernetes CI/CD MCP LangChain LLMs Feature_Stores RAG Unity_Catalog FAISS Pinecone Weaviate Semantic_layers DataHub OpenMetadata AI-powered_developer_tools

Save

Senior DGX Cloud AI Infrastructure Software Engineer

Nvidia

Remote (Santa Clara, CA) 63 days ago $184,000–$287,500

Python C/C++ Prometheus Loki ELK CI/CD Git PyTorch TensorFlow JAX Ray NCCL IB_verbs ucx libfabrics Docker Kubernetes AWS GCP Azure

Remote

Save

Senior Software Engineer (AI Platform)

Smartly

Helsinki, Finland 48 days ago

Python TypeScript PostgreSQL Node.js Docker Kubernetes React AWS GCP CI/CD MLOps PyTorch TensorFlow MLflow Kubeflow

Hybrid

Save