Senior AI Infrastructure Software Engineer - DGX Cloud
Nvidia
At a glance
AI generatedAs a senior AI infrastructure software engineer on NVIDIA’s DGX Cloud AI Efficiency Team, you will play a pivotal role in developing and maintaining the tools that optimize efficiency and resiliency for large-scale AI workloads. Your responsibilities include implementing robust software solutions to ensure high availability of AI systems, co-designing APIs with NVIDIA's resiliency stacks, and enhancing infrastructure underpinning AI platforms. You will also define reliability metrics to track system performance. Ideal candidates have over 8 years of experience in building scalable distributed systems for AI, proficiency in Python, C/C++, and scripting languages, and expertise in observability tools like ELK, Prometheus, and Loki. Additionally, familiarity with RDMA software stacks such as NCCL and ucx is beneficial. This role offers the chance to work on cutting-edge technologies that drive advancements in AI and data science within a collaborative environment focused on iterative improvement and risk-taking.
Skills
What you'll do
What we're looking for
Market check
How this pay compares to similar roles
This role pays more than 77% of similar roles. Most pay $162,000–$235,750 — the shaded band above. At the midpoint, this role pays about $236k versus about $199k for comparable roles.
Based on 239 similar postings.
Employer
Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing
Nvidia currently has 824 open roles on FindRole.
Listed pay typically runs $184,000–$287,500 across 812 roles with salary data.
Most-posted roles
More like this
Nvidia
Nvidia
Allstate
Allstate
Adobe
Nvidia