Senior Software Engineer - Storage

Nvidia

Remote

Quick summary

Work type
Remote
Location
Santa Clara, CARedmond, WA
Salary
$152,000–$241,500 / yr
Posted
4 days ago

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $184k
This role $197k
$133k most similar roles pay here $253k

This role pays more than 61% of similar roles. Most pay $145,000–$222,000 — the shaded band above. At the midpoint, this role pays about $197k versus about $184k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 980 open roles on FindRole.

Listed pay typically runs $168,000–$270,250 across 966 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Software Engineer - Storage

As a Software Engineer at NVIDIA’s Managed AI Research Superclusters (MARS) team, you will play a pivotal role in designing, building, and operating exascale infrastructure that powers cutting-edge AI research. Your responsibilities include developing distributed systems for managing data, compute, and networking across thousands of GPUs and petabytes of storage, as well as collaborating with cross-functional teams to ensure the reliability, performance, and observability of these systems. You will leverage your expertise in C++, Python, or Go to create production-quality software and automation tools that enable AI researchers to focus on innovation rather than infrastructure challenges. This role requires a strong background in distributed systems, large-scale storage solutions like Lustre or GPFS, and orchestration frameworks such as Kubernetes or Slurm, along with experience in cloud environments and infrastructure automation tools.

What you'll do

  • Design, develop, and operate exascale distributed systems for large-scale AI workloads.
  • Build automation to orchestrate thousands of GPUs and petabytes of storage across multi-region clusters.
  • Translate AI/ML research requirements into scalable, high-performance solutions.
  • Enhance system reliability, performance, and observability to meet exascale standards.
  • Ensure MARS infrastructure meets robustness and compliance through collaboration with security teams.

What we're looking for

  • 5+ years of experience developing and operating large-scale distributed systems.
  • Strong programming skills in C++, Python, or Go for production-quality software.
  • Solid understanding of distributed systems principles and orchestration frameworks.
  • Hands-on experience with high-performance storage and compute scheduling.
  • Familiarity with cloud environments and infrastructure automation tools.
  • Excellent communication and cross-functional collaboration skills.

More like this

Similar roles

Senior Software Engineer, Storage

SpaceX

Remote (US) 17 days ago $199,000$210,000
Go Python Rust Valkey Redis Memcached EC2 Datadog Cloudwatch Sentry Snowflake CI/CD Distributed Systems AI Monitoring Tiered Database Storage Observability Analytics
Remote

Senior Software Engineer II, Storage

SpaceX

Remote (US) 17 days ago $230,000$242,000
Go Python Rust PostgreSQL Yugabyte RDS EC2 GCP Datadog Cloudwatch Sentry Snowflake Distributed Databases CI/CD
Remote

Page Not Found | Uber

Uber

Sunnyvale, CA 11 days ago
Python Rust Go Java Pytorch Ray Iceberg Lance Gravitino Polaris S3 GCS Azure OCI Docker Kubernetes CI/CD PostgreSQL AWS Google Cloud Azure Cloud

Senior Software Developer (Storage)

Oracle

Santa Clara, CA +1 15 days ago $79,200$178,100
C++ Java OCI Distributed Systems AWS Kubernetes CI/CD Python PostgreSQL Docker Git Linux REST APIs Scalability Performance Optimization SLAs Documentation On-call Support

Page Not Found | Uber

Uber

Seattle, WA +2 50 days ago
HDFS Cloud Object Storage S3 GCS OCI Blobstore metadata management Apache Hudi Apache Iceberg Docstore Google Spanner TiDB Cassandra Redis Spark Flink Ray Presto Trino Hive Java Go Scala C++ Distributed MySQL Vitess GCP RAG systems GPU data loading Observability CI/CD