Senior Software Engineer, Distributed Systems - NIM Factory

Nvidia

Remote Actively hiring
Santa Clara, CA Posted 49 days ago $168,000$270,250 / year

At a glance

AI generated

TL;DR

NVIDIA seeks a senior engineer to join its team designing and building factory infrastructure and automation for NVIDIA Inference Microservices (NIMs), aiming to optimize AI model performance across diverse environments. This role involves developing an efficient pipeline that transforms AI models into deployable services validated in cloud, on-premises, and Kubernetes settings. Key responsibilities include defining technical strategies, expanding observability over the factory pipeline, and collaborating with multi-functional teams to enhance productivity. The ideal candidate possesses deep expertise in distributed containerized applications using Docker, K8s, Helm, and Prometheus, alongside experience in building rich microservice applications and CI/CD pipelines. They should also have a track record of mentoring team members and delivering event-driven applications, making significant contributions to large-scale full-stack development projects.

Skills

Docker Kubernetes CI/CD Prometheus Helm Temporal Kafka Redis Python Go Rust AWS GCP Azure PostgreSQL MongoDB GitLab Jenkins GitHub Slack Confluence Swagger GraphQL REST JSON YAML

What you'll do

  • Design and build an efficient factory pipeline to validate AI models across various deployment environments.
  • Develop scalable and reliable factory components in collaboration with technical leaders and multiple AI model teams.
  • Define metrics and drive continuous improvements based on user feedback and performance analysis.
  • Mentor team members and foster growth within the organization through knowledge sharing and leadership.
  • Build observability over the factory pipeline and its compute infrastructure to enhance monitoring and debugging.

What we're looking for

  • Extensive experience building distributed and compute systems using microservices and cloud technologies.
  • Deep technical expertise in Docker, Kubernetes, Helm, Prometheus, and other containerization tools.
  • Proven ability to design scalable and reliable factory components for AI model deployment.
  • Strong background in developing performant microservice applications with CI/CD pipelines.
  • Experience debugging and analyzing performance issues in distributed systems and cloud environments.
  • BS or MS in Computer Science, Engineering, or related field with 8+ years of relevant experience.
  • Effective collaboration skills across multi-functional teams and organizational boundaries.

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $192k
This role $219k
$123k most similar roles pay here $286k

This role pays more than 74% of similar roles. Most pay $161,125–$223,750 — the shaded band above. At the midpoint, this role pays about $219k versus about $192k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 824 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 812 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Software Engineer – TensorRT Edge-LLM

Nvidia

Remote (Santa Clara, CA) 78 days ago $152,000$241,500
C++ TensorRT CUDA vLLM SGLang MLC-LLM FlashInfer Transformer models Quantization Tensor parallelism Memory-efficient scheduling Speculative decoding KV cache management Compiler infrastructure Robotics Embedded AI pipelines Performance profiling GPU architecture
Remote

Senior Software Development Engineer

Adobe

San Jose 79 days ago $177,900$257,550
Java Scala Agile CI/CD Databases Compilers Query Optimization Distributed Systems Python PostgreSQL Kafka Redis Elasticsearch GraphQL

Senior Software Development Engineer

Adobe

San Jose 51 days ago $208,300$301,600
Java Scala Apache_Spark Agile CI/CD Python Docker Kubernetes AWS PostgreSQL Redis Git Jenkins Prometheus Grafana

Senior Software Development Engineer

Adobe

San Jose 49 days ago $208,300$301,600
Adobe Experience Platform Spark Hadoop Kafka Java Scala Apache Parquet Databricks Delta Apache Iceberg Apache Hudi Jenkins Agile Continuous Learning Big Data OOP Principles