Senior Systems Software Engineer, Observability and Telemetry Platform

Nvidia

Remote

Quick summary

Work type
Remote
Location
Santa Clara, CA
Salary
$184,000–$287,500 / yr
Posted
4 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $192k
This role $236k
$126k most similar roles pay here $305k

This role pays more than 91% of similar roles. Most pay $163,416–$221,000 — the shaded band above. At the midpoint, this role pays about $236k versus about $192k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 942 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 931 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Systems Software Engineer, Observability and Telemetry Platform

As a Senior Systems Software Engineer (SRE) at NVIDIA, you will join a specialized team responsible for ensuring the reliability and uptime of GPU cloud services while enabling developers to make changes efficiently. Your day-to-day tasks include designing and implementing large-scale Observability & Telemetry platforms with real-time monitoring, logging, and alerting capabilities. You will engage in all stages of service lifecycle management, from system design consulting to capacity planning and post-launch support. Key responsibilities involve maintaining system health through automation, scaling systems sustainably, and participating in on-call rotations for production support. The role demands expertise in Linux, networking, containers, and experience with Kubernetes, OpenStack, Docker, Grafana, Prometheus, and OpenTelemetry. Ideal candidates have a BS degree in Computer Science or related fields, 8+ years of infrastructure automation and distributed systems design experience, and strong problem-solving and communication skills.

What you'll do

  • Design and implement operational aspects of large-scale Observability & Telemetry platforms.
  • Engage in the full lifecycle of services from inception to refinement.
  • Support services pre-launch through system design consulting and capacity management.
  • Maintain live services by monitoring availability, latency, and overall health.
  • Scale systems sustainably using automation and improve reliability and velocity.
  • Participate in on-call rotations for production support and incident response.

What we're looking for

  • 8+ years of experience with infrastructure automation and distributed systems design.
  • 5+ years delivering foundational infrastructure and observability platforms.
  • Expertise in Linux, networking, containers, and one or more languages: Python, Go, Perl, Ruby.
  • Experience running large private/public cloud systems using Kubernetes, OpenStack, Docker.
  • Ability to debug, optimize code, automate tasks, and practice sustainable incident response.

More like this

Similar roles

Senior System Software Engineer, Data Platform Observability

Nvidia

Remote (Santa Clara, CA) +1 123 days ago $184,000$287,500
Python JavaScript React Grafana Prometheus Kubernetes Terraform Apache Spark Elasticsearch OpenSearch Helm Ansible Go Rust Docker CI/CD OpenTelemetry Data Governance Policy-as-Code
Remote

Senior Software Engineer, Sensing & Connectivity

Apple Inc

Cupertino, CA 72 days ago $147,400$272,100
C++ Embedded Systems Real-Time Performance Algorithm Design Data Structures Object-Oriented Design API Development Sensor Fusion iOS WatchOS macOS CI/CD

Senior Platform Telemetry Engineer

Nvidia

Remote (Santa Clara, CA) 5 days ago $152,000$241,500
Python C/C++ Git Jira Prometheus InfluxDB Grafana Redfish PagerDuty CI/CD REST APIs Telemetry Firmware architecture Time series databases x86 or ARM system architecture Confidential Compute
Remote

Senior Software Systems Engineer

Boeing

Oklahoma City, OK 3 days ago $164,900$223,100
Python C++ SQL Ada JavaScript CI/CD PostgreSQL Kubernetes AWS Git Docker Terraform JUnit Selenium Swagger Jenkins SonarQube GitHub Bitbucket