Site Reliability Engineer - Hardware Infrastructure
Nvidia
At a glance
AI generatedJoin NVIDIA’s Site Reliability Engineering (SRE) team as a senior specialist responsible for ensuring the reliability and uptime of GPU cloud services. You will design, implement, and support large-scale observability and telemetry platforms, focusing on real-time monitoring, logging, and alerting. Your day-to-day involves engaging in all stages of service lifecycle management, from initial design to deployment and ongoing maintenance, while also practicing sustainable incident response and conducting blameless postmortems. Key skills include extensive experience with infrastructure automation, distributed systems, and cloud platforms like Kubernetes and OpenStack, along with proficiency in Python, Go, Perl, or Ruby, and deep knowledge of Linux, networking, and containers. This role demands a systematic problem-solving approach and the ability to automate routine tasks, contributing to the continuous improvement of production systems at scale.
Skills
What you'll do
What we're looking for
Market check
This $248,000–$396,750 range sits above 98% of similar postings on FindRole.
Peer median band
$127,666–$199,750
Median floor and ceiling across peers.
Typical midpoint (25–75%)
$137,000–$196,750
Middle half of comparable postings.
Based on 240 comparable postings.
* 240 is the maximum number of comparable postings sampled.
Employer
Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing
Nvidia currently has 801 open roles on FindRole.
Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.
Most-posted roles
More like this
Nvidia
Equifax
T. Rowe Price
The Walt Disney Company
The Walt Disney Company
The Walt Disney Company