Senior Site Reliability Engineer - HPC
Nvidia
Quick summary
Market check
How this pay compares to similar roles
This role pays more than 74% of similar roles. Most pay $146,125–$210,112 — the shaded band above. At the midpoint, this role pays about $209k versus about $178k for comparable roles.
Based on 240 similar postings.
Employer
Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing
Microsoft currently has 728 open roles on FindRole.
Listed pay typically runs $119,800–$234,700 across 664 roles with salary data.
Most-posted roles
At a glance
Join our High Performance Computing (HPC) infrastructure team as an experienced Site Reliability Engineer (SRE), where you will blend software engineering with systems engineering to maintain the reliability and efficiency of large-scale distributed AI infrastructure. Your daily tasks include ensuring high uptime for HPC clusters, designing monitoring and alerting systems, building automation tools for deployments and incident response, managing security compliance, and collaborating with ML engineers to enhance developer experience. Ideal candidates have strong proficiency in Kubernetes, Docker, CI/CD pipelines, public cloud platforms like Azure or AWS, and monitoring tools such as Grafana and Datadog, along with expertise in Python, Go, Bash, distributed systems, and large-scale GPU clusters for AI workloads.
Skills
What you'll do
What we're looking for
More like this
Nvidia
Microsoft
Blackstone Inc
Microsoft
Microsoft
Microsoft