Senior Site Reliability Engineer, HPC
Nvidia
Quick summary
Market check
How this pay compares to similar roles
This role pays more than 73% of similar roles. Most pay $142,450–$214,000 — the shaded band above. At the midpoint, this role pays about $209k versus about $178k for comparable roles.
Based on 240 similar postings.
Employer
Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing
Microsoft currently has 377 open roles on FindRole.
Listed pay typically runs $119,800–$234,700 across 355 roles with salary data.
Most-posted roles
At a glance
As a Senior Site Reliability Engineer (SRE) in the HPC team, you will ensure the reliability and availability of high-performance computing clusters used for training and inference of MAI models. Your daily tasks include designing monitoring systems, automating deployments, leading incident management, and collaborating with ML engineers to enhance developer experience. You must be proficient in Kubernetes, Docker, CI/CD pipelines, public cloud platforms like Azure or AWS, and observability tools such as Grafana and Datadog. Strong programming skills in Python, Go, or Bash are essential, along with expertise in distributed systems and networking. This role offers the opportunity to work on cutting-edge infrastructure that impacts millions of users through reliable AI deployments.
Skills
What you'll do
What we're looking for
Related searches
More like this
Nvidia
SpaceX
Nvidia
Equifax
Shopify
Booz Allen Hamilton