Member of Technical Staff, High Performance Computing Engineer - MAI SuperIntelligence Team | Microsoft Careers
Microsoft
At a glance
AI generatedJoin our High Performance Computing (HPC) infrastructure team as an experienced Site Reliability Engineer (SRE), where you will blend software engineering with systems engineering to maintain the reliability and efficiency of large-scale distributed AI infrastructure. Your daily tasks include ensuring high uptime for HPC clusters, designing monitoring and alerting systems, building automation tools for deployments and incident response, managing security compliance, and collaborating with ML engineers to enhance developer experience. Ideal candidates have strong proficiency in Kubernetes, Docker, CI/CD pipelines, public cloud platforms like Azure or AWS, and monitoring tools such as Grafana and Datadog, along with expertise in Python, Go, Bash, distributed systems, and large-scale GPU clusters for AI workloads.
Skills
What you'll do
What we're looking for
Market check
This $139,900–$274,800 range sits above 71% of similar postings on FindRole.
Peer median band
$139,900–$234,700
Median floor and ceiling across peers.
Typical midpoint (25–75%)
$167,900–$213,656
Middle half of comparable postings.
Based on 240 comparable postings.
* 240 is the maximum number of comparable postings sampled.
Employer
Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing
Microsoft currently has 534 open roles on FindRole.
Listed pay typically runs $119,800–$234,700 across 488 roles with salary data.
Most-posted roles
More like this
Microsoft
Microsoft
Microsoft
Microsoft
Microsoft
Microsoft