Principal Site Reliability Engineering Manager

Microsoft

Quick summary

Work type: On-site
Location: —
Salary: $142,800–$274,800 / yr
Posted: 71 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $188k

This role $209k

$127k most similar roles pay here $291k

This role pays more than 67% of similar roles. Most pay $159,375–$216,423 — the shaded band above. At the midpoint, this role pays about $209k versus about $188k for comparable roles.

Based on 239 similar postings.

Employer

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing

Microsoft currently has 622 open roles on FindRole.

Listed pay typically runs $119,800–$234,700 across 571 roles with salary data.

Most-posted roles

View all roles at Microsoft

At a glance

TL;DR · Principal Site Reliability Engineering Manager

Role Posting Log in to save

As a Principal Site Reliability Engineering Manager in Microsoft’s ES365 organization, you will lead a team of diverse SREs to enhance the reliability of large-scale engineering systems used by multiple divisions. Your day-to-day responsibilities include partnering with engineers and product managers to design and maintain reliable services, driving cross-organizational alignment through shared standards, and implementing service level objectives (SLOs) and indicators (SLIs). You will also foster a culture of continuous improvement by leading incident response and conducting Engineering Service Reviews. The role requires expertise in cloud services, particularly Azure, containerization, orchestration, and observability practices such as metrics, logs, and tracing. Additionally, you must have experience in reducing toil through automation and improving operational efficiency across build, validation, and deployment systems. This position is ideal for someone passionate about coaching and people leadership within a high-functioning team focused on customer impact and reliability at scale.

Skills

Azure Kubernetes Docker CI/CD Prometheus Grafana Python Go PostgreSQL Terraform AWS GitOps SLOs SLIs Observability MetricstoLogsTracing BlamelessPostIncidentReviews SelfHealingSystems SafeRollouts AutomatedRemediation

What you'll do

Partner with engineers to design and maintain reliable and resilient services.
Drive cross-organizational alignment through partnerships and shared reliability standards.
Build and retain a team of Site Reliability Engineers, providing mentorship and coaching.
Define and implement SLOs/SLIs for critical engineering systems to guide continuous improvement.
Lead incident management, including blameless post-incident reviews and corrective actions.
Drive automation to reduce operational toil and improve efficiency in build and deployment systems.

What we're looking for

5+ years of experience leading large-scale initiatives involving multiple engineers.
Proven track record in reliability engineering for developer or platform services.
Experience in cross-disciplinary collaboration to align reliability priorities.
Expertise in architecting and operating enterprise-scale distributed cloud services.
Strong background in managing engineering systems processes with reliability practices.
Leadership in incident response, automation, and observability (metrics/logs/traces).
Deep understanding of containerization and orchestration technologies.

Save