Site Reliability Engineer II

Microsoft

Quick summary

Work type
On-site
Location
Salary
$102,100–$202,200 / yr
Posted
71 days ago

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $173k
This role $152k
$89k most similar roles pay here $227k

This role pays less than 66% of similar roles. Most pay $142,400–$202,812 — the shaded band above. At the midpoint, this role pays about $152k versus about $173k for comparable roles.

Based on 240 similar postings.

Employer

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing

Microsoft currently has 622 open roles on FindRole.

Listed pay typically runs $119,800–$234,700 across 571 roles with salary data.

Most-posted roles

View all roles at Microsoft

At a glance

TL;DR · Site Reliability Engineer II

Join the Security & Compliance team within Microsoft’s M365 Sovereign Clouds organization as a senior engineer, where you will work on distributed systems at massive scale, automating operations, and building disaster recovery capabilities for highly regulated sovereign cloud environments. Your day-to-day responsibilities include responding to incidents during on-call rotations, writing scripts to automate scalable processes, designing telemetry pipelines, troubleshooting issues using AI/ML, deploying changes through a safe deployment process, and sharing insights with product engineering teams. You will need expertise in languages such as C#, Java, JavaScript, or Python, experience with large-scale cloud systems, and an active TS/SCI clearance. This role offers the opportunity to work on complex security challenges that protect millions of messages and documents daily, ensuring reliable and high-performance services for critical customers.

What you'll do

  • Responds to incidents during on-call rotations, troubleshooting issues and deploying fixes.
  • Writes code or scripts to automate scalable operations processes across product components.
  • Designs and maintains telemetry pipelines for monitoring product component metrics.
  • Troubleshoots problems affecting availability, security, reliability, performance of features.
  • Creates, tests, and deploys changes through a safe deployment process to enhance observability.
  • Shares insights and best practices via documented artifacts to improve system development.

What we're looking for

  • Master's in Computer Science or related field with 3+ years of coding experience.
  • Bachelor's in Computer Science or related field with 5+ years of coding experience.
  • At least 2 years of technical experience working with large-scale cloud or distributed systems.
  • Proficiency in programming languages such as C, C++, C#, Java, JavaScript, or Python.
  • Active TS/SCI clearance and willingness to upgrade to TS/SCI with polygraph.
  • Strong problem-solving skills for troubleshooting complex issues.

More like this

Similar roles

Site Reliability Engineer

Microsoft

US 31 days ago $102,100$202,200
Python JavaScript Docker Kubernetes Terraform Azure CI/CD PostgreSQL SQL Prometheus Grafana Git RESTful APIs OAuth SAML Zero-Touch Deployment M365 Services Exchange Online Protection Microsoft Defender for Office

Site Reliability Engineer

Microsoft

Redmond, WA +1 3 days ago $119,800$234,700
Azure Terraform Kubernetes Docker PowerShell Python Bash ARM templates Azure Bicep Spark Hadoop CI/CD PostgreSQL Git Azure Container Apps AKS ACI Event Hubs Synapse

Site Reliability Engineer II

CME Group

Chicago, IL 60 days ago $93,900$156,500
Google Cloud Platform Kubernetes Python Bash OpenTelemetry Splunk Prometheus Grafana Linux Distributed systems Networking HTTP TCP UDP IP Agile CI/CD
Hybrid

Site Reliability Engineer II

CME Group

New York, NY +1 48 days ago $93,900$156,500
Python Bash Google Cloud Platform GCP Kubernetes Prometheus Grafana Dynatrace New Relic Moogsoft BigPanda LLMs LangChain LlamaIndex PagerDuty AIOps OpenTelemetry Splunk Linux AI ML AIOps
Hybrid

Site Reliability Engineer II

The Walt Disney Company

Remote (New York, NY) 40 days ago $123,000$165,000
AWS Kubernetes Terraform Python Go Docker CI/CD Prometheus Grafana Bash Jenkins Infrastructure-as-Code GitOps SLO/SLI Service_mesh Performance_testing Message_queues AI_assisted_development_tools
Remote

Site Reliability Engineer II

Microsoft

Redmond, WA +1 25 days ago $102,100$202,200
Python Java Go C# CI/CD Terraform AWS Kubernetes Docker Prometheus Grafana PostgreSQL Linux Git Ansible Nginx SSL/TLS OAuth RESTful APIs JSON