Site Reliability Engineer - CTJ - Secret

Microsoft

Quick summary

Work type
On-site
Location
Redmond, WA
Salary
$102,100–$202,200 / yr
Posted
8 days ago
Closes
Dec 5, 2026

Market check

Salary context

Below market

How this pay compares to similar roles

Similar $176k
This role $152k
$88k most similar roles pay here $237k

This role pays less than 70% of similar roles. Most pay $142,837–$209,750 — the shaded band above. At the midpoint, this role pays about $152k versus about $176k for comparable roles.

Based on 240 similar postings.

Employer

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing

Microsoft currently has 1578 open roles on FindRole.

Listed pay typically runs $119,800–$234,700 across 1406 roles with salary data.

Most-posted roles

View all roles at Microsoft

At a glance

TL;DR · Site Reliability Engineer - CTJ - Secret

Join our dynamic team as a Senior Site Reliability Engineer responsible for ensuring the reliability and operational health of critical Substrate components in highly regulated environments. You will serve as an On-Call Engineer, responding to incidents independently and leading post-incident reviews to enhance service stability through automation and monitoring improvements. Your daily tasks include developing production-quality code, collaborating with software engineering teams to embed operability into designs, and maintaining robust alerting systems aligned with SLOs. Proficiency in cloud or distributed systems, along with experience in software engineering, network engineering, or systems administration is essential. This role requires a deep understanding of security screenings and the ability to work within highly regulated domains, ensuring that you contribute effectively to our large-scale system's reliability and scalability.

What you'll do

  • Own and maintain the reliability and operational health of Substrate components or services.
  • Independently respond to production incidents as part of an on-call rotation.
  • Design and implement automation to reduce operational workload and improve service stability.
  • Develop monitoring, alerting, and telemetry for SLOs and operational metrics support.
  • Lead post-incident reviews focusing on root cause analysis and durable fixes implementation.
  • Collaborate with software engineering teams to embed reliability into service design.

What we're looking for

  • At least 4 years of technical experience in software engineering, network engineering, or systems administration.
  • Ability to independently respond to and resolve production incidents.
  • Design and implement automation for operational efficiency and service stability.
  • Develop and maintain monitoring, alerting, and telemetry for SLOs and metrics.
  • Lead post-incident reviews focusing on root cause analysis and durable fixes.
  • Collaborate with software engineering teams to embed reliability into service design.

More like this

Similar roles

| Microsoft Careers

Microsoft

US 85 days ago
Azure Kubernetes Docker CI/CD Python Go Terraform Prometheus Grafana AI ML Telemetry SDP PostgreSQL SQL Git Linux Windows Server DevOps SRE Cloud Security Capacity Planning
Hybrid

Site Reliability Engineer |||

CME Group

Chicago, IL 133 days ago $100,700$167,800
GCP Docker Kubernetes Python Java Oracle Postgres BigQuery SLO SLI SLA OpenTelemetry Splunk Prometheus Grafana CI/CD Bamboo JIRA Git

Site Reliability Engineer

Equifax

St. Louis, Missouri +1 62 days ago
AWS GCP Terraform Jenkins Python Bash Docker Kubernetes CI/CD Prometheus PostgreSQL Linux Windows Ansible Chef
Hybrid

Site Reliability Engineer

Shopify

Europe 46 days ago
Kubernetes Docker CI/CD Python Go PostgreSQL AWS GCP Prometheus Grafana Terraform GitOps