Site Reliability Engineer - CTJ - Secret

Microsoft

Quick summary

Work type: On-site
Location: Redmond, WA
Salary: $102,100–$202,200 / yr
Posted: 8 days ago
Closes: Dec 5, 2026
Nearby: 99+ roles within 25 mi

Market check

Salary context

Below market

How this pay compares to similar roles

Similar $176k

This role $152k

$88k most similar roles pay here $237k

This role pays less than 70% of similar roles. Most pay $142,837–$209,750 — the shaded band above. At the midpoint, this role pays about $152k versus about $176k for comparable roles.

Based on 240 similar postings.

Employer

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing

Microsoft currently has 1578 open roles on FindRole.

Listed pay typically runs $119,800–$234,700 across 1406 roles with salary data.

Most-posted roles

View all roles at Microsoft

At a glance

TL;DR · Site Reliability Engineer - CTJ - Secret

Apply Now Log in to save

Join our dynamic team as a Senior Site Reliability Engineer responsible for ensuring the reliability and operational health of critical Substrate components in highly regulated environments. You will serve as an On-Call Engineer, responding to incidents independently and leading post-incident reviews to enhance service stability through automation and monitoring improvements. Your daily tasks include developing production-quality code, collaborating with software engineering teams to embed operability into designs, and maintaining robust alerting systems aligned with SLOs. Proficiency in cloud or distributed systems, along with experience in software engineering, network engineering, or systems administration is essential. This role requires a deep understanding of security screenings and the ability to work within highly regulated domains, ensuring that you contribute effectively to our large-scale system's reliability and scalability.

Skills

Python Kubernetes Terraform Docker CI/CD Prometheus Grafana PostgreSQL AWS Linux Git Ansible Nginx SRE DevOps

What you'll do

Own and maintain the reliability and operational health of Substrate components or services.
Independently respond to production incidents as part of an on-call rotation.
Design and implement automation to reduce operational workload and improve service stability.
Develop monitoring, alerting, and telemetry for SLOs and operational metrics support.
Lead post-incident reviews focusing on root cause analysis and durable fixes implementation.
Collaborate with software engineering teams to embed reliability into service design.

What we're looking for

At least 4 years of technical experience in software engineering, network engineering, or systems administration.
Ability to independently respond to and resolve production incidents.
Design and implement automation for operational efficiency and service stability.
Develop and maintain monitoring, alerting, and telemetry for SLOs and metrics.
Lead post-incident reviews focusing on root cause analysis and durable fixes.
Collaborate with software engineering teams to embed reliability into service design.

Similar roles

| Microsoft Careers

Microsoft

US 85 days ago

Azure Kubernetes Docker CI/CD Python Go Terraform Prometheus Grafana AI ML Telemetry SDP PostgreSQL SQL Git Linux Windows Server DevOps SRE Cloud Security Capacity Planning

Hybrid

Save