Senior Site Reliability Engineer | Microsoft Careers

Microsoft

Actively hiring
US Posted 106 days ago $119,800$234,700 / year

At a glance

AI generated

TL;DR

As a Senior Site Reliability Engineer on Microsoft Teams’ Core Services Infrastructure and Security team, you will play a pivotal role in maintaining the reliability, security, and performance of mission-critical distributed systems at global scale. Your responsibilities include designing, implementing, and operating secure network and infrastructure services, enhancing monitoring and alerting mechanisms, and troubleshooting complex issues related to traffic routing, gateway behavior, and network security policies. You will also serve as a Designated Responsible Individual (DRI) on a rotational basis, participating in root cause analyses and driving continuous improvements through data-driven analysis. This role requires expertise in core networking concepts, cloud infrastructure, microservices patterns, and automation frameworks, with a focus on large-scale systems and active-active architectures. You will collaborate closely with security, networking, and compliance teams to deliver integrated solutions that enhance the reliability and efficiency of Microsoft Teams’ global communication experiences.

Skills

Azure Kubernetes Terraform Python Go Docker CI/CD Prometheus Grafana GitOps Infrastructure-as-Code DNS CDN TLS Certificate Lifecycle Management Network Security Cloud Security Controls Identity-Driven Security Policies Microservices Patterns API Gateways Global Routing Architectures Automation Frameworks Scripting Distributed Tracing Metric Analysis Log Analysis

What you'll do

  • Design, implement, and operate secure network and infrastructure services supporting Microsoft Teams’ microservices environment.
  • Develop monitoring, alerting, and automated recovery mechanisms to enhance system reliability.
  • Troubleshoot complex issues related to traffic routing, gateway behavior, DNS, CDN interactions, and network security policies.
  • Serve as a Designated Responsible Individual (DRI) for incident management and root cause analysis.
  • Optimize service performance and availability through data-driven analysis using metrics, logs, and distributed tracing.
  • Identify automation opportunities to reduce manual tasks and increase engineering productivity.

What we're looking for

  • Master's Degree in Computer Science or 2+ years of technical experience in software engineering, network engineering, or systems administration.
  • Experience with core networking concepts including TCP/IP fundamentals, routing, load balancing, CDN, and firewalling.
  • Ability to diagnose and remediate performance or availability issues using logs, metrics, traces, and standard network troubleshooting tools.
  • Hands-on experience operating services in a cloud environment like Azure.
  • 3+ years of technical experience working with large-scale cloud or distributed systems.
  • Experience with network security, cloud security controls, identity-driven security policies, and certificate management at scale.
  • Familiarity with large-scale cloud infrastructure (IaaS), microservices patterns, API gateways, and global routing architectures.

Market check

Salary context

This $119,800–$234,700 range sits above 72% of similar postings on FindRole.

Peer median band

$119,800$204,470

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$142,400$183,000

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing

Microsoft currently has 451 open roles on FindRole.

Listed pay typically runs $119,800–$234,700 across 417 roles with salary data.

Most-posted roles

View all roles at Microsoft

More like this

Similar roles

Site Reliability Engineer II | Microsoft Careers

Microsoft

US 161 days ago $100,600$199,000
Python Docker Kubernetes Terraform AWS CI/CD Git Linux Azure PostgreSQL Ansible Jenkins Prometheus Grafana JSON YAML REST OAuth PCI DSS

Careers - Senior Site Reliability Engineer

Block

New York, New York, US 47 days ago $189,000$283,600
AWS Terraform Kubernetes Istio Event driven architectures CI/CD DataDog LaunchDarkly Java Kotlin gRPC Protocol Buffers MySQL Vitess DynamoDB HTTP JSON

Careers - Senior Site Reliability Engineer

Block

Bay Area, California, US 47 days ago $189,000$283,600
AWS Terraform Kubernetes Istio Event driven architectures CI/CD DataDog LaunchDarkly Java Kotlin gRPC Protocol Buffers MySQL Vitess DynamoDB HTTP JSON