Lead Director, Site Reliability Engineering - Client Experience

CVS Health

Remote Actively hiring
Remote, USA · Richardson, US Posted 15 days ago $144,200$288,400 / year

At a glance

AI generated

TL;DR

The Lead Director of Site Reliability Engineering for Client Experience is a senior leadership role responsible for building and scaling hands-on SRE teams supporting critical Adjudication and Client Experience platforms across Azure and GCP environments. This technical leader will define reliability engineering practices, including SLIs, SLOs, and error budgets, while coaching teams to design resilient cloud architectures and automate operations. Key responsibilities include driving cloud-native reliability patterns, leading incident management and post-mortem analysis, and embedding reliability into the software development lifecycle. The ideal candidate has extensive experience in designing, deploying, and operating distributed systems in cloud environments, managing senior engineers, and implementing SRE best practices at enterprise scale. Proficiency with Kubernetes-based platforms (AKS, GKE), AI-Ops solutions, and observability tools is preferred.

Skills

Azure GCP Kubernetes CI/CD SLOs SLIs Terraform Docker Prometheus Grafana PostgreSQL Python Go AWS OpenShift AI‑Ops observability microservices APIs

What you'll do

  • Lead and grow hands-on SRE teams for reliability of Tier-1 services across Azure and GCP.
  • Establish and enforce SRE best practices including SLIs, SLOs, error budgets, and toil reduction.
  • Review architecture and reliability designs for critical platforms and influence failure modes.
  • Drive cloud-native reliability patterns such as autoscaling, graceful degradation, and disaster recovery.
  • Own incident management, lead blameless post-mortems, and champion systemic fixes.

What we're looking for

  • 10+ years of progressive experience in engineering or SRE organizations
  • 5+ years managing senior engineers and leaders
  • Hands-on experience designing, deploying, and operating systems in Azure and GCP
  • Proven experience building or scaling SRE practices including SLOs, SLIs, incident response
  • Strong background in distributed systems, microservices, APIs, and cloud-native architectures
  • Experience leading platform modernization or reliability transformation initiatives

Market check

Salary context

This $144,200–$288,400 range sits above 72% of similar postings on FindRole.

Peer median band

$138,000$222,651

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$148,750$230,062

Middle half of comparable postings.

Based on 239 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About CVS Health

CVS Health is a leading American healthcare company operating retail pharmacies, pharmacy benefit management services, and a health insurance segment through Aetna, one of the nation''s largest health insurers. Industry: Healthcare & Pharmacy

CVS Health currently has 89 open roles on FindRole.

Listed pay typically runs $118,450–$260,590 across 86 roles with salary data.

Most-posted roles

View all roles at CVS Health

More like this

Similar roles

Site Reliability Engineer Lead - Senior Vice President

Citi

Remote (388 Greenwich Street - Tower, US) 50 days ago $176,720$265,080
Kubernetes OpenShift Prometheus Grafana Terraform Ansible Helm Python Java Go AWS Google Cloud Azure CI/CD Disaster Recovery Infrastructure as Code Observability SLOs SLIs Error Budgets Chaos Engineering
Remote

Director, Site Reliability Engineering

McDonald’s Corporation

Chicago, Illinois, US 29 days ago $178,121$222,651
AWS Azure GCP Site Reliability Engineering Agile Methodologies CI/CD Vendor Management Cloud Infrastructure PaaS IaaS Data Analytics Financial Forecasting Chargeback Management Global Vendor Relationships High-Performance Team Building

Senior Manager, Site Reliability Engineering

Oracle

Reston, Virginia, US 21 days ago
Kubernetes Docker CI/CD AWS Terraform Python PostgreSQL Prometheus Grafana Ansible Git Jenkins Linux DevOps Nginx SSL/TLS RESTful APIs JSON YAML Scalability

Principal Site Reliability Engineer

The Walt Disney Company

Remote (Usa - Fl - Disney'S Hollywood Studios - Feature Animation Building, US) 49 days ago
AWS Azure GCP Terraform CloudFormation Ansible Chef CI/CD Docker Kubernetes Prometheus Grafana Python Linux Windows AI LLM PCI DevOps SRE SLI SLO SLA
Remote

Principal Site Reliability Engineer

The Walt Disney Company

Remote (Usa - Fl - Disney'S Hollywood Studios - Feature Animation Building, US) 42 days ago
Akamai Kona Site Defender WAF Bot Manager DevOps CI/CD Python Go Docker Terraform AWS Azure Google Cloud PostgreSQL MongoDB Redis Prometheus Grafana Kubernetes Ansible Jenkins GitLab GitHub
Remote