Site Reliability Engineer Lead

Goldman Sachs

Quick summary

Work type
On-site
Location
Dallas, TX
Posted
1 day ago

Market check

Salary context

How this pay compares to similar roles

Similar $181k
$134k most similar roles pay here $228k

This listing doesn't post a salary. Most similar roles pay $148,830–$212,750.

Based on 240 similar postings.

Employer

About Goldman Sachs

Goldman Sachs is a leading global investment banking, securities, and investment management firm providing financial services to corporations, financial institutions, governments, and individuals.

Goldman Sachs currently has 187 open roles on FindRole.

Listed pay typically runs $130,000–$250,000 across 60 roles with salary data.

Most-posted roles

View all roles at Goldman Sachs

At a glance

TL;DR · Site Reliability Engineer Lead

The Endpoint Compute SRE Lead role within the Workplace Engineering organization focuses on ensuring reliability engineering and operational excellence across endpoint compute platforms and foundational services. This senior position involves defining service-level objectives, observability strategies, and failure models to maintain high availability at enterprise scale. Key responsibilities include establishing observability standards for various services, setting up error budget frameworks, leading incident response efforts, and driving automation to reduce manual remediation tasks. The ideal candidate has extensive experience in SRE or platform operations, with a strong background in operating endpoint compute platforms and core supporting services. They should possess deep knowledge of device lifecycle management, identity and access dependencies, and profile orchestration, along with the ability to influence architecture using data-driven insights. This role demands collaboration with multiple teams to ensure robust reliability metrics and clear communication of operational priorities to leadership.

What you'll do

  • Own end-to-end reliability of endpoint compute platforms and supporting services.
  • Define observability standards for enrollment success rates, access health, policy delivery latency, and application availability.
  • Establish SLOs and SLIs for key endpoint services to guide operational priorities.
  • Lead incident response and drive post-incident reviews focused on systemic corrections.
  • Drive automation to reduce manual remediation and repeat incidents in endpoint services.
  • Partner with Technology Risk and Security teams to support operational resilience assessments.

What we're looking for

  • 8+ years of experience in SRE, platform operations, or workplace infrastructure roles.
  • Proven ability to define and implement observability frameworks, SLOs/SLIs, and incident management models.
  • Strong systems thinking across endpoint lifecycle, access, and service dependencies.
  • Experience operating endpoint compute platforms at enterprise scale.
  • Excellent documentation and communication skills for reliability posture.
  • Preferred: Deep understanding of device lifecycle, identity/access dependencies, and profile/policy orchestration.
  • Preferred: Experience in regulated or high-assurance environments.

More like this

Similar roles

Vice President, Engineering - SRE Platforms

Goldman Sachs

Dallas, TX 1 day ago
Python Java Go AWS GCP Docker Kubernetes Terraform Puppet Chef Ansible Prometheus Grafana ELK_stack Datadog PagerDuty Jenkins GitLab Maven CI/CD Linux Networking Distributed_systems Elastic_Search Big_Query Kafka

Principal Site Reliability Engineering Manager

Microsoft

73 days ago $142,800$274,800
Azure Kubernetes Docker CI/CD Prometheus Grafana Python Go PostgreSQL Terraform AWS GitOps SLOs SLIs Observability MetricstoLogsTracing BlamelessPostIncidentReviews SelfHealingSystems SafeRollouts AutomatedRemediation

Site Reliability Engineer

Comcast

Downingtown, PA +3 13 days ago $114,997$180,337
AWS Kubernetes Terraform Docker CI/CD Prometheus Grafana Git Concourse GoCD Bash Python Ansible ECS ECR