Site Reliability Engineer Lead

Goldman Sachs

Quick summary

Work type: On-site
Location: Dallas, TX
Posted: 1 day ago
Nearby: 99+ roles within 25 mi

Market check

Salary context

How this pay compares to similar roles

Similar $181k

$134k most similar roles pay here $228k

This listing doesn't post a salary. Most similar roles pay $148,830–$212,750.

Based on 240 similar postings.

Employer

About Goldman Sachs

Goldman Sachs is a leading global investment banking, securities, and investment management firm providing financial services to corporations, financial institutions, governments, and individuals.

Goldman Sachs currently has 187 open roles on FindRole.

Listed pay typically runs $130,000–$250,000 across 60 roles with salary data.

Most-posted roles

View all roles at Goldman Sachs

At a glance

TL;DR · Site Reliability Engineer Lead

Role Posting Log in to save

The Endpoint Compute SRE Lead role within the Workplace Engineering organization focuses on ensuring reliability engineering and operational excellence across endpoint compute platforms and foundational services. This senior position involves defining service-level objectives, observability strategies, and failure models to maintain high availability at enterprise scale. Key responsibilities include establishing observability standards for various services, setting up error budget frameworks, leading incident response efforts, and driving automation to reduce manual remediation tasks. The ideal candidate has extensive experience in SRE or platform operations, with a strong background in operating endpoint compute platforms and core supporting services. They should possess deep knowledge of device lifecycle management, identity and access dependencies, and profile orchestration, along with the ability to influence architecture using data-driven insights. This role demands collaboration with multiple teams to ensure robust reliability metrics and clear communication of operational priorities to leadership.

Skills

SiteReliabilityEngineering Observability SLOs SLIs ErrorBudgets IncidentManagement Automation CloudServices Terraform AWS Kubernetes PostgreSQL Python Go CI/CD Grafana Prometheus GitOps

What you'll do

Own end-to-end reliability of endpoint compute platforms and supporting services.
Define observability standards for enrollment success rates, access health, policy delivery latency, and application availability.
Establish SLOs and SLIs for key endpoint services to guide operational priorities.
Lead incident response and drive post-incident reviews focused on systemic corrections.
Drive automation to reduce manual remediation and repeat incidents in endpoint services.
Partner with Technology Risk and Security teams to support operational resilience assessments.

What we're looking for

8+ years of experience in SRE, platform operations, or workplace infrastructure roles.
Proven ability to define and implement observability frameworks, SLOs/SLIs, and incident management models.
Strong systems thinking across endpoint lifecycle, access, and service dependencies.
Experience operating endpoint compute platforms at enterprise scale.
Excellent documentation and communication skills for reliability posture.
Preferred: Deep understanding of device lifecycle, identity/access dependencies, and profile/policy orchestration.
Preferred: Experience in regulated or high-assurance environments.

Similar roles

Vice President, Engineering - SRE Platforms

Goldman Sachs

Dallas, TX 1 day ago

Python Java Go AWS GCP Docker Kubernetes Terraform Puppet Chef Ansible Prometheus Grafana ELK_stack Datadog PagerDuty Jenkins GitLab Maven CI/CD Linux Networking Distributed_systems Elastic_Search Big_Query Kafka

Save

Principal Site Reliability Engineering Manager

Microsoft

73 days ago $142,800–$274,800

Azure Kubernetes Docker CI/CD Prometheus Grafana Python Go PostgreSQL Terraform AWS GitOps SLOs SLIs Observability MetricstoLogsTracing BlamelessPostIncidentReviews SelfHealingSystems SafeRollouts AutomatedRemediation

Save

Lead Director, Site Reliability Engineering, Client Experience

CVS Health

Remote 32 days ago $144,200–$288,400

Azure GCP Kubernetes CI/CD SLOs SLIs Terraform Docker Prometheus Grafana PostgreSQL Python Go AWS OpenShift AI‑Ops observability microservices APIs

Remote

Save

Compliance Engineering, Site Reliability Engineer SRE, Associate

Goldman Sachs

Dallas, TX 1 day ago

Python Java Perl Prometheus Grafana ELK OpenTelemetry AWS Azure GCP CI/CD Distributed Tracing Logging MetricstoOLS Relational Databases Hadoop Big Data Technologies SRE Error Budgeting

Save

Senior Platform Engineer, Site Reliability Engineer (SRE)

AT&T

Remote (Norway) 40 days ago

Kubernetes AWS Terraform GitLab Jenkins Ansible Python Grafana Zabbix OpenTelemetry HAProxy MariaDB Kafka OpenShift CI/CD Prometheus Shell scripting Linux administration

Remote

Save

Site Reliability Engineer

Comcast

Downingtown, PA +3 13 days ago $114,997–$180,337

AWS Kubernetes Terraform Docker CI/CD Prometheus Grafana Git Concourse GoCD Bash Python Ansible ECS ECR

Save