About Goldman Sachs

Goldman Sachs is a leading global investment banking, securities, and investment management firm providing financial services to corporations, financial institutions, governments, and individuals.

Goldman Sachs currently has 187 open roles on FindRole.

Listed pay typically runs $130,000–$250,000 across 60 roles with salary data.

Most-posted roles

View all roles at Goldman Sachs

At a glance

TL;DR · Vice President, Engineering - SRE Platforms

Role Posting Log in to save

As a Vice President in the SRE Platforms team at Goldman Sachs in Dallas, you will lead the strategic direction for ensuring the availability, scalability, and performance of critical platform services. Your role involves architecting highly available systems, developing advanced automation tools, managing complex incidents, and conducting post-mortem analyses to enhance system resilience. You will collaborate with development teams on capacity planning and observability strategies using technologies like Prometheus, Grafana, and Kubernetes. With a strong background in Java, Python, or Go, you should be proficient in cloud platforms (AWS, GCP), containerization, and CI/CD practices. Additionally, experience with distributed databases, messaging systems like Kafka, and Prompt Engineering is preferred. This role demands exceptional technical vision, mentorship skills, and the ability to influence across global teams and executive stakeholders.

Skills

Python Java Go AWS GCP Docker Kubernetes Terraform Puppet Chef Ansible Prometheus Grafana ELK_stack Datadog PagerDuty Jenkins GitLab Maven CI/CD Linux Networking Distributed_systems Elastic_Search Big_Query Kafka

What you'll do

Drive strategic direction for availability, scalability, and performance of critical applications.
Lead design and implementation of resilient infrastructure and application architectures.
Develop advanced automation solutions to optimize operational workflows across the enterprise.
Conduct root cause analysis for systemic issues and implement preventative measures.
Embed reliability into application design from inception and lead capacity planning initiatives.
Define monitoring strategies with multi-user query capabilities for deep insights.
Provide technical vision, conduct code reviews, and mentor senior engineers.

What we're looking for

Minimum 6+ years of hands-on experience in Site Reliability Engineering.
Expertise in cloud platforms (AWS, GCP), containerization, and orchestration technologies.
Mastery of Infrastructure as Code tools and configuration management systems.
Profound understanding of Linux internals, networking, distributed systems, and performance tuning.
Advanced proficiency in monitoring, alerting, logging, and tracing solutions.
Strong foundation in databases, CI/CD practices, and complex problem-solving skills.