Site Reliability Engineer, Asset & Wealth Management
Quick summary
- Work type
- On-site
- Location
- Richardson, TX
- Posted
- today
- Nearby
- 99+ roles within 25 mi
Employer
About Goldman Sachs
Goldman Sachs is a leading global investment banking, securities, and investment management firm providing financial services to corporations, financial institutions, governments, and individuals.
Goldman Sachs currently has 187 open roles on FindRole.
Listed pay typically runs $130,000–$250,000 across 60 roles with salary data.
Most-posted roles
- Asset & Wealth Management - Software Engineer - Vice President - Dallas 3
- AMD Public-New York-Vice President-Software Engineering 2
- Internal Audit, Technology Auditor-Investment Banking, Associate 2
- Senior Software Engineer, Global Banking & Markets, Front Office Technology 2
- AI Engineering, Vice President (New York, New Jersey, Toronto) 1
At a glance
TL;DR · Site Reliability Engineer, Asset & Wealth Management
As a Vice President in Site Reliability Engineering at Goldman Sachs, you will lead the strategic direction for ensuring the availability, scalability, and performance of critical platform services. Your role involves architecting highly available systems, developing advanced automation tools, managing complex incidents, and conducting post-mortem analyses to enhance system resilience. You will collaborate with development teams on capacity planning and observability strategies, providing technical vision and mentorship while evaluating cutting-edge technologies for integration. The position requires extensive experience in SRE, proficiency in languages like Java, Python, or Go, expertise in cloud platforms (AWS, GCP), containerization tools (Docker, Kubernetes), and IaC solutions (Terraform). You will work on large-scale distributed systems, ensuring reliability across Goldman Sachs’ global operations.
Skills
What you'll do
- Drive strategic reliability and performance for mission-critical applications and services.
- Lead the design and implementation of resilient infrastructure and application architectures.
- Develop advanced automation solutions to optimize operational workflows across the enterprise.
- Conduct root cause analysis and implement preventative measures for system stability.
- Embed reliability into application design from inception, leading comprehensive capacity planning.
- Define and implement monitoring strategies to provide deep insights into system performance.
What we're looking for
- Minimum 6+ years of hands-on experience in Site Reliability Engineering at an enterprise level.
- Expertise in cloud platforms (AWS, GCP), containerization, orchestration technologies (Docker, Kubernetes).
- Mastery of Infrastructure as Code and configuration management tools (Terraform, Puppet, Ansible).
- Advanced proficiency in monitoring, alerting, logging, and tracing solutions (Prometheus, Grafana, ELK stack).
- Strong foundation in databases, distributed systems, and CI/CD practices.
- Exceptional problem-solving abilities with a track record of resolving complex technical challenges.
- Advanced degree in Computer Science or related technical field involving coding/systems engineering.