Site Reliability Engineer Lead - Senior Vice President

Citi

Remote

Quick summary

Work type
Remote
Location
Remote
Salary
$176,720–$265,080 / yr
Posted
5 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $177k
This role $221k
$125k most similar roles pay here $280k

This role pays more than 83% of similar roles. Most pay $149,731–$205,000 — the shaded band above. At the midpoint, this role pays about $221k versus about $177k for comparable roles.

Based on 240 similar postings.

Employer

About Citi

Citi is one of the world’s most trusted financial institutions, proudly serving millions of customers across the United States.

Citi currently has 391 open roles on FindRole.

Listed pay typically runs $125,760–$188,640 across 361 roles with salary data.

Most-posted roles

View all roles at Citi

At a glance

TL;DR · Site Reliability Engineer Lead - Senior Vice President

The Site Reliability Engineer (SRE) is a strategic professional joining the Production Management team to enhance the reliability and efficiency of our Applications and Services. This role involves driving end-to-end observability and resiliency strategies, collaborating across departments to ensure services are stable and scalable. Key responsibilities include fostering a culture of transparency and accountability, ensuring compliance with regulatory requirements, and overseeing advanced recovery testing practices. The SRE will work closely with development teams to leverage cloud-native services and automation tools for enhanced application reliability. Essential skills encompass deep understanding of SRE concepts, proficiency in OpenShift/Kubernetes, and hands-on experience with modern observability tools like Prometheus and Grafana. Desired skills include expertise in major public clouds, Agile frameworks, and coding languages such as Java or Python. This role demands 10+ years of relevant experience in production management or software development, along with strong analytical and communication abilities to influence strategic decisions within a complex, large-scale environment.

What you'll do

  • Ensure critical business applications meet stringent operational resilience requirements.
  • Oversee advanced recovery testing and drive adoption of automation for minimal recovery time.
  • Develop and scale observability solutions using modern tools across the organization.
  • Instrument applications to provide deep insights into system health and performance.
  • Collaborate with development teams to enhance application reliability through cloud native services.

What we're looking for

  • Over 10 years of professional experience in production management or software development with a focus on Site Reliability Engineering.
  • Deep understanding and practical application of SRE concepts including SLOs, SLIs, error budgets, and toil reduction.
  • Expertise in deploying, managing, and troubleshooting applications on OpenShift/Kubernetes and proficiency with Infrastructure as Code tools like Ansible and Terraform.
  • Hands-on experience with modern observability tools such as Prometheus, Grafana, Loki, Mimir, Tempo, and AppDynamics for metrics, logging, and tracing.
  • Demonstrable experience in disaster recovery planning, resiliency testing, and designing fault-tolerant distributed systems.
  • Strong communication skills to effectively collaborate across multiple business and technical teams and present technical strategies to senior executives.

More like this

Similar roles

Senior Manager, Site Reliability Engineering

Oracle

Reston, Virginia 30 days ago
Kubernetes Docker CI/CD AWS Terraform Python PostgreSQL Prometheus Grafana Ansible Git Jenkins Linux DevOps Nginx SSL/TLS RESTful APIs JSON YAML Scalability

VPII, Head of Site Reliability Engineering

LPL Financial

Charlotte, NC 5 days ago $132,767$221,347
Kubernetes Docker Terraform AWS CI/CD MELT Prometheus Grafana Python PostgreSQL GitLab Ansible Nagios Zabbix Jenkins GitHub Slack Confluence Jira Scrum