Senior Lead Site Reliability Engineer - Manager-AI/ML and Data Platforms

JPMorgan Chase

Quick summary

Work type
On-site
Location
Jersey City, NJDallas, TX
Salary
$171,000–$260,000 / yr
Posted
today

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $219k
This role $216k
$160k most similar roles pay here $271k

This role pays more than 51% of similar roles. Most pay $192,050–$246,150 — the shaded band above. At the midpoint, this role pays about $216k versus about $219k for comparable roles.

Based on 240 similar postings.

Employer

About JPMorgan Chase

JPMorgan Chase & Co. is a global financial services firm and one of the largest banks in the world, offering investment banking, commercial banking, asset management, and consumer financial services.

JPMorgan Chase currently has 436 open roles on FindRole.

Listed pay typically runs $152,000–$215,000 across 230 roles with salary data.

Most-posted roles

View all roles at JPMorgan Chase

At a glance

TL;DR · Senior Lead Site Reliability Engineer - Manager-AI/ML and Data Platforms

Join JPMorgan Chase’s Chief Data & Analytics Office (CDAO) AI/ML & Data Platforms team as a Senior Lead Site Reliability Engineer, where you will collaborate with stakeholders to establish non-functional requirements and availability targets for large-scale data platforms. Your daily tasks include designing robust observability solutions, mentoring technologists on reliability practices, and leveraging enterprise-authorized AI capabilities to enhance operational decision-making while ensuring compliance with security standards. You will drive the adoption of AI-assisted workflows across the software development lifecycle, focusing on traceability and auditability in application and data platform environments. The role demands expertise in tools like Grafana, Prometheus, and Splunk, along with proficiency in AWS platforms, managed data services such as Databricks, and containerization technologies like Docker and Kubernetes.

What you'll do

  • Defines and embeds non-functional requirements and availability targets for large-scale data platforms.
  • Designs and implements observability and reliability solutions for complex systems without incurring technical debt.
  • Leads the adoption of AI-assisted workflows across software development lifecycle practices, ensuring security and auditability.
  • Drives debugging and evolution of critical components by understanding application and infrastructure interdependencies.
  • Provides guidance on scalable data platform infrastructure and engineering best practices to support firm growth.
  • Validates outputs from enterprise-authorized AI capabilities used in reliability engineering workflows.
  • Sets team practices for safe AI usage in operations, maintaining compliance with risk controls.

What we're looking for

  • Formal training or certification in site reliability engineering and 5+ years of applied experience
  • Advanced understanding of SLI/SLO/SLA, error budgets, and observability tools like Grafana, Prometheus, and Splunk
  • Experience using enterprise-authorized AI capabilities to enhance reliability workflows with validation habits
  • Ability to set team practices for safe AI usage in operations while ensuring compliance with risk controls
  • Strong knowledge of distributed systems, resiliency, testing, operational stability, and disaster recovery techniques

More like this

Similar roles

Senior AI Site Reliability Engineer

Oracle

30 days ago
AWS Azure OCI Kubernetes Terraform Python Java Go Docker Prometheus Grafana CI/CD Vertica Snowflake Tableau Power BI Oracle Analytics LangChain AutoGPT Jenkins

Site Reliability Engineer III

JPMorgan Chase

Jersey City, NJ +1 14 days ago $133,000$185,000
AWS Kubernetes Python Databricks Snowflake CI/CD Grafana Prometheus PySpark Java Spring Boot .Net AI/ML SLO SLI SLA Error Budgets Observability Terraform Dynatrace Datadog Splunk

Lead AI Engineer, Data Solutions

Salesforce

Remote (San Francisco, CA) +3 29 days ago $172,500$260,100
Python ML models LLMs APIs Spark Airflow Dagster Snowflake BigQuery A/B testing CI/CD Prometheus Grafana Kubernetes AWS Terraform
Remote