Principal Site Reliability Engineer

Upstart

Remote

Quick summary

Work type: Remote
Location: Canada
Salary: $195,300–$270,400 / yr
Posted: 111 days ago
Closes: Sep 1, 2026

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $181k

This role $233k

$118k most similar roles pay here $287k

This role pays more than 80% of similar roles. Most pay $142,175–$220,450 — the shaded band above. At the midpoint, this role pays about $233k versus about $181k for comparable roles.

Based on 239 similar postings.

Employer

About Upstart

Upstart is an AI lending platform that partners with banks and credit unions to expand access to affordable credit using non-traditional variables.

Upstart currently has 40 open roles on FindRole.

Listed pay typically runs $177,200–$245,400 across 40 roles with salary data.

Most-posted roles

View all roles at Upstart

At a glance

TL;DR · Principal Site Reliability Engineer

Apply Now Log in to save

As a Principal Site Reliability Engineer on Upstart’s SRE team, you will lead the adoption of best practices and mentor engineers across various disciplines to enhance operational excellence. Your daily responsibilities include defining long-term reliability strategies, championing distributed tracing and real user monitoring, and building self-healing systems to minimize downtime. You will collaborate closely with cross-functional teams such as Product Engineering and DevOps to drive enterprise-wide improvements in incident response processes and engineering velocity. The ideal candidate has a balanced background in Software Engineering and SRE, proficiency in Python, Go, and JavaScript/TypeScript, and expertise with observability tools like Datadog and Prometheus. Additionally, experience with Infrastructure as Code (Terraform, CDK) and hands-on work with large-scale or ML-related incidents is preferred, along with a strong track record of influencing technical and operational roadmaps through data-driven insights.

Skills

Python Go JavaScript TypeScript Terraform Datadog Prometheus RUM LLM GenAI CI/CD Kubernetes Docker AWS GCP Service Mesh Infrastructure as Code Self-healing systems On-call management Program management

What you'll do

Lead the definition and adoption of SRE principles across engineering teams.
Partner with leadership to shape long-term reliability and observability strategies.
Build self-healing systems to minimize manual intervention and reduce downtime.
Drive improvements in incident response processes, including for ML systems.
Influence technical roadmaps through data-driven insights and hands-on contributions.
Own and deliver cross-functional initiatives from concept to execution.

What we're looking for

Combined experience in Software Engineering and Site Reliability Engineering.
Proven track record as an SRE thought leader and evangelist.
Strong communication and mentoring skills to influence engineers across disciplines.
Proficiency in Python, Go, JavaScript/TypeScript, and Infrastructure as Code tools.
Expertise with observability, distributed tracing, RUM, LCP, and performance monitoring tools.
Experience with on-call and incident management for large-scale or ML-related incidents.
Hands-on experience using LLM/GenAI to improve SRE efficiency and processes.

Similar roles

Staff Site Reliability Engineer

CME Group

Chicago, IL 32 days ago $132,100–$220,100

GCP Kubernetes Python Terraform ArgoCD Go Node.js CI/CD Distributed Systems Generative AI Agile PostgreSQL GitOps CICD SLI SLO Error Budgets

Hybrid

Save

Principal Site Reliability Engineer

The Walt Disney Company

Remote (Usa - Fl - Disney'S Hollywood Studios - Feature Animation Building, US) 55 days ago

AWS Azure GCP Terraform CloudFormation Ansible Chef CI/CD Docker Kubernetes Prometheus Grafana Python Linux Windows AI LLM PCI DevOps SRE SLI SLO SLA

Remote

Save

Principal Site Reliability Engineer

The Walt Disney Company

Remote (Bay Lake, FL) 48 days ago

Akamai Kona Site Defender WAF Bot Manager DevOps CI/CD Python Go Docker Terraform AWS Azure Google Cloud PostgreSQL MongoDB Redis Prometheus Grafana Kubernetes Ansible Jenkins GitLab GitHub

Remote

Save

Sr Principal Site Reliability Engineer

The Walt Disney Company

Remote (Usa - Ca - Market St, US) 58 days ago $250,500–$335,900

Kubernetes AWS CI/CD Docker Prometheus Grafana Python PostgreSQL Terraform Ansible GitOps CDN integration media streaming technologies content delivery strategies

Remote

Save

Site Reliability Engineer

The Walt Disney Company

Remote (Bay Lake, FL) 56 days ago

Akamai Splunk AppDynamics GitHub Ansible Chef AWS Azure GCP CI/CD RESTful APIs Microservices Cloud computing Python JavaScript Kubernetes Terraform Prometheus Grafana

Remote

Save

Site Reliability Engineer

Equifax

St. Louis, Missouri 50 days ago

AWS GCP Terraform Jenkins Python Bash Docker Kubernetes CI/CD Prometheus PostgreSQL Linux Windows Ansible Chef

Hybrid

Save