Principal Site Reliability Engineer

Upstart

Remote

Quick summary

Work type
Remote
Location
Canada
Salary
$195,300–$270,400 / yr
Posted
111 days ago
Closes
Sep 1, 2026

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $181k
This role $233k
$118k most similar roles pay here $287k

This role pays more than 80% of similar roles. Most pay $142,175–$220,450 — the shaded band above. At the midpoint, this role pays about $233k versus about $181k for comparable roles.

Based on 239 similar postings.

Employer

About Upstart

Upstart is an AI lending platform that partners with banks and credit unions to expand access to affordable credit using non-traditional variables.

Upstart currently has 40 open roles on FindRole.

Listed pay typically runs $177,200–$245,400 across 40 roles with salary data.

Most-posted roles

View all roles at Upstart

At a glance

TL;DR · Principal Site Reliability Engineer

As a Principal Site Reliability Engineer on Upstart’s SRE team, you will lead the adoption of best practices and mentor engineers across various disciplines to enhance operational excellence. Your daily responsibilities include defining long-term reliability strategies, championing distributed tracing and real user monitoring, and building self-healing systems to minimize downtime. You will collaborate closely with cross-functional teams such as Product Engineering and DevOps to drive enterprise-wide improvements in incident response processes and engineering velocity. The ideal candidate has a balanced background in Software Engineering and SRE, proficiency in Python, Go, and JavaScript/TypeScript, and expertise with observability tools like Datadog and Prometheus. Additionally, experience with Infrastructure as Code (Terraform, CDK) and hands-on work with large-scale or ML-related incidents is preferred, along with a strong track record of influencing technical and operational roadmaps through data-driven insights.

What you'll do

  • Lead the definition and adoption of SRE principles across engineering teams.
  • Partner with leadership to shape long-term reliability and observability strategies.
  • Build self-healing systems to minimize manual intervention and reduce downtime.
  • Drive improvements in incident response processes, including for ML systems.
  • Influence technical roadmaps through data-driven insights and hands-on contributions.
  • Own and deliver cross-functional initiatives from concept to execution.

What we're looking for

  • Combined experience in Software Engineering and Site Reliability Engineering.
  • Proven track record as an SRE thought leader and evangelist.
  • Strong communication and mentoring skills to influence engineers across disciplines.
  • Proficiency in Python, Go, JavaScript/TypeScript, and Infrastructure as Code tools.
  • Expertise with observability, distributed tracing, RUM, LCP, and performance monitoring tools.
  • Experience with on-call and incident management for large-scale or ML-related incidents.
  • Hands-on experience using LLM/GenAI to improve SRE efficiency and processes.

More like this

Similar roles

Staff Site Reliability Engineer

CME Group

Chicago, IL 32 days ago $132,100$220,100
GCP Kubernetes Python Terraform ArgoCD Go Node.js CI/CD Distributed Systems Generative AI Agile PostgreSQL GitOps CICD SLI SLO Error Budgets
Hybrid

Principal Site Reliability Engineer

The Walt Disney Company

Remote (Usa - Fl - Disney'S Hollywood Studios - Feature Animation Building, US) 55 days ago
AWS Azure GCP Terraform CloudFormation Ansible Chef CI/CD Docker Kubernetes Prometheus Grafana Python Linux Windows AI LLM PCI DevOps SRE SLI SLO SLA
Remote

Principal Site Reliability Engineer

The Walt Disney Company

Remote (Bay Lake, FL) 48 days ago
Akamai Kona Site Defender WAF Bot Manager DevOps CI/CD Python Go Docker Terraform AWS Azure Google Cloud PostgreSQL MongoDB Redis Prometheus Grafana Kubernetes Ansible Jenkins GitLab GitHub
Remote

Sr Principal Site Reliability Engineer

The Walt Disney Company

Remote (Usa - Ca - Market St, US) 58 days ago $250,500$335,900
Kubernetes AWS CI/CD Docker Prometheus Grafana Python PostgreSQL Terraform Ansible GitOps CDN integration media streaming technologies content delivery strategies
Remote

Site Reliability Engineer

The Walt Disney Company

Remote (Bay Lake, FL) 56 days ago
Akamai Splunk AppDynamics GitHub Ansible Chef AWS Azure GCP CI/CD RESTful APIs Microservices Cloud computing Python JavaScript Kubernetes Terraform Prometheus Grafana
Remote

Site Reliability Engineer

Equifax

St. Louis, Missouri 50 days ago
AWS GCP Terraform Jenkins Python Bash Docker Kubernetes CI/CD Prometheus PostgreSQL Linux Windows Ansible Chef
Hybrid