Principal Site Reliability Engineer, Infrastructure Observability

T. Rowe Price

Hybrid Actively hiring
Owings Mills, MD · Colorado · Washington Posted 76 days ago $159,000$272,000 / year

At a glance

AI generated

TL;DR

As a Principal Site Reliability Engineer at T. Rowe Price, you will lead a team focused on enhancing observability and reliability for the company’s cloud and on-prem solutions. Your daily tasks include designing technology solutions to prevent service disruptions, fostering a blameless post-mortem culture, and driving SRE methodologies across operations teams. You will leverage automation and best-of-breed tools like New Relic, Prometheus, and Terraform to ensure system stability and scalability. The role requires extensive experience in cloud environments (AWS preferred), DevOps practices, CI/CD toolchains, and incident response management. Ideal candidates possess deep expertise in programming languages such as Python or Java, database development skills, and the ability to define and track Service Level Objectives. This position demands strategic thinking, independent problem-solving, and strong communication skills to engage with diverse stakeholders across a complex, distributed technology environment.

Skills

AWS Python PostgreSQL CI/CD Prometheus Grafana Terraform Ansible New Relic SolarWinds DPA Elastic Stack Splunk DevOps SRE Chaos Engineering SQL Server Node.js .Net Core Java Go

What you'll do

  • Designs technology solutions to prevent or minimize service disruptions in cloud environments.
  • Leads internal change initiatives to adopt SRE methodologies across operations teams.
  • Analyzes incidents for high-level trends and drives strategic growth within Global Technology.
  • Implements chaos engineering models at scale to improve system resilience and reliability.
  • Standardizes dashboards and tools for observability, APM, and infrastructure monitoring.
  • Defines Service Level Objectives (SLOs) and manages error budgets to track system availability.

What we're looking for

  • 10+ years of experience designing and operating cloud infrastructure with senior-level impact.
  • Extensive experience building and supporting solutions in Amazon AWS and running DevOps/SRE functions.
  • Demonstrable experience implementing new technology, tools, and platforms, including automation for incident prevention/remediation.
  • Proficiency with multiple programming languages (Python, Java, GO, Node.js, .Net Core) and database development (SQL Server, PostgreSQL, MySQL).
  • Knowledge of observability and cloud management tools (New Relic, SolarWinds DPA, Elastic Stack, Prometheus, Grafana, Splunk, Ansible, Terraform, Vault, Vagrant).

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $175k
This role $216k
$113k most similar roles pay here $289k

This role pays more than 76% of similar roles. Most pay $138,446–$212,125 — the shaded band above. At the midpoint, this role pays about $216k versus about $175k for comparable roles.

Based on 239 similar postings.

Employer

About T. Rowe Price

T. Rowe Price is an asset management firm focused on delivering global investment management excellence and retirement services

T. Rowe Price currently has 20 open roles on FindRole.

Listed pay typically runs $133,000–$226,500 across 20 roles with salary data.

Most-posted roles

View all roles at T. Rowe Price

More like this

Similar roles

Site Reliability Engineer

Autodesk

Atlanta, GA 13 days ago $117,000$209,330
AWS Kubernetes Terraform Python Linux Bash Docker CI/CD Jenkins Git CloudWatch Splunk Dynatrace New Relic Grafana PostgreSQL MySQL MSSQL EC2 ECS EKS Lambda ELB S3 IAM VPC DynamoDB RDS

Site Reliability Engineer

Booz Allen Hamilton

Herndon, VA 37 days ago $86,800$198,000
Java Spring Boot CI/CD Agile Bitbucket GitLab Kubernetes NiFi Kafka MongoDB Elasticsearch ArgoCD

Principal Site Reliability Engineer

The Walt Disney Company

Remote (Usa - Fl - Disney'S Hollywood Studios - Feature Animation Building, US) 54 days ago
AWS Azure GCP Terraform CloudFormation Ansible Chef CI/CD Docker Kubernetes Prometheus Grafana Python Linux Windows AI LLM PCI DevOps SRE SLI SLO SLA
Remote

Principal Site Reliability Engineer

The Walt Disney Company

Remote (Bay Lake, FL) 47 days ago
Akamai Kona Site Defender WAF Bot Manager DevOps CI/CD Python Go Docker Terraform AWS Azure Google Cloud PostgreSQL MongoDB Redis Prometheus Grafana Kubernetes Ansible Jenkins GitLab GitHub
Remote

Sr Principal Site Reliability Engineer

The Walt Disney Company

Remote (Usa - Ca - Market St, US) 57 days ago $250,500$335,900
Kubernetes AWS CI/CD Docker Prometheus Grafana Python PostgreSQL Terraform Ansible GitOps CDN integration media streaming technologies content delivery strategies
Remote