Staff Site Reliability Engineer, Observability GCP

Okta Inc

Quick summary

Work type: On-site
Location: Bellevue, WAChicago, ILNew York, NYSan Francisco, CAWashington, DC
Salary: $194,000–$267,000 / yr
Posted: 16 days ago
Nearby: 99+ roles within 25 mi

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $185k

This role $230k

$128k most similar roles pay here $282k

This role pays more than 82% of similar roles. Most pay $149,750–$220,062 — the shaded band above. At the midpoint, this role pays about $230k versus about $185k for comparable roles.

Based on 240 similar postings.

Employer

About Okta Inc

Okta, Inc. is an American identity and access management company based in San Francisco. It provides cloud software that helps companies manage and secure user authentication into applications, and for developers to build identity controls into applications, websites, web services, and devices.[

Okta Inc currently has 155 open roles on FindRole.

Listed pay typically runs $188,000–$253,000 across 153 roles with salary data.

Most-posted roles

View all roles at Okta Inc

At a glance

TL;DR · Staff Site Reliability Engineer, Observability GCP

Role Posting Log in to save

As a Site Reliability Engineer specializing in Observability for Google Cloud, you will join our dedicated team to enhance and scale our Observability ecosystem on GCP. Your daily tasks include designing automated infrastructure with Terraform, optimizing data collection and processing for Splunk and Grafana, participating in incident response rotations, and automating observability agent deployments. You must have at least five years of experience managing observability in Google Cloud environments, expertise in creating actionable dashboards using Splunk or Grafana, and a strong background in SRE with a focus on high-availability systems. Proficiency in Python or Go for building internal tools is essential, along with deep knowledge of Linux internals, networking, and Kubernetes/GKE orchestration. Additional skills such as experience with OpenTelemetry and Grafana Loki are highly valued.

Skills

Google Cloud Terraform Go Python Ruby Splunk Grafana Kubernetes Linux TCP/IP DNS Load Balancing OpenTelemetry Grafana Loki AWS

What you'll do

Design, build, and maintain scalable observability infrastructure using Terraform.
Optimize collection, processing, and storage of Observability data in GCP.
Participate in on-call rotations and lead post-incident reviews for continuous improvement.
Automate deployment and scaling of observability agents and collectors to reduce manual effort.
Create intuitive Splunk or Grafana dashboards correlating data from multiple sources.

What we're looking for

Minimum 5+ years of experience scaling and managing observability in Google Cloud Platform.
Expertise in creating intuitive, actionable Splunk or Grafana dashboards correlating data from multiple sources.
At least 3 years of SRE, DevOps, or Systems Engineering experience focusing on high-availability systems.
Strong coding skills in Python or Go for building internal tools and automating workflows.
Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and Kubernetes/GKE orchestration.
Experience with Terraform to design, build, and maintain scalable observability infrastructure.

Similar roles

Staff Site Reliability Engineer, Observability

Okta Inc

Bellevue, WA +3 127 days ago $194,000–$267,000

Splunk Terraform Go Python Ruby SPL Kubernetes AWS GCP Linux TCP/IP DNS OpenTelemetry Docker CI/CD

Save

Site Reliability Engineer

Autodesk

Atlanta, GA 43 days ago $117,000–$209,330

AWS Kubernetes Terraform Python Linux Bash Docker CI/CD Jenkins Git CloudWatch Splunk Dynatrace New Relic Grafana PostgreSQL MySQL MSSQL EC2 ECS EKS Lambda ELB S3 IAM VPC DynamoDB RDS

Save

Principal Site Reliability Engineer, Observability and Telemetry Platform

Nvidia

Remote (Santa Clara, CA) 41 days ago $248,000–$396,750

Kubernetes OpenStack Docker Python Go Prometheus Grafana OpenTelemetry Linux Networking Containers CI/CD Terraform AWS Azure PostgreSQL MySQL Ansible Saltstack Bash

Remote

Save

Principal Site Reliability Engineer, Infrastructure Observability

T. Rowe Price

Owings Mills, MD +5 106 days ago $159,000–$272,000

AWS Python Terraform Prometheus Grafana Ansible New Relic SolarWinds DPA Elastic Stack CI/CD SQL Server PostgreSQL MySQL DevOps SRE Chaos Engineering Kubernetes Docker Git Fluentd ELK Stack .Net Core Java Go Node.js Infrastructure as Code Service Level Objectives Error Budgets

Hybrid

Save

Staff Site Reliability Engineer

TransUnion

Chicago, IL +4 70 days ago $112,500–$187,500

GCP Kubernetes CI/CD Prometheus Grafana PostgreSQL MySQL Bigtable Firestore Redis Terraform Python Bash Go VPC DNS Load Balancing Firewall Rules VPN Private Service Connect Linux Networking Database Architecture Infrastructure-as-Code Scripting Automation

Hybrid

Save

Staff Site Reliability Engineer

CME Group

Chicago, IL 61 days ago $132,100–$220,100

GCP Kubernetes Python Terraform ArgoCD Go Node.js CI/CD Distributed Systems Generative AI Agile Methodologies PostgreSQL GitOps CICD SLI SLO Error Budgets IAM Networking High-Concurrency Architectures

Hybrid

Save