Staff Site Reliability Engineer, Observability

Okta Inc

Quick summary

Work type: On-site
Location: Bellevue, WANew York, NYSan Francisco, CAWashington, DC
Salary: $194,000–$267,000 / yr
Posted: 127 days ago
Nearby: 99+ roles within 25 mi

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $184k

This role $230k

$131k most similar roles pay here $282k

This role pays more than 83% of similar roles. Most pay $151,826–$216,000 — the shaded band above. At the midpoint, this role pays about $230k versus about $184k for comparable roles.

Based on 240 similar postings.

Employer

About Okta Inc

Okta, Inc. is an American identity and access management company based in San Francisco. It provides cloud software that helps companies manage and secure user authentication into applications, and for developers to build identity controls into applications, websites, web services, and devices.[

Okta Inc currently has 155 open roles on FindRole.

Listed pay typically runs $188,000–$253,000 across 153 roles with salary data.

Most-posted roles

View all roles at Okta Inc

At a glance

TL;DR · Staff Site Reliability Engineer, Observability

Role Posting Log in to save

As a Staff Observability Site Reliability Engineer with expertise in Splunk, you will join our dedicated team to enhance and scale our observability platform, ensuring it meets the needs of both SRE teams and business partners. You will leverage Terraform and coding skills in Go, Python, or Ruby to automate infrastructure as code for deploying agents and collectors across distributed systems. Key responsibilities include optimizing Splunk Cloud at a large scale, creating actionable dashboards, participating in on-call rotations, and leading post-incident reviews. Essential qualifications are 5+ years of experience in SRE roles with a focus on high availability, strong SPL coding skills, and deep knowledge of Linux internals and container orchestration. Bonus points for familiarity with OpenTelemetry or implementing Splunk charge-back apps, as well as managing observability tools within AWS or GCP.

Skills

Splunk Terraform Go Python Ruby SPL Kubernetes AWS GCP Linux TCP/IP DNS OpenTelemetry Docker CI/CD

What you'll do

Design, build, and maintain scalable observability infrastructure using Terraform.
Optimize Splunk services for high reliability and low latency in log data processing.
Lead post-incident reviews to drive systemic improvements and "observability-driven development."
Automate deployment and scaling of observability agents and collectors across distributed systems.
Create intuitive, actionable Splunk dashboards correlating data from multiple sources.

What we're looking for

Minimum 5+ years of experience scaling and managing Splunk Cloud at scale (1000+ SVCs).
Expertise in creating intuitive, actionable Splunk dashboards correlating data from multiple sources.
Strong coding skills in SPL and Go for building internal tools and automating workflows.
Deep understanding of Linux internals, networking, and container orchestration (Kubernetes/EKS).
Experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.
Data-driven approach to debugging complex, cross-service performance bottlenecks.

Similar roles

Staff Site Reliability Engineer, Observability GCP

Okta Inc

Bellevue, WA +4 16 days ago $194,000–$267,000

Google Cloud Terraform Go Python Ruby Splunk Grafana Kubernetes Linux TCP/IP DNS Load Balancing OpenTelemetry Grafana Loki AWS

Save

Site Reliability Engineer

Autodesk

Atlanta, GA 43 days ago $117,000–$209,330

AWS Kubernetes Terraform Python Linux Bash Docker CI/CD Jenkins Git CloudWatch Splunk Dynatrace New Relic Grafana PostgreSQL MySQL MSSQL EC2 ECS EKS Lambda ELB S3 IAM VPC DynamoDB RDS

Save

Splunk Content Developer

Leidos

Arlington, VA 60 days ago $131,300–$237,350

Splunk Linux Windows Python PowerShell Bash SQL Docker CI/CD Kubernetes AWS Azure Grafana Prometheus Terraform FISMA NIST NSA CIM DB Connect Modular Inputs TCP/UDP Indexer Clustering Search Head Clustering

Save

Principal Site Reliability Engineer, Observability and Telemetry Platform

Nvidia

Remote (Santa Clara, CA) 41 days ago $248,000–$396,750

Kubernetes OpenStack Docker Python Go Prometheus Grafana OpenTelemetry Linux Networking Containers CI/CD Terraform AWS Azure PostgreSQL MySQL Ansible Saltstack Bash

Remote

Save

Principal Site Reliability Engineer, Infrastructure Observability

T. Rowe Price

Owings Mills, MD +5 106 days ago $159,000–$272,000

AWS Python Terraform Prometheus Grafana Ansible New Relic SolarWinds DPA Elastic Stack CI/CD SQL Server PostgreSQL MySQL DevOps SRE Chaos Engineering Kubernetes Docker Git Fluentd ELK Stack .Net Core Java Go Node.js Infrastructure as Code Service Level Objectives Error Budgets

Hybrid

Save

Staff Site Reliability Engineer

TransUnion

Chicago, IL +4 70 days ago $112,500–$187,500

GCP Kubernetes CI/CD Prometheus Grafana PostgreSQL MySQL Bigtable Firestore Redis Terraform Python Bash Go VPC DNS Load Balancing Firewall Rules VPN Private Service Connect Linux Networking Database Architecture Infrastructure-as-Code Scripting Automation

Hybrid

Save