Staff Site Reliability Engineer, Observability

Okta Inc

Quick summary

Work type
On-site
Location
Bellevue, WANew York, NYSan Francisco, CAWashington, DC
Salary
$194,000–$267,000 / yr
Posted
127 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $184k
This role $230k
$131k most similar roles pay here $282k

This role pays more than 83% of similar roles. Most pay $151,826–$216,000 — the shaded band above. At the midpoint, this role pays about $230k versus about $184k for comparable roles.

Based on 240 similar postings.

Employer

About Okta Inc

Okta, Inc. is an American identity and access management company based in San Francisco. It provides cloud software that helps companies manage and secure user authentication into applications, and for developers to build identity controls into applications, websites, web services, and devices.[

Okta Inc currently has 155 open roles on FindRole.

Listed pay typically runs $188,000–$253,000 across 153 roles with salary data.

Most-posted roles

View all roles at Okta Inc

At a glance

TL;DR · Staff Site Reliability Engineer, Observability

As a Staff Observability Site Reliability Engineer with expertise in Splunk, you will join our dedicated team to enhance and scale our observability platform, ensuring it meets the needs of both SRE teams and business partners. You will leverage Terraform and coding skills in Go, Python, or Ruby to automate infrastructure as code for deploying agents and collectors across distributed systems. Key responsibilities include optimizing Splunk Cloud at a large scale, creating actionable dashboards, participating in on-call rotations, and leading post-incident reviews. Essential qualifications are 5+ years of experience in SRE roles with a focus on high availability, strong SPL coding skills, and deep knowledge of Linux internals and container orchestration. Bonus points for familiarity with OpenTelemetry or implementing Splunk charge-back apps, as well as managing observability tools within AWS or GCP.

What you'll do

  • Design, build, and maintain scalable observability infrastructure using Terraform.
  • Optimize Splunk services for high reliability and low latency in log data processing.
  • Lead post-incident reviews to drive systemic improvements and "observability-driven development."
  • Automate deployment and scaling of observability agents and collectors across distributed systems.
  • Create intuitive, actionable Splunk dashboards correlating data from multiple sources.

What we're looking for

  • Minimum 5+ years of experience scaling and managing Splunk Cloud at scale (1000+ SVCs).
  • Expertise in creating intuitive, actionable Splunk dashboards correlating data from multiple sources.
  • Strong coding skills in SPL and Go for building internal tools and automating workflows.
  • Deep understanding of Linux internals, networking, and container orchestration (Kubernetes/EKS).
  • Experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.
  • Data-driven approach to debugging complex, cross-service performance bottlenecks.

More like this

Similar roles

Site Reliability Engineer

Autodesk

Atlanta, GA 43 days ago $117,000$209,330
AWS Kubernetes Terraform Python Linux Bash Docker CI/CD Jenkins Git CloudWatch Splunk Dynatrace New Relic Grafana PostgreSQL MySQL MSSQL EC2 ECS EKS Lambda ELB S3 IAM VPC DynamoDB RDS

Splunk Content Developer

Leidos

Arlington, VA 60 days ago $131,300$237,350
Splunk Linux Windows Python PowerShell Bash SQL Docker CI/CD Kubernetes AWS Azure Grafana Prometheus Terraform FISMA NIST NSA CIM DB Connect Modular Inputs TCP/UDP Indexer Clustering Search Head Clustering

Principal Site Reliability Engineer, Infrastructure Observability

T. Rowe Price

Owings Mills, MD +5 106 days ago $159,000$272,000
AWS Python Terraform Prometheus Grafana Ansible New Relic SolarWinds DPA Elastic Stack CI/CD SQL Server PostgreSQL MySQL DevOps SRE Chaos Engineering Kubernetes Docker Git Fluentd ELK Stack .Net Core Java Go Node.js Infrastructure as Code Service Level Objectives Error Budgets
Hybrid

Staff Site Reliability Engineer

TransUnion

Chicago, IL +4 70 days ago $112,500$187,500
GCP Kubernetes CI/CD Prometheus Grafana PostgreSQL MySQL Bigtable Firestore Redis Terraform Python Bash Go VPC DNS Load Balancing Firewall Rules VPN Private Service Connect Linux Networking Database Architecture Infrastructure-as-Code Scripting Automation
Hybrid