Staff Site Reliability Engineer - Observability | Okta

Okta Inc

Actively hiring Verified listing
Bellevue, WA · New York, NY · San Francisco, CA · Washington, DC Posted 97 days ago $194,000$267,000 / year

At a glance

AI generated

TL;DR

As a Staff Observability Site Reliability Engineer with expertise in Splunk, you will join our dedicated team to enhance and scale our observability platform, ensuring it meets the needs of both SRE teams and business partners. You will leverage Terraform and coding skills in Go, Python, or Ruby to automate infrastructure as code for deploying agents and collectors across distributed systems. Key responsibilities include optimizing Splunk Cloud at a large scale, creating actionable dashboards, participating in on-call rotations, and leading post-incident reviews. Essential qualifications are 5+ years of experience in SRE roles with a focus on high availability, strong SPL coding skills, and deep knowledge of Linux internals and container orchestration. Bonus points for familiarity with OpenTelemetry or implementing Splunk charge-back apps, as well as managing observability tools within AWS or GCP.

Skills

Splunk Terraform Go Python Ruby SPL Kubernetes AWS GCP Linux TCP/IP DNS OpenTelemetry Docker CI/CD

What you'll do

  • Design, build, and maintain scalable observability infrastructure using Terraform.
  • Optimize Splunk services for high reliability and low latency in log data processing.
  • Lead post-incident reviews to drive systemic improvements and "observability-driven development."
  • Automate deployment and scaling of observability agents and collectors across distributed systems.
  • Create intuitive, actionable Splunk dashboards correlating data from multiple sources.

What we're looking for

  • Minimum 5+ years of experience scaling and managing Splunk Cloud at scale (1000+ SVCs).
  • Expertise in creating intuitive, actionable Splunk dashboards correlating data from multiple sources.
  • Strong coding skills in SPL and Go for building internal tools and automating workflows.
  • Deep understanding of Linux internals, networking, and container orchestration (Kubernetes/EKS).
  • Experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.
  • Data-driven approach to debugging complex, cross-service performance bottlenecks.

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $184k
This role $230k
$120k most similar roles pay here $283k

This role pays more than 85% of similar roles. Most pay $156,000–$212,076 — the shaded band above. At the midpoint, this role pays about $230k versus about $184k for comparable roles.

Based on 239 similar postings.

Employer

About Okta Inc

Okta, Inc. is an American identity and access management company based in San Francisco. It provides cloud software that helps companies manage and secure user authentication into applications, and for developers to build identity controls into applications, websites, web services, and devices.[

Okta Inc currently has 145 open roles on FindRole.

Listed pay typically runs $194,000–$267,000 across 145 roles with salary data.

Most-posted roles

View all roles at Okta Inc

More like this

Similar roles

Site Reliability Engineer

Autodesk

Atlanta, GA 13 days ago $117,000$209,330
AWS Kubernetes Terraform Python Linux Bash Docker CI/CD Jenkins Git CloudWatch Splunk Dynatrace New Relic Grafana PostgreSQL MySQL MSSQL EC2 ECS EKS Lambda ELB S3 IAM VPC DynamoDB RDS