Senior Site Reliability Engineer, Production Engineering

Anduril Industries

Quick summary

Work type: On-site
Location: Costa Mesa, CA
Salary: $166,000–$220,000 / yr
Posted: today

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $166k

This role $193k

$122k most similar roles pay here $230k

This role pays more than 80% of similar roles. Most pay $139,100–$193,000 — the shaded band above. At the midpoint, this role pays about $193k versus about $166k for comparable roles.

Based on 239 similar postings.

Employer

About Anduril Industries

Anduril Industries is a defense technology company that builds advanced hardware and software systems for national security, including autonomous drones, surveillance systems, and the Lattice AI command platform.

Anduril Industries currently has 1882 open roles on FindRole.

Listed pay typically runs $146,000–$194,000 across 1696 roles with salary data.

Most-posted roles

View all roles at Anduril Industries

At a glance

TL;DR · Senior Site Reliability Engineer, Production Engineering

Apply Now Log in to save

As a Senior Site Reliability Engineer at Anduril's Production Engineering team, you will play a pivotal role in ensuring the reliability and scalability of Lattice, the company’s autonomous command and control platform. Your responsibilities include designing comprehensive monitoring systems, driving incident response efforts, and implementing defensive strategies to prevent production issues. You will also build automation tools using Terraform and Kubernetes operators, establish service level objectives, and collaborate with software engineering teams to enhance system architecture for reliability. Additionally, you will develop capacity planning models, create documentation, and lead cross-functional initiatives to improve deployment safety. This role requires deep expertise in Kubernetes, cloud platforms, observability stacks like Prometheus and Grafana, and a proven track record of improving system reliability through architectural changes. Ideal candidates have experience with mission-critical systems and hold U.S. Secret security clearance.

Skills

Kubernetes Terraform Go Python Rust Java Prometheus Grafana AWS Azure GCP CI/CD PostgreSQL Istio Linkerd Vault Sealed Secrets SOPS

What you'll do

Design and implement comprehensive monitoring, observability, and alerting systems.
Drive incident response and conduct blameless postmortems to prevent recurrence of issues.
Build infrastructure automation using Terraform, Kubernetes operators, and custom tooling.
Establish Service Level Objectives (SLOs) and Error Budgets for system reliability.
Partner with software teams to improve system architecture for better reliability.
Develop capacity planning models and performance testing frameworks for scalability.
Create runbooks and training materials to enable effective operation of production systems.

What we're looking for

7+ years of engineering experience, including at least 3 years in SRE or production operations.
Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
Deep expertise with Kubernetes and cloud platforms (AWS, Azure, GCP).
Strong programming skills in Go, Python, Rust, or Java for building production tooling.
Proven ability to design observability stacks using Prometheus, Grafana, ELK/EFK.
Demonstrated track record of improving system reliability through architectural changes.
Must hold a U.S. Secret security clearance.

Similar roles

Senior Site Reliability Engineer, Production Engineering

Anduril Industries

Seattle, WA today $166,000–$220,000

Kubernetes Terraform Go Python Rust Java Prometheus Grafana AWS Azure GCP CI/CD PostgreSQL Istio Linkerd Vault Sealed Secrets SOPS Jenkins ArgoCD FluxCD Spinnaker

Save