Senior Site Reliability Engineer, Production Engineering

Anduril Industries

Quick summary

Work type
On-site
Location
Costa Mesa, CA
Salary
$166,000–$220,000 / yr
Posted
today

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $166k
This role $193k
$122k most similar roles pay here $230k

This role pays more than 80% of similar roles. Most pay $139,100–$193,000 — the shaded band above. At the midpoint, this role pays about $193k versus about $166k for comparable roles.

Based on 239 similar postings.

Employer

About Anduril Industries

Anduril Industries is a defense technology company that builds advanced hardware and software systems for national security, including autonomous drones, surveillance systems, and the Lattice AI command platform.

Anduril Industries currently has 1882 open roles on FindRole.

Listed pay typically runs $146,000–$194,000 across 1696 roles with salary data.

Most-posted roles

View all roles at Anduril Industries

At a glance

TL;DR · Senior Site Reliability Engineer, Production Engineering

As a Senior Site Reliability Engineer at Anduril's Production Engineering team, you will play a pivotal role in ensuring the reliability and scalability of Lattice, the company’s autonomous command and control platform. Your responsibilities include designing comprehensive monitoring systems, driving incident response efforts, and implementing defensive strategies to prevent production issues. You will also build automation tools using Terraform and Kubernetes operators, establish service level objectives, and collaborate with software engineering teams to enhance system architecture for reliability. Additionally, you will develop capacity planning models, create documentation, and lead cross-functional initiatives to improve deployment safety. This role requires deep expertise in Kubernetes, cloud platforms, observability stacks like Prometheus and Grafana, and a proven track record of improving system reliability through architectural changes. Ideal candidates have experience with mission-critical systems and hold U.S. Secret security clearance.

What you'll do

  • Design and implement comprehensive monitoring, observability, and alerting systems.
  • Drive incident response and conduct blameless postmortems to prevent recurrence of issues.
  • Build infrastructure automation using Terraform, Kubernetes operators, and custom tooling.
  • Establish Service Level Objectives (SLOs) and Error Budgets for system reliability.
  • Partner with software teams to improve system architecture for better reliability.
  • Develop capacity planning models and performance testing frameworks for scalability.
  • Create runbooks and training materials to enable effective operation of production systems.

What we're looking for

  • 7+ years of engineering experience, including at least 3 years in SRE or production operations.
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
  • Deep expertise with Kubernetes and cloud platforms (AWS, Azure, GCP).
  • Strong programming skills in Go, Python, Rust, or Java for building production tooling.
  • Proven ability to design observability stacks using Prometheus, Grafana, ELK/EFK.
  • Demonstrated track record of improving system reliability through architectural changes.
  • Must hold a U.S. Secret security clearance.

More like this

Similar roles

Senior Site Reliability Engineer

Anduril Industries

Costa Mesa, CA today $166,000$220,000
Linux Python Terraform Kubernetes Docker Ansible Networking Security CI/CD Monitoring Splunk AWS Azure GCP

Senior Site Reliability Engineer

Anduril Industries

Costa Mesa, CA today $191,000$287,000
Kubernetes AWS Azure Terraform Python Go Rust C++ Docker Helm ArgoCD CI/CD Prometheus Grafana