Senior Site Reliability Engineer, Production Engineering

Anduril Industries

Quick summary

Work type
On-site
Location
Seattle, WA
Salary
$166,000–$220,000 / yr
Posted
today

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $166k
This role $193k
$122k most similar roles pay here $230k

This role pays more than 80% of similar roles. Most pay $139,100–$193,000 — the shaded band above. At the midpoint, this role pays about $193k versus about $166k for comparable roles.

Based on 239 similar postings.

Employer

About Anduril Industries

Anduril Industries is a defense technology company that builds advanced hardware and software systems for national security, including autonomous drones, surveillance systems, and the Lattice AI command platform.

Anduril Industries currently has 1882 open roles on FindRole.

Listed pay typically runs $146,000–$194,000 across 1696 roles with salary data.

Most-posted roles

View all roles at Anduril Industries

At a glance

TL;DR · Senior Site Reliability Engineer, Production Engineering

As a Senior Site Reliability Engineer at Anduril's Production Engineering team, you will play a crucial role in ensuring the reliability and scalability of Lattice, the company’s autonomous command and control platform. Your responsibilities include designing comprehensive monitoring systems, driving incident response, building automation with tools like Terraform and Kubernetes operators, establishing SLOs, and improving system architecture for better reliability. You will also develop capacity planning models, create runbooks, and lead cross-functional efforts to enhance deployment safety. The ideal candidate has deep expertise in Kubernetes, strong programming skills in languages such as Go or Python, and experience with observability stacks like Prometheus and Grafana. This role requires a U.S. Secret security clearance and offers the opportunity to work on mission-critical systems that directly impact national security at massive scale.

What you'll do

  • Design and implement comprehensive monitoring, observability, and alerting systems.
  • Drive incident response and conduct blameless postmortems to prevent recurrence of issues.
  • Build infrastructure automation using Terraform, Kubernetes operators, and custom tooling.
  • Establish Service Level Objectives (SLOs) and Error Budgets for system reliability.
  • Partner with software engineering teams to improve system architecture for reliability.
  • Develop capacity planning models and performance testing frameworks for peak demands.
  • Create runbooks and documentation to enable effective operation of production systems.

What we're looking for

  • 7+ years of engineering experience, including at least 3 years in SRE or production operations.
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
  • Deep expertise with Kubernetes and cloud platforms (AWS, Azure, GCP).
  • Strong programming skills in Go, Python, Rust, or Java for building production tooling.
  • Proven ability to design observability stacks using Prometheus, Grafana, ELK/EFK.
  • Demonstrated track record of improving system reliability through architectural changes.
  • Must hold a U.S. Secret security clearance.

More like this

Similar roles

Senior Site Reliability Engineer

Anduril Industries

Costa Mesa, CA today $166,000$220,000
Linux Python Terraform Kubernetes Docker Ansible Networking Security CI/CD Monitoring Splunk AWS Azure GCP

Senior Site Reliability Engineer

Anduril Industries

Costa Mesa, CA today $191,000$287,000
Kubernetes AWS Azure Terraform Python Go Rust C++ Docker Helm ArgoCD CI/CD Prometheus Grafana