Site Reliability Engineer, HPC & Automation

SpaceX

Quick summary

Work type
On-site
Location
Redmond, WA
Salary
$125,000–$150,000 / yr
Posted
today

Market check

Salary context

Below market

How this pay compares to similar roles

Similar $187k
This role $138k
$113k most similar roles pay here $237k

This role pays less than 85% of similar roles. Most pay $152,150–$222,146 — the shaded band above. At the midpoint, this role pays about $138k versus about $187k for comparable roles.

Based on 240 similar postings.

Employer

About SpaceX

SpaceX designs, manufactures, and launches advanced rockets and spacecraft with the mission of enabling humans to become a multi-planetary species. It operates the Falcon 9, Falcon Heavy, and Starship launch vehicles, as well as the Starlink satellite internet constellation.

SpaceX currently has 657 open roles on FindRole.

Listed pay typically runs $130,000–$160,000 across 483 roles with salary data.

Most-posted roles

View all roles at SpaceX

At a glance

TL;DR · Site Reliability Engineer, HPC & Automation

Join the Silicon Engineering team as a Site Reliability Engineer, where you will design and operate high-performance computing infrastructure for developing chips that power global satellite constellations. Your daily tasks include deploying and scaling clusters, automating silicon simulation workflows to speed up project timelines, managing infrastructure as code, operating CI/CD pipelines, and identifying performance bottlenecks. You’ll work with technologies such as Bash, Python, Linux, Docker, Kubernetes, Terraform, Ansible, Grafana, Prometheus, Jenkins, and various ASIC design tools. This role demands strong experience in system administration, high-performance computing, and a passion for improving infrastructure efficiency at scale.

What you'll do

  • Deploy, upgrade, and maintain high performance computing clusters and services.
  • Develop automated solutions for silicon simulation workflows to speed up project timelines.
  • Use modern observability tools to manage infrastructure health and provide comprehensive monitoring.
  • Operate continuous integration pipelines, build systems, and version control across the environment.
  • Identify and eliminate performance bottlenecks in the system through measurement and engineering.

What we're looking for

  • 1+ years of development experience with Bash, Python, and other programming languages.
  • Bachelor’s degree in computer science or equivalent professional experience.
  • Experience with Linux operating systems and containerization technologies (Docker, Kubernetes).
  • Knowledge in high performance computing, workload managers (Slurm, LSF), and automation frameworks (Terraform, Ansible).
  • Ability to manage infrastructure as code and build monitoring and alerting solutions.

More like this

Similar roles

Site Reliability Engineer

CME Group

Chicago, IL 149 days ago $100,700$167,800
GCP Docker Kubernetes Python Java Oracle Postgres BigQuery SLO SLI SLA OpenTelemetry Splunk Prometheus Grafana CI/CD Bamboo JIRA Git UC4 Automic

Software Engineer, Hardware-in-the-Loop

SpaceX

Redmond, WA 19 days ago $125,000$150,000
Python C CI/CD Linux Bash Docker Kubernetes Networking Unit testing Data analysis Simulation software Git PostgreSQL Maven Jenkins Ansible Terraform AWS Grafana Prometheus

Site Reliability Engineer, HPC

Microsoft

US 134 days ago $142,800$274,800
Kubernetes Docker CI/CD Azure AWS GCP Terraform Python Go Bash Grafana Datadog OpenTelemetry Networking Storage GPU HPC Capacity_Planning Cost_Optimization

Site Reliability Engineer, Hardware Infrastructure

Nvidia

Santa Clara, CA 17 days ago $184,000$287,500
SRE DevOps Python Go Prometheus Grafana CI/CD Kubernetes Terraform AWS LLM Generative AI Agentic solutions Docker Git Jenkins Ansible PostgreSQL Redis Nginx

Senior Site Reliability Engineer, HPC

Nvidia

Santa Clara, CA +2 17 days ago $152,000$241,500
AWS GCP OCI Kubernetes Slurm LSF CI/CD Terraform Python Go Perl Ruby Prometheus Grafana Docker Ansible GitOps AIOps PostgreSQL MySQL

Site Reliability Engineer, Human Engineering

Apple Inc

Austin, TX 42 days ago
Kubernetes AWS Terraform Python Django PostgreSQL Redis Kafka Elasticsearch CI/CD GitOps Helm TLS DNS ServiceMesh Istio Envoy Grafana Prometheus SLO DistributedTracing OIDC APIGateway