Principal, Site Reliability Engineer

Walmart

Quick summary

Work type
On-site
Location
Bentonville, AR · Sunnyvale, CA
Salary
$110,000–$220,000 / yr
Posted
11 days ago

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $179k
This role $165k
$97k most similar roles pay here $233k

This role pays less than 64% of similar roles. Most pay $142,950–$215,881 — the shaded band above. At the midpoint, this role pays about $165k versus about $179k for comparable roles.

Based on 239 similar postings.

Employer

About Walmart

Walmart Inc. is the world''s largest retailer by revenue, operating a chain of hypermarkets, discount department stores, and grocery stores, as well as a growing e-commerce presence through Walmart.com. Industry: General Merchandise & Grocery Retail

Walmart currently has 529 open roles on FindRole.

Listed pay typically runs $117,000–$234,000 across 523 roles with salary data.

Most-posted roles

View all roles at Walmart

At a glance

TL;DR · Principal, Site Reliability Engineer

The Principal Site Reliability Engineer at Walmart Global Tech’s CES team leads the design and implementation of reliability programs for complex site environments, ensuring system performance, scalability, and disaster recovery through advanced monitoring, root cause analysis, and infrastructure automation. This role collaborates with software engineers, data scientists, and machine learning experts to drive continuous improvement and establish reliability standards that support business objectives. The Principal Engineer will develop and implement monitoring strategies, guide chaos engineering experiments, mentor team members on best practices, and ensure robust, scalable solutions aligned with organizational goals. Proficiency in cloud computing platforms like Docker, strong coding skills in JavaScript and Python, and experience with CI/CD pipelines are essential, as is a deep understanding of disaster recovery planning and system architecture optimization for large-scale enterprise applications.

What you'll do

  • Design and develop reliability programs for complex site environments.
  • Lead reliability testing and chaos experiments to validate system resiliency.
  • Analyze system architecture to optimize scalability and disaster recovery.
  • Develop monitoring strategies with metrics and alerts for system availability.
  • Guide root cause analysis efforts to resolve defects and enhance stability.
  • Drive infrastructure automation and telemetry integration for operational excellence.
  • Mentor team members on reliability best practices and coding standards.

What we're looking for

  • Extensive experience in site reliability engineering and system administration.
  • Proficiency in designing scalable software architectures for complex environments.
  • Expertise in disaster recovery planning and execution for large-scale systems.
  • Skilled in cloud computing platforms and containerization technologies like Docker.
  • Strong coding skills in JavaScript and Python, with CI/CD pipeline automation.
  • Ability to lead reliability testing and chaos engineering experiments using tools.
  • Proven capability in system performance analysis and telemetry implementation.

More like this

Similar roles

Principal Site Reliability Engineer

The Walt Disney Company

Remote (Usa - Fl - Disney'S Hollywood Studios - Feature Animation Building, US) 57 days ago
AWS Azure GCP Terraform CloudFormation Ansible Chef CI/CD Docker Kubernetes Prometheus Grafana Python Linux Windows AI LLM PCI DevOps SRE SLI SLO SLA
Remote

Principal Site Reliability Engineer

The Walt Disney Company

Remote (Bay Lake, FL) 50 days ago
Akamai Kona Site Defender WAF Bot Manager DevOps CI/CD Python Go Docker Terraform AWS Azure Google Cloud PostgreSQL MongoDB Redis Prometheus Grafana Kubernetes Ansible Jenkins GitLab GitHub
Remote

Principal Site Reliability Engineer

Upstart

Remote (Canada) 113 days ago $195,300$270,400
Python Go JavaScript TypeScript Terraform Datadog Prometheus RUM LLM GenAI CI/CD Kubernetes Docker AWS GCP Service Mesh Infrastructure as Code Self-healing systems On-call management Program management
Remote

Sr Principal Site Reliability Engineer

The Walt Disney Company

Remote (Usa - Ca - Market St, US) 60 days ago $250,500$335,900
Kubernetes AWS CI/CD Docker Prometheus Grafana Python PostgreSQL Terraform Ansible GitOps CDN integration media streaming technologies content delivery strategies
Remote

Director, Site Reliability Engineering

McDonald’s Corporation

Chicago, IL 36 days ago $178,121$222,651
AWS Azure GCP Site Reliability Engineering Agile Methodologies CI/CD Vendor Management Cloud Infrastructure PaaS IaaS Data Analytics Financial Forecasting Chargeback Management Global Vendor Relationships High-Performance Team Building