Principal, Site Reliability Engineer

Walmart

Quick summary

Work type: On-site
Location: Bentonville, AR · Sunnyvale, CA
Salary: $110,000–$220,000 / yr
Posted: 11 days ago

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $179k

This role $165k

$97k most similar roles pay here $233k

This role pays less than 64% of similar roles. Most pay $142,950–$215,881 — the shaded band above. At the midpoint, this role pays about $165k versus about $179k for comparable roles.

Based on 239 similar postings.

Employer

About Walmart

Walmart Inc. is the world''s largest retailer by revenue, operating a chain of hypermarkets, discount department stores, and grocery stores, as well as a growing e-commerce presence through Walmart.com. Industry: General Merchandise & Grocery Retail

Walmart currently has 529 open roles on FindRole.

Listed pay typically runs $117,000–$234,000 across 523 roles with salary data.

Most-posted roles

View all roles at Walmart

At a glance

TL;DR · Principal, Site Reliability Engineer

Apply Now Log in to save

The Principal Site Reliability Engineer at Walmart Global Tech’s CES team leads the design and implementation of reliability programs for complex site environments, ensuring system performance, scalability, and disaster recovery through advanced monitoring, root cause analysis, and infrastructure automation. This role collaborates with software engineers, data scientists, and machine learning experts to drive continuous improvement and establish reliability standards that support business objectives. The Principal Engineer will develop and implement monitoring strategies, guide chaos engineering experiments, mentor team members on best practices, and ensure robust, scalable solutions aligned with organizational goals. Proficiency in cloud computing platforms like Docker, strong coding skills in JavaScript and Python, and experience with CI/CD pipelines are essential, as is a deep understanding of disaster recovery planning and system architecture optimization for large-scale enterprise applications.

Skills

Python JavaScript Docker CI/CD Kubernetes AWS Prometheus Grafana Terraform PostgreSQL Git Jenkins Ansible SRE certification Disaster Recovery Planning

What you'll do

Design and develop reliability programs for complex site environments.
Lead reliability testing and chaos experiments to validate system resiliency.
Analyze system architecture to optimize scalability and disaster recovery.
Develop monitoring strategies with metrics and alerts for system availability.
Guide root cause analysis efforts to resolve defects and enhance stability.
Drive infrastructure automation and telemetry integration for operational excellence.
Mentor team members on reliability best practices and coding standards.

What we're looking for

Extensive experience in site reliability engineering and system administration.
Proficiency in designing scalable software architectures for complex environments.
Expertise in disaster recovery planning and execution for large-scale systems.
Skilled in cloud computing platforms and containerization technologies like Docker.
Strong coding skills in JavaScript and Python, with CI/CD pipeline automation.
Ability to lead reliability testing and chaos engineering experiments using tools.
Proven capability in system performance analysis and telemetry implementation.

Site Reliability Engineer, Senior

Booz Allen Hamilton

Aurora, CO 57 days ago $86,900–$198,000

Linux HP

Save