(USA) Principal, Software Engineer
$143,000 - $286,000/year
Role Details
Position Summary...
Walmart processes more transactions in a day than most companies handle in a year. When performance degrades or systems fail, the impact is immediate — measured in millions of dollars and hundreds of millions of customers. We're building the team that prevents that using agentic AI.
As a Principal Engineer in Performance and Resiliency Engineering, you'll architect and lead the development of intelligent, self-healing systems: LLM-based agents that detect anomalies, reason across observability data, and trigger automated remediation — without waiting for a human in the loop. You'll operate at a scale most AI engineers never encounter: 10,500 stores, 240M weekly customers, and infrastructure that powers one of the world's largest retail ecosystems.
This isn't a research role or a proof-of-concept environment. You'll own the technical strategy, set architectural direction, and ship to production — building agentic systems that directly impact Walmart's global reliability and business continuity.
About the Team
Building the right technology foundation for Infrastructure & Platforms is vital to success at Walmart's scale. Our team builds and maintains the foundational technologies that power the entire tech organization — data platforms, enterprise architecture, DevOps, cloud computing, and infrastructure. We ship to production weekly, run blameless postmortems, and treat chaos experiments as first-class engineering work. If you thrive in high-ownership environments where your architectural decisions have immediate, measurable impact, this is where you belong.
What you'll do...
What You'll Own
You'll set the technical direction — not just execute it. From initial architecture through production deployment, you'll own the roadmap for Walmart's agentic AI platform for performance and resiliency. You'll have the autonomy to make architectural tradeoffs, drive experimentation, and shape how intelligent systems operate at enterprise scale.
Key Responsibilities
Build & Lead Agentic AI Systems
- Architect production multi-agent pipelines — from RAG-based knowledge grounding to LLM-driven decision-making and autonomous remediation — operating across 10,500 stores and 240M weekly customers
- Own LLM evaluation standards for production: factuality, consistency, safety guardrails, and failure modes; set the bar that other teams adopt
- Optimize LLM inference at scale through prompt caching, quantization, and retrieval filtering — measurable latency and cost impact, not theoretical gains
- Integrate vector databases and observability stacks to build context-aware systems that act on live signals without human intervention
Drive Performance & Resiliency
- Build the AI/ML layer that moves Walmart from reactive incident response to predictive, self-correcting infrastructure — cutting mean time to recovery across critical systems
- Design and run chaos experiments that expose real failure modes and change architecture decisions — not checkbox exercises
- Define SLOs that reflect real business impact, integrate performance gates into CI/CD, and make observability (Grafana, Prometheus, ELK, Splunk) actionable across the org
- Write and maintain runbooks that teams actually use: tested, updated after every incident, and clear enough to act on under pressure
Lead & Elevate Engineering
- Set the architectural direction for the org's agentic AI platform — from initial design through production deployment — and own the decisions that follow
- Close the gap between experimentation and production: move ML models from notebooks into reliable, monitored systems that hold up under Black Friday-scale traffic
- Raise the technical floor through design reviews and mentoring that produces engineers who make better decisions independently
- Shape the multi-year roadmap for AI-powered performance and resiliency, influencing infrastructure investment decisions across the org
What You'll Bring
Core Requirements
- 10+ years of experience building and operating distributed systems at scale
- Proven, hands-on production experience with LLMs, agentic frameworks, or RAG-based systems
- Deep background in performance engineering, chaos engineering, or SRE — with real ownership of SLOs and incident response
- Strong programming skills in Python and/or Java; comfort working across the full ML stack
Additional Experience (Valued)
- Familiarity with ML frameworks: PyTorch, TensorFlow, Hugging Face Transformers
- Hands-on with cloud-native infrastructure: GCP, Azure, Kubernetes, Docker
- MLOps experience: CI/CD for ML, drift detection, model monitoring
- Experimentation background: A/B testing, causal inference, multi-armed bandits
Excellent communication skills — able to align technical and non-technical stakeholders on complex architectural decisions
At Walmart, we offer competitive pay as well as performance-based bonus awards and other great benefits for a happier mind, body, and wallet. Health benefits include medical, vision and dental coverage. Financial benefits include 401(k), stock purchase and company-paid life insurance. Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting. Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more.
You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable.
For information about PTO, see https://one.walmart.com/notices.
Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart.
Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms.
For information about benefits and eligibility, see One.Walmart.
The annual salary range for this position is $143,000.00 - $286,000.00
Additional compensation includes annual or quarterly performance bonuses.
Additional compensation for certain positions may also include :
- Stock
ㅤ
ㅤ
ㅤ
ㅤ
Minimum Qualifications...
Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.
Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 5 years’ experience in software engineering or related area.
Option 2: 7 years’ experience in software engineering or related area.
Preferred Qualifications...
Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.
Master’s degree in computer science, computer engineering, computer information systems, software engineering, or related area and 3 years' experience in software engineering or related area., We value candidates with a background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly. The ideal candidate would have knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmart’s accessibility standards and guidelines for supporting an inclusive culture.
Primary Location...
1345 Crossman Ave, Sunnyvale, CA 94089-1114, United States of AmericaWalmart and its subsidiaries are committed to maintaining a drug-free workplace and has a no tolerance policy regarding the use of illegal drugs and alcohol on the job. This policy applies to all employees and aims to create a safe and productive work environment.
For more details click Job Post.
About Walmart
Walmart Inc. is the world''s largest retailer by revenue, operating a chain of hypermarkets, discount department stores, and grocery stores, as well as a growing e-commerce presence through Walmart.com. Industry: General Merchandise & Grocery Retail