Distinguished Software Engineer, AI/ML Engineer, Site Reliability Engineering

Walmart

Quick summary

Work type
On-site
Location
Sunnyvale, CA
Salary
$169,000–$338,000 / yr
Posted
3 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $198k
This role $254k
$125k most similar roles pay here $361k

This role pays more than 88% of similar roles. Most pay $165,000–$230,400 — the shaded band above. At the midpoint, this role pays about $254k versus about $198k for comparable roles.

Based on 240 similar postings.

Employer

About Walmart

Walmart Inc. is the world''s largest retailer by revenue, operating a chain of hypermarkets, discount department stores, and grocery stores, as well as a growing e-commerce presence through Walmart.com. Industry: General Merchandise & Grocery Retail

Walmart currently has 311 open roles on FindRole.

Listed pay typically runs $110,000–$220,000 across 303 roles with salary data.

Most-posted roles

View all roles at Walmart

At a glance

TL;DR · Distinguished Software Engineer, AI/ML Engineer, Site Reliability Engineering

As a Distinguished AI/ML Engineer at Walmart Global Tech's Site Reliability Engineering team, you will lead the development of advanced agentic AI systems and intelligent automation solutions to ensure mission-critical reliability across Walmart’s global technology ecosystem. Your daily tasks include architecting cutting-edge ML platforms, designing multi-agent orchestration frameworks for automated incident response, and building observability tools with ML-driven anomaly detection. You must have extensive experience in cloud engineering, deep learning frameworks like TensorFlow or PyTorch, and expertise in distributed tracing, metrics collection, log aggregation, and APM tools. This role requires a background in large-scale retail systems and collaboration across diverse engineering teams to deliver enterprise-wide reliability solutions.

What you'll do

  • Architect advanced agentic AI systems for autonomous reliability engineering workflows.
  • Design multi-agent orchestration platforms for automated incident response and capacity planning.
  • Build intelligent observability and monitoring systems using ML-driven anomaly detection.
  • Develop self-healing infrastructure platforms that predict, prevent, and resolve issues autonomously.
  • Innovate in large language models (LLMs) for automated incident response and reinforcement learning agents.

What we're looking for

  • Expert-level AI/ML engineering experience with deep knowledge of machine learning algorithms and production ML system deployment.
  • Advanced experience in agentic AI systems, including multi-agent frameworks and autonomous decision-making systems.
  • Comprehensive Site Reliability Engineering expertise with hands-on experience in performance and capacity engineering for AI/ML systems.
  • Expert-level cloud engineering skills with extensive knowledge of cloud-native AI/ML services and serverless architectures.
  • Deep observability and monitoring expertise using distributed tracing, metrics collection, log aggregation, APM tools, and AI-driven anomaly detection.

More like this

Similar roles

Distinguished Software Engineer

Walmart

Bentonville, AR 10 days ago $130,000$260,000
CI/CD Kubernetes AWS Python Docker DevOps Terraform PostgreSQL AI observability Git Jenkins Prometheus Grafana

Distinguished Software Engineer

Walmart

Bentonville, AR 6 days ago $130,000$260,000
Java Golang SPARK Kafka RabbitMQ Docker Kubernetes Azure Google Cloud Platform CI/CD Agile SpringBoot Dropwizard DevOps AI Open-source RESTful Microservices Messaging systems Scalability Security

Distinguished Software Engineer

Walmart

Sunnyvale, CA 9 days ago $169,000$338,000
Python PyTorch TensorFlow Hugging Face Transformers Agentic Frameworks RAG frameworks GCP Azure Kubernetes Docker CI/CD Generative AI LLMs Vector search technologies Java TypeScript REST Microservices

Distinguished Software Engineer

Walmart

Bentonville, AR +1 12 days ago $130,000$260,000
Python Kubernetes AWS CI/CD PostgreSQL Docker Prometheus Grafana JSON Schema OpenAPI Pydantic protobuf GenAI LLM tool-calling responsible AI practices schema frameworks observability cloud-native design code quality system reliability

Distinguished Software Engineer

Walmart

Sunnyvale, CA 20 days ago $169,000$338,000
Python Java Go Docker Kubernetes AWS CI/CD PostgreSQL Redis GraphQL Microservices Cloud-Native Secure Software Development Operational Due Diligence Mentorship

Distinguished Software Engineer

Walmart

Sunnyvale, CA 3 days ago $169,000$338,000
Python Java Cloud-Native AI DevOps CI/CD Kubernetes Docker AWS Azure GCP Terraform PostgreSQL Reliability Engineering Database Abstraction Layers Automation Tools API Design Microservices Architecture Security Best Practices