Distinguished, Software Engineer -AI/ML Engineer – Agentic Systems

Walmart

Actively hiring
(Usa) Crossman Excellence Building Ca Sunnyvale Home Office, US Posted 70 days ago $169,000$338,000 / year

At a glance

AI generated

TL;DR

As a Distinguished AI/ML Engineer within Walmart Global Tech’s Reliability Engineering Organization, you will lead the technical development of advanced agentic AI systems and intelligent automation solutions to ensure mission-critical reliability across Walmart’s technology ecosystem. Your daily tasks include architecting multi-agent orchestration platforms for change management and performance optimization, building ML-driven observability and monitoring tools, and developing self-healing infrastructure that predicts and resolves issues autonomously. You will work with modern tech stacks like TensorFlow, PyTorch, Kubernetes, and cloud-native AI services to drive innovation in autonomous reliability solutions and MLOps platforms. This role requires expertise in mission-critical systems, deep observability, and cloud engineering across Walmart’s e-commerce, supply chain, and store technology domains.

Skills

AI/ML TensorFlow PyTorch MLOps Kubernetes Docker AWS Azure GCP CI/CD Prometheus Grafana Python Agentic AI systems Multi-agent frameworks LLM-based agents Reliability Engineering Cloud Native AI services Observability Distributed tracing Metrics Logs APM AI-driven anomaly detection Infrastructure as code Service mesh architectures API gateways

What you'll do

  • Lead the technical development of agentic AI systems ensuring mission-critical reliability and scalability across Walmart’s technology ecosystem.
  • Design and implement multi-agent orchestration platforms for change management, capacity planning, and performance optimization in hybrid cloud environments.
  • Build intelligent observability and monitoring platforms using ML-driven anomaly detection and autonomous resolution capabilities.
  • Develop self-healing infrastructure platforms that predict, prevent, and resolve issues before impacting customers or business operations.
  • Innovate in agentic AI technologies, including large language models for automated incident response and reinforcement learning agents for capacity optimization.

What we're looking for

  • 12+ years of hands-on experience in Reliability Engineering, AI/ML Engineering, or Platform Engineering.
  • Expert-level AI/ML engineering experience with deep learning frameworks and large-scale production ML deployments.
  • Advanced experience with agentic AI systems including multi-agent frameworks and autonomous decision-making systems.
  • Comprehensive reliability engineering expertise covering service management and performance/capacity engineering for AI/ML systems.
  • Strong cloud engineering background with containerization, serverless architectures, and cloud-native AI services.

Market check

Salary context

This $169,000–$338,000 range sits above 82% of similar postings on FindRole.

Peer median band

$169,000$247,000

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$167,150$246,150

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Walmart

Walmart Inc. is the world''s largest retailer by revenue, operating a chain of hypermarkets, discount department stores, and grocery stores, as well as a growing e-commerce presence through Walmart.com. Industry: General Merchandise & Grocery Retail

Walmart currently has 495 open roles on FindRole.

Listed pay typically runs $117,000–$234,000 across 487 roles with salary data.

Most-posted roles

View all roles at Walmart

More like this

Similar roles

Software Development Engineer (AI & ML)

Fiserv

Alpharetta, Georgia, US 12 days ago
Python TensorFlow Keras Scikit-learn Apache Kafka Airflow Spark Docker Kubernetes PostgreSQL MongoDB Cassandra SQL NoSQL

Sr. Software Engineer - Applied AI

GEICO

Remote (Ca Palo Alto Office, US) 42 days ago $80,000$215,000
Python LangChain HuggingFace OpenAI Kubernetes CI/CD Docker Prometheus Grafana PostgreSQL Redis Apache Kafka Spring AI LangGraph LangSmith LlamaIndex Anthropic APIs Vector databases Knowledge graphs Java Spring生态系统
Remote

Software Engineer I, AI Specialist

Warner Bros. Discovery

Remote (Ga Atlanta 1050 Techwood Drive Nw, US) 13 days ago
Python LLMs Prompt Engineering Evaluation Frameworks Human-in-the-Loop Workflows AI Evaluation Practices Content Classification Taxonomy Management Information Retrieval Concepts Basic Scripting Skills Cross-Functional Collaboration CI/CD
Remote

Senior Software Engineer - Applied AI/ML

Motorola Solutions

Chicago, Il, US 15 days ago $135,000$155,000
Python SQL Docker Kubernetes AWS Azure GCP MLOps CI/CD PyTorch Tensorflow Databricks MLFlow AWS SageMaker Hugging Face Apache Airflow Temporal RF rRay

Senior ML/AI Engineer - Agentic Developer Experience

General Motors (GM)

Remote (Mountain View Technical Center - Mountain View Technical Center, US) 23 days ago $178,420$230,500
Go GCP gRPC Docker Kubernetes AWS CI/CD Python PostgreSQL Terraform GitLab Jenkins GitHub Open-Source Unix/Linux SSH Networking Prometheus Grafana
Remote

Principal Software Development Engineer (AI/ML)

Abbott

US 42 days ago $130,700$261,300
AWS Python TensorFlow LangChain Hugging Face CI/CD MLOps DevOps Docker Kubernetes PostgreSQL NoSQL Mobile App Development Relational Databases AI ML Cloud Platforms DevSecOps