Kubernetes Platform Engineer – AI Infrastructure

Cisco

Hybrid Actively hiring
San Jose, CA · Remote, USA Posted 14 days ago $152,500$219,200 / year

At a glance

AI generated

TL;DR

As a Kubernetes Platform Engineer at Cisco’s AI Infrastructure team, you will design and build large-scale on-prem Kubernetes platforms to support next-generation AI/ML workloads, including GPU-enabled environments for training and inference. Your day-to-day responsibilities include architecting scalable multi-tenant infrastructure, implementing Infrastructure as Code with Golang, and building platform capabilities using custom controllers and operators. You’ll partner closely with data scientists and ML engineers to optimize AI/ML pipelines and workflows while ensuring reliability through performance tuning and participation in on-call support. This role requires hands-on experience with Kubernetes control plane management, etcd operations, and AIOps-driven automation, making it ideal for those who want to influence platform strategy and mentor junior engineers in a hybrid work environment.

Skills

Kubernetes OpenShift Anthos etcd Golang Python Infrastructure as Code AIOps CRDs Controllers Operators Webhooks GPU-based workloads AI/ML pipelines Observability Telemetry CI/CD

What you'll do

  • Design and build large-scale on-prem Kubernetes platforms for AI/ML workloads.
  • Optimize GPU-based environments for training, inference, and model deployment.
  • Extend platform capabilities using custom controllers, operators, and CRDs in Golang.
  • Implement Infrastructure as Code with automation and AIOps-driven self-healing.
  • Ensure reliability through performance tuning and participation in on-call support.

What we're looking for

  • 5+ years of software engineering experience with AI/ML or GPU-based workloads on Kubernetes.
  • 3+ years operating Kubernetes in production, managing control plane and cluster lifecycle.
  • Expertise in Go for building Kubernetes controllers/operators, CRDs, and webhooks.
  • Deep knowledge of Kubernetes internals including API server, scheduler, and reconciliation patterns.
  • Proven ability to debug and operate large-scale distributed systems in production environments.
  • Experience with etcd management, including backup, restore, and recovery processes.
  • Familiarity with AI/ML platforms, pipelines, and tooling for model training and inference.

Market check

Salary context

This $152,500–$219,200 range sits above 47% of similar postings on FindRole.

Peer median band

$151,000$225,550

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$147,715$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Cisco

Cisco Systems is the world''s leading networking technology company, designing and manufacturing networking hardware, telecommunications equipment, and cybersecurity solutions for businesses and governments. Industry: Networking Technology & Cybersecurity

Cisco currently has 103 open roles on FindRole.

Listed pay typically runs $165,000–$241,400 across 103 roles with salary data.

Most-posted roles

View all roles at Cisco

More like this

Similar roles

Kubernetes Platform Engineer - AI Infrastructure

Cisco

Remote (Usa-Research Triangle Park, US) 14 days ago $126,500$182,000
Kubernetes OpenShift Anthos Golang Python etcd Infrastructure as Code AIOps Prometheus Grafana CI/CD GPU ML pipelines CRDs Webhooks Observability On-call support
Remote

Senior Kubernetes Platform Engineer - AI Infrastructure

Cisco

Remote (Usa-Research Triangle Park, US) 14 days ago $137,000$200,500
Kubernetes OpenShift Anthos etcd Go Infrastructure as Code AIOps telemetry Prometheus Grafana Kubeflow MLflow CI/CD Docker GitOps Terraform Ansible Python PostgreSQL
Remote

Senior Kubernetes Platform Engineer - AI/ML Infrastructure

Cisco

Remote (Usa-Research Triangle Park, US) 14 days ago $137,000$200,500
Kubernetes Go etcd Infrastructure as Code AIOps Observability Metrics Logs Traces Kubeflow MLflow Distributed systems On-call rotations Bare-metal infrastructure OpenShift Anthos Prometheus Grafana CI/CD
Remote

Kubernetes Platform Engineer (IT Engineer Senior)

Qualcomm

San Diego, Ca,Us, US 30 days ago
Kubernetes Rancher RKE2 GKE EKS AKS Cilium Docker ContainerD git Github Python Go bash JIRA CKAD CKA CKS Portworx MetalLB Github Actions CI/CD