Senior Kubernetes Platform Engineer - AI Infrastructure

Cisco

Remote Hybrid Actively hiring
Remote · Research Triangle Park, NC · Dallas, TX · Allen, TX Posted 14 days ago $137,000$200,500 / year

At a glance

AI generated

TL;DR

As a Senior Kubernetes Platform Engineer on the AI Infrastructure team, you will design and operate large-scale Kubernetes platforms to support next-generation AI/ML workloads, including GPU-enabled environments for both traditional ML and state-of-the-art LLMs. Your daily tasks include architecting scalable multi-tenant platform architectures, building platform extensions using Golang-based services and Kubernetes controllers, and implementing Infrastructure as Code practices to ensure operational efficiency. You will also drive AIOps capabilities by leveraging telemetry and automation for reliability, while partnering with data scientists and ML engineers to optimize resource utilization and improve observability. Essential skills include extensive experience in Kubernetes production environments, deep knowledge of etcd management, proficiency in Go, and expertise in Kubernetes internals such as the API server and scheduler. Experience with AI/ML platforms like Kubeflow and distributed training systems is preferred.

Skills

Kubernetes OpenShift Anthos etcd Go Infrastructure as Code AIOps telemetry Prometheus Grafana Kubeflow MLflow CI/CD Docker GitOps Terraform Ansible Python PostgreSQL

What you'll do

  • Architect and build large-scale on-prem Kubernetes platforms for AI/ML workloads.
  • Define and evolve scalable multi-tenant platform architecture supporting GPU-based workloads.
  • Enable ML workloads by optimizing training, inference, and LLM deployment pipelines.
  • Implement Infrastructure as Code to enhance scalability and operational efficiency.
  • Drive AIOps capabilities using telemetry, automation, and self-healing systems.
  • Improve observability through metrics, logs, traces, and resource optimization.

What we're looking for

  • 8+ years of software engineering experience with a focus on Kubernetes production environments.
  • At least 4 years of hands-on Kubernetes control plane ownership and management.
  • Expertise in etcd lifecycle management including backup, restore, recovery, and upgrades.
  • Proficiency in Go for building Kubernetes controllers, operators, CRDs, and webhooks.
  • Deep understanding of Kubernetes internals such as API server, scheduler, and controller loops.
  • Experience supporting AI/ML or GPU-based workloads on Kubernetes platforms.
  • Proven ability to operate and debug large-scale distributed systems.

Market check

Salary context

This $137,000–$200,500 range sits above 32% of similar postings on FindRole.

Peer median band

$155,210$241,500

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$156,000$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Cisco

Cisco Systems is the world''s leading networking technology company, designing and manufacturing networking hardware, telecommunications equipment, and cybersecurity solutions for businesses and governments. Industry: Networking Technology & Cybersecurity

Cisco currently has 103 open roles on FindRole.

Listed pay typically runs $165,000–$241,400 across 103 roles with salary data.

Most-posted roles

View all roles at Cisco

More like this

Similar roles

Senior Kubernetes Platform Engineer - AI/ML Infrastructure

Cisco

Remote (Usa-Research Triangle Park, US) 14 days ago $137,000$200,500
Kubernetes Go etcd Infrastructure as Code AIOps Observability Metrics Logs Traces Kubeflow MLflow Distributed systems On-call rotations Bare-metal infrastructure OpenShift Anthos Prometheus Grafana CI/CD
Remote

Kubernetes Platform Engineer - AI Infrastructure

Cisco

Remote (Usa-Research Triangle Park, US) 14 days ago $126,500$182,000
Kubernetes OpenShift Anthos Golang Python etcd Infrastructure as Code AIOps Prometheus Grafana CI/CD GPU ML pipelines CRDs Webhooks Observability On-call support
Remote

Kubernetes Platform Engineer – AI Infrastructure

Cisco

Remote (Usa-San Jose, US) 14 days ago $152,500$219,200
Kubernetes OpenShift Anthos etcd Golang Python Infrastructure as Code AIOps CRDs Controllers Operators Webhooks GPU-based workloads AI/ML pipelines Observability Telemetry CI/CD
Remote

Kubernetes Platform Engineer (IT Engineer Senior)

Qualcomm

San Diego, Ca,Us, US 30 days ago
Kubernetes Rancher RKE2 GKE EKS AKS Cilium Docker ContainerD git Github Python Go bash JIRA CKAD CKA CKS Portworx MetalLB Github Actions CI/CD

Senior Kubernetes Software Engineer

Broadcom

Usa-Ca - Promontory B, US 51 days ago $120,000$192,000
Kubernetes Go CNCF CI/CD vSphere Docker Terraform AWS GCP Azure PostgreSQL Prometheus GitLab GitHub Maven Jenkins Ansible Python Shell_scripting