AI/ML Technical Leader - Language Model Inference & AI Ops

Cisco

Hybrid

Quick summary

Work type: Hybrid
Location: San Jose, CA
Salary: $212,300–$275,800 / yr
Posted: 3 days ago
Closes: Sep 30, 2026
Nearby: 99+ roles within 25 mi

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $218k

This role $244k

$167k most similar roles pay here $288k

This role pays more than 66% of similar roles. Most pay $189,750–$246,150 — the shaded band above. At the midpoint, this role pays about $244k versus about $218k for comparable roles.

Based on 240 similar postings.

Employer

About Cisco

Cisco Systems is the world''s leading networking technology company, designing and manufacturing networking hardware, telecommunications equipment, and cybersecurity solutions for businesses and governments. Industry: Networking Technology & Cybersecurity

Cisco currently has 134 open roles on FindRole.

Listed pay typically runs $168,800–$241,400 across 134 roles with salary data.

Most-posted roles

View all roles at Cisco

At a glance

TL;DR · AI/ML Technical Leader - Language Model Inference & AI Ops

Apply Now Log in to save

As an AI Operations Technical Leader at Cisco’s CX AI Incubation Team in San Jose, CA, you will lead the productionization of large language and semantic models for intelligent customer experiences across cloud and on-prem environments. Your day-to-day responsibilities include optimizing inference performance using techniques like speculative decoding and quantization, deploying robust model-serving pipelines with clear SLAs, and ensuring observability through latency metrics and quality drift signals. You’ll work with cutting-edge technologies such as PyTorch, TensorFlow, vLLM, TensorRT-LLM, and Triton, requiring expertise in Python, Java, or C++, along with hands-on experience in deploying NLP/Generative AI systems. This role demands strong software engineering skills, GPU inference knowledge, and the ability to collaborate effectively in a fast-paced environment.

Skills

Python PyTorch TensorFlow Kubernetes CI/CD vLLM TensorRT-LLM Triton SGLang llama.cpp NVIDIA Nsight Prometheus Grafana PostgreSQL Java C++ ML lifecycle tooling Model registry Experiment tracking Observability

What you'll do

Build and deploy robust model-serving pipelines for LLM/SLM features in cloud and on-prem environments.
Optimize inference performance across various hardware configurations using advanced techniques like speculative decoding and quantization.
Design scalable serving architectures for multi-tenant, secure, cost-aware generative AI systems.
Implement automated CI/CD processes for models and prompts to ensure reproducible releases and regression testing.
Support training and fine-tuning workflows for LLMs/SLMs, including data curation and experiment tracking.

What we're looking for

9+ years of experience in software engineering with a focus on ML/AI workloads.
Strong background in Python, Java or C++ for building production services.
Experience deploying and operating NLP/Generative AI systems in production.
Proficient in PyTorch/TensorFlow and tooling across the ML lifecycle.
Hands-on experience with inference engines and GPU profiling tools.
Expertise in on-prem deployment patterns and edge/resource-constrained environments.
Strong communication skills for technical documentation and design reviews.