AI/ML Technical Leader - Language Model Inference & AI Ops

Cisco

Hybrid

Quick summary

Work type
Hybrid
Location
San Jose, CA
Salary
$212,300–$275,800 / yr
Posted
3 days ago
Closes
Sep 30, 2026

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $218k
This role $244k
$167k most similar roles pay here $288k

This role pays more than 66% of similar roles. Most pay $189,750–$246,150 — the shaded band above. At the midpoint, this role pays about $244k versus about $218k for comparable roles.

Based on 240 similar postings.

Employer

About Cisco

Cisco Systems is the world''s leading networking technology company, designing and manufacturing networking hardware, telecommunications equipment, and cybersecurity solutions for businesses and governments. Industry: Networking Technology & Cybersecurity

Cisco currently has 134 open roles on FindRole.

Listed pay typically runs $168,800–$241,400 across 134 roles with salary data.

Most-posted roles

View all roles at Cisco

At a glance

TL;DR · AI/ML Technical Leader - Language Model Inference & AI Ops

As an AI Operations Technical Leader at Cisco’s CX AI Incubation Team in San Jose, CA, you will lead the productionization of large language and semantic models for intelligent customer experiences across cloud and on-prem environments. Your day-to-day responsibilities include optimizing inference performance using techniques like speculative decoding and quantization, deploying robust model-serving pipelines with clear SLAs, and ensuring observability through latency metrics and quality drift signals. You’ll work with cutting-edge technologies such as PyTorch, TensorFlow, vLLM, TensorRT-LLM, and Triton, requiring expertise in Python, Java, or C++, along with hands-on experience in deploying NLP/Generative AI systems. This role demands strong software engineering skills, GPU inference knowledge, and the ability to collaborate effectively in a fast-paced environment.

What you'll do

  • Build and deploy robust model-serving pipelines for LLM/SLM features in cloud and on-prem environments.
  • Optimize inference performance across various hardware configurations using advanced techniques like speculative decoding and quantization.
  • Design scalable serving architectures for multi-tenant, secure, cost-aware generative AI systems.
  • Implement automated CI/CD processes for models and prompts to ensure reproducible releases and regression testing.
  • Support training and fine-tuning workflows for LLMs/SLMs, including data curation and experiment tracking.

What we're looking for

  • 9+ years of experience in software engineering with a focus on ML/AI workloads.
  • Strong background in Python, Java or C++ for building production services.
  • Experience deploying and operating NLP/Generative AI systems in production.
  • Proficient in PyTorch/TensorFlow and tooling across the ML lifecycle.
  • Hands-on experience with inference engines and GPU profiling tools.
  • Expertise in on-prem deployment patterns and edge/resource-constrained environments.
  • Strong communication skills for technical documentation and design reviews.

More like this

Similar roles

AI and ML Engineer

Booz Allen Hamilton

Annapolis Junction, MD 30 days ago $99,000$225,000
Python PyTorch TensorFlow Keras NLP Docker Kubernetes Git AWS SageMaker LangChain LlamaIndex Prometheus Grafana

AI/ML Engineer

Lam Research

Fremont, CA 62 days ago $119,000$261,000
Python C++ PostgreSQL SQLite MySQL Git Domain-Driven Design Test-Driven Development CI/CD
Hybrid

AI/ML Engineer

Booz Allen Hamilton

McLean, VA 69 days ago
Python PyTorch Keras LLMs LangGraph MCP A2A AWS CI/CD

AI/ML Engineer

Booz Allen Hamilton

Norfolk, VA 58 days ago $77,500$176,000
Python Spark Hadoop Databricks C# Java LLMs MCP LangChain LangGraph