LLM Serving Engineer (Cloud AI Engineering), Senior / Staff Engineer

Qualcomm

Actively hiring
San Diego, CA · Markham, ON Posted 51 days ago $158,400$237,600 / year

At a glance

AI generated

TL;DR

LLM Serving Engineer (Cloud AI Engineering) at Qualcomm Technologies is a senior-level position within the Cloud AI team, focusing on developing hardware and software solutions for inference acceleration in large language models. This role involves building a scalable LLM inference platform using advanced techniques such as disaggregated serving and KV-Cache management, while also contributing to the development of packages like vLLM, SGLang, and Triton-Inference server. Engineers will collaborate with internal teams and customers to drive solutions, engage with open-source communities, and optimize deep learning workloads for efficient autoscaling and load balancing. Candidates should have hands-on experience with LLM serving tools, a strong background in PyTorch development, and expertise in computer architecture and distributed systems, along with excellent communication skills.

Skills

Triton-Inference Server vLLM SGLang PyTorch Python Kubernetes Docker CI/CD Prometheus Grafana PostgreSQL Redis OpenAI Hugging Face AWS Google Cloud Platform Azure Git Jenkins GitHub Slack

What you'll do

  • Build a scalable LLM inference platform using advanced techniques like disaggregated serving and KV-Cache management.
  • Contribute to the development of LLM Serving packages such as vLLM, SGLang, TGI, Triton-Inference server, Dynamo, and LLM-d.
  • Identify new optimization opportunities by understanding advanced algorithms and numerics in GenAI.
  • Drive efficient model serving through smart autoscaling, load balancing, and routing strategies.
  • Engage with open-source communities to evolve and improve inference frameworks.
  • Collaborate closely with internal teams on compiler, firmware, and platform aspects for LLM deployment.

What we're looking for

  • Hands-on experience with LLM serving packages like Triton-Inference Server and vLLM.
  • Deep understanding of transformer-based architectures and foundational language models.
  • Strong Python development skills for large-scale projects and software engineering.
  • Experience in analyzing, profiling, and optimizing deep learning workloads.
  • Proactive knowledge of the latest inference optimization techniques.

Market check

Salary context

This $158,400–$237,600 range sits above 49% of similar postings on FindRole.

Peer median band

$144,850$243,250

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$162,000$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Qualcomm

Qualcomm is a leading American semiconductor and telecommunications company based in San Diego, CA.

Qualcomm currently has 567 open roles on FindRole.

Listed pay typically runs $148,300–$226,100 across 534 roles with salary data.

Most-posted roles

View all roles at Qualcomm

More like this

Similar roles

Principal SW Engineer - LLM Serving (Cloud AI)

Qualcomm

San Diego, Ca,Us, US 101 days ago $200,800$301,200
PyTorch Python C++ LLMs Multi-modal models Reasoning models Neural networks High performance software Multicore systems Performance analysis Multi-core architecture SoC architectures Performance modeling Machine learning accelerators Neural network operators Linear algebra Math libraries

AI Performance Engineer (Cloud AI Engineering), Sr | Staff | Sr. Staff

Qualcomm

San Diego, Ca,Us, US 51 days ago $178,400$267,600
PyTorch ONNX Python Transformer architectures Attention mechanisms Sharding strategies Parallelism techniques Computer architecture ML accelerators Distributed systems Linear algebra Math libraries Machine learning compilers torch.compile torchDynamo

Senior Lead AI Engineer (LLM Gateway, FM Hosting)

Capital One Financial

Mclean, Va, US 17 days ago $229,900$262,400
Python TensorFlow PyTorch Docker Kubernetes AWS CI/CD Git PostgreSQL Redis Scikit-learn Flask RESTful APIs Nginx Prometheus Grafana

Senior Lead AI Engineer (FM Hosting, LLM Inference)

Capital One Financial

Mclean, Va, US 124 days ago $229,900$262,400
Python TensorFlow PyTorch Kubernetes Docker AWS CI/CD PostgreSQL Redis Prometheus Grafana GitLab Jupyter Notebook Scikit-learn Pandas NumPy Hugging Face Transformers

Staff Software Development Engineer (Cloud & AI)

CVS Health

Remote (Buffalo Grove-2100 E Lake Cook, US) 45 days ago $130,295$260,590
Azure GCP CI/CD Python Java Go FastAPI PostgreSQL Azure SQL Cosmos DB Apache Kafka Jenkins GitHub Copilot Agile methodologies
Remote