Principal SW Engineer - LLM Serving (Cloud AI)
Qualcomm
At a glance
AI generatedLLM Serving Engineer (Cloud AI Engineering) at Qualcomm Technologies is a senior-level position within the Cloud AI team, focusing on developing hardware and software solutions for inference acceleration in large language models. This role involves building a scalable LLM inference platform using advanced techniques such as disaggregated serving and KV-Cache management, while also contributing to the development of packages like vLLM, SGLang, and Triton-Inference server. Engineers will collaborate with internal teams and customers to drive solutions, engage with open-source communities, and optimize deep learning workloads for efficient autoscaling and load balancing. Candidates should have hands-on experience with LLM serving tools, a strong background in PyTorch development, and expertise in computer architecture and distributed systems, along with excellent communication skills.
Skills
What you'll do
What we're looking for
Market check
This $158,400–$237,600 range sits above 49% of similar postings on FindRole.
Peer median band
$144,850–$243,250
Median floor and ceiling across peers.
Typical midpoint (25–75%)
$162,000–$235,750
Middle half of comparable postings.
Based on 240 comparable postings.
* 240 is the maximum number of comparable postings sampled.
Employer
Qualcomm is a leading American semiconductor and telecommunications company based in San Diego, CA.
Qualcomm currently has 567 open roles on FindRole.
Listed pay typically runs $148,300–$226,100 across 534 roles with salary data.
Most-posted roles
More like this
Qualcomm
Qualcomm
Capital One Financial
Capital One Financial
Capital One Financial
CVS Health