AI Model Optimization Architect

Qualcomm

Actively hiring
San Diego, CA Posted 78 days ago Apply by Sep 12, 2026 $158,400$237,600 / year

At a glance

AI generated

TL;DR

Qualcomm Technologies is hiring a Staff Engineer – AI Model Optimization Architect to join its Cloud AI team, focusing on developing hardware and software platforms for efficient inference of large-scale foundation models. This senior role involves architecting model optimization strategies that transform PyTorch models into accelerator-efficient execution, working closely with compiler, performance, and accuracy teams to ensure optimal throughput, latency, memory usage, and quality across various batch sizes and sequence lengths. Key responsibilities include designing fusion kernels using DSL-based approaches like Triton, profiling and optimizing large language and vision models for inference, enabling continuous batching strategies, and scaling distributed inference across multi-core systems. The ideal candidate has expert-level proficiency in PyTorch, experience with torch.compile, deep knowledge of transformer architectures, and a strong foundation in computer architecture and ML accelerators.

Skills

PyTorch Python ONNX torch.compile TorchDynamo Triton Transformer architectures KVcache Continuous batching Distributed systems Computer architecture ML accelerators Debugging Performance optimization Memory management

What you'll do

  • Architect and deliver model optimization strategies for PyTorch models on Qualcomm accelerators.
  • Drive graph capture and deployment using PyTorch, ONNX, and torch.compile for efficient execution.
  • Design and implement fusion kernels to enable performance-critical algorithmic rewrites.
  • Profile and optimize LLM/VLM/diffusion inference for throughput and latency across various conditions.
  • Own transformer-specific optimizations including KVcache management and long context performance.
  • Enable and optimize continuous batching for memory, scheduling, and tail latency improvements.
  • Architect distributed inference strategies to scale model optimizations across multi-core systems.

What we're looking for

  • Expert level expertise in PyTorch and model optimization for inference.
  • Hands-on experience with torch.compile/TorchDynamo or similar graph capture workflows.
  • Deep understanding of transformer architectures, attention mechanisms, and performance trade-offs.
  • Practical experience with KVcache behavior and memory/performance optimizations.
  • Strong foundation in computer architecture, ML accelerators, and distributed systems.
  • Proven ability to lead cross-functional technical efforts and influence design decisions.
  • MS in Computer Science, Machine Learning, or related field.

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $204k
This role $198k
$147k most similar roles pay here $261k

This role pays less than 57% of similar roles. Most pay $162,000–$246,150 — the shaded band above. At the midpoint, this role pays about $198k versus about $204k for comparable roles.

Based on 240 similar postings.

Employer

About Qualcomm

Qualcomm is a leading American semiconductor and telecommunications company based in San Diego, CA.

Qualcomm currently has 595 open roles on FindRole.

Listed pay typically runs $148,300–$222,500 across 540 roles with salary data.

Most-posted roles

View all roles at Qualcomm

More like this

Similar roles

AI Accuracy Architect

Qualcomm

San Diego, CA 78 days ago $158,400$237,600
Python PyTorch ONNX LLMs VLMs Quantization Transformer_architectures Attention_mechanisms Precision_tradeoffs Numerical_stability Accuracy_evaluation_metrics ML_compilers Torch.compile Computer_architecture ML_accelerators

Solutions Architect, AI Models

Nvidia

Remote (Santa Clara, CA) 42 days ago $152,000$241,500
Python PyTorch TensorFlow Hugging Face Transformers Kubernetes SLURM Docker CI/CD Prometheus Grafana PostgreSQL Git Jupyter Notebook NVIDIA NeMo NVIDIA Nemotron Linux AWS Azure Google Cloud Platform
Remote

AI Solution Architect

Booz Allen Hamilton

Nellis Afb, NV 19 days ago $112,800$257,000
Palantir Foundry Palantir Gotham Kubernetes DevSecOps CI/CD Docker LLM AI/ML DevOps Secret clearance Top Secret clearance AWS

Solutions Architect, AI and ML

Nvidia

Redmond, WA 85 days ago $124,000$195,500
AWS GCP Azure TensorFlow PyTorch CUDA RAPIDS Kubernetes Docker Python DevOps CI/CD NVIDIA GPUs GPU-based systems Deep Learning Parallel programming Distributed computing platforms

Solutions Architect, AI and ML

Nvidia

Redmond, WA 90 days ago $124,000$195,500
AWS GCP Azure TensorFlow PyTorch CUDA RAPIDS Kubernetes Docker Python DevOps CI/CD NVIDIA GPUs GPU-based systems Deep Learning Parallel programming Distributed computing platforms