Principal GenAI Inference Optimization Engineer in San Jose, California | Advanced Micro Devices, Inc

Amd

Hybrid

Quick summary

Work type: Hybrid
Location: San Jose, CA
Salary: $240,000–$240,000 / yr
Posted: 78 days ago
Closes: Mar 27, 2027
Nearby: 99+ roles within 25 mi

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $212k

This role $240k

$168k most similar roles pay here $270k

This role pays more than 67% of similar roles. Most pay $177,737–$246,150 — the shaded band above. At the midpoint, this role pays about $240k versus about $212k for comparable roles.

Based on 240 similar postings.

Employer

About Amd

AMD (Advanced Micro Devices) is a semiconductor company that develops high-performance processors, graphics cards, and adaptive computing solutions for gaming, data centers, and embedded markets. Industry: Semiconductors

Amd currently has 65 open roles on FindRole.

Listed pay typically runs $188,000–$188,000 across 65 roles with salary data.

Most-posted roles

View all roles at Amd

At a glance

TL;DR · Principal GenAI Inference Optimization Engineer in San Jose, California | Advanced Micro Devices, Inc

Apply Now Log in to save

As a Principal GenAI Inference Optimization Engineer on the Models and Applications team, you will focus on enhancing performance, efficiency, and scalability of generative AI inference workloads on AMD GPU platforms. Your daily tasks include optimizing latency, throughput, and cost for large-scale model deployments in production environments, analyzing bottlenecks across compute, memory, and communication layers, and implementing advanced optimization techniques such as batching strategies and quantization. You will collaborate with hardware, compiler, and framework teams to drive cross-stack optimizations and contribute to the development of scalable serving systems using tools like vLLM, SGLang, Triton, or similar frameworks. Proficiency in Python, C++, CUDA/HIP, and experience with ML frameworks such as PyTorch are essential, along with a deep understanding of GPU architecture and performance fundamentals.

Skills

Python C++ CUDA HIP vLLM SGLang Triton TensorRT-LLM PyTorch JAX TensorFlow AMD GPUs PCIe RDMA Distributed systems Profiling tools Benchmarking tools Performance analysis tools CI/CD

What you'll do

Optimize performance of GenAI inference workloads on AMD GPUs.
Improve latency and throughput for large-scale model serving in production.
Analyze and resolve bottlenecks across compute, memory, and communication systems.
Implement and evaluate advanced inference optimization techniques like quantization.
Develop profiling and benchmarking tools for inference workload analysis.
Support scalable serving system development and resource utilization.
Contribute to internal and open-source projects for AMD platform optimizations.

What we're looking for

Extensive expertise in optimizing GenAI inference workloads on AMD GPUs.
Deep understanding of GPU architecture and performance optimization techniques.
Hands-on experience with inference/serving frameworks like vLLM, SGLang, Triton.
Proficiency in Python and systems programming languages (C++, CUDA, HIP).
Ability to analyze and resolve bottlenecks across compute, memory, and communication.
Experience optimizing large-scale LLM and multimodal model serving systems.
Strong collaboration skills with cross-functional teams for system performance improvements.

Sr. Lead AI Engineer (GenAI Platform)

Capital One Financial

San Jose, CA +4 34 days ago $229,900–$262,400

Python TensorFlow PyTorch Kubernetes Docker AWS CI/CD Git PostgreSQL MongoDB Scikit-learn Pandas NumPy Jupyter Swagger RESTful APIs GraphQL

Save