Staff Compiler Engineer - PyTorch + Kernel DSLPLATE
Quick summary
- Work type
- On-site
- Location
- San Jose, CA
- Salary
- $163,000–$253,000 / yr
- Posted
- today
Market check
Salary context
How this pay compares to similar roles
This role pays less than 51% of similar roles. Most pay $183,726–$237,237 — the shaded band above. At the midpoint, this role pays about $208k versus about $210k for comparable roles.
Based on 240 similar postings.
Employer
About Samsung Semiconductor
Samsung Semiconductor is the global semiconductor business unit of Samsung Electronics, designing and manufacturing memory chips, logic semiconductors, and foundry solutions for a broad range of applications.
Samsung Semiconductor currently has 54 open roles on FindRole.
Listed pay typically runs $163,000–$253,000 across 54 roles with salary data.
Most-posted roles
- Principal Engineer, AI System Architect (Hardware) 2
- Staff Engineer, SSD Qualification 2
- Account Director, Sales 1
- Compensation Partner 1
- Director, Design Verification 1
At a glance
TL;DR · Staff Compiler Engineer - PyTorch + Kernel DSLPLATE
Join our team as a Staff Compiler Engineer specializing in PyTorch and Kernel DSL development, where you will adapt torch.compile to fit our backend by lowering Inductor's IR to our hardware and defining fusion strategies. You’ll build or extend kernel DSLs for our unique hardware, design placement and scheduling passes, implement parallelism-aware lowering, and engage with upstream review processes for open-source projects like PyTorch and Triton. Ideal candidates have 3-5 years of experience in technologies such as MLIR, XLA, TVM, Inductor, or similar, along with a background in HPC, distributed systems, and non-flat memory hierarchies. Experience with kernel autotuning and open-source contributions is highly valued.
Skills
What you'll do
- Adapt torch.compile to backend by lowering Inductor's IR to hardware.
- Build or extend kernel DSLs for custom hardware, deciding changes needed in frontend/backend.
- Design placement and scheduling passes for distributed memory model optimization.
- Implement parallelism-aware lowering for tensor, pipeline, expert, and sequence parallelism.
- Contribute upstream to open-source projects like PyTorch, Triton, Helion, and MLIR.
What we're looking for
- 10+ years of industry experience in relevant fields or equivalent education and experience.
- Experience designing a kernel DSL or making significant changes to an existing one.
- Proficiency in MLIR, including writing dialects, passes, and backend integration.
- Expertise in building PyTorch backends for non-CUDA accelerators like XPU, ROCm, TPU.
- Knowledge of kernel autotuning, performance modeling, and cost-based compilation techniques.
- Background in HPC, distributed systems, or NUMA-aware programming to understand non-flat memory hierarchies.
- Open-source contributions to PyTorch, Triton, Helion, LLVM/MLIR, or similar projects.
More like this
Similar roles
Senior Deep Learning Compiler Verification Engineer
Nvidia
Senior Deep Learning Compiler Engineer - XLA
Nvidia
Senior Machine Learning Applications and Compiler Engineer, LPX
Nvidia
Machine Learning Compiler Engineer
Qualcomm
Senior Deep Learning Compiler Engineer
Nvidia