Staff Compiler Engineer - PyTorch + Kernel DSLPLATE

Samsung Semiconductor

Quick summary

Work type: On-site
Location: San Jose, CA
Salary: $163,000–$253,000 / yr
Posted: today

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $210k

This role $208k

$150k most similar roles pay here $264k

This role pays less than 51% of similar roles. Most pay $183,726–$237,237 — the shaded band above. At the midpoint, this role pays about $208k versus about $210k for comparable roles.

Based on 240 similar postings.

Employer

About Samsung Semiconductor

Samsung Semiconductor is the global semiconductor business unit of Samsung Electronics, designing and manufacturing memory chips, logic semiconductors, and foundry solutions for a broad range of applications.

Samsung Semiconductor currently has 54 open roles on FindRole.

Listed pay typically runs $163,000–$253,000 across 54 roles with salary data.

Most-posted roles

View all roles at Samsung Semiconductor

At a glance

TL;DR · Staff Compiler Engineer - PyTorch + Kernel DSLPLATE

Apply Now Log in to save

Join our team as a Staff Compiler Engineer specializing in PyTorch and Kernel DSL development, where you will adapt torch.compile to fit our backend by lowering Inductor's IR to our hardware and defining fusion strategies. You’ll build or extend kernel DSLs for our unique hardware, design placement and scheduling passes, implement parallelism-aware lowering, and engage with upstream review processes for open-source projects like PyTorch and Triton. Ideal candidates have 3-5 years of experience in technologies such as MLIR, XLA, TVM, Inductor, or similar, along with a background in HPC, distributed systems, and non-flat memory hierarchies. Experience with kernel autotuning and open-source contributions is highly valued.

Skills

PyTorch MLIR Triton Helion Inductor XLA TVM IREE CUTLASS CUDA XPU ROCm MPS TPU kernel DSL HPC distributed systems NUMA-aware programming autotuning performance modeling cost-based compilation LLVM open-source contributions

What you'll do

Adapt torch.compile to backend by lowering Inductor's IR to hardware.
Build or extend kernel DSLs for custom hardware, deciding changes needed in frontend/backend.
Design placement and scheduling passes for distributed memory model optimization.
Implement parallelism-aware lowering for tensor, pipeline, expert, and sequence parallelism.
Contribute upstream to open-source projects like PyTorch, Triton, Helion, and MLIR.

What we're looking for

10+ years of industry experience in relevant fields or equivalent education and experience.
Experience designing a kernel DSL or making significant changes to an existing one.
Proficiency in MLIR, including writing dialects, passes, and backend integration.
Expertise in building PyTorch backends for non-CUDA accelerators like XPU, ROCm, TPU.
Knowledge of kernel autotuning, performance modeling, and cost-based compilation techniques.
Background in HPC, distributed systems, or NUMA-aware programming to understand non-flat memory hierarchies.
Open-source contributions to PyTorch, Triton, Helion, LLVM/MLIR, or similar projects.

Similar roles

Careers

Qualcomm

Santa Clara, CA 59 days ago

MLIR LLVM Pytorch 2.0 TVM Triton SYCL C++ Python CUDA OpenCL Polyhedral Compiler Optimization Loop Transformation Vectorization GPU Programming CI/CD Git Linux Docker Kubernetes

Save

Senior Deep Learning Compiler Verification Engineer

Nvidia

Remote (Santa Clara, CA) 36 days ago $140,000–$224,250

Python C++ PyTorch JAX TensorRT LLVM MLIR TVM XLA Type Systems Program Semantics Proof-Based Verification Quantization Operator Fusion Mixed-Precision Graph-Level Optimization

Remote

Save

Senior Deep Learning Compiler Engineer - XLA

Nvidia

Remote (Santa Clara, CA) 99 days ago $152,000–$241,500

C/C++ CUDA JAX PyTorch TensorFlow XLA MLIR LLVM OpenAI_Triton GPU distributed_programming performance_analysis compiler_optimizations clean_software_engineering_practices high_performance_computing

Remote

Save

Senior Machine Learning Applications and Compiler Engineer, LPX

Nvidia

Remote (Santa Clara, CA) 77 days ago $152,000–$241,500

C/C++ Rust LLVM MLIR TensorFlow PyTorch ONNX GPU Profiling tools Tracing tools Benchmarking tools CI/CD Parallel computing Heterogeneous computing Spatial architectures Dataflow architectures Large-scale AI systems

Remote

Save

Machine Learning Compiler Engineer

Qualcomm

New York, NY 30 days ago $200,800–$301,200

MLIR LLVM Pytorch 2.0 TVM Triton SYCL Python C++ CUDA OpenCL Polyhedral Compiler Optimization Loop Transformation Vectorization GPU Programming High Performance Computing CI/CD Git Linux Docker

Save

Senior Deep Learning Compiler Engineer

Nvidia

Remote (Santa Clara, CA) 35 days ago $152,000–$241,500

MLIR XLA TVM LLVM PyTorch CUDA C++ Python GPU CPU Embedded_Systems Cross_Compilation CI/CD

Remote

Save