Senior Staff Engineer - Machine Learning Inference

Not Available Posted 1 day, 9 hours ago

Job Description

Step into the engine room of Agentic Commerce! Imagine owning the bleeding edge of machine learning at Shopify, where your acceleration, optimization, and scaling of ML inference will shape the experience of millions of merchants, and influence how commerce AI is done worldwide. We’re seeking a Senior Staff Engineer to architect, optimize, and own the high-performance runtime that transforms innovative models into production breakthroughs. Your work will be the engine behind our real-time AI systems, driving game-changing cost and latency reductions, and enabling rapid launches of intelligent features that keep Shopify (and our merchants) years ahead. Join a remote-first team of world-class experts, experiment fearlessly, and see your code move the needle for some of the largest-scale ML workloads in commerce.

Responsibilities

Architect, optimize, and own Shopify’s production ML inference. Designing for high throughput, ultra-low latency, and global reliability.
Leverage and extend technologies like CUDA, TensorRT, Triton, TVM, and custom GPU kernels to deliver state-of-the-art performance and efficiency at scale.
Partner with ML, infrastructure, and product teams to seamlessly deploy, benchmark, and scale cutting-edge models powering our platform.
Drive cost optimization and system efficiency, reducing cloud spend and carbon footprint by orders of magnitude without sacrificing model quality.
Lead deep performance investigations, apply advanced techniques (pruning, quantization, distillation, batching), and implement robust solutions for serving models in production.
Set technical strategy and culture for ML inference across Shopify, mentoring others and collaborating with global AI pioneers.

Qualifications

Proven, hands-on expertise in building and optimizing large-scale ML inference systems, with measurable performance and cost wins.
Deep experience in production model serving, runtime optimization, and acceleration. Especially leveraging GPUs (CUDA, TensorRT) and high-performance deep learning infrastructure.
Strong software engineering skills (Python, C++, and/or other relevant languages) with a robust systems and distributed computing mindset.
Demonstrated leadership in architecting or scaling reliable, real-time inference at scale, handling millions of queries per day.
Track record of cross-functional impact: working closely with ML research/engineering, infra, and product teams to deliver production results.
Advanced understanding of model compression, quantization, efficient deployment, and tradeoffs between speed, cost, and accuracy.

Nice to Haves

Open source contributions to inference frameworks (TensorRT, TVM, Triton, DeepSpeed, ONNX, etc.) or technical talks/publications at leading AI conferences.
Experience optimizing inference across a variety of hardware (NVIDIA, AMD, ARM, cloud TPUs).
Familiarity with building or integrating robust monitoring, observability, and auto-scaling for inference platforms.
Experience with modern MLOps pipelines and methodologies.
Prior experience in e-commerce, large-scale product infra, or globally distributed inference workloads.

At Shopify, we pride ourselves on moving quickly—not just in shipping, but in our hiring process as well. If you're ready to apply, please be prepared to interview with us within the week. Our goal is to complete the entire interview loop within 30 days. You will be expected to complete a live pair programming session, come prepared with your own IDE.

For more details click Apply Now.

About Shopify

Shopify is a global commerce company providing a leading e-commerce platform and ecosystem of tools that allows businesses of all sizes to build, manage, and grow their online and physical retail operations. Industry: E-Commerce Technology & Payments

View All Jobs →