Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

MongoDB

Quick summary

Work type
On-site
Location
Toronto, Ontario, CanadaMontreal, Quebec, Canada
Salary
$144,000–$200,000 / yr
Posted
13 days ago

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $179k
This role $172k
$127k most similar roles pay here $232k

This role pays less than 57% of similar roles. Most pay $144,262–$214,487 — the shaded band above. At the midpoint, this role pays about $172k versus about $179k for comparable roles.

Based on 240 similar postings.

Employer

About MongoDB

MongoDB is a leading American software company that develops and provides commercial support for a popular, source-available document database. Designed to handle unstructured and structured data natively, its platform is purpose-built for modern cloud applications, analytics, and AI experiences.

MongoDB currently has 287 open roles on FindRole.

Listed pay typically runs $126,500–$209,000 across 104 roles with salary data.

Most-posted roles

View all roles at MongoDB

At a glance

TL;DR · Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

As a Senior Site Reliability Engineer (SRE) joining MongoDB’s small, senior team in Toronto or Montreal, you will play a pivotal role in defining Service Level Objectives and shaping capacity plans for Atlas's storage services. Your day-to-day responsibilities include building reliable and fault-tolerant distributed storage systems, identifying key metrics to detect incidents, and participating in 24/7 on-call rotations. You will leverage your expertise in Python or Go, Kubernetes, cloud infrastructure platforms like AWS, GCP, or Azure, and Linux operating system internals to optimize performance across the application stack. This role demands experience with stateful storage systems at scale, a customer-focused mindset, and a preference for automation over manual processes, as you help execute on MongoDB’s multi-year roadmap for cloud storage architecture.

What you'll do

  • Define and implement Service Level Objectives (SLOs) for storage services.
  • Shape capacity plans to ensure reliability and durability of the storage layer.
  • Build self-healing infrastructure to maintain service availability and resilience.
  • Configure key metrics to detect incidents and measure service health effectively.
  • Participate in 24/7 on-call rotations to resolve critical issues promptly.

What we're looking for

  • At least 6 years of experience in software development and operating distributed systems.
  • Proficient in Python, Go, or similar programming languages.
  • Experience with stateful storage or database systems at scale, understanding durability and consistency trade-offs.
  • Strong customer focus and efficiency in processes and operations.
  • Expertise in cloud infrastructure platforms like AWS, GCP, or Azure.
  • Knowledge of Linux operating system internals and networking concepts.

More like this

Similar roles