Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

MongoDB

Quick summary

Work type
On-site
Location
Dublin, IrelandCork, Ireland
Posted
13 days ago

Market check

Salary context

How this pay compares to similar roles

Similar $179k
$130k most similar roles pay here $231k

This listing doesn't post a salary. Most similar roles pay $144,262–$214,487.

Based on 240 similar postings.

Employer

About MongoDB

MongoDB is a leading American software company that develops and provides commercial support for a popular, source-available document database. Designed to handle unstructured and structured data natively, its platform is purpose-built for modern cloud applications, analytics, and AI experiences.

MongoDB currently has 287 open roles on FindRole.

Listed pay typically runs $126,500–$209,000 across 104 roles with salary data.

Most-posted roles

View all roles at MongoDB

At a glance

TL;DR · Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

Join our small, senior team of Site Reliability Engineers (SREs) as a founding member responsible for defining Service Level Objectives (SLOs), shaping capacity plans, and ensuring the reliability and durability of MongoDB’s Atlas storage layer. You will work on multi-tenant distributed storage systems, balancing strategic infrastructure goals with immediate engineering needs while building reliable services that are fault-tolerant and self-healing. Key responsibilities include identifying critical metrics for incident detection and quantifying service health, as well as participating in a 24/7 on-call rotation to resolve issues promptly. Ideal candidates have over six years of experience in distributed systems, proficiency in Python or Go, and expertise in Kubernetes, cloud infrastructure platforms like AWS or GCP, and Linux networking concepts.

What you'll do

  • Define and implement Service Level Objectives (SLOs) for storage services.
  • Shape capacity plans to ensure reliability, durability, and operational safety.
  • Build self-healing infrastructure to maintain service availability and resilience.
  • Configure key metrics to detect incidents and measure service performance.
  • Participate in 24/7 on-call rotations to resolve critical issues promptly.

What we're looking for

  • Over 6 years of experience in software development and operating distributed systems.
  • Proficiency in Python, Go, or similar programming languages.
  • Experience with stateful storage or database systems at scale.
  • Strong preference for automation over manual processes.
  • Expertise in Kubernetes and other containerization technologies.
  • Knowledge of cloud infrastructure platforms like AWS, GCP, or Azure.
  • Deep understanding of Linux OS internals and networking concepts.

More like this

Similar roles