Senior Reliability Engineer

Uber

Quick summary

Work type
On-site
Location
Sunnyvale, CA
Posted
6 days ago

Market check

Salary context

How this pay compares to similar roles

Similar $171k
$132k most similar roles pay here $213k

This listing doesn't post a salary. Most similar roles pay $144,339–$198,200.

Based on 240 similar postings.

Employer

About Uber

Uber Technologies, Inc. is the world’s largest, San Francisco-based mobile technology platform facilitating on-demand ride-hailing, food delivery (Uber Eats), and freight transportation across approximately 70 countries.

Uber currently has 45 open roles on FindRole.

Most-posted roles

View all roles at Uber

At a glance

TL;DR · Senior Reliability Engineer

As a Senior Reliability Engineer at AV Labs, you will join a dedicated team focused on ensuring the reliable operation of Uber’s in-vehicle sensor data collection systems. Your primary responsibilities include architecting observability platforms that ingest and analyze real-time health telemetry from thousands of distributed vehicle nodes, developing edge-constrained systems for diverse hardware environments, and defining criticality models to distinguish transient anomalies from systemic issues impacting sensor uptime and data yield. You will also design automated detection mechanisms to eliminate manual intervention as the fleet scales, collaborate with Operations and Engineering teams to build safe, automated responses to recurring failures, and drive reliability-focused technical strategy through design reviews and roadmaps. This role requires expertise in distributed systems, observability platforms like Prometheus and Grafana, proficiency in languages such as Go or Python, and deep knowledge of Linux internals and networking protocols.

What you'll do

  • Architect observability platforms to ingest and analyze real-time health data from distributed vehicle nodes.
  • Develop systems that maintain performance across diverse hardware with intermittent connectivity challenges.
  • Define alerting strategies to distinguish transient anomalies from systemic issues affecting sensor uptime.
  • Design detection logic for silent failures like sensor degradation, compute saturation, or recording pipeline stalls.
  • Create automated mechanisms to detect, triage, and mitigate issues as the fleet scales without manual intervention.

What we're looking for

  • 5+ years of experience in software engineering, site reliability, or systems engineering.
  • Expertise in modern observability platforms like Prometheus, Grafana, and ELK for edge/IoT environments.
  • Proficiency in Go, Python, or C++ with production system development experience.
  • Deep knowledge of Linux internals and shell scripting for debugging hardware-related issues.
  • Proven reliability ownership for large-scale production systems, including SLIs/SLOs implementation.
  • Leadership in driving complex technical projects across multiple teams from design to production.

More like this

Similar roles

Senior Hardware R&D Technician

Uber

Sunnyvale, CA 2 days ago
PCBA Rework SMT Wire_Harness_Fabrication Crimping Multimeters Oscilloscopes Power_Supplies DFM CAN LIN Automotive_Ethernet TE_Connectivity Deutsch Mechanical_Prototyping Drill_Presses Band_Saws Welding

Senior Engineering Manager, AV Labs

Uber

San Francisco, CA +1 6 days ago
Python PyTorch TensorFlow Kubernetes AWS CI/CD Docker PostgreSQL Git Jenkins Prometheus Grafana Linux Scikit-learn Pandas NumPy Apache Spark MLOps

Software Engineer II

Uber

Sunnyvale, CA 9 days ago
C++ Linux Python Machine Learning Computer Vision Robotics Game Theory Autonomous Driving Data Mining Deep Learning CI/CD

Director, Engineering

Uber

Sunnyvale, CA 23 days ago
Python TensorFlow PyTorch Kubernetes Docker CI/CD AWS Google Cloud Azure PostgreSQL MongoDB Git Jenkins Prometheus Grafana Scikit-learn Pandas NumPy Apache Spark