Senior Manager, Site Reliability Engineering

Oracle

Actively hiring Verified listing
Reston, VA · Seattle, WA · Austin, TX Posted 21 days ago

At a glance

AI generated

TL;DR

As a Senior Manager in Site Reliability Engineering at Oracle, you will lead and mentor a team responsible for ensuring the reliability and scalability of critical infrastructure. Your day-to-day involves supervising capacity planning, incident management, and automation initiatives to optimize system performance and efficiency. You will leverage your expertise in technologies such as Kubernetes, Docker, and cloud platforms like AWS or Azure to guide your team in implementing robust solutions. Additionally, you will foster a culture of continuous improvement by encouraging experimentation with new tools and methodologies while maintaining strict adherence to security standards. This role requires extensive experience in software engineering or infrastructure management, ideally within large-scale enterprise environments where reliability is paramount.

Skills

Kubernetes Docker CI/CD AWS Terraform Python PostgreSQL Prometheus Grafana Ansible Git Jenkins Linux DevOps Nginx SSL/TLS RESTful APIs JSON YAML Scalability

What you'll do

  • Supervises team members to ensure accurate forecasting of demands for infrastructure.
  • Implements and manages prototyping initiatives to explore novel approaches in infrastructure development.
  • Monitors comprehensive health and performance reporting to guide appropriate actions based on data trends.
  • Reviews and provides feedback on automation tools and scripts, leading during implementation phases.
  • Serves as a senior escalation point for incidents and complex issues within Oracle services.

What we're looking for

  • At least 5 years of experience in software engineering, infrastructure management, or a related field.
  • Proven leadership and supervision skills in managing and directing team members.
  • Expertise in monitoring, analyzing, and optimizing system performance and reliability.
  • Strong technical communication and documentation abilities for diverse audiences.
  • Experience in implementing automation standards and conducting testing on automations.
  • Knowledge of incident response, root cause analysis, and post-mortem procedures.
  • Commitment to continuous learning and innovation in site reliability engineering.

Market check

Salary context

This listing doesn't show a salary. Similar roles on FindRole typically pay $135,000–$218,850.

Peer median band

$135,000$218,850

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$143,248$216,300

Middle half of comparable postings.

Based on 238 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Oracle

Oracle Corporation is a leading multinational technology company specializing in database software, cloud computing, and enterprise software.

Oracle currently has 251 open roles on FindRole.

Listed pay typically runs $97,500–$199,500 across 193 roles with salary data.

Most-posted roles

View all roles at Oracle

More like this

Similar roles

Senior Site Reliability Engineer

Oracle

Reston, Virginia, US 21 days ago
Oracle Linux Ansible Terraform Python Bash Prometheus Grafana Kubernetes CI/CD Git Active Directory LDAP Kerberos GlusterFS PostgreSQL Docker AWS Azure Google Cloud Platform Nginx Apache HTTP Server

Senior Site Reliability Engineer

Oracle

US 14 days ago $79,100$158,200
Oracle Cloud Infrastructure Kubernetes Python Go Bash CI/CD Terraform Prometheus Grafana Linux Networking Docker SRE Incident Response SLIs/SLOs Resilience Engineering FedRAMP 3PAO

Senior Site Reliability Engineer

CoStar Group

US 11 days ago
AWS Kubernetes Docker Terraform CloudFormation Python Java C# NodeJS Bash PCI compliance REST API Microservices CDN PostgreSQL MySQL Azure Google Cloud CI/CD

Site Reliability Engineer Lead - Senior Vice President

Citi

Remote (388 Greenwich Street - Tower, US) 50 days ago $176,720$265,080
Kubernetes OpenShift Prometheus Grafana Terraform Ansible Helm Python Java Go AWS Google Cloud Azure CI/CD Disaster Recovery Infrastructure as Code Observability SLOs SLIs Error Budgets Chaos Engineering
Remote