Senior Manager, AI Infrastructure Network Operations

Oracle

Quick summary

Work type
On-site
Location
Seattle, WASanta Clara, CAAustin, TX
Salary
$118,300–$251,600 / yr
Posted
4 days ago

Market check

Salary context

Below market

How this pay compares to similar roles

Similar $216k
This role $185k
$101k most similar roles pay here $280k

This role pays less than 74% of similar roles. Most pay $184,925–$246,150 — the shaded band above. At the midpoint, this role pays about $185k versus about $216k for comparable roles.

Based on 239 similar postings.

Employer

About Oracle

Oracle Corporation is a leading multinational technology company specializing in database software, cloud computing, and enterprise software.

Oracle currently has 755 open roles on FindRole.

Listed pay typically runs $97,500–$211,506 across 570 roles with salary data.

Most-posted roles

View all roles at Oracle

At a glance

TL;DR · Senior Manager, AI Infrastructure Network Operations

As a Senior Manager in the OCI AI Infrastructure Network Operations team, you will lead a group of engineers responsible for developing, operating, and enhancing large-scale RDMA/RoCE network fabrics that support Oracle Cloud Infrastructure’s most demanding AI, GPU, and HPC workloads. Your day-to-day involves deep networking expertise, software engineering skills, and collaboration with various teams to resolve complex issues, improve operational readiness, and enhance performance. Key responsibilities include managing a team of engineers, driving improvements in reliability and observability through tool development and automation, and supporting customer escalations by coordinating technical investigations across multiple disciplines. You will also define roadmaps for engineering efficiency and service availability, build data-driven metrics to monitor fabric health and performance trends, and ensure operational planning meets corporate expectations. The ideal candidate has extensive experience with RDMA/RoCE, Clos fabrics, congestion control, telemetry, and large-scale troubleshooting in a cloud environment.

What you'll do

  • Manage and develop a team responsible for RDMA/RoCE fabric operations and troubleshooting.
  • Lead efforts to improve the reliability, availability, observability, and performance of OCI AI/HPC networking fabrics.
  • Apply deep knowledge in RDMA, RoCE, Ethernet fabrics, congestion control, QoS, telemetry, and large-scale troubleshooting.
  • Guide design and enhancement of operational tools, automation platforms, monitoring systems, and infrastructure services.
  • Define and execute team roadmaps focused on engineering efficiency, network performance, and service availability.

What we're looking for

  • 10+ years of experience in large-scale cloud network operations or development
  • Deep expertise in RDMA/RoCE, Clos fabrics, congestion control, telemetry, and performance troubleshooting
  • Strong software engineering background for operational tools, automation, monitoring systems, and infrastructure services
  • Proven leadership in managing engineers who operate and build software for distributed infrastructure
  • Experience driving improvements in network architectures and scaling operational workflows
  • Ability to coordinate technical investigations across networking, software, hardware, and operations teams
  • Track record of defining and executing team roadmaps focused on engineering efficiency and service availability

More like this

Similar roles

Principal Network Developer, AI Infrastructure

Oracle

Austin, TX +1 6 days ago $109,200$223,400
Oracle Cloud Infrastructure Python Networking Protocols Automation Scripts CI/CD Monitoring Systems Docker Kubernetes Terraform PostgreSQL MySQL Cisco RFP Development Vendor Management Technical Coaching

Senior Principal Engineer - AI Networking

Oracle

Austin, TX +1 4 days ago $96,800$306,400
RDMA InfiniBand C/C++ Linux NCCL RCCL MPI UCX XCCL PyTorch DeepSpeed Megatron-LM TensorFlow JAX Kubernetes GPU networking GPUDirect RDMA RoCE congestion management adaptive routing traffic shaping network resiliency Docker CI/CD

Infrastructure Engineering Senior Manager - AI

Wells Fargo

Minneapolis, MN +1 4 days ago $153,000$239,000
Microsoft Purview PowerShell Microsoft Graph DSPM for AI Microsoft Priva Entra ID Defender for Cloud Apps Data Loss Prevention (DLP) Information Protection Insider Risk Management Communication Compliance Audit Manager Compliance Manager eDiscovery Retention and Records Management SharePoint Advanced Management Microsoft 365 Copilot Copilot Studio AI governance data classification retention policies legal hold GDPR CCPA NIST AI RMF EU AI Act

Senior Manager, AI Operations & Platform Engineering

Autodesk

San Francisco, CA 13 days ago $151,300$271,150
AWS Azure GCP Kubernetes CI/CD Terraform Python Go Docker Prometheus Grafana Ansible Jenkins PostgreSQL Redis FedRAMP DevOps SRE AI Machine Learning Observability

Senior Software Manager, AI Networking

Nvidia

Remote (Santa Clara, CA) 31 days ago $272,000$431,250
BlueField ConnectX Spectrum-X DOCA RDMA RoCE InfiniBand DPDK NCCL CUDA-aware networking congestion control telemetry CI/CD Kubernetes AWS GCP Azure Python Shell scripting Prometheus Grafana
Remote