Senior Manager, AI Infrastructure Network Operations

Oracle

Quick summary

Work type
On-site
Location
Austin, TX
Salary
$133,100–$306,400 / yr
Posted
16 days ago

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $215k
This role $220k
$112k most similar roles pay here $327k

This role pays more than 51% of similar roles. Most pay $184,800–$246,150 — the shaded band above. At the midpoint, this role pays about $220k versus about $215k for comparable roles.

Based on 239 similar postings.

Employer

About Oracle

Oracle Corporation is a leading multinational technology company specializing in database software, cloud computing, and enterprise software.

Oracle currently has 774 open roles on FindRole.

Listed pay typically runs $97,500–$209,500 across 584 roles with salary data.

Most-posted roles

View all roles at Oracle

At a glance

TL;DR · Senior Manager, AI Infrastructure Network Operations

As a Senior Manager in the AI Infrastructure Network Operations team, you will lead and develop a group of engineers focused on RDMA/RoCE fabric operations, performance optimization, automation, and troubleshooting. Your role involves deep networking expertise and software engineering skills to enhance reliability, observability, and efficiency at global cloud scale. You will collaborate with various teams such as Network Availability, Automation, Monitoring, and GNOC to resolve complex customer escalations and improve operational readiness. Key responsibilities include driving engineering programs that increase performance and availability, managing NOC events, and building data-driven metrics for fabric health and service status. The ideal candidate has extensive experience in large-scale cloud network operations, RDMA/RoCE, GPU/HPC networking, and leading teams that build automation and monitoring systems.

What you'll do

  • Manage and develop a team for RDMA/RoCE fabric operations, performance, automation, and troubleshooting.
  • Lead efforts to improve reliability, availability, observability, and performance of OCI AI/HPC networking fabrics.
  • Apply deep networking knowledge to guide the design and enhancement of operational tools and monitoring systems.
  • Drive improvements in software and network architectures to simplify and scale operational workflows.
  • Support customer escalations by coordinating technical investigations across multiple teams.

What we're looking for

  • Deep expertise in RDMA/RoCE, Clos fabrics, congestion control, and telemetry.
  • Extensive experience managing teams that build automation, monitoring, and operational systems.
  • Strong background in large-scale cloud network operations or building networks for cloud environments.
  • Hands-on technical skills in software architecture, debugging, and enhancing operational tools.
  • Proven ability to lead engineering programs improving performance and availability of distributed infrastructure.
  • Experience resolving complex customer escalations and coordinating across multiple technical teams.

More like this

Similar roles

Senior Manager, AI Infrastructure Network Operations

Oracle

Seattle, WA +2 7 days ago $118,300$251,600
RDMA RoCE Clos fabrics congestion control telemetry Oracle Cloud Infrastructure Kubernetes Docker CI/CD Python PostgreSQL AWS Azure Grafana Prometheus Ansible Terraform

Principal Network Developer, AI Infrastructure

Oracle

Austin, TX +1 9 days ago $109,200$223,400
Oracle Cloud Infrastructure Python Networking Protocols Automation Scripts CI/CD Monitoring Systems Docker Kubernetes Terraform PostgreSQL MySQL Cisco RFP Development Vendor Management Technical Coaching

Senior Principal Engineer - AI Networking

Oracle

Seattle, WA 7 days ago $96,800$306,400
RDMA InfiniBand C/C++ Linux NCCL RCCL MPI UCX XCCL PyTorch DeepSpeed Megatron-LM TensorFlow JAX Kubernetes GPU GPUDirect RHEL Networking Distributed Systems

Senior Manager, AI Operations & Platform Engineering

Autodesk

San Francisco, CA 16 days ago $151,300$271,150
AWS Azure GCP Kubernetes CI/CD Terraform Python Go Docker Prometheus Grafana Ansible Jenkins PostgreSQL Redis FedRAMP DevOps SRE AI Machine Learning Observability

Senior Principal Engineer - AI Networking

Oracle

Austin, TX +1 7 days ago $96,800$306,400
RDMA InfiniBand C/C++ Linux NCCL RCCL MPI UCX XCCL PyTorch DeepSpeed Megatron-LM TensorFlow JAX Kubernetes GPU networking GPUDirect RDMA RoCE congestion management adaptive routing traffic shaping network resiliency Docker CI/CD