Site Reliability Engineering Technical Leader (Remote)

Cisco

Remote

Quick summary

Work type
Remote
Location
Research Triangle Park, NC
Salary
$149,100–$218,900 / yr
Posted
5 days ago

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $177k
This role $184k
$130k most similar roles pay here $230k

This role pays more than 54% of similar roles. Most pay $143,248–$211,625 — the shaded band above. At the midpoint, this role pays about $184k versus about $177k for comparable roles.

Based on 240 similar postings.

Employer

About Cisco

Cisco Systems is the world''s leading networking technology company, designing and manufacturing networking hardware, telecommunications equipment, and cybersecurity solutions for businesses and governments. Industry: Networking Technology & Cybersecurity

Cisco currently has 167 open roles on FindRole.

Listed pay typically runs $168,800–$241,400 across 167 roles with salary data.

Most-posted roles

View all roles at Cisco

At a glance

TL;DR · Site Reliability Engineering Technical Leader (Remote)

As a Senior Network Engineer on the Data Center Network Services team at Cisco IT, you will design and develop advanced AI-driven software features for global data center networks, focusing on scalability and reliability in AI/ML infrastructure and high-performance computing environments. Your day-to-day responsibilities include collaborating with cross-functional teams to implement innovative solutions, managing GPU cluster networking, and deploying automation tools like Terraform and Ansible. You must have expertise in Cisco Data Center Networking technologies such as ACI, Routing, Switching, Nexus, VPC, VLAN, VXLAN, and BGP, along with a solid understanding of AI Fabric and high-performance networking for AI/ML workloads. Additionally, proficiency in Unix/Linux environments, DevOps principles, and tools like JIRA, GIT, and Jenkins is essential, as you will play a key role in enhancing service reliability and operational efficiency across Cisco’s global network infrastructure.

What you'll do

  • Design and develop advanced AI-driven software features for data center networks.
  • Implement innovative capabilities to enhance service reliability and operational efficiency.
  • Manage networking for GPU cluster environments and scale AI workloads effectively.
  • Create documentation and training materials for internal clients and cross-functional teams.
  • Utilize Terraform and Ansible for Infrastructure as Code (IaC) in network deployments.
  • Expertise in deploying and utilizing AI-based observability tools for monitoring.

What we're looking for

  • 10+ years of experience designing and building scalable, reliable networking solutions for AI/ML infrastructure.
  • Expertise in Cisco Data Center Networking technologies including ACI networks and related protocols.
  • Proven leadership in driving strategic automation initiatives and enhancing service reliability.
  • Skilled in managing GPU cluster environments and implementing AI-based observability tools.
  • Proficiency in Terraform and Ansible for Infrastructure as Code (IaC) practices.
  • Strong programming skills with a solid grasp of software engineering concepts and cloud computing paradigms.

More like this

Similar roles

Site Reliability Engineer Lead - Senior Vice President

Citi

Remote (388 Greenwich Street - Tower, US) 12 days ago $176,720$265,080
Kubernetes OpenShift Prometheus Grafana Terraform Ansible Helm Python Java Go AWS Google Cloud Azure CI/CD Disaster Recovery Infrastructure as Code Observability SLOs SLIs Error Budgets Chaos Engineering
Remote

Director, Site Reliability Engineering

McDonald’s Corporation

Chicago, IL 45 days ago $178,121$222,651
AWS Azure GCP Site Reliability Engineering Agile Methodologies CI/CD Vendor Management Cloud Infrastructure PaaS IaaS Data Analytics Financial Forecasting Chargeback Management Global Vendor Relationships High-Performance Team Building

Site Reliability Engineer |||

CME Group

Chicago, IL 132 days ago $100,700$167,800
GCP Docker Kubernetes Python Java Oracle Postgres BigQuery SLO SLI SLA OpenTelemetry Splunk Prometheus Grafana CI/CD Bamboo JIRA Git