Capacity Operations Manager

Nvidia

Actively hiring
Remote (Us, Ca, Santa Clara, US) Posted 69 days ago $136,000$218,500 / year

At a glance

AI generated

TL;DR

Join NVIDIA’s dynamic team as a Senior Cloud Infrastructure Engineer, where you’ll coordinate the development of High Performance Computing (HPC) clusters and manage GPU capacity across diverse cloud platforms. Your day-to-day involves assessing technical requirements from internal and external teams, identifying performance bottlenecks, and collaborating with finance and engineering to optimize resource usage. You will develop tooling for automation and AI-driven insights, enhancing operational efficiency and supporting strategic capacity decisions. Ideal candidates have a Bachelor’s or Master’s degree in Computer Science or related fields, 8+ years of experience in cloud computing, particularly GPU management, and expertise in AWS, Azure, GCP, and OCI. Strong skills in statistical modeling, machine learning, and cloud architecture are essential, as is the ability to work effectively across departments in a fast-paced environment.

Skills

AWS Azure GCP OCI Python Shell scripting Kubernetes Docker CI/CD Prometheus Grafana AI Machine learning Statistical modeling HPC GPU Cloud architecture Data automation Dashboards Infrastructure efficiency Analytics platform Command line interfaces

What you'll do

  • Coordinate the development of HPC clusters with internal and external teams.
  • Improve GPU capacity and compute resources across cloud platforms.
  • Design and manage data models and reporting platforms for infrastructure governance.
  • Assess technical requirements for GPU capacity from various groups.
  • Identify and resolve performance bottlenecks in daily usage of compute resources.
  • Drive resource efficiency initiatives with engineering, finance, and product teams.
  • Develop tooling for cloud infrastructure to optimize resource usage and performance.

What we're looking for

  • Bachelor's or Master's degree in Computer Science, Software Engineering, or related field.
  • 8+ years of experience in cloud computing with a focus on GPU capacity management.
  • Strong technical proficiency in cloud architecture and large-scale data set management.
  • Practical experience with major Cloud Service Providers (AWS, Azure, GCP, OCI).
  • Experience using AI tools to extract insights from data for resource optimization.
  • Deep knowledge of statistical modeling and machine learning for operational efficiency.
  • Excellent communication skills and ability to work across different departments.

Market check

Salary context

This $136,000–$218,500 range sits above 61% of similar postings on FindRole.

Peer median band

$131,000$199,500

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$142,487$199,500

Middle half of comparable postings.

Based on 236 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 802 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 798 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Engineering Operations Manager

Qualcomm

San Diego, Ca,Us, US 49 days ago $125,600$188,400
Python SQL Docker Jenkins Git AWS Google Cloud Platform Kubernetes Terraform CI/CD Prometheus Grafana PostgreSQL MSSQLSERVER

Mission Critical Operations Manager

JLL (Jones Lang LaSalle)

Remote (Usa-Client Mission Ks-At&Amp;T, US) 10 days ago
MS Office CMMS BAS DDC KPI HVAC UPS CRAC Energy Management SOPs MOPs EOPs Change Management CI/CD Python PostgreSQL AWS Azure Google Cloud Terraform Docker
Remote

Operations Enablement Manager (Disputes Management)

Affirm

Remote (US) 30 days ago $160,000$210,000
Lean Six Sigma Project Management Design Thinking Salesforce Confluence LMS platforms AI-driven knowledge tools Process Mapping Workflow Analysis Six Sigma
Remote

Manager, Technical Operations

Warner Bros. Discovery

Remote (Dc Washington 820 1St Street Ne, US) 78 days ago $98,000$182,000
Windows Linux Project Management CI/CD Python SQL PostgreSQL AWS Kubernetes Docker Prometheus Grafana Git Jira Confluence
Remote

Sr Specialist Operations Management

CIBC

Il-70 W Madison St, 9Th Fl, US 13 days ago $86,600$95,000
ACBS Metavante Aspire Loan Manager Private Link Microsoft Python SQL PostgreSQL Excel CI/CD Git Jira Confluence AWS Azure Google Workspace Slack Zoom Tableau Power BI

Facilities Operations Manager – Electrical

Oracle

US 29 days ago $97,500$97,500
OSHA Electrical Systems Maintenance Execution Vendor Management Incident Response Data Center Operations People Management Technical Leadership Safety Compliance