Senior AI Tools Engineer, SRE Operations - GeForce NOW

Nvidia

Remote Actively hiring Posted this week
Canada Posted 6 days ago $144,000$230,000 / year

At a glance

AI generated

TL;DR

Join our dynamic Site Reliability Engineering (SRE) Data Team as an AI Tools Engineer and help build sophisticated AI-powered tools to optimize the global Geforce Now service. You will develop robust ML systems for root cause analysis and predictive maintenance, lead the creation of advanced LLM-based solutions, and manage large-scale data pipelines for model development. Essential skills include Python proficiency, experience with Kubernetes and AWS, and a deep understanding of AI frameworks and current developments in LLMs. Ideal candidates have 5+ years of relevant experience, strong automation expertise, and hands-on knowledge of monitoring tools like Grafana. This role demands an expert who can navigate the complexities of SRE principles and cloud technologies to ensure long-term technical sustainability and operational excellence at scale.

Skills

Python Kubernetes AWS LLMs Grafana Go Terraform CI/CD PostgreSQL Docker Prometheus SRE Data Pipelines Monitoring Tools

What you'll do

  • Build robust AI/ML tools to analyze production data and identify root causes of complex incidents.
  • Lead development of LLM- and Agent-based systems to enhance operational efficiency.
  • Establish best practices for managing large-scale data sources critical for model development.
  • Enhance LLM-based pipelines with a deep understanding of LLM progress in product development.
  • Serve as an expert on AI Frameworks, recommending optimal platforms and toolsets for long-term sustainability.

What we're looking for

  • B.S. in Computer Science, Statistics, or Engineering and 5+ years of AI/ML experience.
  • Proficiency in Python; familiarity with Go or other systems languages preferred.
  • Strong knowledge of AI frameworks and LLM-based platforms.
  • Experience with Kubernetes and cloud environments like AWS.
  • Expertise in building and optimizing large-scale data pipelines.
  • Hands-on experience with monitoring and visualization tools such as Grafana.
  • Understanding of SRE principles and production environment management.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 825 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 813 roles with salary data.

Most-posted roles

View all roles at Nvidia