Principal Software Developer, AI Infrastructure

Oracle

Actively hiring
Austin, TX Posted 12 days ago $99,600$223,400 / year

At a glance

AI generated

TL;DR

As a Principal Software Developer at Oracle Cloud Infrastructure AI Infrastructure in Austin, TX, you will join a dynamic team focused on building high-performance GPU platforms for AI/ML/HPC workloads. Your role involves designing and implementing fundamental architectural changes to enhance GPU delivery, health monitoring, and diagnostic services across thousands of GPUs using technologies like RoCE and Infiniband. You will collaborate closely with product teams to debug and resolve customer issues while ensuring the system's scalability and reliability. Ideal candidates possess a deep understanding of distributed systems, proficiency in languages such as Java, Python, or C++, and experience with Linux systems and cloud service data planes. This role demands technical excellence, adaptability, and a passion for simplifying complex systems to deliver innovative solutions at scale.

Skills

Oracle Cloud Infrastructure Linux Python Java C++ Go Shell scripting Infiniband RoCE Docker CI/CD MySQL Redis Memcached Kubernetes Terraform

What you'll do

  • Design and implement software for managing GPU-based AI servers.
  • Collaborate on delivering high-quality software to manage, triage, and repair GPU systems.
  • Work closely with product teams to debug and resolve customer issues.
  • Develop fundamental architectural changes for GPU delivery and health monitoring.
  • Automate triage processes and enhance diagnostic services for distributed workloads.

What we're looking for

  • 6+ years of experience in delivering and operating large-scale production systems (1000+ server instances)
  • Deep understanding of operating systems, computer networks, and high-performance applications
  • Proficiency in at least one programming language (Java/Python/C/C++/GoLang/shell scripting)
  • Strong background in Linux systems and familiarity with system-level architecture
  • Experience with Infiniband or RoCE networking technologies
  • Hands-on experience designing, developing, and operating public cloud service data planes

Employer

About Oracle

Oracle Corporation is a leading multinational technology company specializing in database software, cloud computing, and enterprise software.

Oracle currently has 343 open roles on FindRole.

Listed pay typically runs $97,500–$199,500 across 253 roles with salary data.

Most-posted roles

View all roles at Oracle