Software Engineer, Distributed Systems

IBM

Quick summary

Work type
On-site
Location
Austin, TX
Posted
7 days ago

Market check

Salary context

How this pay compares to similar roles

Similar $166k
$116k most similar roles pay here $213k

This listing doesn't post a salary. Most similar roles pay $128,188–$203,850.

Based on 240 similar postings.

Employer

About IBM

IBM is a US-based global technology company providing hybrid cloud, AI, consulting, enterprise software, and IT infrastructure products and services.

IBM currently has 727 open roles on FindRole.

Most-posted roles

View all roles at IBM

At a glance

TL;DR · Software Engineer, Distributed Systems

As a Senior Software Engineer in the distributed systems team based in Austin, TX, you will design and develop critical infrastructure components such as metadata services, coordination layers, and state management systems to ensure high-throughput, fault-tolerant operations at petabyte scale for watsonx.data. Your daily tasks include implementing replication, automatic failover mechanisms, and exactly-once processing to maintain data consistency under failure scenarios. You will also contribute to the CI/CD pipeline by integrating structured logging, distributed tracing, and metrics to enhance observability. Additionally, you will collaborate with cross-functional teams to debug complex issues and design diagnostic tools, ensuring seamless integration of GPU acceleration and AI/ML capabilities. The ideal candidate has extensive experience in Java, Go, or C++, deep knowledge of consistency models and consensus protocols like Raft or Paxos, and proficiency in Kubernetes for stateful workloads.

What you'll do

  • Design and implement metadata services and distributed schedulers for petabyte-scale systems.
  • Build fault-tolerant infrastructure with automatic failover and exactly-once processing guarantees.
  • Develop diagnostic tooling to debug and resolve customer-reported production issues.
  • Contribute to CI/CD pipeline by instrumenting components with logging, tracing, and metrics.
  • Work in an Agile environment to collaborate on reliability and scalability decisions.

What we're looking for

  • 6+ years of professional software engineering experience with at least 2 years in large-scale distributed systems.
  • Proficiency in Java, Go, C++, or a comparable systems language for production-level distributed-system code.
  • Deep understanding and hands-on experience with consistency models, replication, quorum systems, leader election, and consensus protocols (Raft/Paxos).
  • Experience designing fault-tolerant systems with automatic failover, idempotent operations, durable recovery, and distributed observability tools.
  • Ability to communicate clearly through design documents, post-mortems, and capacity analyses; Bachelor's degree in Computer Science or equivalent experience.

More like this

Similar roles

Software Engineer, Distributed Systems

IBM

San Jose, CA 7 days ago
Java Go C++ Kubernetes Raft Paxos OpenTelemetry Jaeger Prometheus Grafana CI/CD Docker Kafka Flink Iceberg Delta Hudi Python PostgreSQL

Senior Software Engineer, Distributed Systems

Apple Inc

Cupertino, CA 9 days ago $150,400$277,600
Go Rust Scala Kubernetes Docker CI/CD Prometheus Grafana PostgreSQL Redis AWS Azure Google Cloud Platform Git Jenkins Python JavaScript React Node.js REST GraphQL

System Software Engineer, Distributed Systems

Nvidia

Santa Clara, CA 10 days ago $152,000$241,500
Go Python Linux NFS IBM LSF Docker Kubernetes Perl CI/CD Prometheus Grafana Git Bash SQL Redis Zookeeper Consul ETCD Apache Kafka Flask Django

Senior Software Engineer, Distributed Systems

Microsoft

Redmond, WA 10 days ago $119,800$234,700
Azure C# .NET Java Go Docker Kubernetes CI/CD Terraform PostgreSQL Redis Elasticsearch Prometheus Grafana Distributed Systems Service Reliability Performance Optimization