Software Engineer, Distributed Systems

IBM

Quick summary

Work type
On-site
Location
San Jose, CA
Posted
7 days ago

Market check

Salary context

How this pay compares to similar roles

Similar $168k
$115k most similar roles pay here $217k

This listing doesn't post a salary. Most similar roles pay $128,188–$207,000.

Based on 240 similar postings.

Employer

About IBM

IBM is a US-based global technology company providing hybrid cloud, AI, consulting, enterprise software, and IT infrastructure products and services.

IBM currently has 727 open roles on FindRole.

Most-posted roles

View all roles at IBM

At a glance

TL;DR · Software Engineer, Distributed Systems

As a Senior Software Engineer in the distributed systems team, you will design and develop high-throughput, fault-tolerant infrastructure components such as metadata services, coordination layers, and state management systems for petabyte-scale data platforms. Your day-to-day responsibilities include implementing replication, automatic failover, and exactly-once processing to ensure system reliability and consistency. You will also contribute to the CI/CD pipeline by instrumenting components with structured logging and distributed tracing, and debug complex distributed failures using diagnostic tooling. This role requires expertise in Java, Go, or C++, as well as deep knowledge of consistency models, consensus protocols like Raft, and fault-tolerant system design. Experience with Kubernetes, OpenTelemetry, Jaeger, Prometheus, and Grafana is essential for monitoring and observability. Additionally, familiarity with petabyte-scale data movement, stateful streaming systems (Kafka, Flink), and open table formats (Iceberg, Delta) will be beneficial.

What you'll do

  • Design and implement metadata services and distributed schedulers for petabyte-scale systems.
  • Build fault-tolerant infrastructure with replication, automatic failover, and consensus protocols.
  • Develop diagnostic tooling to debug and resolve customer-reported production issues.
  • Contribute to CI/CD pipeline by instrumenting components with logging, tracing, and metrics.
  • Collaborate on reliability and scalability decisions for distributed data systems.

What we're looking for

  • 6+ years of professional software engineering experience with at least 2 years in large-scale distributed systems.
  • Proficiency in Java, Go, C++, or a comparable systems language for production-level distributed-system code.
  • Deep understanding and hands-on experience with consistency models, replication, quorum systems, leader election, and consensus protocols (Raft/Paxos).
  • Experience designing fault-tolerant systems with automatic failover, idempotent operations, durable recovery, and distributed observability tools.
  • Ability to communicate clearly through design documents, post-mortems, and capacity analyses; Bachelor's degree in Computer Science or equivalent experience.

More like this

Similar roles

Senior Software Engineer, Distributed Systems

Apple Inc

Cupertino, CA 9 days ago $150,400$277,600
Go Rust Scala Kubernetes Docker CI/CD Prometheus Grafana PostgreSQL Redis AWS Azure Google Cloud Platform Git Jenkins Python JavaScript React Node.js REST GraphQL

System Software Engineer, Distributed Systems

Nvidia

Santa Clara, CA 10 days ago $152,000$241,500
Go Python Linux NFS IBM LSF Docker Kubernetes Perl CI/CD Prometheus Grafana Git Bash SQL Redis Zookeeper Consul ETCD Apache Kafka Flask Django

Senior Software Engineer, Distributed Systems

Microsoft

Redmond, WA 10 days ago $119,800$234,700
Azure C# .NET Java Go Docker Kubernetes CI/CD Terraform PostgreSQL Redis Elasticsearch Prometheus Grafana Distributed Systems Service Reliability Performance Optimization