2025-02-27

Distributed systems programming has stalled

Burnout, culture, and observability

Several commenters echo the opening anecdote: modern distributed work often means chasing missing requests across many components, with poor observability and little organizational appetite to invest in better tools.
Some teams resist automation or tooling because manual debugging is seen as the “real work,” or as job security; others are simply burned out and defensive.
Where organizations do invest in structured logging, tracing, and APM, people report dramatic uptime and stress improvements—but also very high SaaS costs and frequent misconfiguration.

Embedded vs distributed work

Multiple people who switched from cloud/distributed back to embedded (often in Rust or C/C++) report higher satisfaction and a sense of control.
Others counter that embedded is also ugly: weak tooling, poor datasheets, non-existent remote observability, low pay, and heavy domain-specific math.
Several point out that modern embedded systems (cars, IoT platforms, power and battery controllers) are themselves complex distributed systems, just with different failure modes and buses.

Overuse of distributed systems & cloud-native

Strong sentiment that many companies adopt microservices, serverless, and Kubernetes without needing them, trading simple monoliths on powerful hardware for slower, costlier, more fragile systems.
Some argue distributed architectures are justified mainly by availability and organizational boundaries, not throughput; but admit teams often underestimate complexity and operational burden.

What “distributed system” really means

One camp says almost everything with a network connection is already a distributed system; the “rush” is just people finally recognizing that.
Another camp uses “distributed” to mean “multi-node, strongly coordinated architecture” and insists most businesses will never truly need that level of sophistication.

Difficulty, theory, and formal methods

Consensus that distributed systems are inherently hard—often compared to or harder than cryptography—because of explosion of state, timing, and failure modes.
Some say the deep theory (Lamport, Paxos, clocks, Byzantine faults) exists and “solved” the fundamentals decades ago; the real gap is practical programming models and verification tools that ordinary engineers can apply.

Existing models and stalled innovation

Erlang/Elixir, actor models, Unison, X10 “places,” Bloom, choreographic programming, and projects like Hydro are cited as promising or existing answers, but none have gone mainstream.
Commenters debate “static-location” (actors/microservices) vs “external-distribution” (databases, queues) vs “arbitrary-location/durable execution” (workflows, Temporal). Each trades control, performance, and cognitive load differently.

Education and skills gap

Many engineers never had a distributed systems course; several argue it should replace less broadly useful topics (like compilers) in standard curricula.
Others say most real expertise comes from on-the-job learning anyway; reading classic papers and modern courses (e.g., DDIA, MIT) is still rare.

LLMs and rising complexity

Some see distributed systems complexity about to spike further as LLMs become central components and even generate dynamic, non-repeatable code.
A few speculate LLMs might eventually help reason about or verify such systems, but for now they struggle even more with non-local, cross-component behavior.

Related topics