2025-08-22

Why was Apache Kafka created?

Perception of the article & naming

Some initially suspected the post was AI-generated or “blogspam” due to Substack’s look and heavy bullet/bold formatting; author clarified only the thumbnail was AI, content was hand-written.
Several jokes and side-comments about the Kafka name (writing-optimized system, “Kafkaesque” configuration, dark literary references).

When Kafka fits vs when it doesn’t

Strong theme: Kafka is often misused as “just a queue” or generic pub/sub because of hype, resume-building, or sales pressure; this can cost a lot in money and complexity.
Multiple commenters: if you only need a simple queue, better options include RabbitMQ, Redis, SQS, MQTT, or even named pipes/ZMQ; Kafka becomes needless overhead for small/internal apps.
Others argue Kafka is appropriate once you have many producers, large event volumes, and multiple independent consumers that need durable, ordered, replayable streams.

Kafka’s core value: distributed log & replay

Defenders stress that Kafka is not a traditional queue but a distributed log:
- Very high write throughput.
- Retention and replay for minutes to days.
- Decoupled producers/consumers; new consumers can re-read from any offset.
- At-least-once delivery with ordering within partitions.
Replayability is cited as a key original motivation at LinkedIn: enabling new consumers on historical data without rebuilding pipelines.

Alternatives and comparisons

MQTT: good for lightweight pub/sub and IoT, but weak on persistence and backpressure; often paired with Kafka/RabbitMQ rather than replacing them.
NATS / Jetstream:
- Praised for simplicity and flexibility, but nuanced analysis notes tradeoffs: per-stream/consumer overhead, scaling limits, 1MB message cap, more complex topologies and semantics.
Redis Streams and cloud services (Kinesis, Pub/Sub) suggested as simpler, cheaper Kafka alternatives at non–Fortune-100 scale.
Redpanda highlighted as Kafka-compatible with less operational/JVM overhead.

Operational & complexity concerns

Recurrent complaints: Kafka is “Kafkaesque” to run; HA clusters are hard, especially on unreliable cloud infra. Managed services (AWS MSK) help but migrations (e.g., to KRaft) are non-trivial.
Stream-processing systems and replay tooling (offset management, separate replay consumers, DLQs) are described as much harder than batch/S3+SQL approaches, especially for small teams.

LinkedIn scale and evolution

Discussion of LinkedIn’s reported Kafka scale (tens of trillions of records/day, ~17 PB/day) and how that drives very different design choices.
LinkedIn is now moving from Kafka to an internal system (Northguard) to address metadata scalability and partitioning limitations; seen as a “clean-sheet” redesign for extreme scale.

Java / JVM side-thread

Long subthread debates whether Kafka’s Java/JVM basis makes it “bloated” or just widely adopted and well-tooled; disagreement centers on memory use, GC tradeoffs, and real-world performance vs theory.

Related topics