Why was Apache Kafka created?
Perception of the article & naming
- Some initially suspected the post was AI-generated or “blogspam” due to Substack’s look and heavy bullet/bold formatting; author clarified only the thumbnail was AI, content was hand-written.
- Several jokes and side-comments about the Kafka name (writing-optimized system, “Kafkaesque” configuration, dark literary references).
When Kafka fits vs when it doesn’t
- Strong theme: Kafka is often misused as “just a queue” or generic pub/sub because of hype, resume-building, or sales pressure; this can cost a lot in money and complexity.
- Multiple commenters: if you only need a simple queue, better options include RabbitMQ, Redis, SQS, MQTT, or even named pipes/ZMQ; Kafka becomes needless overhead for small/internal apps.
- Others argue Kafka is appropriate once you have many producers, large event volumes, and multiple independent consumers that need durable, ordered, replayable streams.
Kafka’s core value: distributed log & replay
- Defenders stress that Kafka is not a traditional queue but a distributed log:
- Very high write throughput.
- Retention and replay for minutes to days.
- Decoupled producers/consumers; new consumers can re-read from any offset.
- At-least-once delivery with ordering within partitions.
- Replayability is cited as a key original motivation at LinkedIn: enabling new consumers on historical data without rebuilding pipelines.
Alternatives and comparisons
- MQTT: good for lightweight pub/sub and IoT, but weak on persistence and backpressure; often paired with Kafka/RabbitMQ rather than replacing them.
- NATS / Jetstream:
- Praised for simplicity and flexibility, but nuanced analysis notes tradeoffs: per-stream/consumer overhead, scaling limits, 1MB message cap, more complex topologies and semantics.
- Redis Streams and cloud services (Kinesis, Pub/Sub) suggested as simpler, cheaper Kafka alternatives at non–Fortune-100 scale.
- Redpanda highlighted as Kafka-compatible with less operational/JVM overhead.
Operational & complexity concerns
- Recurrent complaints: Kafka is “Kafkaesque” to run; HA clusters are hard, especially on unreliable cloud infra. Managed services (AWS MSK) help but migrations (e.g., to KRaft) are non-trivial.
- Stream-processing systems and replay tooling (offset management, separate replay consumers, DLQs) are described as much harder than batch/S3+SQL approaches, especially for small teams.
LinkedIn scale and evolution
- Discussion of LinkedIn’s reported Kafka scale (tens of trillions of records/day, ~17 PB/day) and how that drives very different design choices.
- LinkedIn is now moving from Kafka to an internal system (Northguard) to address metadata scalability and partitioning limitations; seen as a “clean-sheet” redesign for extreme scale.
Java / JVM side-thread
- Long subthread debates whether Kafka’s Java/JVM basis makes it “bloated” or just widely adopted and well-tooled; disagreement centers on memory use, GC tradeoffs, and real-world performance vs theory.