Why was Apache Kafka created?

Perception of the article & naming

  • Some initially suspected the post was AI-generated or “blogspam” due to Substack’s look and heavy bullet/bold formatting; author clarified only the thumbnail was AI, content was hand-written.
  • Several jokes and side-comments about the Kafka name (writing-optimized system, “Kafkaesque” configuration, dark literary references).

When Kafka fits vs when it doesn’t

  • Strong theme: Kafka is often misused as “just a queue” or generic pub/sub because of hype, resume-building, or sales pressure; this can cost a lot in money and complexity.
  • Multiple commenters: if you only need a simple queue, better options include RabbitMQ, Redis, SQS, MQTT, or even named pipes/ZMQ; Kafka becomes needless overhead for small/internal apps.
  • Others argue Kafka is appropriate once you have many producers, large event volumes, and multiple independent consumers that need durable, ordered, replayable streams.

Kafka’s core value: distributed log & replay

  • Defenders stress that Kafka is not a traditional queue but a distributed log:
    • Very high write throughput.
    • Retention and replay for minutes to days.
    • Decoupled producers/consumers; new consumers can re-read from any offset.
    • At-least-once delivery with ordering within partitions.
  • Replayability is cited as a key original motivation at LinkedIn: enabling new consumers on historical data without rebuilding pipelines.

Alternatives and comparisons

  • MQTT: good for lightweight pub/sub and IoT, but weak on persistence and backpressure; often paired with Kafka/RabbitMQ rather than replacing them.
  • NATS / Jetstream:
    • Praised for simplicity and flexibility, but nuanced analysis notes tradeoffs: per-stream/consumer overhead, scaling limits, 1MB message cap, more complex topologies and semantics.
  • Redis Streams and cloud services (Kinesis, Pub/Sub) suggested as simpler, cheaper Kafka alternatives at non–Fortune-100 scale.
  • Redpanda highlighted as Kafka-compatible with less operational/JVM overhead.

Operational & complexity concerns

  • Recurrent complaints: Kafka is “Kafkaesque” to run; HA clusters are hard, especially on unreliable cloud infra. Managed services (AWS MSK) help but migrations (e.g., to KRaft) are non-trivial.
  • Stream-processing systems and replay tooling (offset management, separate replay consumers, DLQs) are described as much harder than batch/S3+SQL approaches, especially for small teams.

LinkedIn scale and evolution

  • Discussion of LinkedIn’s reported Kafka scale (tens of trillions of records/day, ~17 PB/day) and how that drives very different design choices.
  • LinkedIn is now moving from Kafka to an internal system (Northguard) to address metadata scalability and partitioning limitations; seen as a “clean-sheet” redesign for extreme scale.

Java / JVM side-thread

  • Long subthread debates whether Kafka’s Java/JVM basis makes it “bloated” or just widely adopted and well-tooled; disagreement centers on memory use, GC tradeoffs, and real-world performance vs theory.