Kafka at the low end: how bad can it get?

KIP-932 and Kafka-as-Queue Direction

  • Multiple comments highlight KIP‑932 (“Queues for Kafka”) as a major change: it should make at-least-once workers and HTTP push gateways much easier, addressing many concerns in the article.
  • Some view it as Kafka trying to compete more directly with traditional job/message queues, but warn the consumer model will become more complex and have “foot-guns” for newcomers.

Partitioning, Fairness, and Low-Volume Pathologies

  • Core article issue: with few messages and multiple workers, round‑robin or random partitioning can still leave workers idle while others are overloaded.
  • Some argue better partitioning (key-based, random UUIDs, more partitions than workers, multi-threaded consumers) can largely mitigate this, especially at higher volumes.
  • Others (including the article’s author) maintain that with genuinely low volumes, you can’t rely on probabilistic smoothing; unfair distribution still occurs in practice.

Alternative Technologies Recommended

  • Strong preference in the thread for simpler tools at low throughput:
    • Traditional DB with SELECT … FOR UPDATE SKIP LOCKED (especially Postgres).
    • RabbitMQ, AWS SQS, Azure Service Bus, Google Cloud Tasks, NATS/JetStream, Redis (lists/streams/pubsub), Beanstalkd, ActiveMQ, Pulsar, Temporal.
  • Advice: use the database you already have until you truly hit scale; pay for managed brokers where possible to avoid operational pain.

Database-as-Queue Debate

  • Some warn “DB as queue” is an antipattern; others counter that modern databases explicitly support this (e.g., SKIP LOCKED) and that simplicity and transactional enqueueing outweigh inefficiency at small scale.
  • Real‑world examples show Postgres queues handling tens of thousands to hundreds of millions of items per hour, with partitioning and batching used to control SKIP LOCKED overhead.
  • Consensus: acceptable at low–medium volume; might require careful design at high throughput.

Durability, Semantics, and Operational Complexity

  • Kafka is repeatedly described as a distributed write‑ahead log, not a job queue; at‑least‑once, not exactly‑once, semantics and retry handling are tricky.
  • Handling failures, retries, and “competing consumers” in Kafka often needs extra topics, DB tables, or custom logic.
  • Experience reports: Kafka is operationally heavy (JVM, cluster management, unclear durability defaults); RabbitMQ, NATS, Pulsar, or Redpanda are often perceived as simpler.
  • A subthread debates Kafka vs Redpanda durability (fsync, replication factors), with disagreement over how risky Kafka’s default settings are.

Ecosystem, Popularity, and Overuse of Kafka

  • Many see Kafka adoption in low-volume scenarios as “resume-driven development” or a red-flag dependency chosen for buzz rather than fit.
  • Counterpoint: even at low volume, Kafka can be justified for multi-consumer replayability, ordered per-key processing, and chaining async workflows.
  • Pulsar and Redpanda are discussed as technically strong Kafka alternatives, but Kafka’s ecosystem and commercial backing give it momentum.

Miscellaneous

  • Some teams report abandoning Kafka for RabbitMQ after hitting the fairness and complexity issues described in the article.
  • Side discussions cover SCADA integrations, .NET messaging libraries, and the origin/irony of the “Kafka” name.