No such thing as exactly-once delivery
Core distinction: “delivery” vs “processing”
- Major thread theme: people conflate “message delivery” with “message processing / committing side effects.”
- One camp insists “exactly-once delivery” is impossible in failure-prone distributed systems.
- Another says you can get “exactly-once processing” via idempotency, deduplication, counters, and transactions, as long as you acknowledge this is different from transport-level delivery.
- Side effects (emails, database writes, external APIs) are where guarantees usually break down.
Limits, failures, and probabilities
- Several comments stress that even “at-least-once” cannot be guaranteed in finite time when nodes, networks, or power can fail arbitrarily or partitions persist.
- Systems can only drive the probability of loss/duplication arbitrarily low, not to zero.
- References to Byzantine Generals and CAP: global, time-bounded exactly-once is provably impossible under realistic assumptions.
Examples: TCP, queues, email, HFT
- TCP is described as:
- Within one connection: data never delivered twice by definition.
- From the app’s perspective: at-most-once, because data can be lost on failures.
- Streaming frameworks (Kafka, Kinesis, Flink, Beam, Kafka Streams) use offsets/checkpoints to approximate exactly-once processing over at-least-once delivery.
- Email’s Message-Id is cited as an idempotency key for deduplication.
- High-frequency trading example: strict latency budgets make even at-least-once impossible to guarantee.
Idempotency, transactions, and system boundaries
- Repeated point: you can build reliable, transactional behavior on unreliable components, but you pay with complexity and cross-layer logic.
- Exactly-once processing is achievable inside a transactional boundary; crossing boundaries requires idempotency keys and careful coordination.
- Chaining two “exactly-once” subsystems via a stateless middle still requires end-to-end idempotency.
Filesystem and low-level guarantees
- Debate over whether file renames across directories are truly atomic and durable in crashes.
- Distinction between POSIX-level atomicity from a process’s view and on-disk reality under crashes or in distributed filesystems.
- Conclusion: even with “atomic” primitives, crash timing can still reintroduce duplicates or ambiguity.
Semantics, marketing, and practice
- Several comments criticize vendors who advertise “exactly-once delivery,” arguing it’s really “exactly-once for practical purposes” or “inside our processing model.”
- Some argue that if a higher layer only ever sees each message once, that’s effectively exactly-once; others insist terminology must reflect theoretical limits.
- Anecdotes show real systems often have much higher duplicate rates than expected, and many apps assume exactly-once without monitoring or checks.