Ask HN: We just had an actual UUID v4 collision...

Collision probability & randomness

  • Commenters stress that UUIDv4 collisions are astronomically unlikely but not impossible; odds like “1 in ~10²⁸” are cited.
  • Many argue a bug or bad RNG is vastly more likely than a true random collision at ~15k records.
  • Several note confusion between probability and certainty, and correct gambler’s fallacy errors: a collision doesn’t change lottery odds; events are independent.

Suspected root causes

  • Strong consensus that the cause is a broken or weak entropy source:
    • Poorly seeded PRNGs on cheap/embedded devices or VMs.
    • Browser shims/polyfills for node:crypto, especially on mobile or constrained environments.
    • Deterministic “random” in bots (e.g., Googlebot) leading to duplicate UUIDs.
    • Virtualization “virtualizing entropy away” or snapshotting RNG state.
  • A specific uuid npm package change is flagged: rng() reusing a single Uint8Array instead of returning a copy, creating a foot-gun if misused.

Client vs server UUID generation

  • Many criticize generating UUIDs on client devices or letting users provide IDs, citing manipulation and weak RNG.
  • Others argue client-side UUIDs are fine if validated and backed by uniqueness constraints.
  • There are concrete anecdotes of analytics systems based on browser-generated UUIDs suffering widespread collisions.

Alternatives & UUID versions

  • Timestamp-based or structured IDs are discussed:
    • UUIDv1/v7, ULID, Snowflake-like schemes, database sequences, and AES-encrypted counters.
    • Pros: sortability, reduced dependence on entropy, easier collision reasoning.
    • Cons: time leakage (privacy/side-channel), clock drift, fewer random bits per ID.
  • Some claim v7 would make a “collision like this” impossible; others counter it still has non-zero collision probability, especially with high volume per millisecond.

Handling collisions in practice

  • Several recommend always planning for collisions:
    • Unique indexes in the DB, retry-on-conflict loops, or generator-side checks against a cache.
  • Others note that the original appeal of UUIDv4 was precisely to avoid centralized checks, and that checking at scale can be costly.

Entropy quality & high-reliability views

  • High-reliability systems often avoid pure entropy-based IDs because detecting RNG failure is hard.
  • Entropy sources (hardware noise, radiation, “lava lamp walls”) are discussed; more entropy is seen as good, but hard to verify in production.

Cultural / architectural critiques

  • Multiple anecdotes mock over-engineered UUID microservices and KPI-driven team growth, using this incident to highlight misplaced complexity and risk-blind design.