Avoid UUID Version 4 Primary Keys in Postgres

Scope and database specifics

  • Most arguments are explicitly about Postgres with B-tree indexes and single-node OLTP workloads.
  • Several commenters stress this is not universal: distributed databases (Spanner, Cockroach, Dynamo-like systems) often prefer randomized keys to avoid hot shards.

Performance, indexes, and fragmentation

  • Core concern: UUIDv4’s randomness destroys locality in B-tree indexes.
    • Inserts land all over the index, causing frequent page splits, higher write amplification, WAL bloat, and very large, cache-unfriendly indexes.
    • This can force indexes out of RAM and lead to more disk I/O and sequential scans.
  • Sequential or mostly-monotonic keys (bigint sequences, Snowflake-style, UUIDv7/ULID) keep recent rows clustered, improving insert cost and range scans.
  • Some report real wins migrating large UUIDv4 PK tables to bigint; others running 60M+ to billions of UUIDv4 rows say it’s a non-issue relative to other bottlenecks.

UUIDv4 vs UUIDv7 and other schemes

  • Many agree: if you need UUIDs in Postgres, v7 (or ULID/KSUID) is better than v4 because of temporal ordering.
  • Counterpoint: UUIDv7 embeds a timestamp, which can leak creation time and enable timing or statistical inferences; some prefer v4 for privacy.
  • Alternatives mentioned:
    • Bigint sequences as default PKs, sometimes with a single global sequence.
    • Snowflake/sonyflake IDs, Firebase-style push IDs.
    • ULID / CUID2 / custom time+random hybrids.
    • Composite keys like (parent_id, local_int) for locality.
  • Some think the article’s integer “obfuscation” (simple XOR) is weak; recommend proper ciphers or format-preserving encryption if you go that route.

Security, privacy, and enumeration

  • Sequential ints leak counts and relative age; can reveal business volume, enable IDOR-style enumeration, and support “German tank problem” estimates.
  • UUIDs (or obfuscated IDs) mitigate this, but:
    • RFC warns not all UUIDs are security capabilities; debate over whether well-generated v4s are nonetheless “unguessable enough” for capability URLs.
    • UUIDv7/ULID timestamp bits can leak user or activity timing, admin status, early adopter status, etc., in some domains (voting, sensitive accounts, business metrics).

Public vs internal IDs

  • Common compromise: bigint PK for internal relations + separate UUID or hashed “public_id” in APIs/URLs.
  • This retains performance for joins while avoiding predictable external IDs, at the cost of another index and more complexity.
  • Others argue PKs should simply never be trusted for authz; “unguessable IDs” are defense-in-depth, not a primary security mechanism.

Distributed systems and sharding

  • In sharded/distributed DBs, monotonically increasing global keys can create hot partitions; randomized keys (v4, reversed ints, hashed sequences) distribute load better.
  • Commenters note you can also encode shard IDs into keys or shard by other attributes, but that adds design complexity. Preemptively using UUIDs is seen by some as a “get out of jail free” for future sharding.

Premature optimization and trade-offs

  • One camp: PK choice is foundational and hard to change; start with integers (or UUIDv7) in Postgres to avoid predictable performance problems.
  • Other camp: for many apps, UUIDv4 performance cost is negligible; data volumes rarely reach problematic scale, and simplicity/operational benefits (client-generated IDs, easy merging, idempotency) outweigh the overhead.
  • Overall sentiment: “avoid blanket rules”; understand workload (write-heavy vs read-heavy, range scans vs point lookups, single-node vs distributed) and privacy requirements before standardizing on UUIDv4 PKs in Postgres.