Avoid UUID Version 4 Primary Keys in Postgres
Scope and database specifics
- Most arguments are explicitly about Postgres with B-tree indexes and single-node OLTP workloads.
- Several commenters stress this is not universal: distributed databases (Spanner, Cockroach, Dynamo-like systems) often prefer randomized keys to avoid hot shards.
Performance, indexes, and fragmentation
- Core concern: UUIDv4’s randomness destroys locality in B-tree indexes.
- Inserts land all over the index, causing frequent page splits, higher write amplification, WAL bloat, and very large, cache-unfriendly indexes.
- This can force indexes out of RAM and lead to more disk I/O and sequential scans.
- Sequential or mostly-monotonic keys (bigint sequences, Snowflake-style, UUIDv7/ULID) keep recent rows clustered, improving insert cost and range scans.
- Some report real wins migrating large UUIDv4 PK tables to bigint; others running 60M+ to billions of UUIDv4 rows say it’s a non-issue relative to other bottlenecks.
UUIDv4 vs UUIDv7 and other schemes
- Many agree: if you need UUIDs in Postgres, v7 (or ULID/KSUID) is better than v4 because of temporal ordering.
- Counterpoint: UUIDv7 embeds a timestamp, which can leak creation time and enable timing or statistical inferences; some prefer v4 for privacy.
- Alternatives mentioned:
- Bigint sequences as default PKs, sometimes with a single global sequence.
- Snowflake/sonyflake IDs, Firebase-style push IDs.
- ULID / CUID2 / custom time+random hybrids.
- Composite keys like (parent_id, local_int) for locality.
- Some think the article’s integer “obfuscation” (simple XOR) is weak; recommend proper ciphers or format-preserving encryption if you go that route.
Security, privacy, and enumeration
- Sequential ints leak counts and relative age; can reveal business volume, enable IDOR-style enumeration, and support “German tank problem” estimates.
- UUIDs (or obfuscated IDs) mitigate this, but:
- RFC warns not all UUIDs are security capabilities; debate over whether well-generated v4s are nonetheless “unguessable enough” for capability URLs.
- UUIDv7/ULID timestamp bits can leak user or activity timing, admin status, early adopter status, etc., in some domains (voting, sensitive accounts, business metrics).
Public vs internal IDs
- Common compromise: bigint PK for internal relations + separate UUID or hashed “public_id” in APIs/URLs.
- This retains performance for joins while avoiding predictable external IDs, at the cost of another index and more complexity.
- Others argue PKs should simply never be trusted for authz; “unguessable IDs” are defense-in-depth, not a primary security mechanism.
Distributed systems and sharding
- In sharded/distributed DBs, monotonically increasing global keys can create hot partitions; randomized keys (v4, reversed ints, hashed sequences) distribute load better.
- Commenters note you can also encode shard IDs into keys or shard by other attributes, but that adds design complexity. Preemptively using UUIDs is seen by some as a “get out of jail free” for future sharding.
Premature optimization and trade-offs
- One camp: PK choice is foundational and hard to change; start with integers (or UUIDv7) in Postgres to avoid predictable performance problems.
- Other camp: for many apps, UUIDv4 performance cost is negligible; data volumes rarely reach problematic scale, and simplicity/operational benefits (client-generated IDs, easy merging, idempotency) outweigh the overhead.
- Overall sentiment: “avoid blanket rules”; understand workload (write-heavy vs read-heavy, range scans vs point lookups, single-node vs distributed) and privacy requirements before standardizing on UUIDv4 PKs in Postgres.