2025-12-15

Avoid UUID Version 4 Primary Keys in Postgres

Scope and database specifics

Most arguments are explicitly about Postgres with B-tree indexes and single-node OLTP workloads.
Several commenters stress this is not universal: distributed databases (Spanner, Cockroach, Dynamo-like systems) often prefer randomized keys to avoid hot shards.

Performance, indexes, and fragmentation

Core concern: UUIDv4’s randomness destroys locality in B-tree indexes.
- Inserts land all over the index, causing frequent page splits, higher write amplification, WAL bloat, and very large, cache-unfriendly indexes.
- This can force indexes out of RAM and lead to more disk I/O and sequential scans.
Sequential or mostly-monotonic keys (bigint sequences, Snowflake-style, UUIDv7/ULID) keep recent rows clustered, improving insert cost and range scans.
Some report real wins migrating large UUIDv4 PK tables to bigint; others running 60M+ to billions of UUIDv4 rows say it’s a non-issue relative to other bottlenecks.

UUIDv4 vs UUIDv7 and other schemes

Many agree: if you need UUIDs in Postgres, v7 (or ULID/KSUID) is better than v4 because of temporal ordering.
Counterpoint: UUIDv7 embeds a timestamp, which can leak creation time and enable timing or statistical inferences; some prefer v4 for privacy.
Alternatives mentioned:
- Bigint sequences as default PKs, sometimes with a single global sequence.
- Snowflake/sonyflake IDs, Firebase-style push IDs.
- ULID / CUID2 / custom time+random hybrids.
- Composite keys like (parent_id, local_int) for locality.
Some think the article’s integer “obfuscation” (simple XOR) is weak; recommend proper ciphers or format-preserving encryption if you go that route.

Security, privacy, and enumeration

Sequential ints leak counts and relative age; can reveal business volume, enable IDOR-style enumeration, and support “German tank problem” estimates.
UUIDs (or obfuscated IDs) mitigate this, but:
- RFC warns not all UUIDs are security capabilities; debate over whether well-generated v4s are nonetheless “unguessable enough” for capability URLs.
- UUIDv7/ULID timestamp bits can leak user or activity timing, admin status, early adopter status, etc., in some domains (voting, sensitive accounts, business metrics).

Public vs internal IDs

Common compromise: bigint PK for internal relations + separate UUID or hashed “public_id” in APIs/URLs.
This retains performance for joins while avoiding predictable external IDs, at the cost of another index and more complexity.
Others argue PKs should simply never be trusted for authz; “unguessable IDs” are defense-in-depth, not a primary security mechanism.

Distributed systems and sharding

In sharded/distributed DBs, monotonically increasing global keys can create hot partitions; randomized keys (v4, reversed ints, hashed sequences) distribute load better.
Commenters note you can also encode shard IDs into keys or shard by other attributes, but that adds design complexity. Preemptively using UUIDs is seen by some as a “get out of jail free” for future sharding.

Premature optimization and trade-offs

One camp: PK choice is foundational and hard to change; start with integers (or UUIDv7) in Postgres to avoid predictable performance problems.
Other camp: for many apps, UUIDv4 performance cost is negligible; data volumes rarely reach problematic scale, and simplicity/operational benefits (client-generated IDs, easy merging, idempotency) outweigh the overhead.
Overall sentiment: “avoid blanket rules”; understand workload (write-heavy vs read-heavy, range scans vs point lookups, single-node vs distributed) and privacy requirements before standardizing on UUIDv4 PKs in Postgres.

Related topics