PostgreSQL and UUID as Primary Key

Performance and storage implications

  • Random UUIDs (especially v4) hurt B‑tree locality: more page fragmentation, cache misses, index and WAL bloat, and higher IOPS, particularly on network/EBS storage.
  • Time-ordered IDs (bigserial, UUIDv7, ULID, TSID, Snowflake-like) insert more sequentially and tend to be faster for inserts and friendlier to caches.
  • UUIDs are 16 bytes vs 4/8 for int/bigint; this multiplies across indexes and foreign keys. For very large tables, people report tens–hundreds of GB of extra storage.
  • Using Postgres’ native uuid type is consistently preferred over char(36)/text for UUID storage.

UUIDv4 vs UUIDv7 and other schemes

  • UUIDv4: fully random, great for distributed generation and security-sensitive tokens; worst for index locality.
  • UUIDv7: time-ordered, still random-suffixed; widely suggested as a good default for DB PKs, but has fewer random bits and embeds timestamps.
  • Alternatives mentioned: Snowflake IDs, ULID, TSID, TypeID, Stripe-style IDs; all aim for time-ordering + distribution + human/URL friendliness.

Security, information leakage, and “guessable” IDs

  • Concerns about auto-increment IDs: easy enumeration, German tank problem (inferring volume/growth), and some attack/abuse scenarios.
  • Concerns about UUIDv7/ULID: timestamp leakage can reveal creation times, growth patterns, even sensitive events (e.g., acquisitions); RFCs recommend v4 for security-critical uses.
  • Some argue these leaks are overblown for most apps; others treat “secure by default” as a strong requirement and avoid exposing ordered IDs at all.

Schema design: PK choice, foreign keys, and migration

  • Common pattern advocated: internal bigint/bigserial PK for joins and batching; separate external random ID (UUID/ULID/hash) for APIs and URLs.
  • “Bigint vs int vs smallint” debate:
    • One side: always use bigint to avoid painful 32-bit exhaustion and future growth surprises.
    • Other side: choose the smallest type that fits lifetime cardinality; storage and cache footprint matter, especially with many FKs.
  • Changing PK type later is possible but painful, especially with many foreign keys and external references.

Distributed systems and ID generation location

  • UUIDs shine when IDs must be generated client-side or across shards/tenants without coordination.
  • Some note you can still use monotonic integers in distributed settings with sharded sequences or central allocators, but that adds complexity.
  • Debate on whether IDs should be generated in the DB (simpler semantics) or app/clients (offload DB, offline-first, idempotency).

Collision risk and correctness

  • UUID collisions are treated as astronomically unlikely; most consider extra collision checking unnecessary beyond the DB’s unique index.
  • If a collision ever occurs, many would treat it as an indicator of a serious RNG or security problem rather than something to silently retry.

Tooling, ergonomics, and practicality

  • Debugging and ad-hoc querying with UUIDs is seen as more painful; numeric IDs are easier to read, compare, and batch (“high water mark” semantics).
  • Some engineers report billions-of-row systems on UUIDv4 without issues; others (especially DB-focused roles) see UUIDv4 as a consistent performance and cost footgun.
  • General sentiment: context matters; for many apps, UUID choice won’t be the first bottleneck, but making a sensible default (e.g., UUIDv7 + native type, or bigint PK + external ID) is cheap and avoids future regret.