PostgreSQL and UUID as Primary Key
Performance and storage implications
- Random UUIDs (especially v4) hurt B‑tree locality: more page fragmentation, cache misses, index and WAL bloat, and higher IOPS, particularly on network/EBS storage.
- Time-ordered IDs (bigserial, UUIDv7, ULID, TSID, Snowflake-like) insert more sequentially and tend to be faster for inserts and friendlier to caches.
- UUIDs are 16 bytes vs 4/8 for int/bigint; this multiplies across indexes and foreign keys. For very large tables, people report tens–hundreds of GB of extra storage.
- Using Postgres’ native
uuidtype is consistently preferred overchar(36)/textfor UUID storage.
UUIDv4 vs UUIDv7 and other schemes
- UUIDv4: fully random, great for distributed generation and security-sensitive tokens; worst for index locality.
- UUIDv7: time-ordered, still random-suffixed; widely suggested as a good default for DB PKs, but has fewer random bits and embeds timestamps.
- Alternatives mentioned: Snowflake IDs, ULID, TSID, TypeID, Stripe-style IDs; all aim for time-ordering + distribution + human/URL friendliness.
Security, information leakage, and “guessable” IDs
- Concerns about auto-increment IDs: easy enumeration, German tank problem (inferring volume/growth), and some attack/abuse scenarios.
- Concerns about UUIDv7/ULID: timestamp leakage can reveal creation times, growth patterns, even sensitive events (e.g., acquisitions); RFCs recommend v4 for security-critical uses.
- Some argue these leaks are overblown for most apps; others treat “secure by default” as a strong requirement and avoid exposing ordered IDs at all.
Schema design: PK choice, foreign keys, and migration
- Common pattern advocated: internal bigint/bigserial PK for joins and batching; separate external random ID (UUID/ULID/hash) for APIs and URLs.
- “Bigint vs int vs smallint” debate:
- One side: always use bigint to avoid painful 32-bit exhaustion and future growth surprises.
- Other side: choose the smallest type that fits lifetime cardinality; storage and cache footprint matter, especially with many FKs.
- Changing PK type later is possible but painful, especially with many foreign keys and external references.
Distributed systems and ID generation location
- UUIDs shine when IDs must be generated client-side or across shards/tenants without coordination.
- Some note you can still use monotonic integers in distributed settings with sharded sequences or central allocators, but that adds complexity.
- Debate on whether IDs should be generated in the DB (simpler semantics) or app/clients (offload DB, offline-first, idempotency).
Collision risk and correctness
- UUID collisions are treated as astronomically unlikely; most consider extra collision checking unnecessary beyond the DB’s unique index.
- If a collision ever occurs, many would treat it as an indicator of a serious RNG or security problem rather than something to silently retry.
Tooling, ergonomics, and practicality
- Debugging and ad-hoc querying with UUIDs is seen as more painful; numeric IDs are easier to read, compare, and batch (“high water mark” semantics).
- Some engineers report billions-of-row systems on UUIDv4 without issues; others (especially DB-focused roles) see UUIDv4 as a consistent performance and cost footgun.
- General sentiment: context matters; for many apps, UUID choice won’t be the first bottleneck, but making a sensible default (e.g., UUIDv7 + native type, or bigint PK + external ID) is cheap and avoids future regret.