Postgres UUIDv7 and per-back end monotonicity
UUID Versions and Use Cases
- UUIDv4: purely random; widely used but can fail when entropy sources are weak.
- UUIDv3/v5: namespace+hash based; useful for deterministic IDs and idempotent imports but controversial because they use MD5/SHA‑1.
- Some participants claim certain high‑assurance/regulated contexts ban SHA‑1 (and even v4 with poor entropy) for critical IDs where collisions must be “impossible”.
- Others argue MD5/SHA‑1 remain fine as non‑cryptographic hashes and that UUIDs are not integrity or confidentiality mechanisms.
- UUIDv7: time‑ordered (timestamp in high bits, remaining bits mostly random) and intended as the “modern” sortable UUID.
UUIDv7 Monotonicity and RFC Interpretation
- The RFC states that time‑based UUIDs “will be monotonic” due to embedded timestamps and describes optional methods to provide additional monotonicity within a millisecond (using extra time precision and/or counters in
rand_a/rand_b). - Disagreement exists:
- One side: monotonicity is a core guarantee; non‑monotonic behavior is an implementation bug.
- Other side: only partial monotonicity is guaranteed; sub‑millisecond ordering is explicitly optional, so depending on it is risky.
Postgres Implementation and Reliance on Behavior
- Postgres’s v7 implementation uses RFC “Method 3”: replace 12 random bits with higher‑precision time (down to ~250 ns steps per ms), yielding per‑backend monotonicity at high rates.
- Critics argue:
- This is not (yet) a documented API guarantee; relying on it couples application code to a particular DB version/engine.
- Tests should explicitly sort or use sets rather than depend on implicit ID ordering.
- Others counter that:
- Using well‑documented implementation details is acceptable, especially in tests.
- In practice, widely used behaviors often become de facto specs (Hyrum’s Law).
Entropy, Collisions, and Time Bias
- Using 12 bits of timestamp reduces randomness but also shortens the time window per bucket; consensus is collisions remain extremely unlikely for realistic workloads.
- Some note real clocks aren’t uniformly random at nanosecond scale, so time‑derived bits are a weaker randomness source than true PRNG output, but still good enough for most databases.
Comparisons: ULID, Snowflake, and Alternatives
- ULID:
- Guarantees per‑process monotonicity by incrementing low bits; spec mandates monotonic generation.
- Uses base32 encoding; harder to manipulate in vanilla SQL compared to hex UUIDv7.
- Monotonicity requires serialized generation; only per process, not global.
- Snowflake:
- 64‑bit, millisecond timestamp + machine ID + counter; good for distributed monotonic integers.
- UUIDv7’s advantage is compatibility with existing UUID ecosystems and DB types while giving Snowflake‑like ordering.
- Some suggest separating concerns instead of using v7 for everything: keep an autoincrement PK, an opaque random ID (v4), and a high‑precision timestamp, rather than conflating all three into one value.
Practical Concerns and Code Longevity
- One camp emphasizes long‑lived systems: avoid depending on undocumented/optional behavior to “save a line of code”.
- Another camp notes much business code is short‑lived; absolute robustness may be overkill, but even then, simple explicit sorting is a low‑cost safeguard.
- Several commenters remind that ID order does not equal commit order; cursor‑based APIs and background jobs need stronger invariants (e.g., a “frozen” watermark) beyond monotonic UUIDs.