2025-01-02

Postgres UUIDv7 and per-back end monotonicity

UUID Versions and Use Cases

UUIDv4: purely random; widely used but can fail when entropy sources are weak.
UUIDv3/v5: namespace+hash based; useful for deterministic IDs and idempotent imports but controversial because they use MD5/SHA‑1.
- Some participants claim certain high‑assurance/regulated contexts ban SHA‑1 (and even v4 with poor entropy) for critical IDs where collisions must be “impossible”.
- Others argue MD5/SHA‑1 remain fine as non‑cryptographic hashes and that UUIDs are not integrity or confidentiality mechanisms.
UUIDv7: time‑ordered (timestamp in high bits, remaining bits mostly random) and intended as the “modern” sortable UUID.

UUIDv7 Monotonicity and RFC Interpretation

The RFC states that time‑based UUIDs “will be monotonic” due to embedded timestamps and describes optional methods to provide additional monotonicity within a millisecond (using extra time precision and/or counters in rand_a/rand_b).
Disagreement exists:
- One side: monotonicity is a core guarantee; non‑monotonic behavior is an implementation bug.
- Other side: only partial monotonicity is guaranteed; sub‑millisecond ordering is explicitly optional, so depending on it is risky.

Postgres Implementation and Reliance on Behavior

Postgres’s v7 implementation uses RFC “Method 3”: replace 12 random bits with higher‑precision time (down to ~250 ns steps per ms), yielding per‑backend monotonicity at high rates.
Critics argue:
- This is not (yet) a documented API guarantee; relying on it couples application code to a particular DB version/engine.
- Tests should explicitly sort or use sets rather than depend on implicit ID ordering.
Others counter that:
- Using well‑documented implementation details is acceptable, especially in tests.
- In practice, widely used behaviors often become de facto specs (Hyrum’s Law).

Entropy, Collisions, and Time Bias

Using 12 bits of timestamp reduces randomness but also shortens the time window per bucket; consensus is collisions remain extremely unlikely for realistic workloads.
Some note real clocks aren’t uniformly random at nanosecond scale, so time‑derived bits are a weaker randomness source than true PRNG output, but still good enough for most databases.

Comparisons: ULID, Snowflake, and Alternatives

ULID:
- Guarantees per‑process monotonicity by incrementing low bits; spec mandates monotonic generation.
- Uses base32 encoding; harder to manipulate in vanilla SQL compared to hex UUIDv7.
- Monotonicity requires serialized generation; only per process, not global.
Snowflake:
- 64‑bit, millisecond timestamp + machine ID + counter; good for distributed monotonic integers.
- UUIDv7’s advantage is compatibility with existing UUID ecosystems and DB types while giving Snowflake‑like ordering.
Some suggest separating concerns instead of using v7 for everything: keep an autoincrement PK, an opaque random ID (v4), and a high‑precision timestamp, rather than conflating all three into one value.

Practical Concerns and Code Longevity

One camp emphasizes long‑lived systems: avoid depending on undocumented/optional behavior to “save a line of code”.
Another camp notes much business code is short‑lived; absolute robustness may be overkill, but even then, simple explicit sorting is a low‑cost safeguard.
Several commenters remind that ID order does not equal commit order; cursor‑based APIs and background jobs need stronger invariants (e.g., a “frozen” watermark) beyond monotonic UUIDs.

Related topics