Postgres UUIDv7 and per-back end monotonicity

UUID Versions and Use Cases

  • UUIDv4: purely random; widely used but can fail when entropy sources are weak.
  • UUIDv3/v5: namespace+hash based; useful for deterministic IDs and idempotent imports but controversial because they use MD5/SHA‑1.
    • Some participants claim certain high‑assurance/regulated contexts ban SHA‑1 (and even v4 with poor entropy) for critical IDs where collisions must be “impossible”.
    • Others argue MD5/SHA‑1 remain fine as non‑cryptographic hashes and that UUIDs are not integrity or confidentiality mechanisms.
  • UUIDv7: time‑ordered (timestamp in high bits, remaining bits mostly random) and intended as the “modern” sortable UUID.

UUIDv7 Monotonicity and RFC Interpretation

  • The RFC states that time‑based UUIDs “will be monotonic” due to embedded timestamps and describes optional methods to provide additional monotonicity within a millisecond (using extra time precision and/or counters in rand_a/rand_b).
  • Disagreement exists:
    • One side: monotonicity is a core guarantee; non‑monotonic behavior is an implementation bug.
    • Other side: only partial monotonicity is guaranteed; sub‑millisecond ordering is explicitly optional, so depending on it is risky.

Postgres Implementation and Reliance on Behavior

  • Postgres’s v7 implementation uses RFC “Method 3”: replace 12 random bits with higher‑precision time (down to ~250 ns steps per ms), yielding per‑backend monotonicity at high rates.
  • Critics argue:
    • This is not (yet) a documented API guarantee; relying on it couples application code to a particular DB version/engine.
    • Tests should explicitly sort or use sets rather than depend on implicit ID ordering.
  • Others counter that:
    • Using well‑documented implementation details is acceptable, especially in tests.
    • In practice, widely used behaviors often become de facto specs (Hyrum’s Law).

Entropy, Collisions, and Time Bias

  • Using 12 bits of timestamp reduces randomness but also shortens the time window per bucket; consensus is collisions remain extremely unlikely for realistic workloads.
  • Some note real clocks aren’t uniformly random at nanosecond scale, so time‑derived bits are a weaker randomness source than true PRNG output, but still good enough for most databases.

Comparisons: ULID, Snowflake, and Alternatives

  • ULID:
    • Guarantees per‑process monotonicity by incrementing low bits; spec mandates monotonic generation.
    • Uses base32 encoding; harder to manipulate in vanilla SQL compared to hex UUIDv7.
    • Monotonicity requires serialized generation; only per process, not global.
  • Snowflake:
    • 64‑bit, millisecond timestamp + machine ID + counter; good for distributed monotonic integers.
    • UUIDv7’s advantage is compatibility with existing UUID ecosystems and DB types while giving Snowflake‑like ordering.
  • Some suggest separating concerns instead of using v7 for everything: keep an autoincrement PK, an opaque random ID (v4), and a high‑precision timestamp, rather than conflating all three into one value.

Practical Concerns and Code Longevity

  • One camp emphasizes long‑lived systems: avoid depending on undocumented/optional behavior to “save a line of code”.
  • Another camp notes much business code is short‑lived; absolute robustness may be overkill, but even then, simple explicit sorting is a low‑cost safeguard.
  • Several commenters remind that ID order does not equal commit order; cursor‑based APIs and background jobs need stronger invariants (e.g., a “frozen” watermark) beyond monotonic UUIDs.