TIL: Versions of UUID and when to use them

UUIDv4 vs UUIDv7 vs ULID

  • Many recommend: use v7 by default; use v4 when creation time could be sensitive or when IDs must be hard to guess.
  • v7 is k‑sortable by timestamp, good for database/index locality and time-based querying.
  • ULID previously filled this niche; now that v7 is standardized, some prefer v7 for ecosystem support while keeping ULID’s presentation format.
  • Python’s current uuid7 third‑party package is called out as unmaintained and non‑compliant (nanosecond vs millisecond precision), potentially breaking v7 monotonicity.

Deterministic / hash-based IDs

  • Some want deterministic IDs that can be regenerated when reprocessing data.
  • UUIDv5 (and v3) are highlighted as hash‑based, deterministic options (v5 uses SHA‑1).
  • There’s debate on how much standardization is possible, since what to hash is inherently application-specific.

Security and privacy concerns

  • Timestamps in IDs may leak creation time; v4 avoids this, v7/ULID do not.
  • Some see parts of the security industry as overemphasizing unlikely threats; others argue it’s still valuable to at least consider security trade‑offs.
  • Using non‑guessability as a security control is discussed; if you truly need that, random (v4‑style) IDs are preferred.

MAC-based and hash-based versions

  • Advice: avoid MAC‑based versions, especially v1; they can leak hardware info.
  • MD5 is criticized for cryptography, but noted as still usable as a non‑crypto hash, with performance and ubiquity trade‑offs.
  • RFC9562’s v8 example shows how to plug in stronger hashes like SHA‑256, though truncation to 128 bits is a limitation.

Shorter and alternative ID schemes

  • Strong interest in “short UUIDs” or URL‑friendly IDs: base64/base58 encoded UUIDs, ULID, Nanoid, Sqids, custom hashed integers, YouTube-style IDs, etc.
  • Trade‑offs: length vs collision risk, human readability, URL safety, lexicographic sortability, and standardization.
  • Some work is underway to standardize shorter encodings for 128‑bit UUIDs.

Semantics, history, and standards

  • UUIDs are fundamentally 128‑bit numbers; the hyphenated string is just one encoding.
  • Several argue programs should treat UUIDs as opaque binary values and not infer semantics.
  • Clarifications are made that v2 is specified (via DCE), contrary to the article’s “no known details” phrasing.
  • Historical context: early uses were for ephemeral message IDs keyed by time and hardware ID; later usage shifted to “canned” identifiers for objects and resources.

Database and system design considerations

  • For non‑distributed systems, many recommend simple auto‑increment integers; k‑sortable or encrypted integers can be exposed externally.
  • For distributed or client‑generated IDs, UUIDv7, snowflake-like schemes, or hash‑based IDs are preferred over central counters.
  • v7 is praised for improving performance in systems like S3 metadata stores and key‑value databases (e.g., DynamoDB) due to its timestamp ordering.
  • Reminder that even UUIDs don’t guarantee zero collisions—only extremely low probability—so requirements should match the actual risk and scale.

Overall practical guidance

  • Common pragmatic stance:
    • Non‑distributed / simple app: use integers.
    • Need distributed, opaque, query‑friendly IDs: use UUIDv7.
    • Need maximum unpredictability or to hide time: use UUIDv4.
    • Need deterministic IDs: use v5 (hash‑based) or an explicit hashing scheme.