TIL: Versions of UUID and when to use them
UUIDv4 vs UUIDv7 vs ULID
- Many recommend: use v7 by default; use v4 when creation time could be sensitive or when IDs must be hard to guess.
- v7 is k‑sortable by timestamp, good for database/index locality and time-based querying.
- ULID previously filled this niche; now that v7 is standardized, some prefer v7 for ecosystem support while keeping ULID’s presentation format.
- Python’s current
uuid7third‑party package is called out as unmaintained and non‑compliant (nanosecond vs millisecond precision), potentially breaking v7 monotonicity.
Deterministic / hash-based IDs
- Some want deterministic IDs that can be regenerated when reprocessing data.
- UUIDv5 (and v3) are highlighted as hash‑based, deterministic options (v5 uses SHA‑1).
- There’s debate on how much standardization is possible, since what to hash is inherently application-specific.
Security and privacy concerns
- Timestamps in IDs may leak creation time; v4 avoids this, v7/ULID do not.
- Some see parts of the security industry as overemphasizing unlikely threats; others argue it’s still valuable to at least consider security trade‑offs.
- Using non‑guessability as a security control is discussed; if you truly need that, random (v4‑style) IDs are preferred.
MAC-based and hash-based versions
- Advice: avoid MAC‑based versions, especially v1; they can leak hardware info.
- MD5 is criticized for cryptography, but noted as still usable as a non‑crypto hash, with performance and ubiquity trade‑offs.
- RFC9562’s v8 example shows how to plug in stronger hashes like SHA‑256, though truncation to 128 bits is a limitation.
Shorter and alternative ID schemes
- Strong interest in “short UUIDs” or URL‑friendly IDs: base64/base58 encoded UUIDs, ULID, Nanoid, Sqids, custom hashed integers, YouTube-style IDs, etc.
- Trade‑offs: length vs collision risk, human readability, URL safety, lexicographic sortability, and standardization.
- Some work is underway to standardize shorter encodings for 128‑bit UUIDs.
Semantics, history, and standards
- UUIDs are fundamentally 128‑bit numbers; the hyphenated string is just one encoding.
- Several argue programs should treat UUIDs as opaque binary values and not infer semantics.
- Clarifications are made that v2 is specified (via DCE), contrary to the article’s “no known details” phrasing.
- Historical context: early uses were for ephemeral message IDs keyed by time and hardware ID; later usage shifted to “canned” identifiers for objects and resources.
Database and system design considerations
- For non‑distributed systems, many recommend simple auto‑increment integers; k‑sortable or encrypted integers can be exposed externally.
- For distributed or client‑generated IDs, UUIDv7, snowflake-like schemes, or hash‑based IDs are preferred over central counters.
- v7 is praised for improving performance in systems like S3 metadata stores and key‑value databases (e.g., DynamoDB) due to its timestamp ordering.
- Reminder that even UUIDs don’t guarantee zero collisions—only extremely low probability—so requirements should match the actual risk and scale.
Overall practical guidance
- Common pragmatic stance:
- Non‑distributed / simple app: use integers.
- Need distributed, opaque, query‑friendly IDs: use UUIDv7.
- Need maximum unpredictability or to hide time: use UUIDv4.
- Need deterministic IDs: use v5 (hash‑based) or an explicit hashing scheme.