2025-05-19

The principles of database design, or, the Truth is out there

Normalization, performance, and usability

Many commenters argue the article over-emphasizes full normalization (3NF/5NF/6NF).
Common practice described: aim for ~3NF (sometimes BCNF) and selectively denormalize for performance, especially for reporting and analytics.
Some find highly normalized schemas painful for ad‑hoc queries (too many joins, especially in meetings or exploratory analysis).
Others counter that joins are intrinsic to the relational model; if you dislike joins, you may want a different data store or a dedicated reporting schema/views.
Skeptics of “Principle of Full Normalization” see it as ideological purity that ignores trade‑offs, hardware realities, and domain ambiguity.

Natural vs surrogate keys (PED principle)

The proposed “Principle of Essential Denotation” (use natural keys as identifiers) is heavily contested.
Multiple commenters report long, painful experiences where natural keys (e.g., national IDs, ISBNs, job codes, user-defined names) turned out to be:
- Non-unique, reused, or duplicated.
- Mutable due to policy changes, errors, or real‑world events (adoption, fraud, schema changes).
Many strongly prefer surrogate keys (typically integers) as primary keys, with natural keys enforced via unique constraints where appropriate.
Others note that natural keys are still important conceptually; danger is when they are ignored entirely and no constraints are enforced.

Security, exposure of identifiers, and UUIDs

Debate over having separate internal IDs (e.g., bigint) and external IDs (UUIDv4/v7, encrypted IDs).
One side: don’t expose internal sequential IDs; they reveal scale and enable inference attacks (German tank problem–style reasoning). Random or encrypted external IDs are seen as a useful extra layer.
Opposing view: relying on obscurity is fragile; proper authorization should make exposed IDs harmless, and adding multiple IDs increases complexity and query cost.
Some suggest encrypting internal IDs into 128‑bit opaque tokens, UUID-like but not standard, as a cheap way to avoid leaking internal state.

Messy reality and “natural” keys

Many anecdotes show supposedly “natural” identifiers are messy: duplicate or invalid ISBNs, conflicting government job codes, non-unique SSNs/national IDs, twins with near-identical attributes, people lacking documents, changing ID formats.
Takeaway: any externally assigned ID is risky as a primary key; treat such values as domain data, not core identity.

ORMs, theory vs practice, and domain complexity

Some principles and examples are said to map poorly to mainstream ORMs; developers relying solely on ORMs may miss important SQL features and optimizations.
Commenters highlight domains where:
- Identity is probabilistic (entity resolution), so no clear natural key exists.
- Multiple representations are inherently required (e.g., spatial vs graph structures in mapping).
Several criticize the article’s philosophical framing (“truth”, “representation of reality”) as overly theoretical; they emphasize pragmatic schemas that are efficient, evolvable, and reflect messy business rules rather than an idealized model.

Related topics