2025-01-09

SQL nulls are weird

What NULL Represents

Strong debate over whether SQL NULL means “unknown value,” “no value,” “missing data,” or an overloaded mix of all three.
Some argue the original relational notion is “unknown,” and that calling it NULL was an unfortunate naming choice; others insist “no value” or “absence” is a more practical mental model.
Several point out that conflating “unknown” and “known absent” is what makes NULL so confusing in real systems.

Three‑Valued Logic and Query Semantics

Many comments explain SQL’s three-valued logic (TRUE/FALSE/UNKNOWN) and show how comparisons with NULL yield UNKNOWN, which WHERE treats as “not selected.”
This leads to surprising behavior: = NULL / <> NULL never match; x = value silently excludes NULL rows unless explicitly guarded.
Some see this as mathematically consistent (e.g., Kleene logic); others call it ergonomically “clownish,” especially since X = X can be UNKNOWN.

Uniqueness, DISTINCT, GROUP BY

Key confusion: UNIQUE constraints treat multiple NULLs as distinct, while SELECT DISTINCT / GROUP BY group all NULLs together.
Some justify this as “set/group semantics”: all unknowns form one group, but are not equal as values.
Others find this inconsistent and argue it forces awkward patterns and mental overhead.

Database-Specific Behavior

Postgres 15+ allows NULLS NOT DISTINCT in unique indexes, giving control over NULL uniqueness.
IS [NOT] DISTINCT FROM (or dialect equivalents) is highlighted as the “correct” equality operator when NULLs must compare as equal.
Oracle’s empty‑string‑equals‑NULL behavior is widely criticized as especially bizarre.
Different engines (Postgres, MySQL/MariaDB, SQL Server, SQLite, Oracle) diverge on indexing, uniqueness, and NULL treatment, adding portability pain.

Schema Design and Modeling Choices

Academic relational purists argue NULL should rarely exist; optional data should be modeled as separate tables (1:0..1 relations).
Practitioners push back: this level of normalization is often impractical and verbose, so nullable columns are the de facto solution.
Soft deletes using deleted_at IS NULL are debated: some call it overloading and a design smell; others say it’s fine with proper indexes/views.

Alternatives, Workarounds, and Ergonomics

Suggested mitigations: COALESCE/IFNULL, IS [NOT] DISTINCT FROM, partial/filtered indexes, views for “active” rows.
Some wish for richer type systems (sum types, Option/Maybe, multiple null kinds) or even SQL replacements (e.g., systems that avoid NULL entirely).
Several note that NULL’s semantics are logically defensible but ergonomically hostile, especially for developers coming from “simple” language nulls and ORMs.

Related topics