Making memcpy(NULL, NULL, 0) well-defined
Intuition vs. C Standard for memcpy(NULL, NULL, 0)
- Many expect copying 0 bytes to be a no-op regardless of pointers.
- The C standard historically made passing null pointers UB even when length is 0.
- In practice, major libc implementations already treat length‑0
memcpyas a no-op, but the compiler is allowed to assume it never happens. - The C2y change is to make
memcpy(NULL, NULL, 0)well‑defined to align spec with reality and remove surprising compiler behavior.
Undefined Behavior and Compiler Optimizations
- UB lets compilers assume “this never happens” and thus remove branches, hoist or delete checks, and do aggressive alias and bounds reasoning.
- Examples:
- GCC removes
dest == NULLbranches after amemcpy(dest, ..., len)call, even iflenis 0 or provably 0. - Dead code elimination and register allocation rely on assuming no out‑of‑bounds or invalid pointer access.
- GCC removes
- Some see this as necessary for performance; others see it as a major source of fragile miscompilations and hard‑to‑reason-about behavior.
Abstract Machine, Pointers, and Memory Model
- The discussion stresses that C is defined on an abstract machine of “objects,” not a flat address space.
- Treating pointers as mere integers would break many optimizations (e.g., assuming local variables are not modified via forged pointers).
- Pointer provenance, null arithmetic, and cross‑object pointer comparisons/subtractions are all subtle UB areas that optimizers exploit.
Static Analysis and the Proposed Change
- Static analyzers previously could unconditionally flag passing NULL to
memcpyas a bug. - With length‑0 calls now defined, analyzers must reason about the size argument, increasing complexity and risk of false positives/negatives.
- Some argue this cost is acceptable to remove a surprising, widely ignored UB; others worry about weakening simple checks.
C Safety vs. Newer Languages and Tooling
- One side argues C can be used safely with discipline, abstractions, sanitizers, and (often expensive) sound static analysis.
- Another side claims humans consistently get C’s UB wrong, and memory‑safe languages (Rust, managed runtimes) are a better default, even if unsafe blocks and bugs still occur.
- There is debate over whether tightening C’s semantics (less UB) would make it closer to a “high‑level assembler” but significantly slower, undermining its niche.