Making memcpy(NULL, NULL, 0) well-defined

Intuition vs. C Standard for memcpy(NULL, NULL, 0)

  • Many expect copying 0 bytes to be a no-op regardless of pointers.
  • The C standard historically made passing null pointers UB even when length is 0.
  • In practice, major libc implementations already treat length‑0 memcpy as a no-op, but the compiler is allowed to assume it never happens.
  • The C2y change is to make memcpy(NULL, NULL, 0) well‑defined to align spec with reality and remove surprising compiler behavior.

Undefined Behavior and Compiler Optimizations

  • UB lets compilers assume “this never happens” and thus remove branches, hoist or delete checks, and do aggressive alias and bounds reasoning.
  • Examples:
    • GCC removes dest == NULL branches after a memcpy(dest, ..., len) call, even if len is 0 or provably 0.
    • Dead code elimination and register allocation rely on assuming no out‑of‑bounds or invalid pointer access.
  • Some see this as necessary for performance; others see it as a major source of fragile miscompilations and hard‑to‑reason-about behavior.

Abstract Machine, Pointers, and Memory Model

  • The discussion stresses that C is defined on an abstract machine of “objects,” not a flat address space.
  • Treating pointers as mere integers would break many optimizations (e.g., assuming local variables are not modified via forged pointers).
  • Pointer provenance, null arithmetic, and cross‑object pointer comparisons/subtractions are all subtle UB areas that optimizers exploit.

Static Analysis and the Proposed Change

  • Static analyzers previously could unconditionally flag passing NULL to memcpy as a bug.
  • With length‑0 calls now defined, analyzers must reason about the size argument, increasing complexity and risk of false positives/negatives.
  • Some argue this cost is acceptable to remove a surprising, widely ignored UB; others worry about weakening simple checks.

C Safety vs. Newer Languages and Tooling

  • One side argues C can be used safely with discipline, abstractions, sanitizers, and (often expensive) sound static analysis.
  • Another side claims humans consistently get C’s UB wrong, and memory‑safe languages (Rust, managed runtimes) are a better default, even if unsafe blocks and bugs still occur.
  • There is debate over whether tightening C’s semantics (less UB) would make it closer to a “high‑level assembler” but significantly slower, undermining its niche.