Pointers Are Complicated II, or: We need better language specs (2020)

Rust’s evolving provenance model

  • Commenters note that since the post was written, Rust has explicitly adopted provenance in its memory model and stabilized “strict provenance” pointer APIs.
  • Emphasis that unsafe Rust authors should use these APIs instead of smuggling pointers through usize, to reduce unsound code.
  • Some see Rust’s model (exposed provenance, APIs to recover it from integers) as a concrete, pragmatic answer to the issues described in the article.

The contested C example and which optimization is wrong

  • Long debate over the article’s toy program where three optimizations are applied; all agree a behavior change occurred, but not on which pass is semantically wrong.
  • One camp: the last optimization (“q is never written; replace q[0] by 0”) is clearly incorrect because the compiler has seen a store via an address that may alias q, especially after pointer→integer→pointer shenanigans.
  • Another camp: the second optimization, which silently turns an integer-derived store into an out-of-bounds pointer store, is the real violation when viewed at the C level.
  • There is confusion and back‑and‑forth about: C vs LLVM IR semantics, when a pointer is considered “exposed,” whether integers carry provenance, and how data/aliasing information must be preserved between passes.

Pointer provenance, exposure, and optimization trade‑offs

  • Central tension: defining provenance strictly enough to justify optimizations vs making it so permissive that many optimizations (including register allocation and reordering) become impossible.
  • Some argue for a model where any pointer-to-int cast is “exposing” and thus potentially aliases exposed objects; others say this would be too pessimistic and slow.
  • PNVI‑ae‑udi (from a C TS) is cited as a compromise model; defenders call it essentially the only workable way to combine abstract pointers with integer addresses.
  • CHERI is mentioned as a hardware example where provenance is literal capability metadata and arbitrary int→ptr casts simply don’t exist.

C, undefined behavior, and “portable assembly” expectations

  • Several participants argue that many C programmers expect pointers to be raw addresses and C to be “portable assembler,” and see provenance-based rewrites as betrayal of that mental model.
  • Others counter that out‑of‑bounds and alias‑based “tricks” cause silent corruption and exploits; UB and strict aliasing rules are what make aggressive optimization possible.
  • The culture around UB is criticized: real-world kernels and libraries routinely rely on non‑strictly‑conforming behavior, and major projects often compile with flags that weaken aliasing assumptions.

Ranges, overflow, and related semantics

  • One‑past‑the‑end pointers are debated: some call them a design mistake; others defend them as essential for half‑open ranges and idiomatic iteration.
  • Separate thread on signed integer overflow: C’s UB vs Rust’s defined two’s‑complement (with debug panics), and whether UB is really necessary for loop optimizations.
  • Disagreement over using smaller signed types and UB as a “bug-catching” tool vs writing semantics that match the math (e.g., explicit wrapping types).