Pointers Are Complicated II, or: We need better language specs (2020)
Rust’s evolving provenance model
- Commenters note that since the post was written, Rust has explicitly adopted provenance in its memory model and stabilized “strict provenance” pointer APIs.
- Emphasis that unsafe Rust authors should use these APIs instead of smuggling pointers through
usize, to reduce unsound code. - Some see Rust’s model (exposed provenance, APIs to recover it from integers) as a concrete, pragmatic answer to the issues described in the article.
The contested C example and which optimization is wrong
- Long debate over the article’s toy program where three optimizations are applied; all agree a behavior change occurred, but not on which pass is semantically wrong.
- One camp: the last optimization (“q is never written; replace
q[0]by 0”) is clearly incorrect because the compiler has seen a store via an address that may aliasq, especially after pointer→integer→pointer shenanigans. - Another camp: the second optimization, which silently turns an integer-derived store into an out-of-bounds pointer store, is the real violation when viewed at the C level.
- There is confusion and back‑and‑forth about: C vs LLVM IR semantics, when a pointer is considered “exposed,” whether integers carry provenance, and how data/aliasing information must be preserved between passes.
Pointer provenance, exposure, and optimization trade‑offs
- Central tension: defining provenance strictly enough to justify optimizations vs making it so permissive that many optimizations (including register allocation and reordering) become impossible.
- Some argue for a model where any pointer-to-int cast is “exposing” and thus potentially aliases exposed objects; others say this would be too pessimistic and slow.
- PNVI‑ae‑udi (from a C TS) is cited as a compromise model; defenders call it essentially the only workable way to combine abstract pointers with integer addresses.
- CHERI is mentioned as a hardware example where provenance is literal capability metadata and arbitrary int→ptr casts simply don’t exist.
C, undefined behavior, and “portable assembly” expectations
- Several participants argue that many C programmers expect pointers to be raw addresses and C to be “portable assembler,” and see provenance-based rewrites as betrayal of that mental model.
- Others counter that out‑of‑bounds and alias‑based “tricks” cause silent corruption and exploits; UB and strict aliasing rules are what make aggressive optimization possible.
- The culture around UB is criticized: real-world kernels and libraries routinely rely on non‑strictly‑conforming behavior, and major projects often compile with flags that weaken aliasing assumptions.
Ranges, overflow, and related semantics
- One‑past‑the‑end pointers are debated: some call them a design mistake; others defend them as essential for half‑open ranges and idiomatic iteration.
- Separate thread on signed integer overflow: C’s UB vs Rust’s defined two’s‑complement (with debug panics), and whether UB is really necessary for loop optimizations.
- Disagreement over using smaller signed types and UB as a “bug-catching” tool vs writing semantics that match the math (e.g., explicit wrapping types).