The provenance memory model for C
Article formatting and accessibility
- Several readers report broken HTML/Markdown conversion: unescaped
&, Unicode mangling, and an unclosed code block that swallows later paragraphs. - Code blocks are described as hard to read; one commenter re-renders the article via ChatGPT for better accessibility.
- The author acknowledges WordPress editing quirks, says the post will be regenerated, and later notes that translation errors have been fixed, though some minor grammar issues remain.
Unicode identifiers and non-ASCII code
- Large subthread on whether C allows Unicode identifiers: specification details (C99 UCNs, C23 XID classes) vs. implementation-defined source character sets.
- Some argue anything non-ASCII in identifiers should be a syntax error for security/readability; others counter that many human languages are non-Latin and deserve first-class support.
- Concerns raised about visually confusable Unicode, “Zalgo” text, and homograph-style vulnerabilities; proposals include rejecting confusable mixes and normalizing identifiers.
- Others defend Unicode for matching mathematical notation or native language domain terms, but there’s resistance to obscure single-character symbols that are hard to type or distinguish.
Modern C, “bloated C”, and new features
- Mixed reactions to the latest “Modern C” edition: some praise the book; others dislike pervasive attributes and newer C features seen as C++-style “bloat”.
- Examples of contentious features include
_BitInt,guard,defer,auto,constexpr,nullptr,_Generic,typeof,restrict, and syntax-based TLS.
Provenance memory model and optimizer behavior
- Many see the provenance model as a formalization of what compilers already assume: you can’t conjure valid pointers from integers or thin air.
- It’s framed as standardizing the contract between programmers and compilers to reduce miscompilations and make more existing code “officially” well-defined.
- Some worry about “more nasal demons”: unclear if the model mainly forbids “sane” low-level tricks or legitimizes previously-UB idioms.
- Technical debate over pointer-to-integer conversions gaining side effects (exposure), implications for dead-load elimination, and interactions with strict aliasing and
char-based type punning. - Discussion of ambiguous provenance at object boundaries, one-past-the-end pointers, and how the model distinguishes storage instances (malloc vs struct fields).
Alias analysis, sanitizers, and allocators
- TySan (LLVM’s type-based alias sanitizer) is mentioned as related work; it currently misses some cases (e.g., unions) and reflects LLVM’s imperfect TBAA.
- Some criticize Clang’s type-based aliasing as non-conforming to the C standard.
- Questions about custom allocators layered on
mallocand whether the model supports nested storage abstractions; suggestion to mark custom allocators via attributes or builtins so compilers know they return fresh storage. - Interest in Rust-like primitives such as a
with_addrfunction to explicitly combine provenance and integer addresses; others argue the model prioritizes not breaking existing C over adding such intrinsics.
C vs other languages and memory safety
- Several commenters express affection for C but note rising “social unacceptability” of using memory-unsafe languages; others dismiss social pressure as a decision factor.
- Alternatives proposed: Pascal, Ada, D, Zig, Rust, and Fil-C (a modified Clang aiming for memory-safe C/C++).
- Zig is seen by some as a “middle ground” between C and Rust, with checked builds and fewer footguns; critics argue it still lacks robust guarantees against use-after-free, data races, and aliasing issues compared to Rust.
- Fil-C is cited as a working memory-safe toolchain for existing C, but its requirement that all code be compiled with it is seen as a major adoption barrier.
Miscellaneous C language discussions
- Debate over longstanding C warts: case sensitivity,
=vs==bugs, truthiness/coercions, macros, null-terminated strings. - Clarification that
registernow primarily forbids taking an address rather than hinting about CPU registers; some question its practical value relative toconst. - Brief technical clarifications on struct alignment, representation of struct pointers, and how touching objects interact with provenance.
- Side commentaries: XOR linked list example is appreciated; jokes about mathematicians’ terse variable names and about Unicode-heavy pseudocode signaling “academic” style.