Debugging memory corruption: who the hell writes "2" into my stack? (2016)
Nature of the Bug
- Thread agrees the core issue is: kernel writes asynchronously to a user-provided buffer that was on the stack, after that stack frame was unwound by an exception → use-after-return.
- Several commenters stress this is primarily undefined behavior (throwing through C frames), not a classic in-process buffer overrun.
- Confusion over whether this is “memory corruption” vs “UB”; consensus: UB at the language/ABI boundary, manifesting as stack corruption.
Debugging Approaches
- Hardware breakpoints suggested, but others note they don’t trigger on kernel writes in user space.
- Time-travel / reverse debuggers (rr, Windows TTD) discussed:
- Could help if they record kernel side effects or interpose async writes.
- But handling async syscalls is hard and often not implemented.
- Perf_event on Linux mentioned as a way to set global hardware breakpoints.
- Valgrind, ASAN/MSAN/UBSAN praised, but multiple people note they wouldn’t catch this specific bug.
Exceptions, C ABI, and APCs
- Strong skepticism about C++ exceptions in systems code; multiple comments advocate avoiding them entirely, or tightly scoping them.
- Key rule repeated: never throw exceptions across C frames, callbacks, or OS callbacks (APCs, signals, qsort, etc.).
- Discussion around
noexceptand C ABI:- Idea: treat C and
extern "C"functions as implicitlynoexcept. - Proposals for compilers to warn when non-
noexceptfunction pointers are passed to C/noexceptAPIs. - Rust 1.81 change (aborting on unwinding through
extern "C") cited as a mitigation.
- Idea: treat C and
Memory-Safe Languages and FFI
- Disagreement on whether Rust/memory-safe languages “would have prevented” this:
- One side: safe Rust can encode lifetimes and forbid passing stack buffers with too-short lifetimes.
- Other side: once you cross into syscalls/FFI, the language can’t fully enforce kernel contracts; unsafe FFI remains a risk.
- General consensus: safe wrappers help, but correctness hinges on accurately modeling OS API requirements.
Win32 / OS API Design & Patterns
- APC / alertable waits described as a powerful but dangerous mechanism, analogous to but safer than Unix signals.
- Criticism that documentation under-emphasizes “don’t throw / don’t unwind” from APCs.
- Contrast drawn with Unix I/O:
- Windows completion-style APIs (OVERLAPPED, IOCP) can hold user pointers across time.
- Traditional Unix syscalls rarely keep user pointers asynchronously; newer AIO/io_uring do.
- Self-pipe / loopback-socket trick recognized as a standard, safer pattern for interrupting
select().
Lessons and Broader Takeaways
- Don’t throw exceptions out of async callbacks or while inside syscalls.
- Avoid stack-backed buffers for operations where the kernel may complete later or re-enter your code unexpectedly.
- Design cancellation with explicit, synchronous semantics where possible.
- Several anecdotes reinforce how small “clever” ideas can cause hugely expensive debugging sessions.