Debugging memory corruption: who the hell writes "2" into my stack? (2016)

Nature of the Bug

  • Thread agrees the core issue is: kernel writes asynchronously to a user-provided buffer that was on the stack, after that stack frame was unwound by an exception → use-after-return.
  • Several commenters stress this is primarily undefined behavior (throwing through C frames), not a classic in-process buffer overrun.
  • Confusion over whether this is “memory corruption” vs “UB”; consensus: UB at the language/ABI boundary, manifesting as stack corruption.

Debugging Approaches

  • Hardware breakpoints suggested, but others note they don’t trigger on kernel writes in user space.
  • Time-travel / reverse debuggers (rr, Windows TTD) discussed:
    • Could help if they record kernel side effects or interpose async writes.
    • But handling async syscalls is hard and often not implemented.
  • Perf_event on Linux mentioned as a way to set global hardware breakpoints.
  • Valgrind, ASAN/MSAN/UBSAN praised, but multiple people note they wouldn’t catch this specific bug.

Exceptions, C ABI, and APCs

  • Strong skepticism about C++ exceptions in systems code; multiple comments advocate avoiding them entirely, or tightly scoping them.
  • Key rule repeated: never throw exceptions across C frames, callbacks, or OS callbacks (APCs, signals, qsort, etc.).
  • Discussion around noexcept and C ABI:
    • Idea: treat C and extern "C" functions as implicitly noexcept.
    • Proposals for compilers to warn when non-noexcept function pointers are passed to C/noexcept APIs.
    • Rust 1.81 change (aborting on unwinding through extern "C") cited as a mitigation.

Memory-Safe Languages and FFI

  • Disagreement on whether Rust/memory-safe languages “would have prevented” this:
    • One side: safe Rust can encode lifetimes and forbid passing stack buffers with too-short lifetimes.
    • Other side: once you cross into syscalls/FFI, the language can’t fully enforce kernel contracts; unsafe FFI remains a risk.
  • General consensus: safe wrappers help, but correctness hinges on accurately modeling OS API requirements.

Win32 / OS API Design & Patterns

  • APC / alertable waits described as a powerful but dangerous mechanism, analogous to but safer than Unix signals.
  • Criticism that documentation under-emphasizes “don’t throw / don’t unwind” from APCs.
  • Contrast drawn with Unix I/O:
    • Windows completion-style APIs (OVERLAPPED, IOCP) can hold user pointers across time.
    • Traditional Unix syscalls rarely keep user pointers asynchronously; newer AIO/io_uring do.
  • Self-pipe / loopback-socket trick recognized as a standard, safer pattern for interrupting select().

Lessons and Broader Takeaways

  • Don’t throw exceptions out of async callbacks or while inside syscalls.
  • Avoid stack-backed buffers for operations where the kernel may complete later or re-enter your code unexpectedly.
  • Design cancellation with explicit, synchronous semantics where possible.
  • Several anecdotes reinforce how small “clever” ideas can cause hugely expensive debugging sessions.