RIP pthread_cancel

pthread_cancel and async cancellation

  • Many commenters argue that pthread_cancel—especially asynchronous cancellation—is fundamentally unsafe: a thread can be killed while holding locks or manipulating internal data structures, causing leaks, corruption, or deadlocks.
  • Asynchronous cancellation is acknowledged as only realistically safe for pure, compute‑bound loops that don’t allocate, lock, or touch shared state, which is a very narrow use case.
  • Comparisons are made to other “kill a thread” primitives (Windows TerminateThread, old Java Thread.stop / destroy / suspend), which are widely regarded as design mistakes.
  • Some see pthread_cancel as useful in theory but too hard to use correctly in long‑running, resource‑managing code; others say it’s “never the answer” outside process shutdown.

Cooperative cancellation and alternative designs

  • Several participants prefer cooperative cancellation: inexpensive periodic checks of a shared “done” flag, with normal cleanup paths rather than abrupt termination.
  • There’s debate over the performance cost of inserting branches in hot loops; some claim it’s negligible when placed outside the innermost loop, others argue it disrupts tight compute kernels.
  • One commenter contrasts pthread_cancel with kernel‑style interruption: set a flag and have blocking operations return an error (e.g., EINTR/ECANCELED), letting existing error‑handling unwind state cleanly.
  • Coroutine libraries are cited as examples where cancellation simply makes blocking calls return immediately with a specific error code.

Blocking DNS, getaddrinfo, and portability

  • Much discussion centers on getaddrinfo being blocking, uncancellable in practice, and entangled with system configuration (NSS, /etc/resolv.conf, gai.conf).
  • A variety of async DNS APIs exist (getaddrinfo_a on glibc, OpenBSD getaddrinfo_async, Windows GetAddrInfoEx*, platform‑specific mobile APIs), but they’re all non‑portable and inconsistently available.
  • This fragmentation explains why a cross‑platform library like libcurl struggled: relying on pthread_cancel around blocking getaddrinfo is brittle, but using every platform’s async DNS API or rolling a custom resolver is complex and may bypass system policies.
  • c‑ares is suggested as a dedicated async resolver, but its platform quirks (e.g., iOS prompts, Android VPN issues) are noted.

Libc, POSIX, and system design questions

  • There’s disagreement over libc’s responsibility: should it spawn background threads, cache configuration, or provide async DNS and timeouts, or is that beyond its remit?
  • POSIX’s conservatism and lack of a standardized non‑blocking DNS API are seen as root causes; some call for deprecating “broken” blocking APIs or standardizing async lookups.
  • The interaction of threads, fork, and background DNS threads is highlighted as another source of complexity.