RIP pthread_cancel
pthread_cancel and async cancellation
- Many commenters argue that
pthread_cancel—especially asynchronous cancellation—is fundamentally unsafe: a thread can be killed while holding locks or manipulating internal data structures, causing leaks, corruption, or deadlocks. - Asynchronous cancellation is acknowledged as only realistically safe for pure, compute‑bound loops that don’t allocate, lock, or touch shared state, which is a very narrow use case.
- Comparisons are made to other “kill a thread” primitives (Windows
TerminateThread, old JavaThread.stop/destroy/suspend), which are widely regarded as design mistakes. - Some see
pthread_cancelas useful in theory but too hard to use correctly in long‑running, resource‑managing code; others say it’s “never the answer” outside process shutdown.
Cooperative cancellation and alternative designs
- Several participants prefer cooperative cancellation: inexpensive periodic checks of a shared “done” flag, with normal cleanup paths rather than abrupt termination.
- There’s debate over the performance cost of inserting branches in hot loops; some claim it’s negligible when placed outside the innermost loop, others argue it disrupts tight compute kernels.
- One commenter contrasts
pthread_cancelwith kernel‑style interruption: set a flag and have blocking operations return an error (e.g.,EINTR/ECANCELED), letting existing error‑handling unwind state cleanly. - Coroutine libraries are cited as examples where cancellation simply makes blocking calls return immediately with a specific error code.
Blocking DNS, getaddrinfo, and portability
- Much discussion centers on
getaddrinfobeing blocking, uncancellable in practice, and entangled with system configuration (NSS,/etc/resolv.conf,gai.conf). - A variety of async DNS APIs exist (
getaddrinfo_aon glibc, OpenBSDgetaddrinfo_async, WindowsGetAddrInfoEx*, platform‑specific mobile APIs), but they’re all non‑portable and inconsistently available. - This fragmentation explains why a cross‑platform library like libcurl struggled: relying on
pthread_cancelaround blockinggetaddrinfois brittle, but using every platform’s async DNS API or rolling a custom resolver is complex and may bypass system policies. - c‑ares is suggested as a dedicated async resolver, but its platform quirks (e.g., iOS prompts, Android VPN issues) are noted.
Libc, POSIX, and system design questions
- There’s disagreement over libc’s responsibility: should it spawn background threads, cache configuration, or provide async DNS and timeouts, or is that beyond its remit?
- POSIX’s conservatism and lack of a standardized non‑blocking DNS API are seen as root causes; some call for deprecating “broken” blocking APIs or standardizing async lookups.
- The interaction of threads,
fork, and background DNS threads is highlighted as another source of complexity.