Make your program slower with threads (2014)
Threads and Performance Pitfalls
- Many commenters echoed the “rite of passage” of assuming more threads ⇒ more speed, then seeing slowdowns instead.
- Common reasons: lock contention, hidden shared state in libraries (e.g., random number generators, allocators, stdio), and oversubscription (more threads than cores).
- Anecdotes: early multithreading on single‑CPU systems (e.g., OS/2) led to crashes and mysterious bugs until code was simplified back to a single loop.
- Another case: futex and user‑space spinlock congestion (e.g., in a custom allocator) caused an orders‑of‑magnitude slowdown when many threads pushed to vectors.
Libc, Global State, and RNG Contention
- A central theme: libc functions often rely on global or shared state and were not originally designed for heavy multithreading.
random()/rand()use global state and locks, causing severe contention in multithreaded code. Thread‑safe variants (random_r, per‑thread RNGs) are recommended.- Some argue this is a design flaw; others counter it was reasonable in the original single‑threaded context but is problematic today.
- Similar global-state issues exist with stdio, allocators, and
errno(though some implementations makeerrnothread‑local).
PRNG Design and Parallelism
- Consensus: parallel code should avoid multiple threads sharing a single PRNG state.
- Suggested patterns: per‑thread PRNGs with distinct, uncorrelated seeds, sometimes derived from a CSPRNG or OS entropy; or counter‑based RNGs for deterministic splitting.
- Debate over whether “PRNGs need global state”: one side says any single PRNG must track shared position in its sequence; others emphasize that sharding state (per thread / per object) is usually better.
- C’s requirement that omitting
srandbe equivalent tosrand(1)is seen as blocking some optimizations that other languages (e.g., Go) use with thread‑local RNGs.
Alternatives, Languages, and Libraries
- Examples praised: Go’s dual RNG behavior (global vs thread‑local), Zig’s thread‑local RNG, Java’s
ThreadLocalRandom, Rust’s emphasis on data‑race freedom. - Some suggest using alternative libcs (e.g., musl) or custom libraries rather than default libc.
Broader Reflections: Optimization, Types, and Abstractions
- The classic “premature optimization” advice is debated: some say modern code is so inefficient everywhere that hotspots barely exist; others insist profiling should still drive optimization.
- Several comments lament “leaky abstractions”: a call like “give me a random number” hides global state, locks, and determinism trade‑offs.
- There is interest in richer type systems and effects/capability systems (monads, algebraic effects, purity) to track side effects (IO, allocation, global state) and make such issues more visible.