2024-12-12

What every systems programmer should know about concurrency (2020) [pdf]

Difficulty of Concurrency (Especially in C++)

Several commenters report that multithreading in C/C++ quickly leads to crashes, segfaults, and elusive bugs.
Others push back, saying it’s “challenging but manageable” if you stay at higher-level primitives (mutexes, condition variables, semaphores) and follow well-known patterns.
There’s broad agreement that concurrency is inherently hard due to CPU/memory behavior; C/C++ expose that complexity with few safety nets.

Language Choices: C++, Rust, and Others

One side argues C++ is “just a thin layer” over hardware, while Rust/VM languages make deliberate tradeoffs to enforce safer sharing.
Critics of Rust claim its ownership model pushes designs toward shared ownership (e.g., shared pointers), hurting performance in multicore-heavy code.
Others counter that Rust forces you to model actual ownership correctly and that poor designs are a user problem, not a language problem.
There’s debate over how common C++ really is in “systems programming”: some see it as central; others claim C and shell/Python/Java are more prevalent, calling C++ niche in that space.

Concurrency Primitives, Memory Models, and “Lock-Free”

Strong interest in low-level primitives: atomics, mutexes, condition variables, memory orderings, and barriers.
“Lock-free” is called an overloaded and often misleading term; commenters distinguish:
- Mutual exclusion (locks),
- Optimistic/lock-free algorithms with retries,
- Partitioned structures (e.g., ring buffers/message queues) that avoid interference.
Many recommend high-level designs: message-passing queues, actor-like models, and process-level parallelism before custom lock-free structures.
The C/C++ memory model (acquire/release, sequential consistency, etc.) is described as complex and arguably designed more for compiler writers than algorithm designers.
Some advise using only sequentially consistent atomics via well-tested data structures; others argue that making SC the default hides deeper ordering issues and can be needlessly slow.

Learning, Tooling, and Education

Formal CS education is seen by some as very helpful for understanding concurrency, but others emphasize that disciplined self-study can suffice.
Suggested learning path: understand loads/stores, pipelines, consistency models, cache coherence, then atomics, locks, and barriers, and only then language-level constructs.
Debugging atomic/order bugs is reported as extremely time-consuming; suggestions include model-checking tools, exhaustive or randomized interleaving testers, and “rubber duck” explanation as aids.

Formats and Miscellaneous

For reading the paper, people share a LaTeX source repo and show how to convert it to EPUB for Kindle; multi-column PDFs are considered painful on small devices.
A brief side thread revisits the Parallella “supercomputer” board and notes its effective end-of-life and the team’s shift to ASIC work.

Related topics