What every systems programmer should know about concurrency (2020) [pdf]

Difficulty of Concurrency (Especially in C++)

  • Several commenters report that multithreading in C/C++ quickly leads to crashes, segfaults, and elusive bugs.
  • Others push back, saying it’s “challenging but manageable” if you stay at higher-level primitives (mutexes, condition variables, semaphores) and follow well-known patterns.
  • There’s broad agreement that concurrency is inherently hard due to CPU/memory behavior; C/C++ expose that complexity with few safety nets.

Language Choices: C++, Rust, and Others

  • One side argues C++ is “just a thin layer” over hardware, while Rust/VM languages make deliberate tradeoffs to enforce safer sharing.
  • Critics of Rust claim its ownership model pushes designs toward shared ownership (e.g., shared pointers), hurting performance in multicore-heavy code.
  • Others counter that Rust forces you to model actual ownership correctly and that poor designs are a user problem, not a language problem.
  • There’s debate over how common C++ really is in “systems programming”: some see it as central; others claim C and shell/Python/Java are more prevalent, calling C++ niche in that space.

Concurrency Primitives, Memory Models, and “Lock-Free”

  • Strong interest in low-level primitives: atomics, mutexes, condition variables, memory orderings, and barriers.
  • “Lock-free” is called an overloaded and often misleading term; commenters distinguish:
    • Mutual exclusion (locks),
    • Optimistic/lock-free algorithms with retries,
    • Partitioned structures (e.g., ring buffers/message queues) that avoid interference.
  • Many recommend high-level designs: message-passing queues, actor-like models, and process-level parallelism before custom lock-free structures.
  • The C/C++ memory model (acquire/release, sequential consistency, etc.) is described as complex and arguably designed more for compiler writers than algorithm designers.
  • Some advise using only sequentially consistent atomics via well-tested data structures; others argue that making SC the default hides deeper ordering issues and can be needlessly slow.

Learning, Tooling, and Education

  • Formal CS education is seen by some as very helpful for understanding concurrency, but others emphasize that disciplined self-study can suffice.
  • Suggested learning path: understand loads/stores, pipelines, consistency models, cache coherence, then atomics, locks, and barriers, and only then language-level constructs.
  • Debugging atomic/order bugs is reported as extremely time-consuming; suggestions include model-checking tools, exhaustive or randomized interleaving testers, and “rubber duck” explanation as aids.

Formats and Miscellaneous

  • For reading the paper, people share a LaTeX source repo and show how to convert it to EPUB for Kindle; multi-column PDFs are considered painful on small devices.
  • A brief side thread revisits the Parallella “supercomputer” board and notes its effective end-of-life and the team’s shift to ASIC work.