How much memory do you need in 2024 to run 1M concurrent tasks?
Overall reactions
- Many are surprised that Node.js ranks very well on memory and that C# / .NET NativeAOT looks exceptionally efficient.
- Rust’s async state-machine approach is widely praised for very low per-task overhead.
- Go’s relatively high memory usage contradicts some expectations of it being “lightweight”.
Go’s goroutines and memory
- Go gives each goroutine an initial stack (commonly ~2 KB), which largely explains the ~2.5–3 GB usage for 1M tasks.
- Some argue this is “unfair” because that stack is intended for real work the program would normally do, not empty sleeps.
- Others counter that, regardless of intent, this memory is not available to other services while allocated.
- There is debate over virtual vs physical memory: some note unused pages may be swapped and not actually consume RAM, others focus on what the OS reserves.
Benchmark design & fairness
- Core criticism: 1M tasks that only sleep is an extreme, synthetic case; real tasks would have their own data and work that dominate overhead.
- Several commenters say the comparison mixes different abstractions:
- Node/async Rust/C# use timer-based, stackless state machines.
- Go/Java virtual threads/BEAM-like systems use stackful coroutines/processes.
- For Go, more comparable code using timers (
time.AfterFuncortime.NewTimer) dramatically reduces memory, closer to Rust’s numbers. - Similar concerns arise for Java (ArrayList resizing, choice of structures) and Elixir (using
Taskadds supervision overhead).
Rust/Tokio semantics confusion
- There is a subthread about why the Rust appendix example appears non-concurrent but still finishes quickly.
- Clarification:
tokio::time::sleeprecords its deadline when called, not when first awaited, so many tasks effectively share the same wake-up time.
Node.js, concurrency, and parallelism
- Some note Node’s example is mostly a single-threaded event loop scheduling timers, not true parallel work.
- Discussion distinguishes concurrency (many in-flight tasks) from parallelism (running on multiple cores); Node scores on memory but would differ on CPU-bound work.
Real-world relevance and alternatives
- Multiple commenters stress that microbenchmarks like this are educational but not directly predictive for real systems.
- Suggestions:
- Add minimal real work (I/O, JSON parsing, CPU loops).
- Measure both memory and time.
- Include more runtimes (Erlang/Elixir, C with pthreads, processes, Deno/Bun, different Python/JS runtimes).