2024-11-29

How much memory do you need in 2024 to run 1M concurrent tasks?

Overall reactions

Many are surprised that Node.js ranks very well on memory and that C# / .NET NativeAOT looks exceptionally efficient.
Rust’s async state-machine approach is widely praised for very low per-task overhead.
Go’s relatively high memory usage contradicts some expectations of it being “lightweight”.

Go’s goroutines and memory

Go gives each goroutine an initial stack (commonly ~2 KB), which largely explains the ~2.5–3 GB usage for 1M tasks.
Some argue this is “unfair” because that stack is intended for real work the program would normally do, not empty sleeps.
Others counter that, regardless of intent, this memory is not available to other services while allocated.
There is debate over virtual vs physical memory: some note unused pages may be swapped and not actually consume RAM, others focus on what the OS reserves.

Benchmark design & fairness

Core criticism: 1M tasks that only sleep is an extreme, synthetic case; real tasks would have their own data and work that dominate overhead.
Several commenters say the comparison mixes different abstractions:
- Node/async Rust/C# use timer-based, stackless state machines.
- Go/Java virtual threads/BEAM-like systems use stackful coroutines/processes.
For Go, more comparable code using timers (time.AfterFunc or time.NewTimer) dramatically reduces memory, closer to Rust’s numbers.
Similar concerns arise for Java (ArrayList resizing, choice of structures) and Elixir (using Task adds supervision overhead).

Rust/Tokio semantics confusion

There is a subthread about why the Rust appendix example appears non-concurrent but still finishes quickly.
Clarification: tokio::time::sleep records its deadline when called, not when first awaited, so many tasks effectively share the same wake-up time.

Node.js, concurrency, and parallelism

Some note Node’s example is mostly a single-threaded event loop scheduling timers, not true parallel work.
Discussion distinguishes concurrency (many in-flight tasks) from parallelism (running on multiple cores); Node scores on memory but would differ on CPU-bound work.

Real-world relevance and alternatives

Multiple commenters stress that microbenchmarks like this are educational but not directly predictive for real systems.
Suggestions:
- Add minimal real work (I/O, JSON parsing, CPU loops).
- Measure both memory and time.
- Include more runtimes (Erlang/Elixir, C with pthreads, processes, Deno/Bun, different Python/JS runtimes).

Related topics