How much memory do you need in 2024 to run 1M concurrent tasks?

Overall reactions

  • Many are surprised that Node.js ranks very well on memory and that C# / .NET NativeAOT looks exceptionally efficient.
  • Rust’s async state-machine approach is widely praised for very low per-task overhead.
  • Go’s relatively high memory usage contradicts some expectations of it being “lightweight”.

Go’s goroutines and memory

  • Go gives each goroutine an initial stack (commonly ~2 KB), which largely explains the ~2.5–3 GB usage for 1M tasks.
  • Some argue this is “unfair” because that stack is intended for real work the program would normally do, not empty sleeps.
  • Others counter that, regardless of intent, this memory is not available to other services while allocated.
  • There is debate over virtual vs physical memory: some note unused pages may be swapped and not actually consume RAM, others focus on what the OS reserves.

Benchmark design & fairness

  • Core criticism: 1M tasks that only sleep is an extreme, synthetic case; real tasks would have their own data and work that dominate overhead.
  • Several commenters say the comparison mixes different abstractions:
    • Node/async Rust/C# use timer-based, stackless state machines.
    • Go/Java virtual threads/BEAM-like systems use stackful coroutines/processes.
  • For Go, more comparable code using timers (time.AfterFunc or time.NewTimer) dramatically reduces memory, closer to Rust’s numbers.
  • Similar concerns arise for Java (ArrayList resizing, choice of structures) and Elixir (using Task adds supervision overhead).

Rust/Tokio semantics confusion

  • There is a subthread about why the Rust appendix example appears non-concurrent but still finishes quickly.
  • Clarification: tokio::time::sleep records its deadline when called, not when first awaited, so many tasks effectively share the same wake-up time.

Node.js, concurrency, and parallelism

  • Some note Node’s example is mostly a single-threaded event loop scheduling timers, not true parallel work.
  • Discussion distinguishes concurrency (many in-flight tasks) from parallelism (running on multiple cores); Node scores on memory but would differ on CPU-bound work.

Real-world relevance and alternatives

  • Multiple commenters stress that microbenchmarks like this are educational but not directly predictive for real systems.
  • Suggestions:
    • Add minimal real work (I/O, JSON parsing, CPU loops).
    • Measure both memory and time.
    • Include more runtimes (Erlang/Elixir, C with pthreads, processes, Deno/Bun, different Python/JS runtimes).