2024-11-26

1B nested loop iterations

Benchmark design and realism

Many view “1B nested loop iterations” as a highly contrived microbenchmark.
It mainly measures how compilers optimize tight integer loops with modulo, not real-world workloads with allocations, branches, indirections, or objects.
Some argue this is still representative of “average bad code” heavy on loops and arithmetic; others say such hot loops are a tiny fraction of real execution.
Several commenters stress that div/mod is unusually slow, so this underestimates C/Rust capabilities on more typical arithmetic.
Concerns raised that the benchmark encourages misleading language comparisons without clear methodology or caveats.

Garbage collection and performance consistency

Discussion emphasizes that beyond raw speed, GC’d languages face issues with startup time and pause consistency.
Modern JVM collectors (e.g., Shenandoah, G1) reportedly achieve sub-millisecond pauses, but GC remains a concern for latency-sensitive domains (games, VR).
Game dev anecdotes: hitches often stem from excessive short-lived allocations in render loops; object pools and careful allocation patterns help.

Language-specific observations

Go appears slower than C/C++/Rust largely because the Go version uses 64-bit ints vs. 32-bit in others; 64-bit modulo is significantly slower and worse for cache. With int32 and GC tweaks, Go gets closer to Java/C++.
PyPy is vastly faster than CPython on this benchmark due to JIT and arithmetic-friendly workload, though commenters note this exaggerates typical speedups.
R and Python are said to benefit enormously from vectorized operations (e.g., using seq_len/sum or NumPy) rather than explicit loops.
JavaScript’s performance in both Deno and browsers surprises some, though differences between engines (Chrome vs. Firefox) are noted.

Visualization and communication

The moving-circle visualization is praised as intuitive by some and criticized as confusing or no better than a bar chart by others.
Several stress that microbenchmarks must be interpreted cautiously and ideally complemented with broader, more realistic benchmark suites.

Related topics