Data Processing Benchmark Featuring Rust, Go, Swift, Zig, Julia etc.
Java, JIT, and C++ Performance Debate
- Several commenters argue the Java sample is misconfigured (SerialGC, no heap tuning, explicit
System.gc()), so its poor showing vs C++ is not meaningful. - Others claim Java’s “abstraction penalty” should always leave it slower than C++, while multiple replies counter that modern JVM JITs can match or beat C++ on many workloads once warmup and heap sizing are handled correctly.
- Deep dives into Java internals mention escape analysis, object flattening (Valhalla), speculation + deoptimization, and vtable inlining as reasons JIT can eliminate many overheads—though cache-unfriendly object layouts remain a real cost until inline/value types land.
Benchmark Methodology Criticisms
- Many see the benchmark as “sloppy”: odd compiler/VM flags, minimal or inconsistent warmup, use of
Stopwatch/wall time, GitHub Actions as a noisy environment, unclear IO/disk/cache conditions. - Code quality varies widely between languages; some implementations are obviously unoptimized or written by non-experts, undermining cross-language comparisons.
- The multicore results (e.g., C# beating Go, Zig “concurrent” being slower) are widely suspected to reflect implementation details (channels, contention, SIMD usage) rather than language fundamentals.
Language-Specific Notes (Julia, Python, R, Lisp, Ruby)
- Julia impresses compared to plain Python; users report 10–100x speedups when porting NumPy-heavy pipelines.
- Only one Python variant is charted despite plain/numpy/numba versions existing.
- R is missing; some argue it would be very slow, others say the included R code is old and uses a notoriously slow JSON package, so it’s not representative.
- Common Lisp appears surprisingly slow; light tuning (types, better data structures, fewer allocations) can easily 2× it, suggesting similar easy gains likely exist in other languages.
- Ruby’s multi-minute times vs sub-second others prompt questions about representativeness.
Systems, GC, and “Ignored” Languages (D, Zig, Nim, C#, Go)
- D’s strong performance sparks “D gets no respect” comments; others point to ecosystem weakness and GC reliance, arguing Rust/Go/Java/C# are more compelling choices.
- Zig and Odin’s weak results are blamed on poor implementations; some suspect LLM-generated code.
- C# is praised for modern low-level features (SIMD, spans, stackalloc, source generators) and a strong ecosystem; its good multicore showing is attributed to explicit SIMD and contention-free parallelism.
- Nim is cited as “Python-like but fast,” with LLMs making library development easier, though others are skeptical that LLMs truly lower the expertise bar.
Rules, “HO” Variants, and Broader Takeaways
- Rules like “no SIMD” but “production-ready” and “must represent tags as strings” are called arbitrary and even exploitable (e.g., degenerate string encodings, interning).
- Highly optimized (“HO”) versions using better data structures/algorithms can be 10–100× faster, underscoring that algorithm and design dominate language choice.
- Many conclude this benchmark is fun but not authoritative; for real decisions, one should build problem-specific benchmarks or consult more rigorous suites (Benchmarksgame, Techempower, etc.).