Data Processing Benchmark Featuring Rust, Go, Swift, Zig, Julia etc.

Java, JIT, and C++ Performance Debate

  • Several commenters argue the Java sample is misconfigured (SerialGC, no heap tuning, explicit System.gc()), so its poor showing vs C++ is not meaningful.
  • Others claim Java’s “abstraction penalty” should always leave it slower than C++, while multiple replies counter that modern JVM JITs can match or beat C++ on many workloads once warmup and heap sizing are handled correctly.
  • Deep dives into Java internals mention escape analysis, object flattening (Valhalla), speculation + deoptimization, and vtable inlining as reasons JIT can eliminate many overheads—though cache-unfriendly object layouts remain a real cost until inline/value types land.

Benchmark Methodology Criticisms

  • Many see the benchmark as “sloppy”: odd compiler/VM flags, minimal or inconsistent warmup, use of Stopwatch/wall time, GitHub Actions as a noisy environment, unclear IO/disk/cache conditions.
  • Code quality varies widely between languages; some implementations are obviously unoptimized or written by non-experts, undermining cross-language comparisons.
  • The multicore results (e.g., C# beating Go, Zig “concurrent” being slower) are widely suspected to reflect implementation details (channels, contention, SIMD usage) rather than language fundamentals.

Language-Specific Notes (Julia, Python, R, Lisp, Ruby)

  • Julia impresses compared to plain Python; users report 10–100x speedups when porting NumPy-heavy pipelines.
  • Only one Python variant is charted despite plain/numpy/numba versions existing.
  • R is missing; some argue it would be very slow, others say the included R code is old and uses a notoriously slow JSON package, so it’s not representative.
  • Common Lisp appears surprisingly slow; light tuning (types, better data structures, fewer allocations) can easily 2× it, suggesting similar easy gains likely exist in other languages.
  • Ruby’s multi-minute times vs sub-second others prompt questions about representativeness.

Systems, GC, and “Ignored” Languages (D, Zig, Nim, C#, Go)

  • D’s strong performance sparks “D gets no respect” comments; others point to ecosystem weakness and GC reliance, arguing Rust/Go/Java/C# are more compelling choices.
  • Zig and Odin’s weak results are blamed on poor implementations; some suspect LLM-generated code.
  • C# is praised for modern low-level features (SIMD, spans, stackalloc, source generators) and a strong ecosystem; its good multicore showing is attributed to explicit SIMD and contention-free parallelism.
  • Nim is cited as “Python-like but fast,” with LLMs making library development easier, though others are skeptical that LLMs truly lower the expertise bar.

Rules, “HO” Variants, and Broader Takeaways

  • Rules like “no SIMD” but “production-ready” and “must represent tags as strings” are called arbitrary and even exploitable (e.g., degenerate string encodings, interning).
  • Highly optimized (“HO”) versions using better data structures/algorithms can be 10–100× faster, underscoring that algorithm and design dominate language choice.
  • Many conclude this benchmark is fun but not authoritative; for real decisions, one should build problem-specific benchmarks or consult more rigorous suites (Benchmarksgame, Techempower, etc.).