2024-12-04

Speeding up Ruby by rewriting C in Ruby

Ruby, YJIT, and alternative implementations

Discussion notes the idea of a “Ruby stdlib in Ruby” predates YJIT (e.g., Rubinius, TruffleRuby), with mixed past results (Rubinius slower than MRI).
TruffleRuby is highlighted as extremely fast and capable of treating C extensions like Ruby code, allowing JIT optimization of C paths.
YJIT’s implementation history (from C to Rust) is mentioned; Rust is seen as a good trade-off despite build-toolchain friction.
Some report mixed real-world speedups from TruffleRuby vs MRI and stress careful benchmarking due to startup and warmup behavior.
TruffleRuby is open source and based on Graal; seen as “forkable” if Oracle ever changes direction.

Rails on TruffleRuby

One view: Rails “doesn’t work” on TruffleRuby and won’t soon, especially with Rails 8 requiring Ruby 3.2.
Counterpoint: TruffleRuby claims to run Rails and many gems; not being “100% MRI 3.2 compatible” doesn’t necessarily mean Rails is broken.
Overall status of full Rails compatibility is unclear from the thread.

Benchmarks, microbenchmarks, and interpretation

Some argue microbenchmarks are often dismissed too quickly: they do expose real issues (e.g., high function-call overhead in dynamic languages).
Others stress they are narrow: you can’t responsibly claim “X is N× slower than Y in general” from a tiny benchmark.
Links to larger benchmark suites (e.g., Benchmarks Game, other repos) are cited to show wide variance across implementations and tasks.
Methodological criticisms appear: too few runs, using wall-clock time, lack of JMH for JVM tests, and ignoring startup costs.

Python performance, C libraries, and mission-critical use

Several comments note that many Python workloads push heavy computation into C/Fortran libraries; Python acts as glue.
Others respond that any language with FFI can do this; the baseline slowness of pure Python still matters.
Debate over acceptability in constrained or mission-critical systems:
- Some describe successful use of Python even on a satellite where extra milliseconds and milliwatts are acceptable.
- Others argue that for highly power- or latency-sensitive systems (e.g., long-endurance drones), interpreter overhead and GC are prohibitive.
Concerns raised about dynamic languages for mission-critical software, even with optional static typing tools.

Other language comparisons (Dart, Crystal, LuaJIT, JVM languages)

Dart’s strong showing surprises some, especially versus C# and LuaJIT; others point out that tiny benchmarks may be dominated by specific optimizations.
Background on Dart’s VM lineage (from teams behind Self, HotSpot, V8) and its AOT+JIT design is mentioned.
Crystal is brought up as a Ruby-like compiled language with Rails-esque frameworks and static binaries; some think omitting it from Ruby-speed discussions is odd.
Others counter that Crystal is not Ruby and doesn’t help existing Ruby codebases.
Node vs Deno and Java vs Kotlin differences are attributed to JVM optimization focus and extra bytecode generated by “guest” languages.

Benchmark design and visualization critiques

The core Ruby benchmark (nested loops with array updates) is called “weird” and easy to algebraically collapse, suggesting it mostly measures a trivial hot loop.
Some note compilers generally don’t do liveness analysis for individual array elements due to cost, even when it could enable dramatic simplifications.
The article’s animated visualization of language speeds is criticized as distracting and hard to read quantitatively; static tables or bar charts are preferred by some, while others find the animation intuitive enough.

Related topics