Faster asin() was hiding in plain sight
Overall reaction
- Many readers enjoyed the deep dive into a “small” function and saw it as archetypal HN content.
- Some felt the ~4% gain over existing fast implementations is modest; others argued even small wins matter, especially at scale or when accumulated.
Lookup tables (LUTs) vs arithmetic
- Several comments explore whether an asin LUT in L1/L2 cache could beat polynomial/rational computation.
- Points against LUTs: cache pollution, sensitivity to access patterns, performance cliffs under real workloads, and limited benefit unless asin dominates runtime.
- Some experiments found LUT + interpolation roughly on par with the best polynomial methods, not clearly faster; dropping interpolation didn’t help much because asin wasn’t the main bottleneck.
SIMD, GPUs, and data layout
- Some argue bigger wins likely come from SIMD/GPU rather than micro-optimizing scalar asin.
- Discussion emphasizes data-oriented design (SoA over AoS) in ray tracing to enable SIMD and better cache behavior.
- Others note global illumination and incoherent rays make batching/SIMD harder without substantial restructuring.
Approximation methods and theory
- Multiple comments stress that Taylor/Maclaurin and naïve Padé are usually inferior to minimax/Chebyshev-based approximations.
- Remez algorithm and Chebyshev polynomials are highlighted as standard tools; “equioscillation” of the error is cited as the hallmark of optimal minimax approximations.
- References are made to classic function-approximation handbooks and to libraries that tweak coefficients beyond textbook values to exploit floating‑point specifics and lower polynomial degree.
Hardware and algorithms
- Discussion touches on historical and current hardware support: old x87 trig, Xeon Phi exp/log, and CORDIC-style methods.
- Modern CPUs generally implement trig via sequences (e.g., polynomial + LUT), not single instructions.
LLMs and prior art
- The blog’s use of an LLM to “discover” the fast asin is noted; commenters point out essentially the same technique existed in older library code and Stack Overflow answers, showing both the value and limitations of LLM-assisted discovery.