Faster asin() was hiding in plain sight

Overall reaction

  • Many readers enjoyed the deep dive into a “small” function and saw it as archetypal HN content.
  • Some felt the ~4% gain over existing fast implementations is modest; others argued even small wins matter, especially at scale or when accumulated.

Lookup tables (LUTs) vs arithmetic

  • Several comments explore whether an asin LUT in L1/L2 cache could beat polynomial/rational computation.
  • Points against LUTs: cache pollution, sensitivity to access patterns, performance cliffs under real workloads, and limited benefit unless asin dominates runtime.
  • Some experiments found LUT + interpolation roughly on par with the best polynomial methods, not clearly faster; dropping interpolation didn’t help much because asin wasn’t the main bottleneck.

SIMD, GPUs, and data layout

  • Some argue bigger wins likely come from SIMD/GPU rather than micro-optimizing scalar asin.
  • Discussion emphasizes data-oriented design (SoA over AoS) in ray tracing to enable SIMD and better cache behavior.
  • Others note global illumination and incoherent rays make batching/SIMD harder without substantial restructuring.

Approximation methods and theory

  • Multiple comments stress that Taylor/Maclaurin and naïve Padé are usually inferior to minimax/Chebyshev-based approximations.
  • Remez algorithm and Chebyshev polynomials are highlighted as standard tools; “equioscillation” of the error is cited as the hallmark of optimal minimax approximations.
  • References are made to classic function-approximation handbooks and to libraries that tweak coefficients beyond textbook values to exploit floating‑point specifics and lower polynomial degree.

Hardware and algorithms

  • Discussion touches on historical and current hardware support: old x87 trig, Xeon Phi exp/log, and CORDIC-style methods.
  • Modern CPUs generally implement trig via sequences (e.g., polynomial + LUT), not single instructions.

LLMs and prior art

  • The blog’s use of an LLM to “discover” the fast asin is noted; commenters point out essentially the same technique existed in older library code and Stack Overflow answers, showing both the value and limitations of LLM-assisted discovery.