Own Constant Folder in C/C++

Inline assembly and “own constant folder” idea

  • Core trick: wrap SIMD sqrt in inline assembly but special‑case compile‑time constants using __builtin_constant_p, so constants are folded while non‑constants emit the desired instruction.
  • Several commenters see this as a narrow workaround for a specific Clang codegen quirk rather than a general “constant folder”.
  • It’s noted that similar patterns are common in performance‑critical domains (e.g., HFT, codecs) where single‑instruction savings matter.

Clang vs GCC and sqrt optimization

  • Clang under -ffast-math turns sqrtps(x) into x * rsqrtps(x) with Newton–Raphson refinement.
  • Some call this a miscompile or performance bug; others argue it’s “working as intended” on older CPUs where this is faster.
  • GCC is reported not to do this transformation in the shown example.
  • Performance advantage is architecture‑dependent; newer Intel CPUs have much faster sqrt instructions, and using -mtune/-march can change Clang’s choice.

Fast-math flags: scope, hazards, and naming

  • Strong pushback on global -ffast-math: it relaxes IEEE guarantees across the whole program and can even affect other translation units or shared libraries via global FP state.
  • Some argue it’s fine when used only for specific numerical TUs; others cite bugs and surprising cross‑module effects.
  • Preference expressed for more granular controls (per function, per expression, or pragmas), as in Rust’s unchecked_add or Julia’s @fastmath.
  • Several commenters criticize the name “fast‑math” since it can be slower and less accurate; suggestions range from “unsafe math” to tongue‑in‑cheek renamings.

Determinism, correctness, and other languages

  • Debate over whether floating‑point determinism matters, especially for games and multiplayer simulations.
  • Examples from Rust, Julia, and Zig show similar low‑level tuning issues and different mechanisms for localizing unsafe or fast math.
  • General consensus: all optimizing compilers can produce surprising code; you must check assembly for hot paths.

C/C++ complexity and when to dive this deep

  • Many note this level of trickery is unnecessary for most projects.
  • Guidance: focus first on architecture and hotspots; only then tweak compiler flags or write intrinsics/asm.
  • Some see this as emblematic of C/C++’s power and complexity; others find it off‑putting and a motivation for safer languages.