Own Constant Folder in C/C++
Inline assembly and “own constant folder” idea
- Core trick: wrap SIMD
sqrtin inline assembly but special‑case compile‑time constants using__builtin_constant_p, so constants are folded while non‑constants emit the desired instruction. - Several commenters see this as a narrow workaround for a specific Clang codegen quirk rather than a general “constant folder”.
- It’s noted that similar patterns are common in performance‑critical domains (e.g., HFT, codecs) where single‑instruction savings matter.
Clang vs GCC and sqrt optimization
- Clang under
-ffast-mathturnssqrtps(x)intox * rsqrtps(x)with Newton–Raphson refinement. - Some call this a miscompile or performance bug; others argue it’s “working as intended” on older CPUs where this is faster.
- GCC is reported not to do this transformation in the shown example.
- Performance advantage is architecture‑dependent; newer Intel CPUs have much faster
sqrtinstructions, and using-mtune/-marchcan change Clang’s choice.
Fast-math flags: scope, hazards, and naming
- Strong pushback on global
-ffast-math: it relaxes IEEE guarantees across the whole program and can even affect other translation units or shared libraries via global FP state. - Some argue it’s fine when used only for specific numerical TUs; others cite bugs and surprising cross‑module effects.
- Preference expressed for more granular controls (per function, per expression, or pragmas), as in Rust’s
unchecked_addor Julia’s@fastmath. - Several commenters criticize the name “fast‑math” since it can be slower and less accurate; suggestions range from “unsafe math” to tongue‑in‑cheek renamings.
Determinism, correctness, and other languages
- Debate over whether floating‑point determinism matters, especially for games and multiplayer simulations.
- Examples from Rust, Julia, and Zig show similar low‑level tuning issues and different mechanisms for localizing unsafe or fast math.
- General consensus: all optimizing compilers can produce surprising code; you must check assembly for hot paths.
C/C++ complexity and when to dive this deep
- Many note this level of trickery is unnecessary for most projects.
- Guidance: focus first on architecture and hotspots; only then tweak compiler flags or write intrinsics/asm.
- Some see this as emblematic of C/C++’s power and complexity; others find it off‑putting and a motivation for safer languages.