2025-12-03

You can't fool the optimizer

Trusting “the compiler is smarter than me”

Many agree this is a good default for low‑level micro‑optimizations: write clear code and let the optimizer handle strength reduction, loop transforms, inlining, etc.
Others stress it’s compiler‑ and language‑dependent: LLVM/Clang and GCC are impressive; CPython or some vendor compilers (e.g. MSVC ARM, some GPU toolchains) are notably weaker or quirky.
Several argue a better framing is: the compiler is more diligent and consistent than humans, not inherently smarter.

What compilers can’t fix

They rarely change algorithms, data structures, or memory layout. N+1 queries, poor data locality, pointer‑chasing graphs, or excessive malloc/free in loops remain the programmer’s problem.
Compilers can’t invent hash tables, turn AoS into SoA, or redesign cache‑friendly layouts. These often deliver orders‑of‑magnitude wins.
HPC, CUDA, games, and real‑time systems still demand hardware‑aware design, profiling, and careful data layout.

Examples of strong optimization

LLVM optimizes various “weird” add implementations (loops, bit tricks, recursive patterns) back to a single add; scalar evolution and induction‑variable simplification are highlighted.
Julia’s tooling and compiler explorer demos show loops over arithmetic series becoming closed‑form formulas, and popcount/multiplication tricks collapsing to single instructions.
Modern passes like SROA can break structs into scalars and keep them in registers, contradicting older folklore that “structs are always slower than locals.”

Examples of missed or constrained optimization

Nontrivial patterns often don’t fold: e.g. if (x==y) return 2*x; else return x+y; stays as compare+select instead of a single add.
Math/logical equivalences such as x%2==0 && x%3==0 vs x%6==0, or redundant strlen / strcmp combinations, typically aren’t recognized, due to heuristics, phase‑ordering, or short‑circuit semantics.
Safety and language rules prevent some obvious transforms (e.g. combining character checks into one load when that might read past a buffer; preserving short‑circuiting; UB around nulls).

Linkage, visibility, and code merging

External functions generally must have distinct addresses, limiting merging; static functions and link‑time optimization enable more aggressive inlining/elision.
Some toolchains and linkers do identical code folding, but this can break assumptions like function‑pointer identity.

Practical guidance

Common workflow recommended: write clear code → benchmark → profile → inspect hot spots (and sometimes assembly) → adjust data structures/algorithms → only then micro‑optimize.
Use static, visibility attributes, non‑short‑circuit &/|, and library conventions to unlock more optimization; use volatile only when you want to inhibit it.

Related topics