You can't fool the optimizer
Trusting “the compiler is smarter than me”
- Many agree this is a good default for low‑level micro‑optimizations: write clear code and let the optimizer handle strength reduction, loop transforms, inlining, etc.
- Others stress it’s compiler‑ and language‑dependent: LLVM/Clang and GCC are impressive; CPython or some vendor compilers (e.g. MSVC ARM, some GPU toolchains) are notably weaker or quirky.
- Several argue a better framing is: the compiler is more diligent and consistent than humans, not inherently smarter.
What compilers can’t fix
- They rarely change algorithms, data structures, or memory layout. N+1 queries, poor data locality, pointer‑chasing graphs, or excessive malloc/free in loops remain the programmer’s problem.
- Compilers can’t invent hash tables, turn AoS into SoA, or redesign cache‑friendly layouts. These often deliver orders‑of‑magnitude wins.
- HPC, CUDA, games, and real‑time systems still demand hardware‑aware design, profiling, and careful data layout.
Examples of strong optimization
- LLVM optimizes various “weird” add implementations (loops, bit tricks, recursive patterns) back to a single add; scalar evolution and induction‑variable simplification are highlighted.
- Julia’s tooling and compiler explorer demos show loops over arithmetic series becoming closed‑form formulas, and popcount/multiplication tricks collapsing to single instructions.
- Modern passes like SROA can break structs into scalars and keep them in registers, contradicting older folklore that “structs are always slower than locals.”
Examples of missed or constrained optimization
- Nontrivial patterns often don’t fold: e.g.
if (x==y) return 2*x; else return x+y;stays as compare+select instead of a single add. - Math/logical equivalences such as
x%2==0 && x%3==0vsx%6==0, or redundantstrlen/strcmpcombinations, typically aren’t recognized, due to heuristics, phase‑ordering, or short‑circuit semantics. - Safety and language rules prevent some obvious transforms (e.g. combining character checks into one load when that might read past a buffer; preserving short‑circuiting; UB around nulls).
Linkage, visibility, and code merging
- External functions generally must have distinct addresses, limiting merging; static functions and link‑time optimization enable more aggressive inlining/elision.
- Some toolchains and linkers do identical code folding, but this can break assumptions like function‑pointer identity.
Practical guidance
- Common workflow recommended: write clear code → benchmark → profile → inspect hot spots (and sometimes assembly) → adjust data structures/algorithms → only then micro‑optimize.
- Use
static, visibility attributes, non‑short‑circuit&/|, and library conventions to unlock more optimization; usevolatileonly when you want to inhibit it.