You can't fool the optimizer

Trusting “the compiler is smarter than me”

  • Many agree this is a good default for low‑level micro‑optimizations: write clear code and let the optimizer handle strength reduction, loop transforms, inlining, etc.
  • Others stress it’s compiler‑ and language‑dependent: LLVM/Clang and GCC are impressive; CPython or some vendor compilers (e.g. MSVC ARM, some GPU toolchains) are notably weaker or quirky.
  • Several argue a better framing is: the compiler is more diligent and consistent than humans, not inherently smarter.

What compilers can’t fix

  • They rarely change algorithms, data structures, or memory layout. N+1 queries, poor data locality, pointer‑chasing graphs, or excessive malloc/free in loops remain the programmer’s problem.
  • Compilers can’t invent hash tables, turn AoS into SoA, or redesign cache‑friendly layouts. These often deliver orders‑of‑magnitude wins.
  • HPC, CUDA, games, and real‑time systems still demand hardware‑aware design, profiling, and careful data layout.

Examples of strong optimization

  • LLVM optimizes various “weird” add implementations (loops, bit tricks, recursive patterns) back to a single add; scalar evolution and induction‑variable simplification are highlighted.
  • Julia’s tooling and compiler explorer demos show loops over arithmetic series becoming closed‑form formulas, and popcount/multiplication tricks collapsing to single instructions.
  • Modern passes like SROA can break structs into scalars and keep them in registers, contradicting older folklore that “structs are always slower than locals.”

Examples of missed or constrained optimization

  • Nontrivial patterns often don’t fold: e.g. if (x==y) return 2*x; else return x+y; stays as compare+select instead of a single add.
  • Math/logical equivalences such as x%2==0 && x%3==0 vs x%6==0, or redundant strlen / strcmp combinations, typically aren’t recognized, due to heuristics, phase‑ordering, or short‑circuit semantics.
  • Safety and language rules prevent some obvious transforms (e.g. combining character checks into one load when that might read past a buffer; preserving short‑circuiting; UB around nulls).

Linkage, visibility, and code merging

  • External functions generally must have distinct addresses, limiting merging; static functions and link‑time optimization enable more aggressive inlining/elision.
  • Some toolchains and linkers do identical code folding, but this can break assumptions like function‑pointer identity.

Practical guidance

  • Common workflow recommended: write clear code → benchmark → profile → inspect hot spots (and sometimes assembly) → adjust data structures/algorithms → only then micro‑optimize.
  • Use static, visibility attributes, non‑short‑circuit &/|, and library conventions to unlock more optimization; use volatile only when you want to inhibit it.