AMD Disables Zen 4's Loop Buffer

Role and size of the loop buffer

  • Described as a small front-end optimization: 144 micro-op entries per core, likely tiny versus per-core L2 (≈1 MB), so die area savings are negligible.
  • Some comments note modern CPUs are often routing- rather than area-constrained; the extra logic is mainly control and loop detection, not large arrays.
  • The feature was primarily intended as a power optimization by allowing parts of the front-end to shut down on tight loops, with performance gains only in niche cases.

Observed performance and power effects

  • The article’s benchmarks show little to no clear performance benefit overall; some workloads show small regressions when disabled, others are unchanged or noisy.
  • One game benchmark shows an unexplained ≈5% loss on a non-V-Cache core with the buffer disabled; commenters question test methodology and BIOS comparability.
  • Power measurement is acknowledged as especially hard; tests using internal energy counters produced confusing results.
  • Some argue that energy per instruction, not just watts, is the right metric, but achieving that cleanly on a live system is difficult.

Why it was disabled

  • Zen 5 dropped the loop buffer entirely; on Zen 4 it appears to be turned off via a hidden firmware flag (“chicken bit”).
  • Several commenters suspect an internal functional bug or an undisclosed security issue; others suggest it may simply not have justified ongoing engineering cost.
  • The lack of a user-visible BIOS toggle leads some to speculate about a serious erratum or security mitigation, though this remains explicitly unclear.

Engineering, validation, and “shipping anyway”

  • Multiple comments emphasize that removing hardware late in the design cycle is riskier than shipping and later disabling it in firmware.
  • Validation for CPUs is described as extremely time- and cost-intensive; features often remain physically present but turned off if they underperform or misbehave.
  • Discussion broadens to how hardware and software teams sometimes pursue speculative optimizations with marginal real-world benefit, driven by schedule and expectations.

Broader security and architecture context

  • Thread digresses into speculative-execution vulnerabilities, trade-offs between performance and mitigations, and the idea of “secure” versus “fast” cores.
  • Historical loop-buffer and loop-mode features (e.g., older 68k and RISC designs) are mentioned as precedents, often with modest real-world gains.