2024-11-30

AMD Disables Zen 4's Loop Buffer

Role and size of the loop buffer

Described as a small front-end optimization: 144 micro-op entries per core, likely tiny versus per-core L2 (≈1 MB), so die area savings are negligible.
Some comments note modern CPUs are often routing- rather than area-constrained; the extra logic is mainly control and loop detection, not large arrays.
The feature was primarily intended as a power optimization by allowing parts of the front-end to shut down on tight loops, with performance gains only in niche cases.

Observed performance and power effects

The article’s benchmarks show little to no clear performance benefit overall; some workloads show small regressions when disabled, others are unchanged or noisy.
One game benchmark shows an unexplained ≈5% loss on a non-V-Cache core with the buffer disabled; commenters question test methodology and BIOS comparability.
Power measurement is acknowledged as especially hard; tests using internal energy counters produced confusing results.
Some argue that energy per instruction, not just watts, is the right metric, but achieving that cleanly on a live system is difficult.

Why it was disabled

Zen 5 dropped the loop buffer entirely; on Zen 4 it appears to be turned off via a hidden firmware flag (“chicken bit”).
Several commenters suspect an internal functional bug or an undisclosed security issue; others suggest it may simply not have justified ongoing engineering cost.
The lack of a user-visible BIOS toggle leads some to speculate about a serious erratum or security mitigation, though this remains explicitly unclear.

Engineering, validation, and “shipping anyway”

Multiple comments emphasize that removing hardware late in the design cycle is riskier than shipping and later disabling it in firmware.
Validation for CPUs is described as extremely time- and cost-intensive; features often remain physically present but turned off if they underperform or misbehave.
Discussion broadens to how hardware and software teams sometimes pursue speculative optimizations with marginal real-world benefit, driven by schedule and expectations.

Broader security and architecture context

Thread digresses into speculative-execution vulnerabilities, trade-offs between performance and mitigations, and the idea of “secure” versus “fast” cores.
Historical loop-buffer and loop-mode features (e.g., older 68k and RISC designs) are mentioned as precedents, often with modest real-world gains.

Related topics