XOR'ing a register with itself is the idiom for zeroing it out. Why not sub?
Performance and Microarchitecture
- Many comments note that on x86,
xor r,randsub r,rhave the same encoding size and nominal cycle count, at least since early 8086/8088-era chips. - Several participants explain that ALUs typically implement add/sub/xor with the same hardware; overall ALU speed is constrained by the slowest operation, so XOR is not inherently faster in most real CPUs.
- Modern OoO x86 cores special-case zeroing idioms:
xor r,randsub r,r(and some other patterns) are detected in the front end and turned into “rename this reg to the internal zero register,” generating no execution uop and effectively zero latency. - Some measurements on recent Intel cores show lower apparent latency for
xor r,rthansub r,r; others point out this reflects the zero-idiom optimization, not a faster ALU path. - Outside x86, there are examples where XOR truly is faster (e.g., some bit-slice and Cray-style designs), and one note that on some vector units certain
subforms have different scheduling thanxor.
Historical Reasons and Idiom Propagation
- Multiple people argue the XOR-zero idiom predates x86, coming from 8080/Z80 and similar 8‑bit CPUs where
XOR Awas 1 byte and faster than loading an immediate zero. - Even on x86, early practice emphasized code size and cycle counting;
xor reg,regwas shorter/faster thanmov reg,0, which helped cement it as “the” zeroing idiom. - Once a pattern gained even a slight real or perceived advantage, network effects, teaching materials, and ROM/BIOs sources made it dominant. Later compilers and microarchitectures then optimized specifically for it.
Flags and Semantics
- Discussion highlights subtle differences in flags:
sub r,rsets flags as a true subtract, whilexor r,ralways clears carry/overflow and is logically “bitwise.” - On some CPUs, programmers preferred XOR because it leaves carry unchanged or more predictable; others point out that on x86 specifically, both idioms are recognized as zeroing and effectively clear dependency on prior flags.
Power, ISA Design, and Miscellany
- There is debate whether XOR saves measurable power versus SUB; consensus leans toward “difference is tiny compared to rest of the core,” though embedded/DSP anecdotes show power-aware instruction choices do exist.
- Several comments discuss zero registers on RISC-like ISAs, x86 opcode-space tradeoffs that discouraged a 1‑byte “clear reg” instruction, and tangential ideas like steganographically encoding bits via choosing XOR vs SUB.