Why xor eax, eax?
Behavior and purpose of xor eax, eax
- Widely used idiom for clearing a register to zero.
- On x86-64, writing to a 32-bit register like
eaxautomatically zero-extends into the upper 32 bits ofrax, soxor eax, eaxclears all 64 bits. - Compared with
mov eax, 0, it uses fewer bytes and thus reduces instruction cache pressure. - Several other 32-bit ALU ops have the same zero-extend property;
xchg eax, eaxis a notable exception because it is defined as a true no-op and doesn’t change upper bits.
Encoding, APX, and register file details
xor rax, raxneeds a REX prefix to mark 64-bit operation;xor eax, eaxavoids this, saving a byte.- REX prefixes both select 64-bit operand size and extend register addressing; APX introduces additional registers r16–r31 using a REX2 prefix, reusing opcode-map bits.
- Partial-register writes that don’t fully define a 64-bit register create extra dependencies, hurting out‑of‑order execution; zero-extend semantics avoid this.
Zeroing idioms and microarchitecture
- Modern out‑of‑order CPUs recognize patterns like
xor reg, reg(and sometimessub reg, reg) as “zeroing idioms”. - The renamer can map the architectural register to a microarchitectural zero register instead of executing an ALU op, effectively giving zero-cycle execution and breaking dependency chains.
- Some debate whether CPUs also special‑case
mov reg, 0; it’s possible, but less compelling since the instruction is longer and compilers already favorxor.
Zero registers vs x86 approach (RISC‑V, MIPS, ARM, etc.)
- RISC‑V, MIPS, Alpha and others have a dedicated architectural zero register; that shrinks the ISA by reusing normal ALU ops for moves, clears, and many compares.
- There’s disagreement over whether a hardwired zero register is “better”:
- Pro: simplifies instruction set, reduces number of distinct instructions, enables tricks like
add dst, src, x0as move orjalr x0, ...as non‑clobbering jump. - Con: on very register‑poor ISAs (e.g., original 8‑register x86) dedicating one hurts architectural register count.
- Pro: simplifies instruction set, reduces number of distinct instructions, enables tricks like
- ARM64 has a register that is sometimes a zero register and sometimes the stack pointer, depending on context.
Historical and cross‑architecture context
- Similar tricks existed on Z80 (
xor a), 6502-like systems (fastest/smallest way to clear accumulator), IBM 370 BAL (XR r,r), Game Boy/SM83, and others, often to save bytes in very tight ROMs. - Shellcode writers use
xor reg, regbecause its opcode contains no zero bytes, avoiding C‑string termination issues.
Meta, culture, and nostalgia
- Many comments reminisce about recognizing
31 C0/31 C9by eye, hand-assembling machine code, keygen music, and old mainframe or 8‑bit practices. - Several note that even with large caches and RAM, code size and common idioms still matter for performance and are baked into CPU optimizations.