Why xor eax, eax?

Behavior and purpose of xor eax, eax

  • Widely used idiom for clearing a register to zero.
  • On x86-64, writing to a 32-bit register like eax automatically zero-extends into the upper 32 bits of rax, so xor eax, eax clears all 64 bits.
  • Compared with mov eax, 0, it uses fewer bytes and thus reduces instruction cache pressure.
  • Several other 32-bit ALU ops have the same zero-extend property; xchg eax, eax is a notable exception because it is defined as a true no-op and doesn’t change upper bits.

Encoding, APX, and register file details

  • xor rax, rax needs a REX prefix to mark 64-bit operation; xor eax, eax avoids this, saving a byte.
  • REX prefixes both select 64-bit operand size and extend register addressing; APX introduces additional registers r16–r31 using a REX2 prefix, reusing opcode-map bits.
  • Partial-register writes that don’t fully define a 64-bit register create extra dependencies, hurting out‑of‑order execution; zero-extend semantics avoid this.

Zeroing idioms and microarchitecture

  • Modern out‑of‑order CPUs recognize patterns like xor reg, reg (and sometimes sub reg, reg) as “zeroing idioms”.
  • The renamer can map the architectural register to a microarchitectural zero register instead of executing an ALU op, effectively giving zero-cycle execution and breaking dependency chains.
  • Some debate whether CPUs also special‑case mov reg, 0; it’s possible, but less compelling since the instruction is longer and compilers already favor xor.

Zero registers vs x86 approach (RISC‑V, MIPS, ARM, etc.)

  • RISC‑V, MIPS, Alpha and others have a dedicated architectural zero register; that shrinks the ISA by reusing normal ALU ops for moves, clears, and many compares.
  • There’s disagreement over whether a hardwired zero register is “better”:
    • Pro: simplifies instruction set, reduces number of distinct instructions, enables tricks like add dst, src, x0 as move or jalr x0, ... as non‑clobbering jump.
    • Con: on very register‑poor ISAs (e.g., original 8‑register x86) dedicating one hurts architectural register count.
  • ARM64 has a register that is sometimes a zero register and sometimes the stack pointer, depending on context.

Historical and cross‑architecture context

  • Similar tricks existed on Z80 (xor a), 6502-like systems (fastest/smallest way to clear accumulator), IBM 370 BAL (XR r,r), Game Boy/SM83, and others, often to save bytes in very tight ROMs.
  • Shellcode writers use xor reg, reg because its opcode contains no zero bytes, avoiding C‑string termination issues.

Meta, culture, and nostalgia

  • Many comments reminisce about recognizing 31 C0/31 C9 by eye, hand-assembling machine code, keygen music, and old mainframe or 8‑bit practices.
  • Several note that even with large caches and RAM, code size and common idioms still matter for performance and are baked into CPU optimizations.