Things I learned while writing an x86 emulator (2023)

x86 Complexity and Quirks

  • Many commenters describe x86 as one of the messiest mainstream ISAs, especially compared with RISC-V and modern ARM.
  • Numerous quirks are highlighted:
    • BSF/BSR vs TZCNT/LZCNT zero-input behavior, and real-world code (e.g., libc) depending on de‑facto, not documented, semantics.
    • Legacy oddities like 0x90 being a special NOP, high 8-bit registers interacting badly with REX, segment overrides mostly ignored in 64‑bit mode.
    • Prefix behavior (66/67, segment overrides, REX/VEX/EVEX, APX REX2) and cases where bits are inconsistently ignored or cause faults.
    • Historical curiosities like BSWAP with operand-size prefix and inconsistent behavior across CPUs.

Instruction Encoding and Decoding

  • Several participants are actively implementing or rewriting x86 decoders and disassemblers, including for QEMU and custom tools.
  • Approaches include hand-written C tables vs autogenerated tables from higher-level specs; both find Intel documentation incomplete or subtly wrong in edge cases.
  • EVEX and APX are seen as pushing complexity further, requiring more context-dependent decoding.
  • Comparison with AArch64 finds ARM more regular at the mnemonic/semantic level but sometimes more complex in operand encoding and constraints.

Emulation and Understanding CPUs

  • Writing emulators (from 6800/68k to x86 and consoles) is widely praised as a powerful way to understand ISAs, calling it a “missing link” between high-level code and hardware.
  • Others argue that to understand modern superscalar CPUs, you must design at gate/RTL level; emulators mostly expose architectural behavior, not pipelines, caches, or speculation.
  • A compromise view: both emulator and simple CPU design projects teach different layers of the stack.

CPU Implementation, EDA, and Microcode

  • Discussion of how real CPUs are built: HDLs like Verilog, large EDA toolchains, standard cells, and macro-generated SRAMs.
  • Modern cores use mostly hardwired uop generation; microcode is reserved for complex or privileged instructions and for controlling internal knobs via MSRs.
  • Participants emphasize the gulf between digital logic design and typical software work, and how hidden microarchitectural optimizations (uop caches, LCP stalls, decoder limits) complicate performance tuning.

Assembly, ISAs, and Pedagogy

  • Some blame x86’s historical baggage for turning people off assembly; others say x86-64 is pleasant if legacy features are ignored.
  • RISC-V is cited as more regular, easier to teach, and offering higher code density and simpler compilers, though critics note that any high-performance ISA involves deep microarchitectural concerns.