Things I learned while writing an x86 emulator (2023)
x86 Complexity and Quirks
- Many commenters describe x86 as one of the messiest mainstream ISAs, especially compared with RISC-V and modern ARM.
- Numerous quirks are highlighted:
- BSF/BSR vs TZCNT/LZCNT zero-input behavior, and real-world code (e.g., libc) depending on de‑facto, not documented, semantics.
- Legacy oddities like
0x90being a special NOP, high 8-bit registers interacting badly with REX, segment overrides mostly ignored in 64‑bit mode. - Prefix behavior (66/67, segment overrides, REX/VEX/EVEX, APX REX2) and cases where bits are inconsistently ignored or cause faults.
- Historical curiosities like BSWAP with operand-size prefix and inconsistent behavior across CPUs.
Instruction Encoding and Decoding
- Several participants are actively implementing or rewriting x86 decoders and disassemblers, including for QEMU and custom tools.
- Approaches include hand-written C tables vs autogenerated tables from higher-level specs; both find Intel documentation incomplete or subtly wrong in edge cases.
- EVEX and APX are seen as pushing complexity further, requiring more context-dependent decoding.
- Comparison with AArch64 finds ARM more regular at the mnemonic/semantic level but sometimes more complex in operand encoding and constraints.
Emulation and Understanding CPUs
- Writing emulators (from 6800/68k to x86 and consoles) is widely praised as a powerful way to understand ISAs, calling it a “missing link” between high-level code and hardware.
- Others argue that to understand modern superscalar CPUs, you must design at gate/RTL level; emulators mostly expose architectural behavior, not pipelines, caches, or speculation.
- A compromise view: both emulator and simple CPU design projects teach different layers of the stack.
CPU Implementation, EDA, and Microcode
- Discussion of how real CPUs are built: HDLs like Verilog, large EDA toolchains, standard cells, and macro-generated SRAMs.
- Modern cores use mostly hardwired uop generation; microcode is reserved for complex or privileged instructions and for controlling internal knobs via MSRs.
- Participants emphasize the gulf between digital logic design and typical software work, and how hidden microarchitectural optimizations (uop caches, LCP stalls, decoder limits) complicate performance tuning.
Assembly, ISAs, and Pedagogy
- Some blame x86’s historical baggage for turning people off assembly; others say x86-64 is pleasant if legacy features are ignored.
- RISC-V is cited as more regular, easier to teach, and offering higher code density and simpler compilers, though critics note that any high-performance ISA involves deep microarchitectural concerns.