2025-04-28

Show HN: I built a hardware processor that runs Python

Architecture and Execution Model

Custom stack-based CPU (PyXL) implemented in Verilog on a Zynq‑7000 FPGA at 100 MHz.
Pipeline executes a custom Python-oriented ISA (PySM), derived from CPython bytecode but simplified for hardware pipelining.
In‑order core, no speculative execution; focus on determinism and predictable timing for embedded/real‑time work.
Toolchain: Python source → CPython bytecode → PySM assembly → hardware binary. ARM side only orchestrates setup/IO; Python logic runs on the custom core.

Performance and Benchmarks

Headline demo: ~480 ns GPIO “round trip”, reported as ~30× faster than MicroPython on a Pyboard at lower clock.
Some commenters emphasize this is latency, not overall throughput; a 100 MHz FPGA won’t match a modern OoO CPU on bulk workloads.
Others note the 480 ns implies ~48 cycles per GPIO toggle and ask where the cycles go and how close it is to hand‑written C/ASM; author hints at future write‑up and room for optimization.
A critical thread argues the comparison should include MicroPython’s “viper” native emitter, not just interpreted GPIO, suggesting current benchmarks may overstate relative gains.

Python Subset, Semantics, and Toolchain Choices

Currently supports only a subset of Python; no threads, heavy reflection, or dynamic loading; many stdlib features and C extensions are not yet available.
Targeting CPython bytecode is defended as easier for early iteration and more insulated from syntax changes, but others warn bytecode is unstable and poorly documented, recommending targeting AST or RPython‑style subsets instead.
ISA is strongly tuned to Python’s stack-based, dynamically‑typed model; mapping directly to ARM/x86 or RISC‑V was rejected as inefficient for these semantics.

Garbage Collection and Memory Management

Memory management identified as one of the hardest problems.
GC design is intended to be asynchronous/background to avoid halting real‑time execution, but details and implementation are still work in progress.
Questions remain on how variable‑length operations (e.g., string concat), malloc/free, and mixed‑type arithmetic dispatch are handled in hardware.

Use Cases and Ecosystem Integration

Primary goal: C‑like or near‑C performance with Python ergonomics for embedded/real‑time systems, not general server CPUs.
Envisioned future roles: soft core in SoCs, ASIC eventually, possibly as a coprocessor alongside ARM/RISC‑V handling C libraries and peripherals.
Some imagine “accelerated Python” cloud offerings or ML feature‑generation accelerators, but author stresses focusing first on concrete embedded use cases.

Comparisons and Historical Context

Related to prior language‑specific hardware: Lisp machines, Java processors (e.g., Jazelle, PicoJava), Forth and Haskell CPUs, and Python‑on‑FPGA experiments.
Multiple comments note such language‑tuned CPUs have historically been eclipsed by JITs and optimizing compilers on commodity hardware, raising the question of whether this design can avoid the same fate.

Reception, Naming, and Claims

Strong enthusiasm for the technical achievement and for lowering the barrier to “real‑time Python.”
Some push back on marketing phrases like “executes Python directly” given there is still an ahead‑of‑time compilation step to a custom ISA, arguing for more precise wording.
Source is not yet public; future licensing as an IP core and broader documentation are left open.

Related topics