2025-12-29

Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB

Retro hardware & emulation

Many commenters love the “LLM on a Z80” angle and want:
- A Z80 simulator bundled with the demos.
- Ports to Game Boy, MSX, ZX Spectrum, Amstrad CPC, CP/M, and 48K Spectrum.
Existing CP/M/Z80 emulators were used to run the demos; they generally work, though one commenter struggled with the GUESS.COM game.
Discussion of Game Boy constraints:
- 32KB ROM + 8KB RAM on original hardware.
- 16KB banked ROM; suggestion to keep each LM layer in a single bank to minimize switching.
- Main expected bottleneck is user text input, not bank-switch overhead.
Some worry performance on 8‑bit systems with bank switching will be “gnarly,” but see it as a fun challenge.

Model design & technical limits

The model is ~150k parameters, heavily quantized, and more “micro-LM” than a typical “small” model.
Commenters clarify it’s essentially an MLP without attention, embedding the entire input and using a short trigram-based “context.”
Questions raised:
- Sensitivity of different layers/components to quantization; one reply reports first and last layers, and certain MLP blocks, degrade most under aggressive quantization.
- Whether sparse weights were considered.
- Token/s performance (no clear answer in thread).
Related exploration:
- “Minimally viable LLM” that can have simple conversations.
- Tiny models specialized for narrow tasks (e.g., regex generation).
- Ideas like a “cognitive core” with minimal knowledge but good tool use.
- RWKV and RNN-like architectures for efficient CPU inference.
- Interest in what similar techniques could do on ESP32/RP2040 and smartphones.

Security and hidden information

One commenter asks if a secret (e.g., passphrase) baked into the weights would be recoverable from the model.
Responses:
- With a network this small, reverse engineering is likely feasible.
- More generally, this ties into model interpretability and “backdoor” research; a cited paper claims some backdoors can be undetectable to bounded adversaries.

Historical what-ifs & human perception

Strong comparisons to ELIZA, PARRY, and simple bots:
- Some think this would have felt magical on 80s/90s hardware; others argue ELIZA-style scripting might feel more impressive given the terseness of replies.
Commenters note:
- Similar techniques might have been technically possible on 60s–90s machines, potentially changing the trajectory of AI in games and interfaces.
- Constraints of specific old hardware (e.g., IBM 7094 word memory vs. a 40KB Z80 binary).
One thread emphasizes that part of the “magic” is human: people work hard to interpret sparse, noisy output as meaningful, so even crude bots can feel conversational.

Implications for devices and software bloat

Some see this as a “stress test” proving that:
- Very limited hardware can host non-trivial conversational behavior.
- Embedded and IoT devices will soon ship with onboard LLMs.
Others speculate we’re at a “home computer era” for LLMs: with enough RAM, local open models plus custom agents can rival proprietary systems.
A long subthread contrasts this tiny model with modern desktop apps:
- One side argues it exposes waste in chat apps needing gigabytes of RAM.
- The other side counters that apps like Slack/Teams provide far more features (integrations, app ecosystems, rich video/screen-share, etc.), and that hardware and resource budgets have grown, changing tradeoffs.
- Ongoing disagreement about whether modern “bloat” is justified or just developer convenience.

General reactions & use cases

Overall tone is enthusiastic: lots of stars, “super cool,” “magical” and “WarGames” vibes.
People imagine:
- NPCs in games each backed by a tiny model.
- Fuzzy-input retro RPGs/adventures that accept natural-ish language.
- A tiny on-device assistant with huge context via external lookup.
Plenty of humor: jokes about AGI “just around the corner,” Z80 shortages, RAM prices, and SCP-style stories about haunted 8‑bit AIs.

Related topics