Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB

Retro hardware & emulation

  • Many commenters love the “LLM on a Z80” angle and want:
    • A Z80 simulator bundled with the demos.
    • Ports to Game Boy, MSX, ZX Spectrum, Amstrad CPC, CP/M, and 48K Spectrum.
  • Existing CP/M/Z80 emulators were used to run the demos; they generally work, though one commenter struggled with the GUESS.COM game.
  • Discussion of Game Boy constraints:
    • 32KB ROM + 8KB RAM on original hardware.
    • 16KB banked ROM; suggestion to keep each LM layer in a single bank to minimize switching.
    • Main expected bottleneck is user text input, not bank-switch overhead.
  • Some worry performance on 8‑bit systems with bank switching will be “gnarly,” but see it as a fun challenge.

Model design & technical limits

  • The model is ~150k parameters, heavily quantized, and more “micro-LM” than a typical “small” model.
  • Commenters clarify it’s essentially an MLP without attention, embedding the entire input and using a short trigram-based “context.”
  • Questions raised:
    • Sensitivity of different layers/components to quantization; one reply reports first and last layers, and certain MLP blocks, degrade most under aggressive quantization.
    • Whether sparse weights were considered.
    • Token/s performance (no clear answer in thread).
  • Related exploration:
    • “Minimally viable LLM” that can have simple conversations.
    • Tiny models specialized for narrow tasks (e.g., regex generation).
    • Ideas like a “cognitive core” with minimal knowledge but good tool use.
    • RWKV and RNN-like architectures for efficient CPU inference.
    • Interest in what similar techniques could do on ESP32/RP2040 and smartphones.

Security and hidden information

  • One commenter asks if a secret (e.g., passphrase) baked into the weights would be recoverable from the model.
  • Responses:
    • With a network this small, reverse engineering is likely feasible.
    • More generally, this ties into model interpretability and “backdoor” research; a cited paper claims some backdoors can be undetectable to bounded adversaries.

Historical what-ifs & human perception

  • Strong comparisons to ELIZA, PARRY, and simple bots:
    • Some think this would have felt magical on 80s/90s hardware; others argue ELIZA-style scripting might feel more impressive given the terseness of replies.
  • Commenters note:
    • Similar techniques might have been technically possible on 60s–90s machines, potentially changing the trajectory of AI in games and interfaces.
    • Constraints of specific old hardware (e.g., IBM 7094 word memory vs. a 40KB Z80 binary).
  • One thread emphasizes that part of the “magic” is human: people work hard to interpret sparse, noisy output as meaningful, so even crude bots can feel conversational.

Implications for devices and software bloat

  • Some see this as a “stress test” proving that:
    • Very limited hardware can host non-trivial conversational behavior.
    • Embedded and IoT devices will soon ship with onboard LLMs.
  • Others speculate we’re at a “home computer era” for LLMs: with enough RAM, local open models plus custom agents can rival proprietary systems.
  • A long subthread contrasts this tiny model with modern desktop apps:
    • One side argues it exposes waste in chat apps needing gigabytes of RAM.
    • The other side counters that apps like Slack/Teams provide far more features (integrations, app ecosystems, rich video/screen-share, etc.), and that hardware and resource budgets have grown, changing tradeoffs.
    • Ongoing disagreement about whether modern “bloat” is justified or just developer convenience.

General reactions & use cases

  • Overall tone is enthusiastic: lots of stars, “super cool,” “magical” and “WarGames” vibes.
  • People imagine:
    • NPCs in games each backed by a tiny model.
    • Fuzzy-input retro RPGs/adventures that accept natural-ish language.
    • A tiny on-device assistant with huge context via external lookup.
  • Plenty of humor: jokes about AGI “just around the corner,” Z80 shortages, RAM prices, and SCP-style stories about haunted 8‑bit AIs.