Show HN: I built a tiny LLM to demystify how language models work
Educational value and goals
- Many see the project as a great, approachable end‑to‑end example of how to train and run a small language model, useful for newcomers and as a teaching tool.
- The constrained “fish” persona is praised as a clever way to make the model’s limits intuitive: a tiny model, tiny world model, and tiny personality.
- Some commenters compare it to other educational resources (spreadsheet‑based demos, “LLMs from scratch”, tiny visual LLMs) and see it as complementary rather than a replacement.
Model capabilities and limitations
- With ~9M parameters, the model mostly parrots patterns from its synthetic training data; several examples look like direct memorization.
- It struggles with out‑of‑distribution or “unknown” queries; the author confirms this is expected and that the goal is demonstration, not robustness.
- Uppercase text is effectively unsupported because the tokenizer/training data used only lowercase; this produces quirky but still in‑character responses.
- There is discussion that such a small model can’t reliably follow conditional instructions; one commenter suggests ~20–25M parameters as a rough threshold for basic instruction following in narrow domains.
Data generation and training
- The personality and dialogue are built from synthetic, templatized “mad‑libs” style data.
- Some ask how much data is needed to make the persona coherent, and how the binary‑compressed dataset is created and used.
- Questions arise about whether LLMs can be trained purely via conversational interaction rather than large offline datasets; constraints like context windows and current architectures are mentioned.
Philosophical and conceptual debates
- A discussion branches into Nagel’s “what is it like to be a bat,” arguing over whether we can map between different minds’ experiences versus just fictional personas.
- Another thread debates the “meaning of life” as food vs reproduction vs gene survival, and whether such goals are meaningful descriptions at all.
Tooling, documentation, and comparisons
- Some appreciate the minimal, vanilla PyTorch implementation; others criticize a lack of deeper explanation and documentation, calling the project oversold.
- There are installation and checkpoint/tokenizer path issues, plus questions about exporting to formats like GGUF.
- Comparisons with other “mini GPT” projects are requested; some argue such comparisons help learners, others say they’re not the author’s responsibility.
Meta and community reactions
- Many express delight at the humor and honesty of a fish that cares only about food.
- Several note how remarkable it is that such conversational models now run as hobby projects on laptops.
- A side discussion laments AI‑generated “slop” comments and the rising use of LLMs both to write code and to understand code, with debate over whether this reduces the need for traditional documentation.