2025-01-21

Ask HN: Is anyone doing anything cool with tiny language models?

Overview

Thread surveys many real-world uses of “tiny” and small LMs (tens of millions to a few billion params), mostly running locally on CPUs or modest GPUs.
Strong interest in privacy, low latency, and narrow, specialized tasks rather than general chat.

Training & Deployment of Tiny Models

0.1B–0.125B models (GPT‑2 scale, JetBrains’ autocomplete) are reported as trainable on consumer hardware; rules of thumb like ~8× params in VRAM are mentioned.
Examples include training ~0.125B models on ~1B tokens in minutes on rented H100s using NanoGPT variants.
Disk sizes around ~70MB zipped per model are seen as small enough for static web apps and browser-based inference (WebGPU, transformers.js, web-llm).
Several projects use llama.cpp, Ollama, or custom C++ microservices to self-host small models with low latency.

Practical Use Cases

Developer tooling: code completion, bash/sed/awk one-liners, git commit messages, variable naming, address and job parsing, cookie-banner detection, Excel-like formula suggestion/repair, email or SMS agents, and structured extraction (e.g., nutrition labels, job attributes).
Productivity and research: Excel add‑ins and small classifiers to filter or prioritize scientific abstracts, detect urgent maternal-health messages, or categorize playlist songs/titles.
System integration: wake-word detection on microcontrollers, Android text firewalls, on-device “activity analysis” assistants, robot/voice command interfaces, and Raspberry Pi personal RAG assistants.

Creative & Playful Uses

Story generators on tiny displays, endless “office gossip” audio streams, Magic card generators, Tic‑Tac‑Toe opponents, NPC dialogue and bargaining in games, and persona-based rewriting (e.g., hiding personal style or imitating fictional characters).
Local LMs used to anonymize code/questions before sending to large remote models, then reinsert real identifiers afterward.

Capabilities, Limits & Skepticism

Consensus that small, fine‑tuned models excel at narrow, text-classification or transformation tasks (summarization, paraphrasing, filtering, translation in some cases).
Reported weaknesses: complex logic, math, temporal expressions, truly diverse creativity, and safety-critical reasoning.
Some argue specialized small models are more useful than giant generalists; others worry comparisons to legacy >100B models can be misleading.
Noted gaps: fine‑tuning expertise, good training data, and a way to package and share small, task-specific models (plus prompts and parsers) cleanly.

Privacy, Security & Ethics

Strong focus on keeping data on-device (HIPAA-sensitive EHR queries, SMS and call filtering, private editors).
Applications to detect prompt injection, rewrite toxic or abusive text, or classify unethical stock suggestions.
Some concern that playful uses (e.g., engaging spam) may inadvertently strengthen adversaries or raise new privacy risks.

Related topics