2025-08-06

Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model

Vision: tiny, offline, everywhere

Many see this as a step toward small, offline ML models that run on cheap, ubiquitous hardware without GPUs or cloud calls.
Use cases discussed: toys, home assistants, medical devices, language learning tools, navigation, robots, “smart toasters,” and local voice interfaces layered on local LLMs.
Some contrast this “pay once, runs anywhere” model with subscription/cloud approaches from big tech.

Language support and scope

Current model is English-only; multilingual models are said to be “in the works.”
Several commenters dislike that the README doesn’t explicitly state the language.
Non‑English inputs (Japanese, Thai, etc.) either fail or produce nonsense. Expectation is separate models per language, similar to other TTS projects.

Quality and voice characteristics

Opinions diverge sharply: some call the quality “amazing for 25MB CPU-only,” others find it metallic, mechanical, “anime/overacted,” or tiring for long listening.
Web and Reddit demos are generally rated higher quality than many users’ local runs; some suspect different settings or voices.
The release is described as an early “preview checkpoint,” around 10% trained, with promises of improved 15M and 80M models soon.
Issues reported: weak punctuation/pauses, occasional mispronunciation, problems with very short phrases, but notably good handling of numbers compared to some LLM-based TTS.

Performance and latency

Benchmarks on a high-end laptop show ~5× realtime generation once loaded; low‑end CPUs can be around realtime or slower.
Some compare unfavorably to Piper on a Raspberry Pi, which feels “almost instant.”
Current demo has no chunking, so long texts can fail; chunking is planned.
Browser demo uses ONNX Runtime; works well in Chrome, but some report Safari/WebGPU issues.

Dependencies, packaging, and licensing

Despite a ~25MB model, Python environments often balloon to multiple GB and are fragile across Python versions. Many complaints about “dependency hell.”
ONNX and phonemizer/espeak-ng preprocessing are the main heavy dependencies; maintainers say they’ll try to reduce this and offer a cleaner SDK.
While the model is advertised as Apache‑2.0, reliance on a GPL‑3 phonemizer (itself using GPL espeak‑ng) effectively makes the combined project GPL‑3 in practice; there’s a long subthread on GPL compatibility, exceptions, and dual licensing.

Comparisons and alternatives

Frequently mentioned alternatives: Piper, KokoroTTS, Dia, Chatterbox, SherpaTTS, Coqui XTTS, Fish-Speech, F5‑TTS, Picovoice Orca, plus classic Festival, eSpeak, DECtalk, and SAM.
Consensus: this model is not yet SOTA in naturalness, but is notable for its combination of tiny size, CPU‑only inference, and permissive licensing tier (subject to the GPL issue).

Related topics