2026-03-19

Show HN: Three new Kitten TTS models – smallest less than 25MB

Model capabilities & quality

New KittenTTS models range from ~15–80M params; smallest under 25MB.
Many users impressed by quality vs size, especially the 15M model outperforming the project’s previous 80M version.
Prosody and expressiveness are praised for a tiny model, but issues remain with numbers, abbreviations, and some technical terms.
Small models are said to struggle more with prosody; author claims next releases will further improve rhythm and intonation.

Voices, accents & customization

Current voices are described as somewhat “cartoon/anime/helium”; several people want more neutral, professional voices for business and audiobooks.
Requests for custom voice cloning, realistic accents, and more “serious” voices. A DIY custom-voice path and better pro voices are promised soon, plus a cloning model (15M–500M).
Interest in specific languages: Spanish, French, German, Japanese, Bengali, Norwegian, Irish/British/Welsh accents. Multilingual and Japanese support are said to be in progress.

Control, tags & expressivity

Strong demand for finer control: pitch/speed/volume, emotional tags ([sarcastically], [happily]), non-speech sounds ([gasp], [laughter], [clapping]), and intonation control.
Current version does not support expressive tags, but developer is considering a small, well-defined tag set and possibly streaming/chunked generation.

Platforms, deployment & APIs

High interest in on-device/mobile: iOS, Android (as system TTS for ebook readers, screen readers), Raspberry Pi, MCUs, browser/WebAssembly, C++/ONNX, and JS packages.
Mobile SDK, custom inference engine, streaming support, browser/edge SDK, and text–audio alignment output are all on the roadmap.
Edge/Next.js and Arduino-style chip deployment are seen as compelling due to the small model size.

Installation, dependencies & tooling

Many complaints about Python environment issues: huge Torch/CUDA dependency chain (multiple GB), version incompatibilities (Python 3.14, spaCy, misaki), and packaging bloat.
Community members share workarounds (CPU-only Torch, trimming unused imports, CLI wrappers) and ask for simpler, self-contained binaries or CLIs.
The maintainer acknowledges environment problems and promises to slim dependencies, add better env tooling (uv/conda), and fix packaging bugs.

Use cases & related wishes

Mentioned use cases: article/news/audiobook reading, screen readers, epub readers, automated business calls, home assistants, virtual pets, and offline accessibility.
Some ask about STT models; the team says they are working on small STT with better formatting (line breaks, quotes) rather than just low WER.

Related topics