Show HN: Three new Kitten TTS models – smallest less than 25MB
Model capabilities & quality
- New KittenTTS models range from ~15–80M params; smallest under 25MB.
- Many users impressed by quality vs size, especially the 15M model outperforming the project’s previous 80M version.
- Prosody and expressiveness are praised for a tiny model, but issues remain with numbers, abbreviations, and some technical terms.
- Small models are said to struggle more with prosody; author claims next releases will further improve rhythm and intonation.
Voices, accents & customization
- Current voices are described as somewhat “cartoon/anime/helium”; several people want more neutral, professional voices for business and audiobooks.
- Requests for custom voice cloning, realistic accents, and more “serious” voices. A DIY custom-voice path and better pro voices are promised soon, plus a cloning model (15M–500M).
- Interest in specific languages: Spanish, French, German, Japanese, Bengali, Norwegian, Irish/British/Welsh accents. Multilingual and Japanese support are said to be in progress.
Control, tags & expressivity
- Strong demand for finer control: pitch/speed/volume, emotional tags ([sarcastically], [happily]), non-speech sounds ([gasp], [laughter], [clapping]), and intonation control.
- Current version does not support expressive tags, but developer is considering a small, well-defined tag set and possibly streaming/chunked generation.
Platforms, deployment & APIs
- High interest in on-device/mobile: iOS, Android (as system TTS for ebook readers, screen readers), Raspberry Pi, MCUs, browser/WebAssembly, C++/ONNX, and JS packages.
- Mobile SDK, custom inference engine, streaming support, browser/edge SDK, and text–audio alignment output are all on the roadmap.
- Edge/Next.js and Arduino-style chip deployment are seen as compelling due to the small model size.
Installation, dependencies & tooling
- Many complaints about Python environment issues: huge Torch/CUDA dependency chain (multiple GB), version incompatibilities (Python 3.14, spaCy, misaki), and packaging bloat.
- Community members share workarounds (CPU-only Torch, trimming unused imports, CLI wrappers) and ask for simpler, self-contained binaries or CLIs.
- The maintainer acknowledges environment problems and promises to slim dependencies, add better env tooling (uv/conda), and fix packaging bugs.
Use cases & related wishes
- Mentioned use cases: article/news/audiobook reading, screen readers, epub readers, automated business calls, home assistants, virtual pets, and offline accessibility.
- Some ask about STT models; the team says they are working on small STT with better formatting (line breaks, quotes) rather than just low WER.