Generate audiobooks from E-books with Kokoro-82M
Overall reception of Kokoro-82M and audiblez tool
- Many commenters are impressed by Kokoro’s quality “for its size” and speed, especially compared to older TTS.
- Others find it only slightly above standard TTS: still flat, occasionally “robotic,” and not acceptable for fiction or long listening.
- Some users report bugs or limitations in the wrapper tool (chapter detection, section naming, lack of progress feedback, long processing times, issues on Windows).
Use cases: where it shines vs falls short
- Strong interest in using it for:
- Converting ebooks, articles, blog posts, and emails into audio for commutes, chores, and exercise.
- Accessibility: blind/low‑vision users, people who can’t sit and read, or can’t afford commercial audiobooks.
- Public‑domain or obscure works without existing audiobooks, and cross‑language access when translations or recordings don’t exist.
- Skepticism for:
- High‑quality fiction, where human narrators add pacing, emotion, character voices, songs, and subtle interpretation.
- Technical/non‑linear material (e.g., music theory) where TTS can mispronounce symbols and ignore diagrams/tables.
Alternatives & ecosystem
- Numerous alternatives discussed: Voice Dream, ElevenLabs Reader, Kybook, Read Aloud browser extensions, Edge/Kindle/iOS/macOS built‑in TTS, Piper, Coqui forks, Fish Speech, F5‑TTS, and others.
- Complaints about subscription pricing (e.g., $80/year tiers) and concern that some currently free services will become expensive.
- Desire for:
- Calibre plugins, EPUB3 audio+text sync, and general “read+listen” workflows that keep position across devices.
- Per‑character voices and richer productions (multiple voices, music, sound effects).
Technical notes & limitations
- Kokoro is praised for being small and trained on <100 hours of mainly English audio, but some doubt the claimed multilingual quality; Japanese and Chinese are called out as weak or lacking proper cadence.
- It may not be easily fine‑tunable, and details of its training process are seen as under‑documented.
Ethics, copyright, and cultural impact
- Voice cloning of favorite narrators raises copyright and consent concerns; some argue personal/private use is acceptable, others warn of legal gray areas.
- Broader worries:
- AI replacing low‑ to mid‑tier human narrators and shrinking career ladders.
- Market “flooded with mediocre AI content,” reducing incentives to fund top‑tier human work.
- Counterpoints stress accessibility gains and the historical pattern of new media tools displacing but also creating roles.