Generate audiobooks from E-books with Kokoro-82M

Overall reception of Kokoro-82M and audiblez tool

  • Many commenters are impressed by Kokoro’s quality “for its size” and speed, especially compared to older TTS.
  • Others find it only slightly above standard TTS: still flat, occasionally “robotic,” and not acceptable for fiction or long listening.
  • Some users report bugs or limitations in the wrapper tool (chapter detection, section naming, lack of progress feedback, long processing times, issues on Windows).

Use cases: where it shines vs falls short

  • Strong interest in using it for:
    • Converting ebooks, articles, blog posts, and emails into audio for commutes, chores, and exercise.
    • Accessibility: blind/low‑vision users, people who can’t sit and read, or can’t afford commercial audiobooks.
    • Public‑domain or obscure works without existing audiobooks, and cross‑language access when translations or recordings don’t exist.
  • Skepticism for:
    • High‑quality fiction, where human narrators add pacing, emotion, character voices, songs, and subtle interpretation.
    • Technical/non‑linear material (e.g., music theory) where TTS can mispronounce symbols and ignore diagrams/tables.

Alternatives & ecosystem

  • Numerous alternatives discussed: Voice Dream, ElevenLabs Reader, Kybook, Read Aloud browser extensions, Edge/Kindle/iOS/macOS built‑in TTS, Piper, Coqui forks, Fish Speech, F5‑TTS, and others.
  • Complaints about subscription pricing (e.g., $80/year tiers) and concern that some currently free services will become expensive.
  • Desire for:
    • Calibre plugins, EPUB3 audio+text sync, and general “read+listen” workflows that keep position across devices.
    • Per‑character voices and richer productions (multiple voices, music, sound effects).

Technical notes & limitations

  • Kokoro is praised for being small and trained on <100 hours of mainly English audio, but some doubt the claimed multilingual quality; Japanese and Chinese are called out as weak or lacking proper cadence.
  • It may not be easily fine‑tunable, and details of its training process are seen as under‑documented.

Ethics, copyright, and cultural impact

  • Voice cloning of favorite narrators raises copyright and consent concerns; some argue personal/private use is acceptable, others warn of legal gray areas.
  • Broader worries:
    • AI replacing low‑ to mid‑tier human narrators and shrinking career ladders.
    • Market “flooded with mediocre AI content,” reducing incentives to fund top‑tier human work.
  • Counterpoints stress accessibility gains and the historical pattern of new media tools displacing but also creating roles.