2024-07-14

Show HN: I generated 70k audiobooks with OpenAI Text-to-Speech

Implementation & Open Sourcing

Code is currently closed source; some ask for open-sourcing to self-host or contribute.
Author describes it as a straightforward wrapper around OpenAI TTS + Google OAuth + a payment provider.
Others note there’s limited upside to open-sourcing unless for portfolio/visibility.

User Experience & Feature Requests

Requests: search across the large catalog, more login options beyond Google, light theme, and mobile apps.
Multiple voices and voice selection (by author or protagonist gender, or per-character) are highly requested but deprioritized so far.
Other ideas: multi-narrator “audio play” style, 1.5x generation speed (not just playback), Apple Pay support.

Audio Quality, Languages & Use Cases

Several listeners praise the natural cadence; some say it’s the best TTS they’ve heard, especially for essays and non-fiction.
Others find it still slightly unnatural, with odd pauses/emphasis, and say it fails badly on poetry/dramatic works (e.g., Shakespeare’s meter).
Consensus in the thread: TTS is currently much better for history/philosophy/science/non-fiction than for fiction and dialogue-heavy texts.
OpenAI TTS is reported as weak for non‑English; some note other models do better at emotion but worse in voice quality or hallucinations.

Generation Strategy & Scalability

System splits books into ~4k-character chunks due to API limits, generating audio on-demand.
It pre-generates the next chunk near the end of the current one to keep playback seamless.
Full-book pre-generation and chapter MP3s are planned but not finished.

Monetization, Pricing & Caching

Current model: one-time purchase of listening “hours,” with pricing set around 50% of raw API costs; profit appears only after multiple purchases of the same book.
Revenue so far is very low; author hopes to reach modest MRR.
Ideas from commenters:
- Monthly subscription and mobile app for recurring revenue.
- Crowd-funding per book (many small contributors unlock a free public audio).
- First buyer funds generation; others pay less, or listen free.
- Using the project as a free/donation-based “lead magnet” for other products.

Ethics & Value of Charging for Public Domain

Some see charging for public-domain audiobooks as unethical or “gross”; others reply that API/storage costs must be covered and point out that many businesses charge for public-domain content.
A compromise suggested: charge only at cost, or let users “donate” generated audio to the public.

Comparisons to Existing Projects & Tools

Mention of Microsoft’s prior Gutenberg TTS effort; some say its voices are worse than OpenAI’s.
Librivox is cited as a human-read alternative; some prefer human narration, others find many Librivox readings lower quality than the AI.
Various TTS engines are discussed (ElevenLabs, Piper, Bark, xTTS, Voicebox); consensus is that OpenAI TTS is currently among the most pleasant but not perfect.

Marketing & Title Controversy

A subthread argues over the HN post title claiming “generated 70k audiobooks” since books are generated on demand, not precomputed.
Critics call this misleading or a “lie”; supporters say it’s a reasonable shorthand since all 70k are playable via the system and the on-demand detail is disclosed in the post.

Related topics