2025-06-05

Eleven v3

Voice quality vs human performance

Many commenters find the English voices strikingly realistic, “almost indistinguishable” from real voice actors for short clips.
A professional voice actor strongly disagrees: says it’s still far from professional work, with missing or forced emotion, flat/predictable delivery, odd timing, and fatiguing for long-form listening.
Several note it sounds like polished radio ads rather than natural conversation; tone feels exaggerated in a uniform, “monotonous” way.
Some see it as great for quick/low-effort content (TikTok, simple narration), but not yet acceptable for audiobooks or high-end acting.

Languages, accents & localization

Consensus: American English is excellent; many other languages are inconsistent or bad.
Reports of strong English accents, mid-sentence accent switches, or outright nonsense in: Russian, Romanian, Bulgarian, Italian, Greek, French, Portuguese, Swedish, Norwegian, Japanese, Kazakh, Spanish variants, Tagalog, etc.
Some languages/voices fare better: Polish is praised, some German and Tamil samples are “okay to good,” but often still sound like an announcer or phone assistant.
Quality is highly dependent on matching a native-language voice from the voice library; homepage demos are often worse.
Accent handling (e.g., British, French-accented English) is hit-or-miss and sometimes comical.
Site UI localization into non-English languages is described as clumsy, literal, and clearly non-native.

Pricing, business model & competition

Pricing for v3 API is unclear; public API is “coming soon.” There’s an 80% discount via UI until mid‑2025 and startup grants for high tiers.
Several complain about subscription + credit “funny money” models and “voice slots,” preferring pure pay‑as‑you‑go.
Comparisons suggest Eleven is several times more expensive than OpenAI’s TTS at small scale, though may become competitive at very high tiers.
Many say Eleven remains quality leader, but high prices create space for rivals and open source: Chatterbox, Kokoro, NVIDIA NeMo + XTTS, PlayHT, Hume, Mirage, etc.

Features, quirks & API

v3 includes expressive tags (e.g., laughs), but laughter often sounds like a separate inserted segment rather than integrated into words.
Some users observe limited but surprising singing behavior triggered by song lyrics or [verse]/[chorus] tags; quality is roughly “like a human who can’t sing.”
Reports of number misreads, language-accent glitches, and voice-breaking changes from v2 to v3.
Echo issues in voice agents are attributed by others to missing client-side echo cancellation.
v3 is currently a research preview and not fully available via API yet.

User experience, ethics & aesthetics

Strong unease about replacing human voice actors and narrators; some call it anti-human and depressing, especially when real voices are cloned.
Audiobook users value human narrators as scarce curators; fear platforms will cut costs with AI and degrade the experience.
Several dislike the “patronizing,” emotionally validating style in support scripts, expecting it to age into an obvious negative trope.
Others simply find the demos insincere and would rather have minimal, task-focused machine voices.

Related topics