2024-11-28

Show HN: Voice-Pro – AI Voice Cloning

Overview

Project is a Gradio-based WebUI that wraps existing audio/ML tools (Whisper variants, F5-TTS/E2 voice cloning, UVR5 vocal isolation, Edge-TTS, yt-dlp).
Targeted at “content creators and developers” for cloning voices, dubbing, transcription, and YouTube processing.
Many commenters see it as mainly an easy front-end; others note that making things easy and integrated is non-trivial and valuable.

Use Cases & Desired Features

Interest in speech-to-speech: act a line with specific emotion/prosody and re-render it in another voice, preserving delivery.
Creative uses: audiobooks/audioplays, tutorials where the voice owner can’t talk long, character voices for games/D&D, satire/parody, custom Home Assistant voices.
Accessibility/identity uses: restoring or preserving voices for people losing speech; letting people uncomfortable with their natural voice (e.g., transgender users) sound closer to how they wish; privacy by masking real voice.
Dubbing/translation: cross-language voice transfer while keeping emotion and speaker identity; auto-dubbing tools and “babelfish”-style real-time use are discussed.

Ethical Concerns & Misuse

Strong worry about:
- Voice scams (especially targeting elderly relatives).
- Impersonation in spear-phishing and social engineering.
- Revenge porn and general identity co‑option.
- Undermining voice actors’ livelihoods and “stealing” their distinctive performance.
Some argue cloning celebrities or public figures for entertainment is satire; others see it as clearly over a line.
Several note that voice is a biometric and core part of personal identity.

Regulation, Responsibility & Social Adaptation

Debate over whether technology creators are morally culpable given known misuse patterns.
Some argue harms from “rogue actors” justify regulation of tools or compute; others say regulating open-source tools is practically impossible.
Counter-arguments point to past “impossible to regulate” claims (internet, sales tax, GDPR) that proved false.
Ideas floated:
- Strengthening right-of-publicity / likeness laws with private rights of action.
- Mandatory licenses for cloning third-party voices.
- Robust caller/authentication mechanisms (ID verification, digital signatures/watermarks on media).
- Family passphrases or improved caller-ID as practical mitigation.
Disagreement over whether concern about scams is “doomerism” or necessary risk analysis.

Security, Installation & Openness

Multiple red flags noted:
- Windows-only batch installer that asks users to bypass SmartScreen and possibly antivirus warnings.
- Directory of precompiled .pyd/.dll files; some see this as incompatible with an MIT-licensed “open source” claim.
- Hidden logic (e.g., one-click installer functions) that can’t easily be inspected.
Defenders counter that:
- Similar patterns exist in other popular local ML UIs.
- Code runs in a conda/venv and mainly installs models and packages.
Skeptical users emphasize that a venv is not a security boundary and treat the project as untrusted/malware-adjacent until proven otherwise.
Some resort to running such tools on isolated machines/VLANs.

Licensing, Trial Limits & Business Model

Despite MIT license, the app reportedly enforces a 30‑minute usage limit and then requires payment, with pricing hard to find (especially in English).
Some see this as misleading for something promoted as open source; questions raised about patching/removing the limit.

Technical & Platform Notes

No official Mac/Linux support in the packaged app; others note the underlying stack (Python + CUDA) is portable and “one Dockerfile away” from cross‑platform.
Questions about low-RAM, CPU-only TTS; interest in alternatives like Coqui TTS, StyleTTSv2, tortoise, elevenlabs, and other open dubbing tools.
Some suggest this project adds mainly integration and UX on top of existing libraries (“wrappers all the way down”).

Related topics