Show HN: Voice-Pro – AI Voice Cloning
Overview
- Project is a Gradio-based WebUI that wraps existing audio/ML tools (Whisper variants, F5-TTS/E2 voice cloning, UVR5 vocal isolation, Edge-TTS, yt-dlp).
- Targeted at “content creators and developers” for cloning voices, dubbing, transcription, and YouTube processing.
- Many commenters see it as mainly an easy front-end; others note that making things easy and integrated is non-trivial and valuable.
Use Cases & Desired Features
- Interest in speech-to-speech: act a line with specific emotion/prosody and re-render it in another voice, preserving delivery.
- Creative uses: audiobooks/audioplays, tutorials where the voice owner can’t talk long, character voices for games/D&D, satire/parody, custom Home Assistant voices.
- Accessibility/identity uses: restoring or preserving voices for people losing speech; letting people uncomfortable with their natural voice (e.g., transgender users) sound closer to how they wish; privacy by masking real voice.
- Dubbing/translation: cross-language voice transfer while keeping emotion and speaker identity; auto-dubbing tools and “babelfish”-style real-time use are discussed.
Ethical Concerns & Misuse
- Strong worry about:
- Voice scams (especially targeting elderly relatives).
- Impersonation in spear-phishing and social engineering.
- Revenge porn and general identity co‑option.
- Undermining voice actors’ livelihoods and “stealing” their distinctive performance.
- Some argue cloning celebrities or public figures for entertainment is satire; others see it as clearly over a line.
- Several note that voice is a biometric and core part of personal identity.
Regulation, Responsibility & Social Adaptation
- Debate over whether technology creators are morally culpable given known misuse patterns.
- Some argue harms from “rogue actors” justify regulation of tools or compute; others say regulating open-source tools is practically impossible.
- Counter-arguments point to past “impossible to regulate” claims (internet, sales tax, GDPR) that proved false.
- Ideas floated:
- Strengthening right-of-publicity / likeness laws with private rights of action.
- Mandatory licenses for cloning third-party voices.
- Robust caller/authentication mechanisms (ID verification, digital signatures/watermarks on media).
- Family passphrases or improved caller-ID as practical mitigation.
- Disagreement over whether concern about scams is “doomerism” or necessary risk analysis.
Security, Installation & Openness
- Multiple red flags noted:
- Windows-only batch installer that asks users to bypass SmartScreen and possibly antivirus warnings.
- Directory of precompiled
.pyd/.dllfiles; some see this as incompatible with an MIT-licensed “open source” claim. - Hidden logic (e.g., one-click installer functions) that can’t easily be inspected.
- Defenders counter that:
- Similar patterns exist in other popular local ML UIs.
- Code runs in a conda/venv and mainly installs models and packages.
- Skeptical users emphasize that a venv is not a security boundary and treat the project as untrusted/malware-adjacent until proven otherwise.
- Some resort to running such tools on isolated machines/VLANs.
Licensing, Trial Limits & Business Model
- Despite MIT license, the app reportedly enforces a 30‑minute usage limit and then requires payment, with pricing hard to find (especially in English).
- Some see this as misleading for something promoted as open source; questions raised about patching/removing the limit.
Technical & Platform Notes
- No official Mac/Linux support in the packaged app; others note the underlying stack (Python + CUDA) is portable and “one Dockerfile away” from cross‑platform.
- Questions about low-RAM, CPU-only TTS; interest in alternatives like Coqui TTS, StyleTTSv2, tortoise, elevenlabs, and other open dubbing tools.
- Some suggest this project adds mainly integration and UX on top of existing libraries (“wrappers all the way down”).