Show HN: Voice-Pro – AI Voice Cloning

Overview

  • Project is a Gradio-based WebUI that wraps existing audio/ML tools (Whisper variants, F5-TTS/E2 voice cloning, UVR5 vocal isolation, Edge-TTS, yt-dlp).
  • Targeted at “content creators and developers” for cloning voices, dubbing, transcription, and YouTube processing.
  • Many commenters see it as mainly an easy front-end; others note that making things easy and integrated is non-trivial and valuable.

Use Cases & Desired Features

  • Interest in speech-to-speech: act a line with specific emotion/prosody and re-render it in another voice, preserving delivery.
  • Creative uses: audiobooks/audioplays, tutorials where the voice owner can’t talk long, character voices for games/D&D, satire/parody, custom Home Assistant voices.
  • Accessibility/identity uses: restoring or preserving voices for people losing speech; letting people uncomfortable with their natural voice (e.g., transgender users) sound closer to how they wish; privacy by masking real voice.
  • Dubbing/translation: cross-language voice transfer while keeping emotion and speaker identity; auto-dubbing tools and “babelfish”-style real-time use are discussed.

Ethical Concerns & Misuse

  • Strong worry about:
    • Voice scams (especially targeting elderly relatives).
    • Impersonation in spear-phishing and social engineering.
    • Revenge porn and general identity co‑option.
    • Undermining voice actors’ livelihoods and “stealing” their distinctive performance.
  • Some argue cloning celebrities or public figures for entertainment is satire; others see it as clearly over a line.
  • Several note that voice is a biometric and core part of personal identity.

Regulation, Responsibility & Social Adaptation

  • Debate over whether technology creators are morally culpable given known misuse patterns.
  • Some argue harms from “rogue actors” justify regulation of tools or compute; others say regulating open-source tools is practically impossible.
  • Counter-arguments point to past “impossible to regulate” claims (internet, sales tax, GDPR) that proved false.
  • Ideas floated:
    • Strengthening right-of-publicity / likeness laws with private rights of action.
    • Mandatory licenses for cloning third-party voices.
    • Robust caller/authentication mechanisms (ID verification, digital signatures/watermarks on media).
    • Family passphrases or improved caller-ID as practical mitigation.
  • Disagreement over whether concern about scams is “doomerism” or necessary risk analysis.

Security, Installation & Openness

  • Multiple red flags noted:
    • Windows-only batch installer that asks users to bypass SmartScreen and possibly antivirus warnings.
    • Directory of precompiled .pyd/.dll files; some see this as incompatible with an MIT-licensed “open source” claim.
    • Hidden logic (e.g., one-click installer functions) that can’t easily be inspected.
  • Defenders counter that:
    • Similar patterns exist in other popular local ML UIs.
    • Code runs in a conda/venv and mainly installs models and packages.
  • Skeptical users emphasize that a venv is not a security boundary and treat the project as untrusted/malware-adjacent until proven otherwise.
  • Some resort to running such tools on isolated machines/VLANs.

Licensing, Trial Limits & Business Model

  • Despite MIT license, the app reportedly enforces a 30‑minute usage limit and then requires payment, with pricing hard to find (especially in English).
  • Some see this as misleading for something promoted as open source; questions raised about patching/removing the limit.

Technical & Platform Notes

  • No official Mac/Linux support in the packaged app; others note the underlying stack (Python + CUDA) is portable and “one Dockerfile away” from cross‑platform.
  • Questions about low-RAM, CPU-only TTS; interest in alternatives like Coqui TTS, StyleTTSv2, tortoise, elevenlabs, and other open dubbing tools.
  • Some suggest this project adds mainly integration and UX on top of existing libraries (“wrappers all the way down”).