Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS
Overall impressions & use cases
- Many commenters like the idea of a fast, fully local macOS hold‑to‑talk STT tool, especially for coding, prompting agents, and general dictation.
- Several report immediately using the app successfully; others hit early bugs (e.g., missing microphone permission prompt, cleanup prompt hallucinating, Chinese speech producing English output).
Comparisons to other STT tools
- Frequent comparisons to Handy, Superwhisper, MacWhisper, Hex, WisprFlow, openwhispr, and various Linux tools (e.g., hyprwhspr, FluidVoice, localvoxtral).
- Handy is repeatedly praised as “fantastic,” with strong macOS/Linux integration and LLM post‑processing; some ask explicitly how Ghost Pepper differentiates.
- Some prefer WisprFlow and other cloud STT for raw speed and accuracy, though note privacy concerns and cost.
- Several say macOS built‑in dictation is “good enough” for simple needs, but weaker on technical terms and formatting than Whisper‑based tools.
Models & accuracy
- Ghost Pepper uses Whisper and supports Parakeet v3; discussion compares:
- Parakeet: often reported as faster and more accurate (if language supported), with good language auto‑detection and small footprint.
- Whisper: praised as robust, multilingual, widely optimized, and less “hallucination‑prone” for some users.
- Cohere Transcribe and Mistral Voxtral mentioned as state‑of‑the‑art cloud options.
- Disagreement over whether Whisper still justifies its popularity versus newer models; some still prefer it after trying Parakeet.
UX, workflow, and feature requests
- Core appeal: quick push‑to‑talk / hold‑to‑talk that types into any field, often combined with things like Stream Deck or hotkeys.
- Desired features:
- Automatic paste after transcription.
- True streaming / live text display, with retroactive corrections.
- Better endpoint detection and long‑form dictation ergonomics.
- Custom vocabulary, corrections, and potentially user‑specific finetuning.
- Non‑keyboard triggers (e.g., foot pedals) and action commands while speaking.
- Support for transcribing system audio / videos more reliably than via mic.
Local vs platform / cloud ecosystems
- Strong interest in fully local STT for privacy and reliability; some explicitly contrast this with cloud‑based WisprFlow and platform dictation.
- Debate over Apple and Google’s built‑in models: some find them surprisingly good and fully offline; others report worse accuracy and odd behavior.
- Meta‑theme: there is an explosion of near‑identical local STT apps; several note this “Hello World for LLMs” effect and wish for more consolidation and differentiation.