2026-04-06

Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS

Overall impressions & use cases

Many commenters like the idea of a fast, fully local macOS hold‑to‑talk STT tool, especially for coding, prompting agents, and general dictation.
Several report immediately using the app successfully; others hit early bugs (e.g., missing microphone permission prompt, cleanup prompt hallucinating, Chinese speech producing English output).

Comparisons to other STT tools

Frequent comparisons to Handy, Superwhisper, MacWhisper, Hex, WisprFlow, openwhispr, and various Linux tools (e.g., hyprwhspr, FluidVoice, localvoxtral).
Handy is repeatedly praised as “fantastic,” with strong macOS/Linux integration and LLM post‑processing; some ask explicitly how Ghost Pepper differentiates.
Some prefer WisprFlow and other cloud STT for raw speed and accuracy, though note privacy concerns and cost.
Several say macOS built‑in dictation is “good enough” for simple needs, but weaker on technical terms and formatting than Whisper‑based tools.

Models & accuracy

Ghost Pepper uses Whisper and supports Parakeet v3; discussion compares:
- Parakeet: often reported as faster and more accurate (if language supported), with good language auto‑detection and small footprint.
- Whisper: praised as robust, multilingual, widely optimized, and less “hallucination‑prone” for some users.
- Cohere Transcribe and Mistral Voxtral mentioned as state‑of‑the‑art cloud options.
Disagreement over whether Whisper still justifies its popularity versus newer models; some still prefer it after trying Parakeet.

UX, workflow, and feature requests

Core appeal: quick push‑to‑talk / hold‑to‑talk that types into any field, often combined with things like Stream Deck or hotkeys.
Desired features:
- Automatic paste after transcription.
- True streaming / live text display, with retroactive corrections.
- Better endpoint detection and long‑form dictation ergonomics.
- Custom vocabulary, corrections, and potentially user‑specific finetuning.
- Non‑keyboard triggers (e.g., foot pedals) and action commands while speaking.
- Support for transcribing system audio / videos more reliably than via mic.

Local vs platform / cloud ecosystems

Strong interest in fully local STT for privacy and reliability; some explicitly contrast this with cloud‑based WisprFlow and platform dictation.
Debate over Apple and Google’s built‑in models: some find them surprisingly good and fully offline; others report worse accuracy and odd behavior.
Meta‑theme: there is an explosion of near‑identical local STT apps; several note this “Hello World for LLMs” effect and wish for more consolidation and differentiation.

Related topics