Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS

Overall impressions & use cases

  • Many commenters like the idea of a fast, fully local macOS hold‑to‑talk STT tool, especially for coding, prompting agents, and general dictation.
  • Several report immediately using the app successfully; others hit early bugs (e.g., missing microphone permission prompt, cleanup prompt hallucinating, Chinese speech producing English output).

Comparisons to other STT tools

  • Frequent comparisons to Handy, Superwhisper, MacWhisper, Hex, WisprFlow, openwhispr, and various Linux tools (e.g., hyprwhspr, FluidVoice, localvoxtral).
  • Handy is repeatedly praised as “fantastic,” with strong macOS/Linux integration and LLM post‑processing; some ask explicitly how Ghost Pepper differentiates.
  • Some prefer WisprFlow and other cloud STT for raw speed and accuracy, though note privacy concerns and cost.
  • Several say macOS built‑in dictation is “good enough” for simple needs, but weaker on technical terms and formatting than Whisper‑based tools.

Models & accuracy

  • Ghost Pepper uses Whisper and supports Parakeet v3; discussion compares:
    • Parakeet: often reported as faster and more accurate (if language supported), with good language auto‑detection and small footprint.
    • Whisper: praised as robust, multilingual, widely optimized, and less “hallucination‑prone” for some users.
    • Cohere Transcribe and Mistral Voxtral mentioned as state‑of‑the‑art cloud options.
  • Disagreement over whether Whisper still justifies its popularity versus newer models; some still prefer it after trying Parakeet.

UX, workflow, and feature requests

  • Core appeal: quick push‑to‑talk / hold‑to‑talk that types into any field, often combined with things like Stream Deck or hotkeys.
  • Desired features:
    • Automatic paste after transcription.
    • True streaming / live text display, with retroactive corrections.
    • Better endpoint detection and long‑form dictation ergonomics.
    • Custom vocabulary, corrections, and potentially user‑specific finetuning.
    • Non‑keyboard triggers (e.g., foot pedals) and action commands while speaking.
    • Support for transcribing system audio / videos more reliably than via mic.

Local vs platform / cloud ecosystems

  • Strong interest in fully local STT for privacy and reliability; some explicitly contrast this with cloud‑based WisprFlow and platform dictation.
  • Debate over Apple and Google’s built‑in models: some find them surprisingly good and fully offline; others report worse accuracy and odd behavior.
  • Meta‑theme: there is an explosion of near‑identical local STT apps; several note this “Hello World for LLMs” effect and wish for more consolidation and differentiation.