Handy – Free open source speech-to-text app

UI, accessibility, and CLI vs GUI

  • Some questioned why a GUI is needed; responses stressed accessibility to non-technical users and ease of installation (especially on macOS/Linux).
  • A separate CLI version exists and is used for automation / shell workflows.
  • Users praise the minimal, “obvious” UI and history view; one finds another app’s UI “too complicated” by comparison.

Models, speed, and local processing

  • Parakeet V3 is repeatedly praised as “incredibly fast” and highly accurate, often beating built‑in macOS dictation and other tools.
  • Handy runs fully locally, leveraging GPU where available; users value this both for privacy and avoiding ongoing costs.
  • “Discharging the model” simply unloads it from RAM, trading memory for slower cold starts.

Features, post‑processing, and limitations

  • Desired features: custom dictionary / replacements for domain terms, confidence indicators on words, ability to edit or correct already typed text, direct piping to tools like Claude Code, meeting transcription, API access, iOS/mobile apps, and an option to keep no history (currently in a debug menu).
  • Handy supports custom words, built‑in dictionary, and experimental LLM post‑processing (hidden in a debug menu).
  • Bluetooth mics (e.g., AirPods) introduce 1–2s start lag; internal laptop mics work better. Latency here is a common complaint.
  • There’s a hotkey pitfall: default Ctrl+Space can emit control characters if key‑up timing is unlucky (e.g., in Emacs).

Use cases and impact on workflows

  • Users employ Handy for: talking to coding agents/LLMs, writing Word comments/feedback, general dictation, and replacing Superwhisper/MacWhisper for accessibility (e.g., dystonia).
  • Some find speech faster than typing, especially when multitasking; others say they think/type faster and struggle to dictate fluently.
  • Discussion extends to “next‑level” workflows: feeding STT into LLM agents to execute commands, manipulate GUIs, or perform “coding by voice,” with references to prior and ongoing work and a related tool that records multimodal context for agents.

Comparisons and ecosystem

  • Handy is compared with Superwhisper, Wispr Flow, open‑whispr, WhisperTux, MacWhisper, FluidVoice, Hex, VoiceInk, and several mobile apps (Spokenly, Futo keyboard, Android Parakeet apps).
  • Many report Handy as at least competitive in accuracy/speed, with the main differentiators being UI, pricing (Handy is free/open), and real‑time vs batch transcription.
  • macOS Dictation is widely described as unreliable for accents, noisy environments, and technical terms.