2026-01-15

Handy – Free open source speech-to-text app

UI, accessibility, and CLI vs GUI

Some questioned why a GUI is needed; responses stressed accessibility to non-technical users and ease of installation (especially on macOS/Linux).
A separate CLI version exists and is used for automation / shell workflows.
Users praise the minimal, “obvious” UI and history view; one finds another app’s UI “too complicated” by comparison.

Models, speed, and local processing

Parakeet V3 is repeatedly praised as “incredibly fast” and highly accurate, often beating built‑in macOS dictation and other tools.
Handy runs fully locally, leveraging GPU where available; users value this both for privacy and avoiding ongoing costs.
“Discharging the model” simply unloads it from RAM, trading memory for slower cold starts.

Features, post‑processing, and limitations

Desired features: custom dictionary / replacements for domain terms, confidence indicators on words, ability to edit or correct already typed text, direct piping to tools like Claude Code, meeting transcription, API access, iOS/mobile apps, and an option to keep no history (currently in a debug menu).
Handy supports custom words, built‑in dictionary, and experimental LLM post‑processing (hidden in a debug menu).
Bluetooth mics (e.g., AirPods) introduce 1–2s start lag; internal laptop mics work better. Latency here is a common complaint.
There’s a hotkey pitfall: default Ctrl+Space can emit control characters if key‑up timing is unlucky (e.g., in Emacs).

Use cases and impact on workflows

Users employ Handy for: talking to coding agents/LLMs, writing Word comments/feedback, general dictation, and replacing Superwhisper/MacWhisper for accessibility (e.g., dystonia).
Some find speech faster than typing, especially when multitasking; others say they think/type faster and struggle to dictate fluently.
Discussion extends to “next‑level” workflows: feeding STT into LLM agents to execute commands, manipulate GUIs, or perform “coding by voice,” with references to prior and ongoing work and a related tool that records multimodal context for agents.

Comparisons and ecosystem

Handy is compared with Superwhisper, Wispr Flow, open‑whispr, WhisperTux, MacWhisper, FluidVoice, Hex, VoiceInk, and several mobile apps (Spokenly, Futo keyboard, Android Parakeet apps).
Many report Handy as at least competitive in accuracy/speed, with the main differentiators being UI, pricing (Handy is free/open), and real‑time vs batch transcription.
macOS Dictation is widely described as unreliable for accents, noisy environments, and technical terms.

Related topics