2024-12-20

The era of open voice assistants

Perceived decline of big-tech assistants

Many commenters report Alexa/Google Home getting slower and less reliable (timers, music, basic commands), especially compared to current LLMs.
Others say their devices still work fine, highlighting inconsistent experiences.
Several note these products generate little profit; some argue this explains lack of investment and layoffs, others say “rich companies” could afford to fix them but choose not to.

Cloud vs local economics and architecture

One view: using LLMs for all Alexa requests would be financially impractical at Amazon scale; GPU-heavy cloud workloads don’t pay for a “free” product.
Counterview: putting an expensive GPU in every home is wasteful (idle most of the time); centralized GPUs plus subscription make more sense.
A middle position: scaling AI infra to millions is hard either way; dedicated efficient SoCs (Apple Silicon–style) may eventually make local AI practical.

Home Assistant Voice device: role and hardware

Device is positioned as open, privacy-preserving “satellite”: mic + speaker + wake word on ESP32-S3 plus XMOS audio processing, connecting to a separate Home Assistant server.
XMOS is valued for beamforming/noise reduction so wake-word and STT work even with music or distance.
Users like that it has audio out and Grove connector; some expect to pair it with better speakers.
Sold out quickly in many regions; some already ordered many units to replace Echos.

Software stack, LLMs, and extensibility

Voice pipeline is modular: wake word, STT, intent/LLM, TTS can each be local or cloud (Whisper/faster-whisper, Piper, Coqui, Ollama, OpenAI, etc.).
Assist can first try structured “home control” intents, then fall back to a general LLM for arbitrary questions.
ESPHome is the main SDK; firmware and case design are open, with expectation of forks and custom hardware variants.

Privacy, openness, and ecosystem

Strong enthusiasm for a fully local, open, auditable alternative to Amazon/Google; many explicitly cite distrust of corporate data practices.
Some want to avoid any cloud use; others are fine with Nabu Casa cloud to support development and offload heavy workloads.
Comparisons: Home Assistant is seen as more capable and community-rich than openHAB; Mycroft is cited as an earlier, ill-fated attempt whose ideas and people partially live on here.

Concerns, trade-offs, and open questions

Some report past HA voice pipelines as unreliable; others find them powerful but complex to set up.
Worries include: HA’s weak fine-grained security model, lack of standard auth (OIDC), UI-over-YAML trend being “anti-engineer,” and unclear docs around running advanced models on user GPUs.
Multilingual quality and music streaming remain pain points; HA is actively crowdsourcing language support, while music depends on external provider integrations.
Debate over “no wake word” assistants raises technical and UX challenges (false triggers, constant compute).

Related topics