Show HN: A real time AI video agent with under 1 second of latency
Overall Reaction
- Many commenters found the real‑time AI video agent technically impressive, with sub‑second response often described as “future of interaction” or “call centers.”
- Others found it unsettling or “creepy,” especially the realism, constant eye contact, and difficulty hanging up.
UX & Use Cases
- Some users loved the conversational feel, found themselves being polite, and saw potential for:
- Sales agents, technical “co‑pilot” on calls, website reps.
- Language tutors and instructors.
- Mentors, edtech, interactive historical figures.
- Celebrity/influencer “digital twins,” ads, political outreach.
- A substantial group dislikes video calls in general and would prefer text chat or voice‑only.
- Multiple requests for:
- Text input option.
- Less intrusive commenting on personal appearance/background.
- Non‑human or more “cartoony” avatars to avoid uncanny valley.
Demo Experience & Limitations
- Reports of:
- Latency spikes, interruptions, and abrupt session endings (often attributed to “HN hug of death”).
- Avatars nodding or moving heads excessively; lip‑sync and expressions sometimes off.
- LLM unaware of its own visual appearance (e.g., denying wearing a hat) and surroundings in some cases.
- Trouble with name pronunciation and turn‑taking; sometimes cutting users off or misreading emotions.
- Vision features impressed many: reading book titles, noticing pets, decor, text on clothing, etc.
Technical & Infrastructure Discussion
- Stack involves fast LLMs, TTS/STT, and a custom video backbone using Gaussian Splatting rather than NeRF for speed on lower‑end hardware.
- Time‑to‑first‑token and end‑of‑turn detection highlighted as core challenges; discussion of VAD, speech confidence, and speculative decoding.
- Service pricing cited at ~$0.24/minute, billed in 6‑second increments.
- Comments on GPU multiplexing, pooling, and partnerships with GPU infra providers; recognition that costs are high but manageable.
Privacy, Security & Ethics
- Strong concern about giving face/voice to a startup; questions about data storage, cloning, hacking, future misuse, and acquisitions.
- Company reps state:
- No session audio/video stored by default.
- Users retain ownership of their clones; data used only for that clone and deletable.
- Many remain skeptical, referencing low trust in AI firms and analogies to genetic‑data companies.
- Broader worries about fraud, deepfakes, political manipulation, loneliness, and the morality of “owning” lifelike digital entities.