Show HN: A real time AI video agent with under 1 second of latency

Overall Reaction

  • Many commenters found the real‑time AI video agent technically impressive, with sub‑second response often described as “future of interaction” or “call centers.”
  • Others found it unsettling or “creepy,” especially the realism, constant eye contact, and difficulty hanging up.

UX & Use Cases

  • Some users loved the conversational feel, found themselves being polite, and saw potential for:
    • Sales agents, technical “co‑pilot” on calls, website reps.
    • Language tutors and instructors.
    • Mentors, edtech, interactive historical figures.
    • Celebrity/influencer “digital twins,” ads, political outreach.
  • A substantial group dislikes video calls in general and would prefer text chat or voice‑only.
  • Multiple requests for:
    • Text input option.
    • Less intrusive commenting on personal appearance/background.
    • Non‑human or more “cartoony” avatars to avoid uncanny valley.

Demo Experience & Limitations

  • Reports of:
    • Latency spikes, interruptions, and abrupt session endings (often attributed to “HN hug of death”).
    • Avatars nodding or moving heads excessively; lip‑sync and expressions sometimes off.
    • LLM unaware of its own visual appearance (e.g., denying wearing a hat) and surroundings in some cases.
    • Trouble with name pronunciation and turn‑taking; sometimes cutting users off or misreading emotions.
  • Vision features impressed many: reading book titles, noticing pets, decor, text on clothing, etc.

Technical & Infrastructure Discussion

  • Stack involves fast LLMs, TTS/STT, and a custom video backbone using Gaussian Splatting rather than NeRF for speed on lower‑end hardware.
  • Time‑to‑first‑token and end‑of‑turn detection highlighted as core challenges; discussion of VAD, speech confidence, and speculative decoding.
  • Service pricing cited at ~$0.24/minute, billed in 6‑second increments.
  • Comments on GPU multiplexing, pooling, and partnerships with GPU infra providers; recognition that costs are high but manageable.

Privacy, Security & Ethics

  • Strong concern about giving face/voice to a startup; questions about data storage, cloning, hacking, future misuse, and acquisitions.
  • Company reps state:
    • No session audio/video stored by default.
    • Users retain ownership of their clones; data used only for that clone and deletable.
  • Many remain skeptical, referencing low trust in AI firms and analogies to genetic‑data companies.
  • Broader worries about fraud, deepfakes, political manipulation, loneliness, and the morality of “owning” lifelike digital entities.