Show HN: Infinity – Realistic AI characters that can speak

Overall impressions & use cases

  • Many commenters find the talking-head results “breathtaking” and surprisingly expressive, great for memes, jokes, dubbing, and creative experiments (statues, paintings, pets, game characters, “AI Seinfeld”-style content, etc.).
  • Others see more serious applications: educational videos for developing countries, long-form talking-head content, political or corporate messaging, and integration into storytelling or agent frameworks.
  • Several people note the tech is fun but “creepy,” prompting reflection on the future of media and authenticity.

Model capabilities & limitations

  • Lip sync, head motion, and emotional alignment with audio are widely praised; some feel they can almost read lips.
  • Weak spots:
    • Teeth and fine facial details (partly due to ~320×320 training resolution).
    • Longer clips: quality drifts, identity changes, and “uncanny valley” behavior, especially with singing or expressive audio.
    • Last-frame “breakdown” artifacts, likely tied to how audio is padded to training-length buckets.
    • Cartoons, sketches, non-humanoid images, and some stylized portraits often fail or remain static.
    • Mouth size and expressions may vary by style/race; one user flags this as a potential bias issue.

Performance, length & technical notes

  • Public config runs at 6 fps (5× slower than real time). With lower resolution and fewer diffusion steps, the team shows ~20–23 fps near-real-time generation.
  • Base training supports ~8-second clips; longer videos are built autoregressively, which accumulates errors. Public tool is capped at ~30 seconds.
  • Model is a custom diffusion transformer with a 3D VAE and rectified flow for faster denoising. Fine-tuning on specific actors is possible but requires video, not just images.

Product design, access & pricing

  • Service is currently free; no detailed pricing or open-weights plan is given.
  • No sign-up is required for the demo; this is praised versus competitors that gate content. A watermark was removed after feedback.
  • The team is non-committal about an API but notes strong interest and asks about use cases.

Ethics, legality & societal impact

  • Concerns raised about default celebrity avatars and copyrighted music in demos; questions about legality and consent.
  • Debate over parody/fair use and whether using famous likenesses is acceptable.
  • Broad agreement that realistic video generation will erode trust in video evidence; suggestions include cryptographic hashes and device-level authenticity, though considered imperfect.
  • Some worry companies prioritize profit over misuse risks; others argue the “genie is out of the bottle” and costs will drop further.