Show HN: Infinity – Realistic AI characters that can speak
Overall impressions & use cases
- Many commenters find the talking-head results “breathtaking” and surprisingly expressive, great for memes, jokes, dubbing, and creative experiments (statues, paintings, pets, game characters, “AI Seinfeld”-style content, etc.).
- Others see more serious applications: educational videos for developing countries, long-form talking-head content, political or corporate messaging, and integration into storytelling or agent frameworks.
- Several people note the tech is fun but “creepy,” prompting reflection on the future of media and authenticity.
Model capabilities & limitations
- Lip sync, head motion, and emotional alignment with audio are widely praised; some feel they can almost read lips.
- Weak spots:
- Teeth and fine facial details (partly due to ~320×320 training resolution).
- Longer clips: quality drifts, identity changes, and “uncanny valley” behavior, especially with singing or expressive audio.
- Last-frame “breakdown” artifacts, likely tied to how audio is padded to training-length buckets.
- Cartoons, sketches, non-humanoid images, and some stylized portraits often fail or remain static.
- Mouth size and expressions may vary by style/race; one user flags this as a potential bias issue.
Performance, length & technical notes
- Public config runs at
6 fps (5× slower than real time). With lower resolution and fewer diffusion steps, the team shows ~20–23 fps near-real-time generation. - Base training supports ~8-second clips; longer videos are built autoregressively, which accumulates errors. Public tool is capped at ~30 seconds.
- Model is a custom diffusion transformer with a 3D VAE and rectified flow for faster denoising. Fine-tuning on specific actors is possible but requires video, not just images.
Product design, access & pricing
- Service is currently free; no detailed pricing or open-weights plan is given.
- No sign-up is required for the demo; this is praised versus competitors that gate content. A watermark was removed after feedback.
- The team is non-committal about an API but notes strong interest and asks about use cases.
Ethics, legality & societal impact
- Concerns raised about default celebrity avatars and copyrighted music in demos; questions about legality and consent.
- Debate over parody/fair use and whether using famous likenesses is acceptable.
- Broad agreement that realistic video generation will erode trust in video evidence; suggestions include cryptographic hashes and device-level authenticity, though considered imperfect.
- Some worry companies prioritize profit over misuse risks; others argue the “genie is out of the bottle” and costs will drop further.