2024-09-06

Show HN: Infinity – Realistic AI characters that can speak

Overall impressions & use cases

Many commenters find the talking-head results “breathtaking” and surprisingly expressive, great for memes, jokes, dubbing, and creative experiments (statues, paintings, pets, game characters, “AI Seinfeld”-style content, etc.).
Others see more serious applications: educational videos for developing countries, long-form talking-head content, political or corporate messaging, and integration into storytelling or agent frameworks.
Several people note the tech is fun but “creepy,” prompting reflection on the future of media and authenticity.

Model capabilities & limitations

Lip sync, head motion, and emotional alignment with audio are widely praised; some feel they can almost read lips.
Weak spots:
- Teeth and fine facial details (partly due to ~320×320 training resolution).
- Longer clips: quality drifts, identity changes, and “uncanny valley” behavior, especially with singing or expressive audio.
- Last-frame “breakdown” artifacts, likely tied to how audio is padded to training-length buckets.
- Cartoons, sketches, non-humanoid images, and some stylized portraits often fail or remain static.
- Mouth size and expressions may vary by style/race; one user flags this as a potential bias issue.

Performance, length & technical notes

Public config runs at ~~6 fps (~~5× slower than real time). With lower resolution and fewer diffusion steps, the team shows ~20–23 fps near-real-time generation.
Base training supports ~8-second clips; longer videos are built autoregressively, which accumulates errors. Public tool is capped at ~30 seconds.
Model is a custom diffusion transformer with a 3D VAE and rectified flow for faster denoising. Fine-tuning on specific actors is possible but requires video, not just images.

Product design, access & pricing

Service is currently free; no detailed pricing or open-weights plan is given.
No sign-up is required for the demo; this is praised versus competitors that gate content. A watermark was removed after feedback.
The team is non-committal about an API but notes strong interest and asks about use cases.

Ethics, legality & societal impact

Concerns raised about default celebrity avatars and copyrighted music in demos; questions about legality and consent.
Debate over parody/fair use and whether using famous likenesses is acceptable.
Broad agreement that realistic video generation will erode trust in video evidence; suggestions include cryptographic hashes and device-level authenticity, though considered imperfect.
Some worry companies prioritize profit over misuse risks; others argue the “genie is out of the bottle” and costs will drop further.

Related topics