GPT-4o
Model capabilities & demos
- GPT-4o is a new “flagship” multimodal model: text + images now via API, with end‑to‑end audio and video promised to a small set of partners soon.
- Key claims: 2× faster and ~50% cheaper than GPT‑4 Turbo, with 5× higher rate limits; still 128k context.
- Live demos highlighted: real‑time voice conversation with interruptions, video-based understanding (e.g., reading equations, commenting on scenes), translation, breathing/voice emotion cues, simple tutoring and coding help.
- Some viewers found the demo “best ever” and close to sci‑fi (“Her”, universal translator); others saw it as evolutionary, not revolutionary.
Voice, emotion, and UX reactions
- Audio2audio (no explicit text TTS layer) is widely seen as a big leap: natural intonation, emotions, sarcasm, singing, responsive interruption.
- Many dislike the default “over‑enthusiastic podcast host” personality and want concise, neutral or “stoic” modes; some already use custom instructions to reduce verbosity.
- Strong uncanny‑valley reactions: laughter, flirting tone, and “AI girlfriend” implications made some users uneasy.
Performance, cost, and API details
- New tokenizer (200k vocab) significantly reduces token counts, especially for non‑English languages (e.g., big gains for Gujarati, Japanese).
- Developers report 4o is noticeably faster than 4‑Turbo, sometimes approaching 3.5‑level latency, but not as fast as some specialized hosts (e.g., Groq+Llama3).
- As of the discussion, API supports text+vision; audio/video streaming and image output are not yet exposed broadly.
Model quality, reasoning & benchmarks
- Many say 4o is “not much smarter” than GPT‑4; described as between 3.5 and 4 Turbo for reasoning, but better at “not being lazy” and goal‑seeking across tool calls.
- Some independent tests: modest improvement over 4‑Turbo on certain programming and reasoning tasks; big jump on one chess‑puzzle benchmark; but no clear GPT‑3→4‑style leap.
- Multiple reports of increased hallucinations vs gpt‑4‑0125‑preview; some users are sticking with older 4‑Turbo for critical work.
- Debate over scaling limits: some think reasoning has plateaued due to data constraints; others argue scaling and multimodal training still have runway.
Free vs paid, business model
- GPT‑4o text+vision is being rolled out to free users with lower message limits; Plus gets ~5× higher limits and likely earlier access to future “frontier” models.
- Many paid users question what they now get for $20–25/month beyond limits and early access; some consider canceling until GPT‑5 or a clearly superior model ships.
- Others speculate this move signals either confidence in a much better upcoming model or competitive pressure from open models (e.g., Llama 3) and other providers.
Privacy, safety, and misuse
- Real‑time screen‑sharing and continuous camera use are seen as both powerful and a “privacy nightmare.”
- Deepfake and voice‑cloning concerns raised; current plan is preset voices only, no arbitrary custom cloning.
- Obvious misuse vectors: romance scams, call‑center fraud, mass propaganda; many worry about older or vulnerable users.
- Some expect regulators and platform policies to heavily constrain custom voices and agentic behaviors.
Accessibility and positive use cases
- Strong excitement around applications for blind/low‑vision and DeafBlind users (e.g., Be My Eyes), navigation help, reading environments, playing instruments with guidance.
- Real‑time translation + natural voice seen as potentially transformative for language learning and cross‑lingual collaboration, though current pronunciation/tones can be poor in some languages.
Broader implications & skepticism
- Split sentiment: some see this as clear progress toward conversational AGI; others say it’s sophisticated “stochastic parroting” with no true world model.
- Concerns about economic impact (job displacement, surveillance, enshittification via ad deals) and about training future models on AI‑generated, private conversational data.
- Meta‑discussion on hype: many note advances are stunning, yet core reasoning hasn’t leapt; some predict an “AI crash” if expectations aren’t reset.