GPT-4o

Model capabilities & demos

  • GPT-4o is a new “flagship” multimodal model: text + images now via API, with end‑to‑end audio and video promised to a small set of partners soon.
  • Key claims: 2× faster and ~50% cheaper than GPT‑4 Turbo, with 5× higher rate limits; still 128k context.
  • Live demos highlighted: real‑time voice conversation with interruptions, video-based understanding (e.g., reading equations, commenting on scenes), translation, breathing/voice emotion cues, simple tutoring and coding help.
  • Some viewers found the demo “best ever” and close to sci‑fi (“Her”, universal translator); others saw it as evolutionary, not revolutionary.

Voice, emotion, and UX reactions

  • Audio2audio (no explicit text TTS layer) is widely seen as a big leap: natural intonation, emotions, sarcasm, singing, responsive interruption.
  • Many dislike the default “over‑enthusiastic podcast host” personality and want concise, neutral or “stoic” modes; some already use custom instructions to reduce verbosity.
  • Strong uncanny‑valley reactions: laughter, flirting tone, and “AI girlfriend” implications made some users uneasy.

Performance, cost, and API details

  • New tokenizer (200k vocab) significantly reduces token counts, especially for non‑English languages (e.g., big gains for Gujarati, Japanese).
  • Developers report 4o is noticeably faster than 4‑Turbo, sometimes approaching 3.5‑level latency, but not as fast as some specialized hosts (e.g., Groq+Llama3).
  • As of the discussion, API supports text+vision; audio/video streaming and image output are not yet exposed broadly.

Model quality, reasoning & benchmarks

  • Many say 4o is “not much smarter” than GPT‑4; described as between 3.5 and 4 Turbo for reasoning, but better at “not being lazy” and goal‑seeking across tool calls.
  • Some independent tests: modest improvement over 4‑Turbo on certain programming and reasoning tasks; big jump on one chess‑puzzle benchmark; but no clear GPT‑3→4‑style leap.
  • Multiple reports of increased hallucinations vs gpt‑4‑0125‑preview; some users are sticking with older 4‑Turbo for critical work.
  • Debate over scaling limits: some think reasoning has plateaued due to data constraints; others argue scaling and multimodal training still have runway.

Free vs paid, business model

  • GPT‑4o text+vision is being rolled out to free users with lower message limits; Plus gets ~5× higher limits and likely earlier access to future “frontier” models.
  • Many paid users question what they now get for $20–25/month beyond limits and early access; some consider canceling until GPT‑5 or a clearly superior model ships.
  • Others speculate this move signals either confidence in a much better upcoming model or competitive pressure from open models (e.g., Llama 3) and other providers.

Privacy, safety, and misuse

  • Real‑time screen‑sharing and continuous camera use are seen as both powerful and a “privacy nightmare.”
  • Deepfake and voice‑cloning concerns raised; current plan is preset voices only, no arbitrary custom cloning.
  • Obvious misuse vectors: romance scams, call‑center fraud, mass propaganda; many worry about older or vulnerable users.
  • Some expect regulators and platform policies to heavily constrain custom voices and agentic behaviors.

Accessibility and positive use cases

  • Strong excitement around applications for blind/low‑vision and DeafBlind users (e.g., Be My Eyes), navigation help, reading environments, playing instruments with guidance.
  • Real‑time translation + natural voice seen as potentially transformative for language learning and cross‑lingual collaboration, though current pronunciation/tones can be poor in some languages.

Broader implications & skepticism

  • Split sentiment: some see this as clear progress toward conversational AGI; others say it’s sophisticated “stochastic parroting” with no true world model.
  • Concerns about economic impact (job displacement, surveillance, enshittification via ad deals) and about training future models on AI‑generated, private conversational data.
  • Meta‑discussion on hype: many note advances are stunning, yet core reasoning hasn’t leapt; some predict an “AI crash” if expectations aren’t reset.