No elephants: Breakthroughs in image generation

Capabilities and Perceived Breakthroughs

  • Many see GPT‑4o’s image generation as a “before/after” step: better prompt adherence, more consistent scenes, readable text, and convincing multi-step edits compared to earlier diffusion models.
  • Users highlight multimodal workflows: describing changes directly on an existing image, generating comics with consistent characters, UI mockups, marketing assets, YouTube thumbnails, and meme-style humor.
  • Some compare this favorably to Midjourney and Stable Diffusion, which were strong on aesthetics but weak at following detailed, structured prompts.

Limitations, Artifacts, and UX Friction

  • Edits often regenerate the entire image, subtly mutating unrelated elements (furniture, lighting, colors) with each pass; partial-edit tools don’t reliably confine changes.
  • Classic issues persist: distorted hands, eyes, scale errors, nonsense text, wrong numbers on gauges/watches, and odd geometry.
  • Negation (“no elephants”, “not green”) still degrades reliability; “pink elephant effect” is reduced but not gone.
  • Image generation remains slow or rate-limited for many, limiting play and iteration.

Practical Uses vs “Toy” Feeling

  • Some users report little real-world need—especially if they never used stock photos—seeing image/LLM tools as novelty, mood boards, or idea generators rather than serious production tools.
  • Others find them transformative for non-artists: quickly creating game assets, internal logos, classroom or hobby projects, kids’ games, and “good enough” website illustrations.
  • Compared to stock libraries, AI shines for highly specific or obscure concepts (e.g., “squirrels doing math in high school”), but still often fails when requirements are concrete and exacting.

Impact on Creative Labor and Copyright

  • Heated disagreement over whether using these tools “actively harms creative labor” or simply replaces work that was never going to hire an artist in the first place (e.g., incidental blog/slide images).
  • Fears that distinctive studio styles (e.g., anime/Ghibli) become trivial to imitate, devaluing decades of craft and encouraging AI “slop” over new visual languages.
  • Others argue style shouldn’t be protectable, that imitation has always existed, and that expanding copyright would mostly empower large companies, not small artists.
  • Broader copyright/IP debate: calls for shorter terms or revenue-based limits; skepticism that current law meaningfully protects most artists; speculation that regulation (GDPR/AI‑Act style) could eventually constrain AI training and use.

Content Pollution and Social Trajectory

  • Observations that YouTube, LinkedIn, and even local restaurants are already filling up with low-effort AI imagery (garbled menus, uncanny decor, template infographics).
  • Some experience a growing “ick” response to obviously AI-generated visuals, even while privately finding them useful for thinking or play.
  • Expectation that certain market segments (thumbnails, stock-like illustration, cheap animation inbetweening) will shift heavily to AI, while higher-end or more intentional work may resist.

Architecture and Open Questions

  • Curiosity about how “image tokens” work in autoregressive models; speculation that OpenAI uses a VAR-like, multi-scale token approach, possibly with additional agentic prompt-processing.
  • Recognition that open models and local equivalents still lag in multimodal integration and controllable editing, even as diffusion-based systems continue to improve.