OpenAI releases image generation in the API

Pricing, Value, and Performance

  • Many see pricing as high: medium 1024×1024 around $0.04–0.07, high quality ~ $0.16–0.25, with 10–20s latency. Several say this is too expensive for high‑volume or consumer products, but acceptable for “get it right first try” workflows.
  • Some confusion over pricing (per-image vs per-token) gets clarified using OpenAI’s docs.
  • Comparisons to Imagen, Flux, Midjourney, SD: for pure “pretty picture” t2i, cheaper diffusion models often win on aesthetics and cost; GPT-image-1 is seen as differentiated by control & prompt adherence, not raw beauty.

Model Capabilities vs Diffusion

  • Strong praise for:
    • Prompt adherence and fine detail (including complex constraints, text in image, multi-reference style transfer).
    • Integrated multimodal flow (LLM reasoning + image generation + editing in one loop).
    • Image editing, restyling, and “graphics workflow engine” type tasks (e.g., ad comps, complex composites, reference-based editing).
  • Critiques:
    • Some tasks still fail (e.g., specific clock times, left-handed writing, exact likeness of a real person).
    • Limited controllability vs diffusion pipelines with LoRAs, ControlNet, ComfyUI graphs.
    • Lower perceived quality at “medium” vs top diffusion models.

Architecture and Ecosystem

  • Multiple commenters note it’s an autoregressive / hybrid (transformer + diffusion-like) system embedded in GPT‑4o, not a standalone diffusion model.
  • Some argue this architecture is a major shift, possibly building a moat that smaller/open-source diffusion efforts can’t match.
  • Others think open-source and alternative providers (e.g., Google’s Gemini image models) will catch up.

Moderation, Verification, and Access Tiers

  • gpt-image-1 requires organization verification (including ID/biometric checks for some), which several find off-putting.
  • Default content filters similar to ChatGPT; API exposes moderation: auto|low. Even “low” still blocks many celebrities, copyrighted characters, weapons, etc.
  • Claims (disputed but detailed) that defense contractors have less-moderated tiers, used for synthetic training data (e.g., military vehicles, CV datasets).

APIs, UX, and Developer Friction

  • Complaints about:
    • Needing verification plus prepaid credits just to try playground.
    • Credits expiring after a year.
    • Inconsistent image API design (different endpoints, content-types, response formats).
  • Some surprised that long-running image generation is exposed as a single blocking call rather than async job polling.

Use Cases and Products

  • Suggested applications: marketing/ads, personalized storybooks, AI icons libraries, headshot enhancement, education content, 2D game sprites, interior design, fashion, and “agentic” workflows.
  • Debate over whether multi-modal generality makes specialized products obsolete; many argue UX, curation, and prebuilt prompts still add value.

Ethics, Culture, and Backlash

  • Dismissive comments about “AI slop,” environmental cost, and enshittification fears.
  • Concerns that centralized, moderated APIs give vendors too much control over what can be generated.