OpenAI releases image generation in the API
Pricing, Value, and Performance
- Many see pricing as high: medium 1024×1024 around $0.04–0.07, high quality ~ $0.16–0.25, with 10–20s latency. Several say this is too expensive for high‑volume or consumer products, but acceptable for “get it right first try” workflows.
- Some confusion over pricing (per-image vs per-token) gets clarified using OpenAI’s docs.
- Comparisons to Imagen, Flux, Midjourney, SD: for pure “pretty picture” t2i, cheaper diffusion models often win on aesthetics and cost; GPT-image-1 is seen as differentiated by control & prompt adherence, not raw beauty.
Model Capabilities vs Diffusion
- Strong praise for:
- Prompt adherence and fine detail (including complex constraints, text in image, multi-reference style transfer).
- Integrated multimodal flow (LLM reasoning + image generation + editing in one loop).
- Image editing, restyling, and “graphics workflow engine” type tasks (e.g., ad comps, complex composites, reference-based editing).
- Critiques:
- Some tasks still fail (e.g., specific clock times, left-handed writing, exact likeness of a real person).
- Limited controllability vs diffusion pipelines with LoRAs, ControlNet, ComfyUI graphs.
- Lower perceived quality at “medium” vs top diffusion models.
Architecture and Ecosystem
- Multiple commenters note it’s an autoregressive / hybrid (transformer + diffusion-like) system embedded in GPT‑4o, not a standalone diffusion model.
- Some argue this architecture is a major shift, possibly building a moat that smaller/open-source diffusion efforts can’t match.
- Others think open-source and alternative providers (e.g., Google’s Gemini image models) will catch up.
Moderation, Verification, and Access Tiers
- gpt-image-1 requires organization verification (including ID/biometric checks for some), which several find off-putting.
- Default content filters similar to ChatGPT; API exposes
moderation: auto|low. Even “low” still blocks many celebrities, copyrighted characters, weapons, etc. - Claims (disputed but detailed) that defense contractors have less-moderated tiers, used for synthetic training data (e.g., military vehicles, CV datasets).
APIs, UX, and Developer Friction
- Complaints about:
- Needing verification plus prepaid credits just to try playground.
- Credits expiring after a year.
- Inconsistent image API design (different endpoints, content-types, response formats).
- Some surprised that long-running image generation is exposed as a single blocking call rather than async job polling.
Use Cases and Products
- Suggested applications: marketing/ads, personalized storybooks, AI icons libraries, headshot enhancement, education content, 2D game sprites, interior design, fashion, and “agentic” workflows.
- Debate over whether multi-modal generality makes specialized products obsolete; many argue UX, curation, and prebuilt prompts still add value.
Ethics, Culture, and Backlash
- Dismissive comments about “AI slop,” environmental cost, and enshittification fears.
- Concerns that centralized, moderated APIs give vendors too much control over what can be generated.