Create and edit images with Gemini 2.0 in preview
Perceived Image Quality & Official Examples
- Thread sees mixed quality: some outputs “impressive,” others (e.g., polar bear mug, lamp-on-desk, table-with-missing-legs) are called embarrassingly bad for a launch blogpost.
- Co-drawing/doodle demo is viewed as a fun tech demo but visually rough; some say it looks “vibe coded.”
- Users report frequent failures on:
- Precise edits (e.g., changing specific windows to bi-fold doors, modifying clothing in a photo)
- Correct object placement and scale (lamp vs sofa, room decor, architectural proportions)
- Understanding stick-figure sketches (inflating them into unintended 3D figures).
- Some find compositing/editing weaker than OpenAI’s gpt-image-1, though others say Gemini preserves the original image better than GPT-4o when editing.
Speed vs Quality & Cost
- Strong consensus that Gemini is very fast—often ~5 seconds vs 30+ seconds for OpenAI image models.
- Several worry Google has over-optimized for speed, yielding “fast but junk” outputs that drive users back to Midjourney or others.
- Pricing: about $0.039 per image, slightly above Imagen 3, with surprise bills when prompts trigger “many illustrations” and dozens of images in one response.
Prompting, Usability & Workflows
- System is highly prompt-sensitive; small wording changes cause big quality swings.
- Conversational interfaces expose limits of users’ ability to describe mental images. Many find it hard to specify clutter, lighting, composition, or technical effects.
- Suggested strategies:
- Feed reference images and ask Gemini (or another model) to describe them “in extreme detail,” then adapt that as a prompt.
- Ramble your intent and have an LLM distill it into a precise prompt; iterate based on results.
- Chain models: one to analyze texture/layout/typography, another to rewrite into richer visual instructions, then back to Gemini for generation.
- Co-drawing’s usefulness is questioned if you must describe everything in text anyway.
Model Zoo, Availability & Comparisons
- Users complain about confusing, fast-changing Gemini variants (Flash, Flash Image Gen March/May, 2.5 Pro/Flash/Live, “IO Edition”), and want a clear capability/price matrix.
- Some benchmarking suggests Imagen 3 and OpenAI 4o still lead in aesthetic quality and prompt fidelity; Gemini’s main wins are multimodality and speed.
- Gemini 2.0 Flash image models are unavailable in parts of Europe/EMEA despite earlier access, adding to confusion.
Wider Concerns & “AI Slop”
- Google’s “product” examples are read as a push toward mass synthetic catalog images and marketing assets.
- Commenters worry about deceptive e-commerce/real-estate imagery and a coming flood of low-effort “AI slop,” with doubts about long-term consumer tolerance.