2025-11-13

Nano Banana can be prompt engineered for nuanced AI image generation

Model capabilities and limitations

Many commenters are impressed by Nano Banana’s fidelity: good prompt adherence, strong HTML-to-screenshot rendering, maintaining scene geometry in edits, and preserving fine details thanks to low spatial scaling / pixel-space behavior.
Others report persistent failures: random additions (e.g., fireplaces, garages) despite “do not change” instructions, trouble with simple geometry (irregular polygons), and difficulty handling multi-constraint scenes (shark/ surfer/ seal/ boat composition).
Spatial reasoning is a recurring weak spot: confusion about left/right relative to subject vs viewer, trouble with up/down, rotation, and “upside‑down” requests. Depth‑of‑field control and removing reflections are also unreliable.

Editing, masks, and control

Several note that unlike many models, Nano Banana handles masked edits relatively well, often preserving lighting, texture, and sharpness.
Others still see pervasive small changes in “unchanged” areas on image diff and find once a session goes off‑track, it’s hard to recover without starting fresh.
Users hack around the lack of native bounding boxes by drawing colored boxes on the image and referencing them in the prompt, sometimes with a second LLM to rewrite more precise edit prompts.

Style transfer and text rendering

The article’s claim that Nano Banana is “terrible at style transfer” is contested. Some find it uniquely good at turning 3D renders, drawings, or engravings into plausible photos while preserving structure.
However, it struggles with explicit “copy this artist/style” transfers and cannot generalize well from arbitrary style reference images; even simple “Starry Night” examples fall short.
Text in images remains error‑prone. Workarounds include supplying a screenshot of correctly spelled text and asking the model to copy it.

Prompt engineering and tooling

Thread debates whether “prompt engineering” is real skill or buzzword. Defenders point to the difficulty of getting small models to follow precise, low‑token specs, and to techniques like multi‑layer prompts, session management, and generator–critic loops.
Others mock the “engineer” title and see it as coping for lack of traditional creative or technical skills.
Several share workflows: Python/CLI wrappers around the API, LLMs that auto‑rewrite prompts into multiple variants, pipelines for comics and storyboards, and chaining Gemini 2.5 (for rich prompts) into Nano Banana (for rendering).

Ethics, watermarks, and openness

A client‑side trick to block Google’s visible watermark is described; some see this as dangerous, others note the visible mark was always trivially removable and that an invisible watermark likely remains.
There’s enthusiasm for open‑weight editing models (e.g., Qwen‑Edit) versus closed US models, with speculation about distilling Nano Banana via (image, instruction → completion) tuples.
NSFW generation is acknowledged as possible; one commenter questions why sharing such outputs is treated as obviously off‑limits.

Related topics