ChatGPT's image generator can be manipulated to produce violent, sexual content
Nature of the “exploit” and “spontaneity”
- Prompt: “restore an attached image” with no image, plus language like “apologies for the content” and “no censorship,” can lead to violent/sexual images.
- Disagreement over “spontaneously generated”: some say it’s just responding to a suggestive prompt; others say generating gore from a missing, undescribed image is qualitatively different.
- Some report the exploit is already patched or only works intermittently; outputs appear random and context‑dependent.
Prompt Injection & Model Architecture Debate
- One camp: prompt injection and similar attacks are inherent to LLMs; user and system prompts are inseparably “mixed,” unlike code/data separation in classic software. Therefore, guardrails can only ever be partial and adversarially bypassable.
- Others argue this “intractable” claim is unproven; architectures or training schemes might separate “command” vs “user” channels, or use provenance-like embeddings.
- Broader point: adversarial/test‑time attacks have long existed in ML; prompt injection is seen by some as just the LLM-flavored version.
Training Data, Latent Space, and Filtering
- Many infer that violent/sexual imagery exists in the training data; some argue that without such data, these outputs wouldn’t appear.
- Counterpoint: even with only milder content (e.g., PG‑13 violence, surgery photos), models can extrapolate to gore. Removing all such data may also harm overall capability.
- Some worry that if models can regurgitate or closely imitate training images, this implicates CSAM and other highly problematic data.
Guardrails, Classifiers, and “Bugs”
- Several are surprised that OpenAI apparently doesn’t run a basic nudity/gore classifier on outputs, given such tools exist and are lightweight.
- Others suspect classifiers exist but have false negatives, or that this exploit sidesteps them via tool invocation or prompt rewriting.
- Disagreement on whether this is a “vulnerability” or simply “garbage in, garbage out”; but many agree it contradicts the advertised “no violent/sexual images” behavior and thus is at least a bug in policy enforcement.
Harm, Ethics, and Expectations
- Some see real harm: unexpected exposure to graphic imagery, especially for trauma survivors; normalization of extreme violence/sexualization in a mainstream tool.
- Others minimize harm: liken it to searching for gore online or drawing violent art; emphasize user agency and freedom to generate disturbing content privately.
- Debate over expectations: one side insists general‑audience tools “should never” output such images given company claims; others say complete prevention is unrealistic with probabilistic models.
Critiques of the Article and “AI Safety” Framing
- Many find the blog post sensationalized, melodramatic, or marketing for a security product.
- Some object to emotional framing (“shaken, in tears”) as unprofessional for a red‑teaming write‑up.
- Others defend the discomfort as understandable, given repeated exposure to disturbing content in safety testing.
Broader Concerns: Regulation and Access
- Some fear incidents like this will justify over‑restricting public models while powerful, less‑restricted versions remain for governments and large firms.
- Underlying tension: desire for powerful, uncensored tools vs. pressure for strict safety, especially around sexual violence and CSAM.