ChatGPT's image generator can be manipulated to produce violent, sexual content

Nature of the “exploit” and “spontaneity”

  • Prompt: “restore an attached image” with no image, plus language like “apologies for the content” and “no censorship,” can lead to violent/sexual images.
  • Disagreement over “spontaneously generated”: some say it’s just responding to a suggestive prompt; others say generating gore from a missing, undescribed image is qualitatively different.
  • Some report the exploit is already patched or only works intermittently; outputs appear random and context‑dependent.

Prompt Injection & Model Architecture Debate

  • One camp: prompt injection and similar attacks are inherent to LLMs; user and system prompts are inseparably “mixed,” unlike code/data separation in classic software. Therefore, guardrails can only ever be partial and adversarially bypassable.
  • Others argue this “intractable” claim is unproven; architectures or training schemes might separate “command” vs “user” channels, or use provenance-like embeddings.
  • Broader point: adversarial/test‑time attacks have long existed in ML; prompt injection is seen by some as just the LLM-flavored version.

Training Data, Latent Space, and Filtering

  • Many infer that violent/sexual imagery exists in the training data; some argue that without such data, these outputs wouldn’t appear.
  • Counterpoint: even with only milder content (e.g., PG‑13 violence, surgery photos), models can extrapolate to gore. Removing all such data may also harm overall capability.
  • Some worry that if models can regurgitate or closely imitate training images, this implicates CSAM and other highly problematic data.

Guardrails, Classifiers, and “Bugs”

  • Several are surprised that OpenAI apparently doesn’t run a basic nudity/gore classifier on outputs, given such tools exist and are lightweight.
  • Others suspect classifiers exist but have false negatives, or that this exploit sidesteps them via tool invocation or prompt rewriting.
  • Disagreement on whether this is a “vulnerability” or simply “garbage in, garbage out”; but many agree it contradicts the advertised “no violent/sexual images” behavior and thus is at least a bug in policy enforcement.

Harm, Ethics, and Expectations

  • Some see real harm: unexpected exposure to graphic imagery, especially for trauma survivors; normalization of extreme violence/sexualization in a mainstream tool.
  • Others minimize harm: liken it to searching for gore online or drawing violent art; emphasize user agency and freedom to generate disturbing content privately.
  • Debate over expectations: one side insists general‑audience tools “should never” output such images given company claims; others say complete prevention is unrealistic with probabilistic models.

Critiques of the Article and “AI Safety” Framing

  • Many find the blog post sensationalized, melodramatic, or marketing for a security product.
  • Some object to emotional framing (“shaken, in tears”) as unprofessional for a red‑teaming write‑up.
  • Others defend the discomfort as understandable, given repeated exposure to disturbing content in safety testing.

Broader Concerns: Regulation and Access

  • Some fear incidents like this will justify over‑restricting public models while powerful, less‑restricted versions remain for governments and large firms.
  • Underlying tension: desire for powerful, uncensored tools vs. pressure for strict safety, especially around sexual violence and CSAM.