ChatGPT just (accidentally) shared all of its secret rules

Image and other hard/soft restrictions

  • Debate over whether limiting image count via natural-language instructions is “stupid” for a fuzzy model.
  • Some argue hard limits at the API/tool level are necessary for resource control, with prompts just reducing user-facing errors.
  • Others note that telling the model “nicely” not to do things is not a true hard restriction and is vulnerable to adversarial prompts.
  • There’s speculation about intermediate layers (function-calling / image APIs) where true caps could be enforced, but details are unclear.

Authenticity and significance of the leaked prompt

  • Several users report being able to reproduce large chunks of the system instructions using simple queries, suggesting it’s not random hallucination.
  • Others point out we still can’t be 100% sure it’s the exact internal wrapper prompt, but likely very close.
  • Some think the article is overblown clickbait, since system prompts having “leaked” before is well-known; others stress it’s still an unintended disclosure of proprietary config.

Prompting as behavior control

  • Many are struck by the need to shout, repeat, and over-specify rules (“I REPEAT”, all caps) to get reliable behavior, likening the model to a “smart toddler.”
  • Some find this sad or creepy; others see it as a fun new programming paradigm.
  • Reports that similar emphatic prompting is required in user projects to force pure SQL, avoid markdown, etc.
  • Observations that models can sometimes be talked out of refusals simply by insisting.

Seaborn vs matplotlib

  • Users notice the prompt’s explicit anti-seaborn rule and that the model sometimes refuses seaborn even when asked.
  • One explanation offered: the execution environment likely only has matplotlib installed, so seaborn would fail.
  • Another comment notes the LLM’s own justification for avoiding seaborn is likely confabulated, not the real reason.

Why rules are text prompts instead of “compiled in”

  • Reasons given: prompts are cheaper and easier to change than retraining; they allow reuse of the same base model in different products.
  • Some argue the system instructions are probably passed as vector embeddings rather than raw text each time.
  • A technical sub-thread disputes how much you can cache or reuse such vectors, with disagreement on how transformers handle context.

Alignment, censorship, and jailbreaks

  • Users note the DALLE-related rules banning realistic public-figure images and copyrighted styles, plus workarounds using stylistic adjectives.
  • General behavioral and “no controversial topics” filters are believed to be mostly from RLHF and additional training, not just prompts.
  • System prompts are seen as easier to jailbreak than RLHF; examples include base64/ROT13 encodings to evade simple output checks.
  • Some ask whether it’s possible to fully “neutralize” these safety instructions; replies say the deeper RLHF layer makes that difficult.