2024-07-05

ChatGPT just (accidentally) shared all of its secret rules

Image and other hard/soft restrictions

Debate over whether limiting image count via natural-language instructions is “stupid” for a fuzzy model.
Some argue hard limits at the API/tool level are necessary for resource control, with prompts just reducing user-facing errors.
Others note that telling the model “nicely” not to do things is not a true hard restriction and is vulnerable to adversarial prompts.
There’s speculation about intermediate layers (function-calling / image APIs) where true caps could be enforced, but details are unclear.

Authenticity and significance of the leaked prompt

Several users report being able to reproduce large chunks of the system instructions using simple queries, suggesting it’s not random hallucination.
Others point out we still can’t be 100% sure it’s the exact internal wrapper prompt, but likely very close.
Some think the article is overblown clickbait, since system prompts having “leaked” before is well-known; others stress it’s still an unintended disclosure of proprietary config.

Prompting as behavior control

Many are struck by the need to shout, repeat, and over-specify rules (“I REPEAT”, all caps) to get reliable behavior, likening the model to a “smart toddler.”
Some find this sad or creepy; others see it as a fun new programming paradigm.
Reports that similar emphatic prompting is required in user projects to force pure SQL, avoid markdown, etc.
Observations that models can sometimes be talked out of refusals simply by insisting.

Seaborn vs matplotlib

Users notice the prompt’s explicit anti-seaborn rule and that the model sometimes refuses seaborn even when asked.
One explanation offered: the execution environment likely only has matplotlib installed, so seaborn would fail.
Another comment notes the LLM’s own justification for avoiding seaborn is likely confabulated, not the real reason.

Why rules are text prompts instead of “compiled in”

Reasons given: prompts are cheaper and easier to change than retraining; they allow reuse of the same base model in different products.
Some argue the system instructions are probably passed as vector embeddings rather than raw text each time.
A technical sub-thread disputes how much you can cache or reuse such vectors, with disagreement on how transformers handle context.

Alignment, censorship, and jailbreaks

Users note the DALLE-related rules banning realistic public-figure images and copyrighted styles, plus workarounds using stylistic adjectives.
General behavioral and “no controversial topics” filters are believed to be mostly from RLHF and additional training, not just prompts.
System prompts are seen as easier to jailbreak than RLHF; examples include base64/ROT13 encodings to evade simple output checks.
Some ask whether it’s possible to fully “neutralize” these safety instructions; replies say the deeper RLHF layer makes that difficult.

Related topics