LLMs tell bad jokes because they avoid surprises
Surprise, probability, and training
- Many commenters like the “surprising but inevitable” framing of jokes, and connect it to LLM training minimizing perplexity (surprise) on text.
- Others push back: pretraining on next-token prediction doesn’t inherently penalize surprise at the sequence level; the “best” joke continuation could be globally likely even if some individual tokens are low probability.
- Temperature and decoding are highlighted: low temperature + safety finetuning bias toward bland, unsurprising text; but simply increasing temperature doesn’t reliably make jokes better, just weirder.
- Some argue the article conflates token-level likelihood with human-level “surprise” and over-psychologizes cross‑entropy minimization.
Safety, RLHF, and guardrails
- Several note that production models are heavily tuned for factuality and safety, which cuts off many joke modes (edgy, transgressive, or absurd).
- This tuning also encourages explicit meta-commentary (“this is a joke…”), which ruins timing and immersion.
- People suspect some “canned” jokes are hard‑wired for evaluations, and that models revert to safe, overused material without careful prompting.
Difficulty of humor & human comparison
- A recurring theme: good original jokes are extremely hard even for humans; comparing LLMs to professional comedians is an unfair benchmark.
- Comparisons are made to children’s jokes and anti‑jokes: kids and LLMs both often get the structure but miss the sharp, specific twist.
- Some say current top models can reach “junior comic / open‑mic” quality on niche prompts, with maybe 10–20% of lines landing. Others still find them flat or derivative.
Humor theory, structure, and culture
- Commenters reference incongruity theory: humor arises when a punchline forces a reinterpretation of the setup. Ambiguity and “frame shifts” (e.g., “alleged killer whale”) are central.
- Others emphasize “obviousness”: the funniest lines often state the most salient but unspoken thought, not the cleverest one. LLMs tend to be too generic and non‑committal to do this well.
- Several note cultural and linguistic differences (e.g., pun density in English vs French, haiku cutting words) as further complications for generalized joke generation.
Proposals and experiments
- Ideas include: an explicit “Surprise Mode,” searching candidate continuations for contradictions, and building humor‑specialized models.
- Many share prompt experiments (HN roasts, “Why did the sun climb a tree?”, man/dog jokes), illustrating that models can sometimes be genuinely funny but are inconsistent and often recycle known material.