LLMs are mortally terrified of exceptions

Satirical example vs real behavior

  • Many note the Python division function in the tweet is clearly satirical, but argue it exaggerates a very real LLM tendency: hyper‑defensive, cluttered code paths.
  • Others initially took the snippet literally and point out it’s logically inconsistent (e.g., conflicting NaN/None handling, impossible conditions, sign errors).

LLM “paranoia” about exceptions

  • Common experience: LLMs add excessive try/except blocks, “security‑theater” checks, and fallback values instead of letting errors surface.
  • This leads to:
    • Silent failures and misleading “success” exits.
    • Hard‑to‑read and hard‑to‑test code with many unexercised branches.
    • Overuse of logging, status enums, wrapper classes, and “future‑proofing.”
  • Several users explicitly instruct models to “fail fast” or forbid catch‑all handlers, but say models still tend to swallow exceptions.

Prompts, training, and incentives

  • Some argue the example likely came from a prompt like “handle all edge cases and be extremely safe,” so the model is doing what it was asked.
  • Others blame:
    • RLHF/RLVR tuned on passing tests: swallowing exceptions can increase passing rates without improving correctness.
    • Training data heavy on tutorials and “defensive programming” patterns, plus beginner code that over‑handles errors.
    • Non‑expert user feedback that rewards “safety” and verbosity (including comments, READMEs, emojis).

Exceptions vs return types and numerical edge cases

  • Long subthread debates:
    • Whether exceptions are needed at all vs using richer return types or checked exceptions.
    • IEEE 754 semantics for division by zero (Inf/‑Inf/NaN) vs domain‑specific handling where the standard can be “wrong enough” to fry hardware or mis‑model physics.
    • Trade‑offs between exceptions (stack traces, less clutter) and value‑based errors (visibility, type checking, but less context).

Impact and mitigations

  • Real incidents: LLM‑written code that logs and continues for every failure, producing no output but no crash.
  • Some developers maintain explicit guidelines (AGENTS.md, Claude.md) describing when to throw vs catch, trying to “re‑train” their assistants.
  • Consensus: LLMs over‑correct toward defensive coding; better reward design and clearer instructions are needed to balance safety with simplicity and debuggability.