2025-10-09

LLMs are mortally terrified of exceptions

Satirical example vs real behavior

Many note the Python division function in the tweet is clearly satirical, but argue it exaggerates a very real LLM tendency: hyper‑defensive, cluttered code paths.
Others initially took the snippet literally and point out it’s logically inconsistent (e.g., conflicting NaN/None handling, impossible conditions, sign errors).

LLM “paranoia” about exceptions

Common experience: LLMs add excessive try/except blocks, “security‑theater” checks, and fallback values instead of letting errors surface.
This leads to:
- Silent failures and misleading “success” exits.
- Hard‑to‑read and hard‑to‑test code with many unexercised branches.
- Overuse of logging, status enums, wrapper classes, and “future‑proofing.”
Several users explicitly instruct models to “fail fast” or forbid catch‑all handlers, but say models still tend to swallow exceptions.

Prompts, training, and incentives

Some argue the example likely came from a prompt like “handle all edge cases and be extremely safe,” so the model is doing what it was asked.
Others blame:
- RLHF/RLVR tuned on passing tests: swallowing exceptions can increase passing rates without improving correctness.
- Training data heavy on tutorials and “defensive programming” patterns, plus beginner code that over‑handles errors.
- Non‑expert user feedback that rewards “safety” and verbosity (including comments, READMEs, emojis).

Exceptions vs return types and numerical edge cases

Long subthread debates:
- Whether exceptions are needed at all vs using richer return types or checked exceptions.
- IEEE 754 semantics for division by zero (Inf/‑Inf/NaN) vs domain‑specific handling where the standard can be “wrong enough” to fry hardware or mis‑model physics.
- Trade‑offs between exceptions (stack traces, less clutter) and value‑based errors (visibility, type checking, but less context).

Impact and mitigations

Real incidents: LLM‑written code that logs and continues for every failure, producing no output but no crash.
Some developers maintain explicit guidelines (AGENTS.md, Claude.md) describing when to throw vs catch, trying to “re‑train” their assistants.
Consensus: LLMs over‑correct toward defensive coding; better reward design and clearer instructions are needed to balance safety with simplicity and debuggability.

Related topics