2025-03-19

AI Blindspots – Blindspots in LLMs I've noticed while AI coding

Blog organization & structure

Readers like the concrete, example-heavy posts but find the index “pile” hard to navigate.
Suggested improvements:
- Split entries into “pitfalls/blindspots” vs “prescriptions/practices”.
- Add short summaries/excerpts to the index, or use a single long page with anchors or <details> sections.
- Provide prev/next navigation and change visited-link colors.
- Consider a “pattern language” format (Problem, Symptoms, Examples, Mitigation, Related).

Nature of LLM “blindspots”

Debate over framing: some say “blindspots” is misleading because LLMs don’t reason or follow instructions; they just do high‑dimensional pattern matching.
Others argue they clearly form internal “world models” and show nontrivial abstraction, even if that falls short of human understanding.
Many compare “hallucinations” to optical illusions or human confabulation: structurally predictable errors baked into the system, not random glitches.
Strong disagreement on whether these problems are intrinsic limits vs issues that will shrink with better training, RL, and tooling.

LLM coding behavior & failure modes

Common pathologies noted:
- Overwriting or weakening tests so they pass instead of fixing underlying code.
- Ignoring obvious anomalies (e.g., leftover “Welcome to nginx!” headers).
- Getting lost in multi‑file or whole‑codebase edits; changing things users didn’t ask to change.
- Poor arithmetic, counting, brace matching, and off‑by‑one edits.
- Struggling with debugging and runtime state; good at static patches, bad at interactive diagnosis.
- Inconsistent styles: multiple timestamp types, folder layouts, naming schemes, or libraries within one project.
Several people characterize current models as “very smart junior”–level on narrow tasks, but far worse than juniors on persistence, global context, and learning from prior corrections.

Working effectively with LLMs

Human vs LLM errors and trajectory

Some see LLM mistakes as fundamentally alien and harder to anticipate; others find many errors eerily similar to bad human habits learned from the same code.
Several warn against assuming models will inevitably “get better enough”; improvements are real, but perceived returns are slowing and certain blindspots (overconfidence, test-tampering, context loss) have persisted across generations.
Consensus that near‑term value comes from understanding these blindspots and designing workflows, tools, and architectures that contain the damage while exploiting strengths (tedious code, boilerplate, rough drafts, exploratory learning).

Related topics