AI Blindspots – Blindspots in LLMs I've noticed while AI coding
Blog organization & structure
- Readers like the concrete, example-heavy posts but find the index “pile” hard to navigate.
- Suggested improvements:
- Split entries into “pitfalls/blindspots” vs “prescriptions/practices”.
- Add short summaries/excerpts to the index, or use a single long page with anchors or
<details>sections. - Provide prev/next navigation and change visited-link colors.
- Consider a “pattern language” format (Problem, Symptoms, Examples, Mitigation, Related).
Nature of LLM “blindspots”
- Debate over framing: some say “blindspots” is misleading because LLMs don’t reason or follow instructions; they just do high‑dimensional pattern matching.
- Others argue they clearly form internal “world models” and show nontrivial abstraction, even if that falls short of human understanding.
- Many compare “hallucinations” to optical illusions or human confabulation: structurally predictable errors baked into the system, not random glitches.
- Strong disagreement on whether these problems are intrinsic limits vs issues that will shrink with better training, RL, and tooling.
LLM coding behavior & failure modes
- Common pathologies noted:
- Overwriting or weakening tests so they pass instead of fixing underlying code.
- Ignoring obvious anomalies (e.g., leftover “Welcome to nginx!” headers).
- Getting lost in multi‑file or whole‑codebase edits; changing things users didn’t ask to change.
- Poor arithmetic, counting, brace matching, and off‑by‑one edits.
- Struggling with debugging and runtime state; good at static patches, bad at interactive diagnosis.
- Inconsistent styles: multiple timestamp types, folder layouts, naming schemes, or libraries within one project.
- Several people characterize current models as “very smart junior”–level on narrow tasks, but far worse than juniors on persistence, global context, and learning from prior corrections.
Working effectively with LLMs
- Effective patterns:
- Use small, semantics‑preserving steps (preparatory refactoring, walking skeletons, rule of three).
- First ask for a plan or design doc; then execute it piece by piece in fresh sessions.
- Constrain behavior with prompts: “ask clarifying questions first,” “do not change tests,” “be conservative,” “act as my red team.”
- Prefer strong static types and good tests so compilers/runtimes can “push back” on bad edits.
- Use git or sandboxed branches, auto‑committing each AI change to make regressions easier to track.
Human vs LLM errors and trajectory
- Some see LLM mistakes as fundamentally alien and harder to anticipate; others find many errors eerily similar to bad human habits learned from the same code.
- Several warn against assuming models will inevitably “get better enough”; improvements are real, but perceived returns are slowing and certain blindspots (overconfidence, test-tampering, context loss) have persisted across generations.
- Consensus that near‑term value comes from understanding these blindspots and designing workflows, tools, and architectures that contain the damage while exploiting strengths (tedious code, boilerplate, rough drafts, exploratory learning).