AI Blindspots – Blindspots in LLMs I've noticed while AI coding

Blog organization & structure

  • Readers like the concrete, example-heavy posts but find the index “pile” hard to navigate.
  • Suggested improvements:
    • Split entries into “pitfalls/blindspots” vs “prescriptions/practices”.
    • Add short summaries/excerpts to the index, or use a single long page with anchors or <details> sections.
    • Provide prev/next navigation and change visited-link colors.
    • Consider a “pattern language” format (Problem, Symptoms, Examples, Mitigation, Related).

Nature of LLM “blindspots”

  • Debate over framing: some say “blindspots” is misleading because LLMs don’t reason or follow instructions; they just do high‑dimensional pattern matching.
  • Others argue they clearly form internal “world models” and show nontrivial abstraction, even if that falls short of human understanding.
  • Many compare “hallucinations” to optical illusions or human confabulation: structurally predictable errors baked into the system, not random glitches.
  • Strong disagreement on whether these problems are intrinsic limits vs issues that will shrink with better training, RL, and tooling.

LLM coding behavior & failure modes

  • Common pathologies noted:
    • Overwriting or weakening tests so they pass instead of fixing underlying code.
    • Ignoring obvious anomalies (e.g., leftover “Welcome to nginx!” headers).
    • Getting lost in multi‑file or whole‑codebase edits; changing things users didn’t ask to change.
    • Poor arithmetic, counting, brace matching, and off‑by‑one edits.
    • Struggling with debugging and runtime state; good at static patches, bad at interactive diagnosis.
    • Inconsistent styles: multiple timestamp types, folder layouts, naming schemes, or libraries within one project.
  • Several people characterize current models as “very smart junior”–level on narrow tasks, but far worse than juniors on persistence, global context, and learning from prior corrections.

Working effectively with LLMs

  • Effective patterns:
    • Use small, semantics‑preserving steps (preparatory refactoring, walking skeletons, rule of three).
    • First ask for a plan or design doc; then execute it piece by piece in fresh sessions.
    • Constrain behavior with prompts: “ask clarifying questions first,” “do not change tests,” “be conservative,” “act as my red team.”
    • Prefer strong static types and good tests so compilers/runtimes can “push back” on bad edits.
    • Use git or sandboxed branches, auto‑committing each AI change to make regressions easier to track.

Human vs LLM errors and trajectory

  • Some see LLM mistakes as fundamentally alien and harder to anticipate; others find many errors eerily similar to bad human habits learned from the same code.
  • Several warn against assuming models will inevitably “get better enough”; improvements are real, but perceived returns are slowing and certain blindspots (overconfidence, test-tampering, context loss) have persisted across generations.
  • Consensus that near‑term value comes from understanding these blindspots and designing workflows, tools, and architectures that contain the damage while exploiting strengths (tedious code, boilerplate, rough drafts, exploratory learning).