The last six months in LLMs in five minutes

Perceived progress in the last 6–12 months

  • Many commenters feel late‑2025/early‑2026 models (various 5.x and 4.x releases) were a real step up, especially for coding and math.
  • Others argue “inflection point” talk is mostly marketing: each new model is hyped as transformative, but practical capability is still incremental.
  • Several note that recent gains are strongest on verifiable tasks (code, math), likely driven by RL with verifiable rewards.

Coding agents, “vibe coding,” and harnesses

  • Some report they now delegate most coding to agents, acting more like architects/reviewers; claim big productivity gains, especially on web and CRUD‑style work.
  • Others still find agents fragile, lazy, or hallucination‑prone, particularly on games, complex architectures, or niche stacks; pure “vibe coding” often produces messy, hard‑to‑maintain code.
  • Strong theme: harnesses (AGENTS.md/CLAUDE.md, skills, multi‑stage pipelines: plan→design→code→test) and good test suites matter as much or more than the underlying model.
  • Differences between top models are described as noticeable mainly at the “edge of difficulty” and in large codebases; for many tasks they feel similar.

Capabilities vs. understanding

  • Multiple comments stress that LLMs excel at pattern synthesis, “code that compiles,” and debugging, but lack deep conceptual understanding or reliable documentation writing.
  • Benchmarks like “pelican riding a bicycle in SVG” are debated: once novel, now likely baked into training and overfitted; some see them as poor proxies for real reasoning.
  • Long context windows help but “smart zones” may effectively be much smaller; careful task chunking and sub‑agents are common strategies.

Jobs, roles, and quality

  • Reports of QA teams being cut and SWE headcount reduced; anxiety about future employability, especially for “feature factory” or low‑skill roles.
  • Counter‑argument: writing syntax is only part of the job; architecture, requirements, trade‑offs, and responsibility for outcomes still require humans.
  • Many suspect claims like “I never write code anymore” understate how much human steering, review, and debugging is still happening.

Security and broader impacts

  • Security researchers see a sharp uptick in vulnerability discovery, including many serious LPEs and supply‑chain issues, attributed to AI‑assisted analysis.
  • Debate over whether AI‑driven vuln finding will net‑improve security (faster defense) or fuel chaos (offense scales cheaper).
  • Outside coding, office workers widely use copilots for slide decks, emails, data summaries; some educators warned or encouraged to offload lesson prep to AI, raising quality/engagement concerns.
  • Commenters worry about AI‑generated media (video, stories) displacing creative work and worsening misinformation, but also note many current outputs are still obviously flawed.