2026-05-19

The last six months in LLMs in five minutes

Perceived progress in the last 6–12 months

Many commenters feel late‑2025/early‑2026 models (various 5.x and 4.x releases) were a real step up, especially for coding and math.
Others argue “inflection point” talk is mostly marketing: each new model is hyped as transformative, but practical capability is still incremental.
Several note that recent gains are strongest on verifiable tasks (code, math), likely driven by RL with verifiable rewards.

Coding agents, “vibe coding,” and harnesses

Some report they now delegate most coding to agents, acting more like architects/reviewers; claim big productivity gains, especially on web and CRUD‑style work.
Others still find agents fragile, lazy, or hallucination‑prone, particularly on games, complex architectures, or niche stacks; pure “vibe coding” often produces messy, hard‑to‑maintain code.
Strong theme: harnesses (AGENTS.md/CLAUDE.md, skills, multi‑stage pipelines: plan→design→code→test) and good test suites matter as much or more than the underlying model.
Differences between top models are described as noticeable mainly at the “edge of difficulty” and in large codebases; for many tasks they feel similar.

Capabilities vs. understanding

Multiple comments stress that LLMs excel at pattern synthesis, “code that compiles,” and debugging, but lack deep conceptual understanding or reliable documentation writing.
Benchmarks like “pelican riding a bicycle in SVG” are debated: once novel, now likely baked into training and overfitted; some see them as poor proxies for real reasoning.
Long context windows help but “smart zones” may effectively be much smaller; careful task chunking and sub‑agents are common strategies.

Jobs, roles, and quality

Reports of QA teams being cut and SWE headcount reduced; anxiety about future employability, especially for “feature factory” or low‑skill roles.
Counter‑argument: writing syntax is only part of the job; architecture, requirements, trade‑offs, and responsibility for outcomes still require humans.
Many suspect claims like “I never write code anymore” understate how much human steering, review, and debugging is still happening.

Security and broader impacts

Security researchers see a sharp uptick in vulnerability discovery, including many serious LPEs and supply‑chain issues, attributed to AI‑assisted analysis.
Debate over whether AI‑driven vuln finding will net‑improve security (faster defense) or fuel chaos (offense scales cheaper).
Outside coding, office workers widely use copilots for slide decks, emails, data summaries; some educators warned or encouraged to offload lesson prep to AI, raising quality/engagement concerns.
Commenters worry about AI‑generated media (video, stories) displacing creative work and worsening misinformation, but also note many current outputs are still obviously flawed.

Related topics