2026-01-14

Claude is good at assembling blocks, but still falls apart at creating them

Perceived Capability: “Good Junior Dev,” Not Senior

Many compare Claude and similar tools to a competent junior developer: fast at localized tasks, but needing close review and architectural guidance.
Some report shipping “most of their code” with Claude (Opus 4.5), including production systems, with clear gains in velocity and bug-fixing (e.g., generating PRs from Datadog errors).
Others argue even a good human junior is still more capable, especially at handling ambiguity and understanding systems.

Abstraction, Architecture, and API Design

Strong consensus that LLMs excel at filling in details (implementing features, wiring code, refactoring with specific patterns) but struggle to invent good abstractions, APIs, or module boundaries without human direction.
Examples: inefficient data copying instead of rearchitecting for pointer-style sharing; poor React component design; Python code with nested ifs, mis-scoped imports, and swallowed exceptions.
Several note this mirrors the median human developer: most people are bad at API design and high-level abstraction anyway.

“Just Search” vs Compression and Novelty

One camp frames LLMs as “really good search”: semantic retrieval over training data + user code, recombining known patterns. This mental model helps set realistic expectations: great at mapping, translating, modifying; weak at truly “from scratch” creation.
Others call “just search” reductive, likening it to calling CPUs “just transistor states.” They emphasize:
- LLMs act as lossy probabilistic compressors of human knowledge, synthesizing and recombining concepts.
- Internal “circuits” and conceptual relationships can enable interpolation, limited extrapolation, and emergent reasoning-like behavior.
Debate over whether outputs can ever be genuinely novel vs only “novel to the user” continues, with no consensus.

Reliability, Hallucinations, and Verification

Experiences are highly mixed: some see large quality improvements and fewer hallucinations over time; others still hit made-up APIs, types, or misleading solutions that waste time.
Simple harnesses (e.g., static typing, linting, tests, formal methods) can catch many hallucinations in code, but most domains lack such verifiers.
A common pattern: Claude often chooses minimal or local edits, sometimes suboptimal globally; attempts to correct via CLAUDE.md–style instructions have only partial success.

Workflow, Learning, and Productivity

Many feel “unlocked”: able to try more ideas, run more experiments, and explore design alternatives quickly, similar to the shift from film to digital.
Others worry this leads to shallow thinking: quick prototyping replacing deeper internal reasoning and design “marination.”
On learning: some say LLMs accelerate conceptual understanding by enabling more experiments; others feel they learn little unless they deeply review and debug the generated code themselves.

Future Trajectory and Limits

One side sees a moving frontier: LLMs progressed from small functions to multi-file subsystems; therefore, higher-level abstraction and multi-service design may improve similarly in 6–12 months.
Another side argues there are hard ceilings:
- Persistent failures at abstraction and hallucinations despite scale-ups.
- Training on “random internet code” bakes in bad patterns; prompts can’t fully fix that.
No agreement on whether we’re nearing a plateau or just mid-curve.

Organizational and Economic Implications

Speculation ranges from “mid-level-equivalent AI would be revolutionary” to tongue-in-cheek visions of a CEO plus fleets of agents (and even questioning whether a CEO would still be needed).
Some foresee boards and owners still wanting a human “ringable neck” and guardian against misaligned AI-provider incentives.
Broad concern that another path to the middle class (junior dev work) is narrowing, even if senior design and oversight roles remain human for the foreseeable future.

Related topics