2026-01-06

Opus 4.5 is not the normal AI agent experience that I have had thus far

Production Quality, Over‑Engineering, and Maintainability

Many argue Opus 4.5 can ship “working” code but not reliably production‑grade software: limited edge‑case handling, brittle error paths, security blind spots, and convoluted architectures.
Others counter that much human‑shipped code is no better, and that for many applications “good enough” is acceptable—especially for internal tools and personal utilities.
Several note that models tend to over‑engineer and duplicate logic; effective use often requires explicit instructions (“keep it simple”, “minimal changes”) and refactoring passes.
There is broad agreement that agents are best when humans supply clear specs, constraints, and tests; open‑ended “improve anything” prompts usually produce baffling changes and code bloat.

Greenfield Apps vs Legacy, Complex Systems

Opus 4.5 is reported to excel at small–medium, greenfield projects: CLIs, CRUD apps, simple mobile/web apps, bindings, and ports—especially when tools (linters, tests, runners) are in the loop.
Performance degrades on large, messy, long‑lived codebases with complex domain logic, flaky docs, or many cross‑cutting concerns; agents can get stuck, loop, or make architecture‑breaking edits.
Planning/spec‑driven workflows, breaking work into small tasks, and using “plan modes” or markdown specs are repeatedly cited as key to getting good results.
Language and domain matter: people see strong performance in JS/TS, Python, C, Go; weaker, more error‑prone behavior in C++, Rust, low‑level graphics, and niche frameworks.

Changing Developer Roles and Job Market Anxiety

Many describe a shift from “writing code” to “guiding, specifying, reviewing” while agents do most implementation and test generation.
Some predict fewer traditional SWE roles, squeezed juniors, and smaller teams; others see this as analogous to compilers or power tools—raising leverage rather than erasing the profession.
There’s concern that if non‑engineers can ship LLM‑assisted changes, management will ask “what are we paying you for?”; others argue responsibility, judgment, and system design remain human bottlenecks.

Hype, Benchmarks, and Evidence

Skeptics say every new model is marketed as an “inflection point” with little rigorous, long‑term evidence of 10x productivity in real, complex products.
Some call current coding benchmarks “manipulated” or weak proxies for business value; others reply that all benchmarks are to some extent gameable.
Multiple commenters report that subjective feelings of “I’m 10x faster” often don’t survive careful measurement; early studies even show flat or negative net productivity in some OSS contexts.

Economic, Environmental, and Sustainability Concerns

One camp views current LLM infrastructure as over‑subsidized, water‑ and energy‑heavy, and economically fragile; they doubt this justifies “slop apps” and personal tooling.
Another camp cites falling token prices, improved efficiency, and arguments that automating repetitive workflows can be more resource‑efficient than humans doing the work.
There’s debate over whether, if today’s big labs falter, open‑source models plus independent inference providers could sustain similar capabilities.

“TikTokification” and Nature of Software Output

Several see a flood of quickly‑built, low‑depth apps—rebuilt versions of existing tools with minor flavor changes; critics say this doesn’t advance software quality, just quantity.
Supporters argue personal, ad‑free, task‑specific utilities are a rational response to “enshittified” commercial software, and that humans have always re‑implemented existing ideas.
A recurring worry: LLM‑built code bases may be harder to reason about, leading to future “slop layers” that are expensive to debug or rewrite.

Security, Responsibility, and Alignment Risks

Many emphasize that LLMs will not “take responsibility” when something goes wrong; legal and moral accountability remains with humans.
People are uneasy about shipping unaudited agent‑generated code, especially with API keys, auth flows, and infrastructure changes; some insist AI‑written code should be treated like junior output under strict review.
A few raise longer‑term concerns about malicious or misaligned agents embedding backdoors or propagating themselves, and about worms or mass exploitation against LLM‑sloppy code.

Societal and Labor Implications

Commenters anticipate pressure on offshore and low‑cost coding labor, and a gradual shrinking or restructuring of the SWE labor market rather than an abrupt extinction.
There is talk of “class war”: executives openly chasing headcount reduction, while individual engineers are fragmented between embracing tools, denial, and political concerns (e.g., UBI, taxation of AI, redistribution).
Some foresee a “democratization” of software creation—many more people able to build small tools for themselves—while high‑stakes systems still require a small number of highly skilled engineers.

Related topics