Opus 4.5 is not the normal AI agent experience that I have had thus far

Production Quality, Over‑Engineering, and Maintainability

  • Many argue Opus 4.5 can ship “working” code but not reliably production‑grade software: limited edge‑case handling, brittle error paths, security blind spots, and convoluted architectures.
  • Others counter that much human‑shipped code is no better, and that for many applications “good enough” is acceptable—especially for internal tools and personal utilities.
  • Several note that models tend to over‑engineer and duplicate logic; effective use often requires explicit instructions (“keep it simple”, “minimal changes”) and refactoring passes.
  • There is broad agreement that agents are best when humans supply clear specs, constraints, and tests; open‑ended “improve anything” prompts usually produce baffling changes and code bloat.

Greenfield Apps vs Legacy, Complex Systems

  • Opus 4.5 is reported to excel at small–medium, greenfield projects: CLIs, CRUD apps, simple mobile/web apps, bindings, and ports—especially when tools (linters, tests, runners) are in the loop.
  • Performance degrades on large, messy, long‑lived codebases with complex domain logic, flaky docs, or many cross‑cutting concerns; agents can get stuck, loop, or make architecture‑breaking edits.
  • Planning/spec‑driven workflows, breaking work into small tasks, and using “plan modes” or markdown specs are repeatedly cited as key to getting good results.
  • Language and domain matter: people see strong performance in JS/TS, Python, C, Go; weaker, more error‑prone behavior in C++, Rust, low‑level graphics, and niche frameworks.

Changing Developer Roles and Job Market Anxiety

  • Many describe a shift from “writing code” to “guiding, specifying, reviewing” while agents do most implementation and test generation.
  • Some predict fewer traditional SWE roles, squeezed juniors, and smaller teams; others see this as analogous to compilers or power tools—raising leverage rather than erasing the profession.
  • There’s concern that if non‑engineers can ship LLM‑assisted changes, management will ask “what are we paying you for?”; others argue responsibility, judgment, and system design remain human bottlenecks.

Hype, Benchmarks, and Evidence

  • Skeptics say every new model is marketed as an “inflection point” with little rigorous, long‑term evidence of 10x productivity in real, complex products.
  • Some call current coding benchmarks “manipulated” or weak proxies for business value; others reply that all benchmarks are to some extent gameable.
  • Multiple commenters report that subjective feelings of “I’m 10x faster” often don’t survive careful measurement; early studies even show flat or negative net productivity in some OSS contexts.

Economic, Environmental, and Sustainability Concerns

  • One camp views current LLM infrastructure as over‑subsidized, water‑ and energy‑heavy, and economically fragile; they doubt this justifies “slop apps” and personal tooling.
  • Another camp cites falling token prices, improved efficiency, and arguments that automating repetitive workflows can be more resource‑efficient than humans doing the work.
  • There’s debate over whether, if today’s big labs falter, open‑source models plus independent inference providers could sustain similar capabilities.

“TikTokification” and Nature of Software Output

  • Several see a flood of quickly‑built, low‑depth apps—rebuilt versions of existing tools with minor flavor changes; critics say this doesn’t advance software quality, just quantity.
  • Supporters argue personal, ad‑free, task‑specific utilities are a rational response to “enshittified” commercial software, and that humans have always re‑implemented existing ideas.
  • A recurring worry: LLM‑built code bases may be harder to reason about, leading to future “slop layers” that are expensive to debug or rewrite.

Security, Responsibility, and Alignment Risks

  • Many emphasize that LLMs will not “take responsibility” when something goes wrong; legal and moral accountability remains with humans.
  • People are uneasy about shipping unaudited agent‑generated code, especially with API keys, auth flows, and infrastructure changes; some insist AI‑written code should be treated like junior output under strict review.
  • A few raise longer‑term concerns about malicious or misaligned agents embedding backdoors or propagating themselves, and about worms or mass exploitation against LLM‑sloppy code.

Societal and Labor Implications

  • Commenters anticipate pressure on offshore and low‑cost coding labor, and a gradual shrinking or restructuring of the SWE labor market rather than an abrupt extinction.
  • There is talk of “class war”: executives openly chasing headcount reduction, while individual engineers are fragmented between embracing tools, denial, and political concerns (e.g., UBI, taxation of AI, redistribution).
  • Some foresee a “democratization” of software creation—many more people able to build small tools for themselves—while high‑stakes systems still require a small number of highly skilled engineers.