2025-10-14

Beliefs that are true for regular software but false when applied to AI

Reliability of Old Software vs AI

Long-running non-AI systems are often more operationally reliable because they’ve been exercised in production, patched, and surrounded with procedures and workarounds.
Commenters distinguish code quality from product reliability: hacks can improve user-visible behavior while making code worse.
Others push back: many old codebases are still terrible; survivorship bias and management priorities skew which systems mature.

Nature of Bugs: Code vs Data

In classic software, people think bugs are in code, but many issues arise from config, deployment environment, concurrency, or integration.
For LLMs, the article’s claim “bugs come from training data” is criticized as oversimplified: even with “perfect” data, finite models and interpolation guarantee failures.
Some stress that LLMs optimize for plausibility, not correctness; they lack an internal mechanism to verify logic, so they systematically produce confident errors.

Determinism, Non‑Determinism, and “Fixing” AI

Deterministic software lets you reason about “all inputs,” enumerate and regress bugs, and expect the same behavior each run.
Neural networks are continuous, high-dimensional systems: tiny input changes can flip outputs; “counting bugs” or proving global properties is essentially intractable.
The only practical levers for improving models are dataset, loss/objective, architecture, and hyperparameters—more like empirical science than traditional debugging.
Non-deterministic sampling (temperature, top‑k/p) is both a quality tool and a source of unpredictability, not just a “realism” trick.

Safety, Power, and Misuse

Many see concentrated human power plus AI as the main danger: surveillance, manipulation, and strengthened authoritarianism, not sci‑fi “Matrix batteries.”
Others worry about information pollution: AI-generated text and images drowning out authentic sources and breaking search.
The “lethal trifecta” pattern (models given untrusted inputs, access to secrets, and external actions) is flagged as structurally risky, especially via tool protocols like MCP.
Sandbox ideas are discussed but seen as leaky once models can influence humans or networked systems.

Current Capabilities and Limits

Several developers report LLMs failing badly on real coding tasks (loops of broken unit tests, shallow debugging), reinforcing skepticism about near-term AGI.
Others counter with rapid capability gains and empirical studies suggesting task competence is improving on a steep curve, though limits of the current paradigm are debated.

Critiques of the Article’s Framing

Some argue the “true for regular software, false for AI” bullets were never really true even for traditional software (e.g., regressions, specs vs reality).
Others defend them as deliberately simplified to explain to non-technical managers why “just fix the bug in the code” doesn’t map to modern LLMs.
There is broad agreement that nobody really “understands” LLM internals at a human-comprehensible level, despite knowing the math and training process.

Related topics