Beliefs that are true for regular software but false when applied to AI
Reliability of Old Software vs AI
- Long-running non-AI systems are often more operationally reliable because they’ve been exercised in production, patched, and surrounded with procedures and workarounds.
- Commenters distinguish code quality from product reliability: hacks can improve user-visible behavior while making code worse.
- Others push back: many old codebases are still terrible; survivorship bias and management priorities skew which systems mature.
Nature of Bugs: Code vs Data
- In classic software, people think bugs are in code, but many issues arise from config, deployment environment, concurrency, or integration.
- For LLMs, the article’s claim “bugs come from training data” is criticized as oversimplified: even with “perfect” data, finite models and interpolation guarantee failures.
- Some stress that LLMs optimize for plausibility, not correctness; they lack an internal mechanism to verify logic, so they systematically produce confident errors.
Determinism, Non‑Determinism, and “Fixing” AI
- Deterministic software lets you reason about “all inputs,” enumerate and regress bugs, and expect the same behavior each run.
- Neural networks are continuous, high-dimensional systems: tiny input changes can flip outputs; “counting bugs” or proving global properties is essentially intractable.
- The only practical levers for improving models are dataset, loss/objective, architecture, and hyperparameters—more like empirical science than traditional debugging.
- Non-deterministic sampling (temperature, top‑k/p) is both a quality tool and a source of unpredictability, not just a “realism” trick.
Safety, Power, and Misuse
- Many see concentrated human power plus AI as the main danger: surveillance, manipulation, and strengthened authoritarianism, not sci‑fi “Matrix batteries.”
- Others worry about information pollution: AI-generated text and images drowning out authentic sources and breaking search.
- The “lethal trifecta” pattern (models given untrusted inputs, access to secrets, and external actions) is flagged as structurally risky, especially via tool protocols like MCP.
- Sandbox ideas are discussed but seen as leaky once models can influence humans or networked systems.
Current Capabilities and Limits
- Several developers report LLMs failing badly on real coding tasks (loops of broken unit tests, shallow debugging), reinforcing skepticism about near-term AGI.
- Others counter with rapid capability gains and empirical studies suggesting task competence is improving on a steep curve, though limits of the current paradigm are debated.
Critiques of the Article’s Framing
- Some argue the “true for regular software, false for AI” bullets were never really true even for traditional software (e.g., regressions, specs vs reality).
- Others defend them as deliberately simplified to explain to non-technical managers why “just fix the bug in the code” doesn’t map to modern LLMs.
- There is broad agreement that nobody really “understands” LLM internals at a human-comprehensible level, despite knowing the math and training process.