Beliefs that are true for regular software but false when applied to AI

Reliability of Old Software vs AI

  • Long-running non-AI systems are often more operationally reliable because they’ve been exercised in production, patched, and surrounded with procedures and workarounds.
  • Commenters distinguish code quality from product reliability: hacks can improve user-visible behavior while making code worse.
  • Others push back: many old codebases are still terrible; survivorship bias and management priorities skew which systems mature.

Nature of Bugs: Code vs Data

  • In classic software, people think bugs are in code, but many issues arise from config, deployment environment, concurrency, or integration.
  • For LLMs, the article’s claim “bugs come from training data” is criticized as oversimplified: even with “perfect” data, finite models and interpolation guarantee failures.
  • Some stress that LLMs optimize for plausibility, not correctness; they lack an internal mechanism to verify logic, so they systematically produce confident errors.

Determinism, Non‑Determinism, and “Fixing” AI

  • Deterministic software lets you reason about “all inputs,” enumerate and regress bugs, and expect the same behavior each run.
  • Neural networks are continuous, high-dimensional systems: tiny input changes can flip outputs; “counting bugs” or proving global properties is essentially intractable.
  • The only practical levers for improving models are dataset, loss/objective, architecture, and hyperparameters—more like empirical science than traditional debugging.
  • Non-deterministic sampling (temperature, top‑k/p) is both a quality tool and a source of unpredictability, not just a “realism” trick.

Safety, Power, and Misuse

  • Many see concentrated human power plus AI as the main danger: surveillance, manipulation, and strengthened authoritarianism, not sci‑fi “Matrix batteries.”
  • Others worry about information pollution: AI-generated text and images drowning out authentic sources and breaking search.
  • The “lethal trifecta” pattern (models given untrusted inputs, access to secrets, and external actions) is flagged as structurally risky, especially via tool protocols like MCP.
  • Sandbox ideas are discussed but seen as leaky once models can influence humans or networked systems.

Current Capabilities and Limits

  • Several developers report LLMs failing badly on real coding tasks (loops of broken unit tests, shallow debugging), reinforcing skepticism about near-term AGI.
  • Others counter with rapid capability gains and empirical studies suggesting task competence is improving on a steep curve, though limits of the current paradigm are debated.

Critiques of the Article’s Framing

  • Some argue the “true for regular software, false for AI” bullets were never really true even for traditional software (e.g., regressions, specs vs reality).
  • Others defend them as deliberately simplified to explain to non-technical managers why “just fix the bug in the code” doesn’t map to modern LLMs.
  • There is broad agreement that nobody really “understands” LLM internals at a human-comprehensible level, despite knowing the math and training process.