The coming knowledge-work supply-chain crisis

Reliability, Confidence, and “Calculator” Analogies

  • Many see the core flaw as: LLMs are confidently, unpredictably wrong; you must review every output.
  • Calls for “confidence scores” run into: token probabilities don’t map to truth, only to “looks like human text.”
  • Models can still hallucinate obviously wrong things (e.g., glue on pizza, fake command-line flags, imaginary hardware).
  • Unlike calculators (single right answer, rarely fail, no one re-checks), language is inherently probabilistic and multi‑valid; people fear we’ll treat LLMs like calculators anyway and stop checking.

Review Burden and Hypervigilance

  • You can get 9 good PRs then a catastrophic 10th, so reviewers must treat all LLM code like risky intern work.
  • Passive oversight is cognitively exhausting; parallels to self‑driving cars and self‑checkout: “monitoring” is a bad human task.
  • Senior engineers report burnout from reviewing growing volumes of often‑opaque AI code, with little mentoring payoff.

Juniors, Learning Ladders, and Labor Structure

  • Concern: if LLMs do “junior” work, how do humans gain the experience needed to become seniors?
  • Counter: LLMs mostly replace “copy‑from‑StackOverflow” coders; serious juniors still read docs, reason, and learn.
  • Some foresee law/accounting–style pyramids: layers of juniors and seniors iteratively editing AI output.
  • Others argue LLMs don’t learn from feedback today, so “tutoring the model” yields no compounding return.

Testing, Specs, and Viable Use Cases

  • Strong theme: rely less on trust, more on tests and (ideally) formal methods.
  • Proposed workflow: generate tests (reviewed), then iterate LLM‑generated code until tests pass.
  • LLMs are viewed as very useful for: bug‑finding, code search, low‑risk utilities, info retrieval with human fact‑checking, and ambient dictation in medicine.

Meaning-Making and Decision Work

  • The article’s claim that “meaningmaking” is uniquely human is contested: ML can score options given criteria, but humans must define those criteria and beat other humans (e.g., in trading).
  • Others argue the hardest part is externalizing tacit expert judgment into explicit frameworks models (and juniors) can use.

Organizational, Data, and Job-Quality Concerns

  • Fear of complacency once models feel “95% right,” enabling subtle errors, manipulation, or prompt‑injection–style attacks.
  • Worries that future training data will degrade (enshittified web, AI‑generated noise), reducing model quality.
  • Many dislike a future where skilled people mostly validate stochastic parrots, analogous to self‑checkout supervisors or outsourced body‑shops.
  • Several commenters think “decision velocity” and exponential productivity are overstated; real bottlenecks are prioritization, strategy, user adoption, and maintaining quality.