The coming knowledge-work supply-chain crisis
Reliability, Confidence, and “Calculator” Analogies
- Many see the core flaw as: LLMs are confidently, unpredictably wrong; you must review every output.
- Calls for “confidence scores” run into: token probabilities don’t map to truth, only to “looks like human text.”
- Models can still hallucinate obviously wrong things (e.g., glue on pizza, fake command-line flags, imaginary hardware).
- Unlike calculators (single right answer, rarely fail, no one re-checks), language is inherently probabilistic and multi‑valid; people fear we’ll treat LLMs like calculators anyway and stop checking.
Review Burden and Hypervigilance
- You can get 9 good PRs then a catastrophic 10th, so reviewers must treat all LLM code like risky intern work.
- Passive oversight is cognitively exhausting; parallels to self‑driving cars and self‑checkout: “monitoring” is a bad human task.
- Senior engineers report burnout from reviewing growing volumes of often‑opaque AI code, with little mentoring payoff.
Juniors, Learning Ladders, and Labor Structure
- Concern: if LLMs do “junior” work, how do humans gain the experience needed to become seniors?
- Counter: LLMs mostly replace “copy‑from‑StackOverflow” coders; serious juniors still read docs, reason, and learn.
- Some foresee law/accounting–style pyramids: layers of juniors and seniors iteratively editing AI output.
- Others argue LLMs don’t learn from feedback today, so “tutoring the model” yields no compounding return.
Testing, Specs, and Viable Use Cases
- Strong theme: rely less on trust, more on tests and (ideally) formal methods.
- Proposed workflow: generate tests (reviewed), then iterate LLM‑generated code until tests pass.
- LLMs are viewed as very useful for: bug‑finding, code search, low‑risk utilities, info retrieval with human fact‑checking, and ambient dictation in medicine.
Meaning-Making and Decision Work
- The article’s claim that “meaningmaking” is uniquely human is contested: ML can score options given criteria, but humans must define those criteria and beat other humans (e.g., in trading).
- Others argue the hardest part is externalizing tacit expert judgment into explicit frameworks models (and juniors) can use.
Organizational, Data, and Job-Quality Concerns
- Fear of complacency once models feel “95% right,” enabling subtle errors, manipulation, or prompt‑injection–style attacks.
- Worries that future training data will degrade (enshittified web, AI‑generated noise), reducing model quality.
- Many dislike a future where skilled people mostly validate stochastic parrots, analogous to self‑checkout supervisors or outsourced body‑shops.
- Several commenters think “decision velocity” and exponential productivity are overstated; real bottlenecks are prioritization, strategy, user adoption, and maintaining quality.