2025-04-27

The coming knowledge-work supply-chain crisis

Reliability, Confidence, and “Calculator” Analogies

Many see the core flaw as: LLMs are confidently, unpredictably wrong; you must review every output.
Calls for “confidence scores” run into: token probabilities don’t map to truth, only to “looks like human text.”
Models can still hallucinate obviously wrong things (e.g., glue on pizza, fake command-line flags, imaginary hardware).
Unlike calculators (single right answer, rarely fail, no one re-checks), language is inherently probabilistic and multi‑valid; people fear we’ll treat LLMs like calculators anyway and stop checking.

Review Burden and Hypervigilance

You can get 9 good PRs then a catastrophic 10th, so reviewers must treat all LLM code like risky intern work.
Passive oversight is cognitively exhausting; parallels to self‑driving cars and self‑checkout: “monitoring” is a bad human task.
Senior engineers report burnout from reviewing growing volumes of often‑opaque AI code, with little mentoring payoff.

Juniors, Learning Ladders, and Labor Structure

Concern: if LLMs do “junior” work, how do humans gain the experience needed to become seniors?
Counter: LLMs mostly replace “copy‑from‑StackOverflow” coders; serious juniors still read docs, reason, and learn.
Some foresee law/accounting–style pyramids: layers of juniors and seniors iteratively editing AI output.
Others argue LLMs don’t learn from feedback today, so “tutoring the model” yields no compounding return.

Testing, Specs, and Viable Use Cases

Strong theme: rely less on trust, more on tests and (ideally) formal methods.
Proposed workflow: generate tests (reviewed), then iterate LLM‑generated code until tests pass.
LLMs are viewed as very useful for: bug‑finding, code search, low‑risk utilities, info retrieval with human fact‑checking, and ambient dictation in medicine.

Meaning-Making and Decision Work

The article’s claim that “meaningmaking” is uniquely human is contested: ML can score options given criteria, but humans must define those criteria and beat other humans (e.g., in trading).
Others argue the hardest part is externalizing tacit expert judgment into explicit frameworks models (and juniors) can use.

Organizational, Data, and Job-Quality Concerns

Fear of complacency once models feel “95% right,” enabling subtle errors, manipulation, or prompt‑injection–style attacks.
Worries that future training data will degrade (enshittified web, AI‑generated noise), reducing model quality.
Many dislike a future where skilled people mostly validate stochastic parrots, analogous to self‑checkout supervisors or outsourced body‑shops.
Several commenters think “decision velocity” and exponential productivity are overstated; real bottlenecks are prioritization, strategy, user adoption, and maintaining quality.

Related topics