2026-04-08

ML promises to be profoundly weird

Nature of LLMs: “Bullshit machines” vs human fallibility

Many agree LLMs often generate fluent but ungrounded text; “bullshit” is used in the Frankfurt sense: output unconcerned with truth, not deliberate lying.
Some argue humans also confabulate and self‑deceive; differences are of degree and scale, not kind.
Others push back: humans have metacognition, can genuinely know they don’t know, have goals and values, and care (at least sometimes) about truth; LLMs just emit statistically likely tokens.
Several warn against sloppy anthropomorphism: “hallucination” and “confabulation” are metaphors, not literal cognitive processes.

Reliability, hallucinations, and evaluation

Broad consensus that models can be extremely helpful yet still unreliable, with failure modes unlike typical humans (can do complex code but fail on trivial factual or logical tasks).
Strong disagreement over current error rates: some claim near‑perfect performance with short text prompts and top models (e.g., “thinking” modes with tools); others provide concrete counterexamples and cite benchmarks with high hallucination rates in factual QA.
A long sub‑thread debates a “challenge” to make GPT-5.4-thinking hallucinate on ≤4 pages of text, with back‑and‑forth over methodology, versions, and what counts as falsification.
Several emphasize that even low single‑digit hallucination rates are unacceptable in high‑stakes domains and that “confidence scores” from models about their own answers are likely meaningless.

Productive uses and guardrails

Many practitioners report large productivity gains in software development: drafting code, writing tests, refactoring, migrating frameworks, etc., provided every line is reviewed and tested.
Tools and unit tests act as strong external verifiers; this is seen as missing in many non‑programming domains.
Others describe models confidently producing wrong or dangerous code, or “gaslighting” users about bugs, reinforcing the need for tight oversight.

Scale, deployment, and social harms

Concern that LLMs enable misinformation, spam, deepfakes, and political manipulation at unprecedented scale, amplifying Brandolini’s law (cheap to generate bullshit, costly to refute).
Debate over capitalism’s role: some see AI as another profit‑maximizing tool with harmful externalities; others frame capitalism as a neutral tool misused when unregulated.
Analogy to the Industrial Revolution: AI as “industrialization of information” raising questions about ownership, copyright, and the incentives for humans to keep creating public content that can be endlessly harvested.

Architecture, scaling, and future progress

The article’s suggestion that progress is mostly “more parameters” is disputed; commenters note modern gains from architectural and training advances (MoE, attention variants, reasoning RL, tool use), not just size.
Some think we’re hitting data and scaling limits without a new “Attention is All You Need”-level breakthrough; others expect more breakthroughs but acknowledge growing costs.

Intelligence, consciousness, and the Turing test

Many see LLMs as powerful pattern‑matching and text‑transformation engines, not minds with world models, object permanence, or “souls.”
Others argue that if consciousness is tied to certain computational or feedback patterns, advanced models plus agentic harnesses might eventually qualify.
Turing‑test claims are contested: experienced users report distinctive LLM “tells” over longer interactions, especially as context windows are exhausted.

Related topics