ML promises to be profoundly weird
Nature of LLMs: “Bullshit machines” vs human fallibility
- Many agree LLMs often generate fluent but ungrounded text; “bullshit” is used in the Frankfurt sense: output unconcerned with truth, not deliberate lying.
- Some argue humans also confabulate and self‑deceive; differences are of degree and scale, not kind.
- Others push back: humans have metacognition, can genuinely know they don’t know, have goals and values, and care (at least sometimes) about truth; LLMs just emit statistically likely tokens.
- Several warn against sloppy anthropomorphism: “hallucination” and “confabulation” are metaphors, not literal cognitive processes.
Reliability, hallucinations, and evaluation
- Broad consensus that models can be extremely helpful yet still unreliable, with failure modes unlike typical humans (can do complex code but fail on trivial factual or logical tasks).
- Strong disagreement over current error rates: some claim near‑perfect performance with short text prompts and top models (e.g., “thinking” modes with tools); others provide concrete counterexamples and cite benchmarks with high hallucination rates in factual QA.
- A long sub‑thread debates a “challenge” to make GPT-5.4-thinking hallucinate on ≤4 pages of text, with back‑and‑forth over methodology, versions, and what counts as falsification.
- Several emphasize that even low single‑digit hallucination rates are unacceptable in high‑stakes domains and that “confidence scores” from models about their own answers are likely meaningless.
Productive uses and guardrails
- Many practitioners report large productivity gains in software development: drafting code, writing tests, refactoring, migrating frameworks, etc., provided every line is reviewed and tested.
- Tools and unit tests act as strong external verifiers; this is seen as missing in many non‑programming domains.
- Others describe models confidently producing wrong or dangerous code, or “gaslighting” users about bugs, reinforcing the need for tight oversight.
Scale, deployment, and social harms
- Concern that LLMs enable misinformation, spam, deepfakes, and political manipulation at unprecedented scale, amplifying Brandolini’s law (cheap to generate bullshit, costly to refute).
- Debate over capitalism’s role: some see AI as another profit‑maximizing tool with harmful externalities; others frame capitalism as a neutral tool misused when unregulated.
- Analogy to the Industrial Revolution: AI as “industrialization of information” raising questions about ownership, copyright, and the incentives for humans to keep creating public content that can be endlessly harvested.
Architecture, scaling, and future progress
- The article’s suggestion that progress is mostly “more parameters” is disputed; commenters note modern gains from architectural and training advances (MoE, attention variants, reasoning RL, tool use), not just size.
- Some think we’re hitting data and scaling limits without a new “Attention is All You Need”-level breakthrough; others expect more breakthroughs but acknowledge growing costs.
Intelligence, consciousness, and the Turing test
- Many see LLMs as powerful pattern‑matching and text‑transformation engines, not minds with world models, object permanence, or “souls.”
- Others argue that if consciousness is tied to certain computational or feedback patterns, advanced models plus agentic harnesses might eventually qualify.
- Turing‑test claims are contested: experienced users report distinctive LLM “tells” over longer interactions, especially as context windows are exhausted.