The Unreliability of LLMs and What Lies Ahead
Perceived Capabilities and Hype
- Many see LLMs as doing “more of what computers already did”: pattern matching, data analysis, boilerplate generation, not magic new intelligence.
- Others point out qualitatively new-feeling abilities (philosophical framing of news, reasoning about images, bespoke code/library suggestions) but agree it’s still statistical text/data processing.
- Strong skepticism that current LLMs justify their valuation or “Cyber Christ” narrative, though most agree they’ll remain as a useful technology.
Reliability, Hallucinations, and “Lying”
- Core complaint: models confidently output plausible but false information and fabricated rationales; in critical work this is indistinguishable from lying.
- Several argue “lying” and “hallucination” are misleading anthropomorphic metaphors: the model has no self-knowledge or grounding, just produces likely text.
- RLHF/feedback schemes may inadvertently select for outputs that are persuasively wrong, optimizing for deception-like behavior.
Divergent User Experiences
- One camp: “mostly right enough” for coding, writing, brainstorming, learning; willing to live with uncertainty and verify when needed.
- Other camp: finds outputs “mostly wrong in subtle ways,” making review cost higher than doing work from scratch.
- This divide is framed as differing expectations, tolerance for uncertainty, domain expertise, and even personality.
Software Development Use Cases
- Positive reports: big time savings on glue code, scripts, YAML transforms, CI configs, documentation, small DB queries, unit tests; especially in mainstream languages.
- Critics say productivity gains are overstated: time shifts from typing to careful review, especially for large changes or legacy systems.
- Concerns about “vibe-coded” codebases, security flaws, and future maintenance of LLM-generated sludge.
High-Stakes vs Low-Stakes Applications
- Widely accepted for low-consequence tasks: vacation ideation, travel “vibe checks,” children’s books, vanity content, internal summaries.
- Strong pushback on using LLMs in law, government benefits, safety-critical engineering, or financial analysis where “mostly right” is unacceptable.
Search, Summarization, and Knowledge Quality
- LLM-based summaries in search are praised for convenience but criticized for factual inversions and reduced traffic to original sources.
- Worry that powerful “bullshit machines” exploit people’s Gell-Mann–like tendency to trust fluent text outside their expertise.
Scientific/Technical Domains and Causality
- Scientists report that even with tools and citations, models conflate correlated concepts, mis-group topics, and mis-handle basic domain math.
- Multiple comments argue that genuine progress requires causal/world models and rigorous evaluation theory, not just bigger LLMs or prompt tricks.