2024-07-10

Ask HN: Why does no one seem to care that AI gives wrong answers?

Perceived Problem: Wrong Answers & “Hallucinations”

Many commenters say they do care; they’ve been “burned” enough to stop trusting LLMs for factual Q&A.
“Hallucination” is viewed by some as PR spin for faulty output; others treat it as an inherent property of current LLMs.
For some users, frequent, confident errors make LLMs effectively useless as answer bots, especially when even simple explanations are wrong.
Others argue hallucination isn’t a showstopper in all contexts; it’s a risk to be managed with tests, checks, and product design.

Why Some Still Use It

Strong adoption for low‑stakes or creative tasks: drafting emails, rewriting, summarizing, translations, basic code scaffolding, text-to-speech, word transformations.
Many accept “90% right” if it saves time and they plan to verify or edit outputs.
For inherently probabilistic tasks (e.g., sentiment analysis), higher accuracy than older methods is considered good enough.
Some prefer LLMs over ad‑ridden, SEO‑polluted search, even if both can be wrong.

Limits of LLMs and Difficulty of Fixing

Multiple comments stress that LLMs are language models, not factual databases; they produce likely next tokens, not guaranteed truths.
Some see current architectures as intrinsically prone to overfitting, hidden failure modes, and irreducible hallucinations; they expect an asymptote, not explosive improvement.
Others claim that with retrieval (RAG), few-shot prompting, and domain constraints, wrong answers can be treated as normal software bugs and pushed very low for narrow tasks.
There’s disagreement whether better training or more complexity will ever make them reliably factual.

Comparison to Humans and Expectations

Analogies to junior engineers: useful but must be supervised; critics respond that juniors learn and stop repeating the same mistakes, LLMs do not.
Humans can say “I don’t know” or convey uncertainty; LLMs typically answer confidently regardless of reliability.
Many note users anthropomorphize models and are misled by fluent, confident language, similar to how some fraudsters operate.

Business Incentives, Hype, and Regulation

Commenters highlight incentives: investors and large vendors profit from shipping “good enough” AI now, betting that the “next model” will fix accuracy.
Hype cycles (VR, blockchain, NFTs, now AI) are seen as driving deployment even where correctness is critical.
Some predict a coming reckoning as enterprises discover AI agents and automation can’t deliver promised reliability.
Concerns include lack of regulation, massive energy use, and a broader culture that tolerates broken, ad‑driven tech as long as “line go up.”

Related topics