2024-06-03

Scientists should use AI as a tool, not an oracle

AI as Tool vs Oracle

Strong agreement that AI (especially LLMs) should be a tool, not an authority.
Concern that many users, including some scientists, effectively let AI “write for them” or source facts without verification.
Some suggest formal norms like citing AI as a writing assistant to increase transparency.

Hallucinations, References, and Reliability

Multiple reports of fabricated or misrepresented citations; one user counted ~95% fake references.
Complaints that LLMs distort even simple factual text when asked to “rewrite” in different tones.
Clarification that hallucination is not a “bug” in the code path but an inherent consequence of the modeling approach.

Comparison to Search and Source Evaluation

LLMs differ from search because they strip away context, provenance, and competing answers.
Traditional search lets users judge credibility via site, author, and links; LLMs offer a single, confident narrative.
Some think people overestimate their ability to detect unreliable web data anyway.

Use in Science, Academia, and Public Sector

Worry that scientists will use chatbots to interpret results or draft papers, driven by publish‑or‑perish pressures.
A contrasting view from public‑sector science: AI assistants could help triage huge backlogs (e.g., toxicology literature, QSAR trends) if used under expert oversight.
Serious concern about public bureaucracies replacing human checks with AI for efficiency, leading to harmful decisions.

Expertise, Trust, and Epistemology

Broader problem: people confidently argue against domain experts while uncritically trusting machines.
Counterpoint: some “experts” in high‑profile domains are politicized, so laypeople struggle to know whom to trust.
Several note that more information of lower average quality worsens existing epistemic problems.

Definitions: Leakage, Overfitting, and “Curve Fitting”

“Leakage” discussed as using information in training that would not be available at inference, often via mislabeled or improperly split data; related to but distinct from overfitting.
Example: models learning background artifacts (e.g., trees) instead of the intended object.
Some argue calling it “curve fitting” rather than “AI” would demystify it and clarify legal responsibility.

Intelligence, Correctness, and Anthropomorphism

Debate over whether LLMs are “intelligent” or just probabilistic text generators.
One side stresses they merely predict next tokens and lack concepts like truth or correctness internally.
Others insist correctness is still a meaningful external criterion: if the output fails the user’s task, it is wrong, regardless of internal mechanics.
Warnings that anthropomorphizing models and marketing them as oracles causes misuse and misplaced trust.

Safety, Influence, and Corporate Incentives

Speculation about using AI to subtly steer human behavior (e.g., toward “better” choices), with ethical and trust risks.
View that corporations will integrate imperfect AI wherever it is economically beneficial, but must surround it with validation pipelines, as in software development.
Some skepticism toward highly speculative AI‑doomer narratives; emphasis that genuine safety work should be grounded in real technical understanding.

Related topics