Scientists should use AI as a tool, not an oracle

AI as Tool vs Oracle

  • Strong agreement that AI (especially LLMs) should be a tool, not an authority.
  • Concern that many users, including some scientists, effectively let AI “write for them” or source facts without verification.
  • Some suggest formal norms like citing AI as a writing assistant to increase transparency.

Hallucinations, References, and Reliability

  • Multiple reports of fabricated or misrepresented citations; one user counted ~95% fake references.
  • Complaints that LLMs distort even simple factual text when asked to “rewrite” in different tones.
  • Clarification that hallucination is not a “bug” in the code path but an inherent consequence of the modeling approach.

Comparison to Search and Source Evaluation

  • LLMs differ from search because they strip away context, provenance, and competing answers.
  • Traditional search lets users judge credibility via site, author, and links; LLMs offer a single, confident narrative.
  • Some think people overestimate their ability to detect unreliable web data anyway.

Use in Science, Academia, and Public Sector

  • Worry that scientists will use chatbots to interpret results or draft papers, driven by publish‑or‑perish pressures.
  • A contrasting view from public‑sector science: AI assistants could help triage huge backlogs (e.g., toxicology literature, QSAR trends) if used under expert oversight.
  • Serious concern about public bureaucracies replacing human checks with AI for efficiency, leading to harmful decisions.

Expertise, Trust, and Epistemology

  • Broader problem: people confidently argue against domain experts while uncritically trusting machines.
  • Counterpoint: some “experts” in high‑profile domains are politicized, so laypeople struggle to know whom to trust.
  • Several note that more information of lower average quality worsens existing epistemic problems.

Definitions: Leakage, Overfitting, and “Curve Fitting”

  • “Leakage” discussed as using information in training that would not be available at inference, often via mislabeled or improperly split data; related to but distinct from overfitting.
  • Example: models learning background artifacts (e.g., trees) instead of the intended object.
  • Some argue calling it “curve fitting” rather than “AI” would demystify it and clarify legal responsibility.

Intelligence, Correctness, and Anthropomorphism

  • Debate over whether LLMs are “intelligent” or just probabilistic text generators.
  • One side stresses they merely predict next tokens and lack concepts like truth or correctness internally.
  • Others insist correctness is still a meaningful external criterion: if the output fails the user’s task, it is wrong, regardless of internal mechanics.
  • Warnings that anthropomorphizing models and marketing them as oracles causes misuse and misplaced trust.

Safety, Influence, and Corporate Incentives

  • Speculation about using AI to subtly steer human behavior (e.g., toward “better” choices), with ethical and trust risks.
  • View that corporations will integrate imperfect AI wherever it is economically beneficial, but must surround it with validation pipelines, as in software development.
  • Some skepticism toward highly speculative AI‑doomer narratives; emphasis that genuine safety work should be grounded in real technical understanding.