2025-05-20

AI in my plasma physics research didn’t go the way I expected

Academic incentives & publication bias

Commenters stress that overselling, cherry-picking, and non-publication of negative results predate AI and stem from how careers and journals reward “exciting” results and citations.
AI hype amplifies this: “flag-planting” papers from big labs are hard to ignore or critique, especially for under-resourced universities that can’t replicate large-scale experiments.
Several note that a key function of PhD training is learning to “read through” papers, understanding them as artifacts of a sociotechnical system, not neutral truth.

Benchmarks, statistics, and replication

A linked medical-imaging paper argues many “state-of-the-art” claims evaporate once confidence intervals are considered; competing models are statistically indistinguishable.
Commenters are surprised that basic statistical practice (e.g., reporting confidence intervals) is often missing in high‑stakes fields like medicine.
Benchmarks in AI are criticized as fragile, often relying on secret datasets, non-replicable setups, and single-number summaries that hide uncertainty.

AI for physics & numerical methods (PINNs, FEM, etc.)

Multiple researchers report that physics-informed neural networks and AI structural/FEM solvers work tolerably only on simple, linear regimes and break down on nonlinear or out-of-distribution problems.
A recurring pattern: ML models reproduce training data but generalize poorly, while papers still imply broad applicability without actually testing it.
Some characterize “AI for numerical simulations” as “industrial-scale p‑hacking” or a hammer in search of nails.

Universities vs industry & funding politics

Once a topic becomes a resource arms race with industry, some argue it no longer fits the core mission of universities (long‑term, foundational, low‑resource work).
Discussion of NSF funding cuts and political attacks: waste exists (e.g., “use up the budget” equipment), but commenters view research/education as extremely high ROI and compare academic waste favorably to corporate boondoggles.

What counts as AI success in science?

Skeptics ask where the genuine AI-driven breakthroughs are; others cite protein folding, numerical weather prediction, drug discovery hit rates, and recent algorithm‑design work (e.g., matrix multiplication, kissing‑number bounds).
There’s disagreement over how overfitted or fragile some of these successes might be, and whether they represent general scientific reasoning versus powerful prediction/hypothesis‑generation tools.

LLMs, productivity, and erosion of competence

Many report substantial gains from LLMs for coding, document drafting, search over messy corpora, and meeting transcription; others find them slow, noisy, or dangerous in high‑stakes scientific programming.
A tension emerges: LLMs can speed up routine work, but may also encourage shallow understanding and brittle workflows if users stop deeply engaging with code, math, or data.

Conceptual confusion around “AI” & hype dynamics

Several argue “AI” is an almost meaningless marketing term, lumping together classic ML, deep learning, LLMs, and domain‑specific models; serious discussion requires more precise labels.
Others defend “AI” as a useful umbrella for recent neural‑network advances, while acknowledging rampant buzzword abuse (from smartphone cameras to “smart toilets”).
Underneath, commenters converge that:
- The current “AI will revolutionize science” narrative is ahead of robust evidence.
- Incentives (career, funding, corporate valuation) strongly favor overstating AI’s scientific impact.
- Nonetheless, as a tool for search, pattern-finding, and acceleration of certain workflows, AI is already meaningfully useful—and may yet yield deeper advances if used with rigorous methods and honest statistics.

Related topics