AI in my plasma physics research didn’t go the way I expected

Academic incentives & publication bias

  • Commenters stress that overselling, cherry-picking, and non-publication of negative results predate AI and stem from how careers and journals reward “exciting” results and citations.
  • AI hype amplifies this: “flag-planting” papers from big labs are hard to ignore or critique, especially for under-resourced universities that can’t replicate large-scale experiments.
  • Several note that a key function of PhD training is learning to “read through” papers, understanding them as artifacts of a sociotechnical system, not neutral truth.

Benchmarks, statistics, and replication

  • A linked medical-imaging paper argues many “state-of-the-art” claims evaporate once confidence intervals are considered; competing models are statistically indistinguishable.
  • Commenters are surprised that basic statistical practice (e.g., reporting confidence intervals) is often missing in high‑stakes fields like medicine.
  • Benchmarks in AI are criticized as fragile, often relying on secret datasets, non-replicable setups, and single-number summaries that hide uncertainty.

AI for physics & numerical methods (PINNs, FEM, etc.)

  • Multiple researchers report that physics-informed neural networks and AI structural/FEM solvers work tolerably only on simple, linear regimes and break down on nonlinear or out-of-distribution problems.
  • A recurring pattern: ML models reproduce training data but generalize poorly, while papers still imply broad applicability without actually testing it.
  • Some characterize “AI for numerical simulations” as “industrial-scale p‑hacking” or a hammer in search of nails.

Universities vs industry & funding politics

  • Once a topic becomes a resource arms race with industry, some argue it no longer fits the core mission of universities (long‑term, foundational, low‑resource work).
  • Discussion of NSF funding cuts and political attacks: waste exists (e.g., “use up the budget” equipment), but commenters view research/education as extremely high ROI and compare academic waste favorably to corporate boondoggles.

What counts as AI success in science?

  • Skeptics ask where the genuine AI-driven breakthroughs are; others cite protein folding, numerical weather prediction, drug discovery hit rates, and recent algorithm‑design work (e.g., matrix multiplication, kissing‑number bounds).
  • There’s disagreement over how overfitted or fragile some of these successes might be, and whether they represent general scientific reasoning versus powerful prediction/hypothesis‑generation tools.

LLMs, productivity, and erosion of competence

  • Many report substantial gains from LLMs for coding, document drafting, search over messy corpora, and meeting transcription; others find them slow, noisy, or dangerous in high‑stakes scientific programming.
  • A tension emerges: LLMs can speed up routine work, but may also encourage shallow understanding and brittle workflows if users stop deeply engaging with code, math, or data.

Conceptual confusion around “AI” & hype dynamics

  • Several argue “AI” is an almost meaningless marketing term, lumping together classic ML, deep learning, LLMs, and domain‑specific models; serious discussion requires more precise labels.
  • Others defend “AI” as a useful umbrella for recent neural‑network advances, while acknowledging rampant buzzword abuse (from smartphone cameras to “smart toilets”).
  • Underneath, commenters converge that:
    • The current “AI will revolutionize science” narrative is ahead of robust evidence.
    • Incentives (career, funding, corporate valuation) strongly favor overstating AI’s scientific impact.
    • Nonetheless, as a tool for search, pattern-finding, and acceleration of certain workflows, AI is already meaningfully useful—and may yet yield deeper advances if used with rigorous methods and honest statistics.