Major AI conference flooded with peer reviews written by AI
Scale of AI-Generated Reviews
- Many readers expected a higher share than 21%, finding the number “shockingly low” given incentives to offload tedious reviews.
- Others stress that 21% fully AI‑generated reviews implies widespread dereliction of duty in a process that’s supposed to be “peer” review.
Does AI Use Matter or Only Review Quality?
- One camp: the tool used is irrelevant; what matters is whether reviews catch errors and provide useful feedback.
- Opposing view: even if accurate, a conference that promises peer review cannot ethically substitute an LLM for a human peer.
- Several note common workflows where humans draft bullets and use LLMs to rewrite, translate, or polish; they argue these should not be equated with fraud.
AI Detectors and Pangram’s Claims
- Strong skepticism toward AI detectors in general: earlier tools had high false positives, especially on non‑native English, and were easily fooled.
- Pangram’s cofounder claims a very low false positive rate and presents benchmarks; critics find “near-zero” error rates implausible and worry about data leakage and overfitting.
- Some see the Nature piece as PR for Pangram and emphasize that detector statistics are not “proof” for individual cases.
- Others counter that even imperfect detectors can be useful for aggregate statistics if not used to punish individuals.
Harms and Misuse of Detection
- Educators report “knowing” many student essays are AI‑assisted but lacking provable evidence; detectors push students to write in degraded, oversimplified styles.
- Commenters warn that unreliable detectors create bias and witch-hunt dynamics: once content is flagged, humans start seeing “evidence” everywhere.
Broader Concerns About Peer Review and AI Slop
- Many describe peer review as already overloaded and low-quality; AI simply lowers the effort further and expands the “market for lemons.”
- Some fear AI’s bland, formulaic style is infecting human writing norms across the web and academia.
- Others suggest more transparency about LLM use, reputation systems and consequences for abusive use, or even structuring conferences around AI-generated baseline reviews that humans must correct—while acknowledging these too could be gamed.