2024-10-20

Do AI detectors work? Students face false cheating accusations

Reliability of AI Detectors

Many commenters say current detectors are “garbage”: they flag good, simple prose, pre‑AI essays, teachers’ own writing, and even warning emails about cheating.
Reported false positives include strong writers, autistic students, non‑native speakers, and a 7th‑grader whose work was mostly flagged.
Some small tests (e.g., older essays run through tools) show non‑trivial false positive rates; precision vs. recall tradeoffs are poorly understood by administrators.

Use in Academic Discipline & Due Process

Strong concern that schools treat detector scores (e.g., “85% AI”) as proof, reversing the burden of proof onto students.
Described as “Kafkaesque”: students often have no meaningful appeals process and must screen‑record writing or use Google Docs history to defend themselves.
Some argue this conflicts with basic standards of evidence and, in some jurisdictions, with rules against purely automated decisions.

Arms Race and Technical Limits

Many believe reliable AI vs. AI detection is fundamentally unwinnable: models can imitate human style or a specific student, and open‑source tools can be tuned to evade detectors.
Suggestions like watermarking are viewed as fragile: easy to strip or mask with rewriting tools.

Cheating, Homework, and Assessment Design

Broad agreement that out‑of‑class writing and homework are now weak signals of individual understanding; cheating was already common, AI just makes it cheaper and easier.
Some propose making homework low‑ or zero‑stakes and basing grades mainly on in‑class, proctored or handwritten exams.
Others defend homework as essential practice, but argue it should be ungraded or used purely for learning, not evaluation.

Effects on Learning and Writing

Worries that LLMs and tools like Grammarly push “beige,” formulaic language and that students will internalize this style (“AI slop”).
Counterpoint: for some (e.g., dyslexic or non‑native writers) these tools are empowering and can improve grammar over time.

Equity, Bias, and Systemic Issues

Concerns that detectors disproportionately mislabel certain linguistic styles (non‑Western English, autistic writing) and could become a civil‑rights issue.
Broader criticism of education systems: incentives favor surveillance tech and numerical grading over trust, feedback, and genuine teaching.

Proposed Alternatives and Adaptations

More in‑class essays, oral exams, vivas on projects, and version‑history‑based grading (repos, Docs history).
Some advocate embracing AI: allow its use but require transparency, logs, and then assess understanding via discussion or exams.
General consensus: AI detection alone is not a viable or fair solution; assessment methods must change.

Related topics