Do AI detectors work? Students face false cheating accusations
Reliability of AI Detectors
- Many commenters say current detectors are “garbage”: they flag good, simple prose, pre‑AI essays, teachers’ own writing, and even warning emails about cheating.
- Reported false positives include strong writers, autistic students, non‑native speakers, and a 7th‑grader whose work was mostly flagged.
- Some small tests (e.g., older essays run through tools) show non‑trivial false positive rates; precision vs. recall tradeoffs are poorly understood by administrators.
Use in Academic Discipline & Due Process
- Strong concern that schools treat detector scores (e.g., “85% AI”) as proof, reversing the burden of proof onto students.
- Described as “Kafkaesque”: students often have no meaningful appeals process and must screen‑record writing or use Google Docs history to defend themselves.
- Some argue this conflicts with basic standards of evidence and, in some jurisdictions, with rules against purely automated decisions.
Arms Race and Technical Limits
- Many believe reliable AI vs. AI detection is fundamentally unwinnable: models can imitate human style or a specific student, and open‑source tools can be tuned to evade detectors.
- Suggestions like watermarking are viewed as fragile: easy to strip or mask with rewriting tools.
Cheating, Homework, and Assessment Design
- Broad agreement that out‑of‑class writing and homework are now weak signals of individual understanding; cheating was already common, AI just makes it cheaper and easier.
- Some propose making homework low‑ or zero‑stakes and basing grades mainly on in‑class, proctored or handwritten exams.
- Others defend homework as essential practice, but argue it should be ungraded or used purely for learning, not evaluation.
Effects on Learning and Writing
- Worries that LLMs and tools like Grammarly push “beige,” formulaic language and that students will internalize this style (“AI slop”).
- Counterpoint: for some (e.g., dyslexic or non‑native writers) these tools are empowering and can improve grammar over time.
Equity, Bias, and Systemic Issues
- Concerns that detectors disproportionately mislabel certain linguistic styles (non‑Western English, autistic writing) and could become a civil‑rights issue.
- Broader criticism of education systems: incentives favor surveillance tech and numerical grading over trust, feedback, and genuine teaching.
Proposed Alternatives and Adaptations
- More in‑class essays, oral exams, vivas on projects, and version‑history‑based grading (repos, Docs history).
- Some advocate embracing AI: allow its use but require transparency, logs, and then assess understanding via discussion or exams.
- General consensus: AI detection alone is not a viable or fair solution; assessment methods must change.