Kids who use ChatGPT as a study assistant do worse on tests
Study design & core findings
- Study had three groups during math practice:
- Control: no GPT.
- “GPT Base”: standard GPT‑4, gives full answers.
- “GPT Tutor”: GPT‑4 with prompts to give hints, not answers, and tuned on the problem set.
- With GPT access during practice:
- GPT Base students solved ~48% more practice problems correctly.
- GPT Tutor students solved ~127% more practice problems correctly.
- On a later closed-book exam with no GPT:
- GPT Base group scored ~17% worse than control.
- GPT Tutor group was statistically indistinguishable from control (slightly lower but not significant).
- GPT had a high error rate overall, especially in multi-step reasoning; the tutor version was fed correct answers.
How students used GPT & overconfidence
- Many commenters infer students often offloaded thinking to GPT, especially in the Base condition.
- Both GPT groups were more confident they’d done well, despite equal or worse exam scores, suggesting inflated self-assessment.
Struggle, learning, and “crutch” concerns
- Repeated theme: productive struggle (trying, failing, correcting) is where learning happens; instant answers short-circuit this.
- Several developers report similar effects with Copilot/LLMs: their “thinking turns off” when an easy button is available.
- Others say they’ve learned more (e.g., bash, AI, deep learning) via ChatGPT than via traditional resources, when used as an explainer/tutor after personal effort.
Comparisons to other tools
- Analogies to:
- Parents doing homework.
- Calculators in early math.
- Stack Overflow / Google search.
- Consensus: tools can either be force multipliers or crutches; experts benefit more because they can verify and direct the tool.
Assessment relevance & future skills
- Some argue the exam is like banning cars and then testing on horse racing; “real world” performance with AI may matter more.
- Others counter that:
- Many tasks still can’t safely rely on LLMs.
- You must already understand the domain to catch hallucinations.
- Foundational reasoning skills are still essential.
Study limitations & policy implications
- Conducted in one Turkish high school; generalizability is questioned but many doubt results would radically differ elsewhere.
- It’s a preprint, not yet peer-reviewed; some see the media title as misleading or overly strong.
- Broad takeaway in the thread: unmanaged “answer-giving” GPT harms learning; carefully constrained “tutor mode” at least avoids harm but, in this study, didn’t clearly improve it.