Fighting Fire with Fire: Scalable Oral Exams
Cheating, Take‑Home Work, and Motivation for AI Oral Exams
- Many see the core problem as take‑home work becoming trivial to complete with LLMs; thoughtful submissions often don’t reflect a student’s own understanding.
- Some hiring anecdotes mirror this: candidates submit polished take‑home work they can’t later explain.
- Supporters of the experiment frame AI‑run oral exams as a way to (a) tie assessment to each student’s project, and (b) force real‑time reasoning that’s harder to outsource to an LLM/friend.
Student Experience, Stress, and “Dehumanization”
- Commenters highlight that most students in the article preferred written exams and found the AI oral exam much more stressful.
- Many call the experience dehumanizing or disrespectful, especially given high tuition: paying six figures to be interrogated by a synthetic voice feels like professor abdication.
- Others note oral exams are inherently stressful but argue that pressure is part of real‑world expectations; several people from countries with longstanding oral‑exam traditions report both benefits and harms, especially for anxious or non‑extroverted students.
Validity, Fairness, and Technical Concerns
- Several worry that LLMs are non‑deterministic “black boxes” whose converging scores may be precise but not necessarily accurate or unbiased.
- There’s skepticism that LLM‑driven questioning truly assesses understanding, especially when students can potentially route answers through their own AI (voice, teleprompters, hidden devices).
- Some are concerned about bias against certain speech patterns, IRB/ethics oversight, and the lack of robust validation of grading quality beyond LLM self‑agreement.
Scalability vs. Human Teaching
- One camp argues oral exams scale fine with TAs and reasonable staffing; the barrier is institutional priorities (admin, sports, amenities) rather than feasibility.
- Others, especially from high‑load teaching environments or online programs, say hand‑graded or human‑oral assessments don’t scale with current enrollment and workloads; AI is seen as a survival tool.
Alternative Approaches
- Revert to in‑person, invigilated written exams (often handwritten) and accept that as the “AI‑proof” baseline.
- Use oral exams, but with human examiners, at least for a subset (e.g., high grades or project defenses).
- Allow AI freely and curve grades so “LLM‑level” work is the floor; evaluate added value on top.
- Focus on culture and enforcement: treat AI plagiarism like serious cheating with real penalties, instead of redesigning everything around it.
Larger Reflections on Education
- Some see the entire arms race (students using AI, teachers countering with AI) as emblematic of universities drifting toward credential vending and “customer” mentality.
- Others are cautiously optimistic about AI as a personalized teaching tool, but view using it as a high‑stakes examiner as premature and misaligned with educational goals.