Fighting Fire with Fire: Scalable Oral Exams

Cheating, Take‑Home Work, and Motivation for AI Oral Exams

  • Many see the core problem as take‑home work becoming trivial to complete with LLMs; thoughtful submissions often don’t reflect a student’s own understanding.
  • Some hiring anecdotes mirror this: candidates submit polished take‑home work they can’t later explain.
  • Supporters of the experiment frame AI‑run oral exams as a way to (a) tie assessment to each student’s project, and (b) force real‑time reasoning that’s harder to outsource to an LLM/friend.

Student Experience, Stress, and “Dehumanization”

  • Commenters highlight that most students in the article preferred written exams and found the AI oral exam much more stressful.
  • Many call the experience dehumanizing or disrespectful, especially given high tuition: paying six figures to be interrogated by a synthetic voice feels like professor abdication.
  • Others note oral exams are inherently stressful but argue that pressure is part of real‑world expectations; several people from countries with longstanding oral‑exam traditions report both benefits and harms, especially for anxious or non‑extroverted students.

Validity, Fairness, and Technical Concerns

  • Several worry that LLMs are non‑deterministic “black boxes” whose converging scores may be precise but not necessarily accurate or unbiased.
  • There’s skepticism that LLM‑driven questioning truly assesses understanding, especially when students can potentially route answers through their own AI (voice, teleprompters, hidden devices).
  • Some are concerned about bias against certain speech patterns, IRB/ethics oversight, and the lack of robust validation of grading quality beyond LLM self‑agreement.

Scalability vs. Human Teaching

  • One camp argues oral exams scale fine with TAs and reasonable staffing; the barrier is institutional priorities (admin, sports, amenities) rather than feasibility.
  • Others, especially from high‑load teaching environments or online programs, say hand‑graded or human‑oral assessments don’t scale with current enrollment and workloads; AI is seen as a survival tool.

Alternative Approaches

  • Revert to in‑person, invigilated written exams (often handwritten) and accept that as the “AI‑proof” baseline.
  • Use oral exams, but with human examiners, at least for a subset (e.g., high grades or project defenses).
  • Allow AI freely and curve grades so “LLM‑level” work is the floor; evaluate added value on top.
  • Focus on culture and enforcement: treat AI plagiarism like serious cheating with real penalties, instead of redesigning everything around it.

Larger Reflections on Education

  • Some see the entire arms race (students using AI, teachers countering with AI) as emblematic of universities drifting toward credential vending and “customer” mentality.
  • Others are cautiously optimistic about AI as a personalized teaching tool, but view using it as a high‑stakes examiner as premature and misaligned with educational goals.