OpenAI’s o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors
Study design & limitations
- Many argue the trial is closer to a “paper quiz” than real ER work: AI and doctors saw text from electronic records and nurse notes, not real patients.
- Doctors were forced to diagnose from notes alone, which they rarely do in practice; physical exam, conversation, and observation were excluded.
- When both AI and humans had fuller case details, the performance gap shrank and became statistically insignificant, weakening “AI beats doctors” claims.
- Some note the study uses older models and vignette-style cases, which are useful early steps but far from real-world validation.
Comparisons with other AI-medical studies
- A recent chest x‑ray benchmark is cited where an AI outperformed radiologists even without seeing images, highlighting how flawed benchmarks can be.
- Another study with “ChatGPT Health” reportedly mis-triaged about half of emergency cases, showing inconsistency across setups and models.
Human vs AI capabilities
- Supporters think diagnosis is largely pattern recognition over vast knowledge; specialized medical models will likely surpass most doctors over time.
- Skeptics emphasize:
- Physical exam, nuanced history-taking, and detecting deceit or missing info.
- Judgment under uncertainty, and knowing when to say “I don’t know, we need more tests.”
- Emotional presence during crises (e.g., cancer diagnoses) as fundamentally human.
Bias, trust, and patient experiences
- Multiple anecdotes:
- Missed or delayed diagnoses by human doctors, especially for women and complex or rare conditions.
- Others report LLMs helping identify conditions (e.g., long Covid, MCAS) or interpret labs better than rushed clinicians.
- Some had AI completely miss serious issues (e.g., hip problems on x‑ray), reinforcing caution.
System-level incentives and risks
- Concerns that:
- AI may be optimized for liability or billing, not patient welfare.
- Metric-driven use (e.g., ranking doctors against AI) could lead to gaming, overreliance, and eventual de-skilling.
- Insurance and private equity may use AI to cut costs or deny care.
Proposed roles for AI
- Common middle-ground view: AI as:
- Triage aid, second opinion, and guideline-following checker.
- Research assistant and note-taker.
- Tool that should augment, not replace, accountable human clinicians.