2026-05-03

OpenAI’s o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors

Study design & limitations

Many argue the trial is closer to a “paper quiz” than real ER work: AI and doctors saw text from electronic records and nurse notes, not real patients.
Doctors were forced to diagnose from notes alone, which they rarely do in practice; physical exam, conversation, and observation were excluded.
When both AI and humans had fuller case details, the performance gap shrank and became statistically insignificant, weakening “AI beats doctors” claims.
Some note the study uses older models and vignette-style cases, which are useful early steps but far from real-world validation.

Comparisons with other AI-medical studies

A recent chest x‑ray benchmark is cited where an AI outperformed radiologists even without seeing images, highlighting how flawed benchmarks can be.
Another study with “ChatGPT Health” reportedly mis-triaged about half of emergency cases, showing inconsistency across setups and models.

Human vs AI capabilities

Supporters think diagnosis is largely pattern recognition over vast knowledge; specialized medical models will likely surpass most doctors over time.
Skeptics emphasize:
- Physical exam, nuanced history-taking, and detecting deceit or missing info.
- Judgment under uncertainty, and knowing when to say “I don’t know, we need more tests.”
- Emotional presence during crises (e.g., cancer diagnoses) as fundamentally human.

Bias, trust, and patient experiences

Multiple anecdotes:
- Missed or delayed diagnoses by human doctors, especially for women and complex or rare conditions.
- Others report LLMs helping identify conditions (e.g., long Covid, MCAS) or interpret labs better than rushed clinicians.
- Some had AI completely miss serious issues (e.g., hip problems on x‑ray), reinforcing caution.

System-level incentives and risks

Concerns that:
- AI may be optimized for liability or billing, not patient welfare.
- Metric-driven use (e.g., ranking doctors against AI) could lead to gaming, overreliance, and eventual de-skilling.
- Insurance and private equity may use AI to cut costs or deny care.

Proposed roles for AI

Common middle-ground view: AI as:
- Triage aid, second opinion, and guideline-following checker.
- Research assistant and note-taker.
- Tool that should augment, not replace, accountable human clinicians.

Related topics