2025-02-25

Resident physicians' exam scores tied to patient survival

Study design, confounders, and effect size

Commenters immediately question whether results are driven by hospital quality, case mix, or specialty choice (e.g., high‑mortality vs low‑mortality fields).
Others note the study reportedly compares doctors within the same hospital to partially control for institutional differences and patient populations.
Some see the study as observational and potentially fragile: effect sizes are described as “marginal,” and there is concern that exam organizations are both data gatekeepers and study sponsors, creating conflicts of interest and no possibility of external replication.
Skeptics ask for more granular statistics (score distributions, classification error between quartiles) before treating the “top 25% vs bottom 25%” difference as practically large.

Residency workload, filtering, and exploitation

Large subthread on whether brutal 80–100 hour residencies improve long‑term outcomes or just act as a resilience filter.
Several argue overwork harms learning (sleep debt, cognitive impairment) and may worsen care during training; any marginal outcome gains would need to justify serious human costs.
Others prioritize patient outcomes over resident comfort but are challenged with fairness arguments and claims that harsh conditions also limit physician supply.
Multiple comments frame US residency as partly historical hazing and cheap, semi‑captive labor, with much time spent on administrative “scut” unrelated to learning.

Complexity of medicine and health‑system structure

Some argue modern medicine is too complex for any one person; better memory and pattern matching (which exams may proxy) become critical, yet many patients still go undiagnosed.
Others emphasize system‑level complexity: huge revenues flowing through insurers, PBMs, and distributors, with multiple middlemen each skimming a little and driving up costs.
There is debate over whether eliminating middle layers would simply produce vertically integrated conglomerates rather than true simplification.

What exam scores may actually reflect

Several commenters see the findings as intuitive: high scores indicate discipline, prioritization, and sufficient “cleverness” to understand the literature.
Others stress that great doctors also need humility, meticulousness, communication skills, and manual abilities (for surgeons), which correlate imperfectly with written exams.
Some worry exams may also alter physicians’ confidence and risk tolerance, potentially affecting practice style in ways not directly captured by knowledge alone.

Race, DEI, and standardized testing

A contentious subthread links the study to existing gaps in MCAT/USMLE performance among demographic groups and to affirmative‑action‑style admissions.
One side argues that if board scores predict outcomes and test scores differ by race, then quality differences by race are “obvious,” even if not measured here.
Others reject extrapolating from entrance exams and partial datasets, calling this non‑rigorous and insisting on direct data on board scores and outcomes by race before drawing such conclusions.
There is disagreement over whether diversity efforts have “passed bad doctors” or simply expanded opportunity while the same exit standards apply.

Surgical skill variation and system response

Anecdotes describe large variability in surgical outcomes, with a small fraction of surgeons perceived as “horrific,” yet often still operating because systems struggle to identify and reassign them.
Proposals to remove or redirect chronically poor performers collide with concerns about surgeon shortages and perverse incentives (e.g., surgeons avoiding high‑risk but necessary cases to protect metrics).

AI and test‑taking vs real practice

One commenter notes that chatbots can score very highly on medical exams; another counters that real practice hinges on physical examination, incomplete histories, and nuanced judgment, where a text model alone is clearly insufficient.

Related topics