AI outperforms doctors in Harvard trial of

A patient arrived at a Boston emergency room with a blood clot in the lungs and worsening symptoms. Human doctors attributed the problem to failing anti-coagulants, ready to adjust treatment. But an AI system saw what the physicians missed: the patient's history of lupus could be causing the lung inflammation. The AI was right, and the diagnosis shift may have changed the course of care.

This single case crystallizes the findings of a groundbreaking Harvard study published in Science that has redrawn the boundaries of what artificial intelligence can accomplish in medicine's most pressurized moments. In trials conducted at Boston's Beth Israel Deaconess Medical Centre, large language models outperformed human doctors at emergency triage—the rapid-fire diagnostics that unfold when patients first arrive, often with incomplete information and stakes measured in minutes.

The numbers tell a striking story. When given the same electronic health records as human physicians—vital signs, demographics, and a brief nursing note—OpenAI's o1 reasoning model identified the correct or very close diagnosis in 67% of cases. The doctors achieved 50-55% accuracy. The AI's advantage was sharpest in exactly those high-pressure moments when decisions must be made with minimal information. When additional clinical detail became available, the AI's accuracy climbed to 82%, while expert humans reached 70-79%—a difference the researchers noted was not statistically significant.

The study expanded beyond snapshot diagnoses. Researchers asked an AI and 46 doctors to examine five clinical case studies and develop longer-term treatment plans: antibiotic regimens, end-of-life care strategies, complex decision trees. The AI scored 89%. The doctors scored 34%. That gap—shaped by access to conventional resources like search engines—suggests AI excels not just at pattern recognition but at synthesizing medical knowledge under time pressure.

Yet the researchers were careful to frame this not as replacement but as profound reshaping. The AI had access only to textual information. It could not read a patient's visual appearance, sense their distress, or perform the thousand subtle calibrations that doctors make in a room. Arjun Manrai, one of the study's lead authors and head of an AI laboratory at Harvard Medical School, put it plainly: "I don't think our findings mean that AI replaces doctors. I think it does mean that we're witnessing a really profound change in technology that will reshape medicine."

His co-author, Dr. Adam Rodman of Beth Israel Deaconess, imagined a future already taking shape: a "triadic care model" of doctor, patient, and AI working in concert. Nearly one in five U.S. physicians are already using AI to assist with diagnosis. In the UK, 16% use it daily and another 15% weekly, according to a Royal College of Physicians survey, with clinical decision-making among the most common applications.

But questions remain. Doctors worry about AI error and liability when algorithms guide critical decisions. Some researchers note that physicians may unconsciously defer to AI recommendations rather than thinking independently—a tendency that could deepen as the technology becomes routine. The researchers themselves acknowledged a crucial limitation: the study did not reveal which patients the AI struggled with, or whether it had blind spots with elderly populations or other demographics.

The Harvard study suggests AI has moved beyond acing standardized tests. It is becoming a second opinion that clinicians cannot afford to ignore—particularly in those critical moments when a missed diagnosis can mean a life lost.

AI outperforms doctors in Harvard trial of emergency triage diagnoses | AI (artificial intelligence) | The Guardian