When Dr. Ethan Goh faced a hypothetical patient with a large lung nodule—silent, incidentally found, but potentially deadly—he didn’t just weigh biopsy options. He considered whether the patient trusted hospitals, if past appointments had been missed, and whether the local health system could reliably follow up. These are the unscripted moments of medicine, where textbook answers fall short and judgment reigns. Now, a groundbreaking study shows that when physicians like Goh team up with AI, their decision-making doesn’t just improve—it matches the performance of AI working alone, even as the AI outpaces doctors working without it.

Published in Nature Medicine, the research led by Dr. Jonathan H. Chen of Stanford and Dr. Adam Rodman of Harvard dives into the “squishier” side of clinical care: not diagnosis, but management. Once a disease is found, what comes next? Should surgery wait? Should medications be adjusted for a patient’s history of adverse reactions? These decisions hinge on nuance, context, and human insight. To test how AI handles them, the team presented five complex, de-identified patient cases to three groups: an AI chatbot operating independently, 46 U.S. physicians using a large language model (LLM) as support, and another 46 physicians relying only on internet searches and medical references. All responses were evaluated by board-certified doctors using a standardized rubric.

The results were striking. The AI chatbot alone scored an average of 67% on clinical management reasoning—outperforming the unsupported physician group, which averaged 60%. But the physicians using AI assistance? They matched the chatbot’s 67%, proving that human-AI collaboration can close the gap. One case involved a patient on blood thinners needing surgery—a scenario demanding careful timing to avoid both clotting and bleeding risks. Another probed how to adjust treatment for someone allergic to common antibiotics. In each, the AI offered rapid, evidence-based options, while physicians brought in patient preferences, system constraints, and ethical considerations.

This isn’t about replacing doctors, Chen emphasizes. It’s about redefining roles. “When combined, human plus computer is going to do better than either one by itself,” he says. The study builds on earlier work from October 2024, also led by Chen and Goh, which found AI outperformed physicians in diagnostic accuracy—a trend now extending into treatment planning. As AI becomes more embedded in clinical workflows, the question shifts from if it should be used to how: where it excels in speed and data recall, and where humans must lead with empathy and context.

The future of medicine may not be human or machine, but human with machine—navigating not just to the right diagnosis, but through the maze of what comes after.