ChatGPT is getting remarkably good at diagnosing

A father watches his feverish toddler tug at one ear, pulls out his phone, and types the symptoms into an AI chatbot. Seconds later: "Your child likely has an ear infection." The diagnosis feels immediate, plausible, possibly correct. And here's what's remarkable—there's a good chance it is.

Artificial intelligence has quietly crossed a threshold in medicine that many thought was still years away. OpenAI's o1 model achieved 78% accuracy on complex diagnostic cases published in the New England Journal of Medicine, and in doing so, outperformed experienced doctors when diagnosing actual emergency room patients. ChatGPT, working independently, has similarly outperformed physicians in diagnosing intricate cases, even when those physicians were given access to ChatGPT themselves. For pattern recognition—matching symptoms to diseases—the machines have become formidable.

Yet diagnosis is only half the battle. A doctor's real work begins when the question shifts from "What does the patient have?" to "What do we do about it?" This second question, known as management reasoning, is where human judgment hasn't been surpassed. It's also where the art of medicine lives.

Consider two men, both 68, both just diagnosed with early-stage prostate cancer with identical slow-growing tumors confined to the prostate. Both face the same two options: treat now with surgery or radiation, accepting risks of incontinence and sexual dysfunction, or monitor closely with regular tests, intervening only if the cancer grows. Which path is right? The answer, as doctors will tell you, is almost always: "It depends." It depends on the patient's age, overall health, values, family history, life expectancy, and how much uncertainty he can psychologically tolerate. The same diagnosis calls for different responses for different people.

This is where AI still struggles. Large language models predict patterns brilliantly—they're trained on vast medical literature where certain symptom clusters reliably precede certain diagnoses. But medical management doesn't have one right answer. It requires weighing multiple reasonable options and prioritizing which is best for the specific human being in front of you. That prioritization demands more than pattern matching. It demands judgment forged through years of clinical experience, conversations with patients, and familiarity with how diseases actually unfold in real bodies.

Experienced doctors build what researchers call "illness scripts"—mental frameworks that capture not just what a disease looks like, but who tends to get it, how it typically progresses, and which details warrant a second look. When a symptom doesn't fit the pattern, when a patient's recent trip abroad or workplace exposure suggests something unexpected, the doctor's script helps them see beyond the obvious. These scripts take time and thousands of hours to develop.

The emerging picture isn't one of replacement but partnership. For straightforward concerns with clear paths forward—a numbing cream for a baby's teething, or a referral to a cardiologist—an AI diagnosis may guide patients toward the care they need. But for the messier, more uncertain cases that fill actual clinical practice, the human doctor remains irreplaceable. The machine can identify the disease. The person across from you—who knows your life, your fears, your priorities—must decide what happens next.

ChatGPT is getting remarkably good at diagnosing health problems, but doctors are still better at treatment options