Google’s AMIE beats doctors on key simulated

When Dr. Sarah Thompson reviewed the case of a 58-year-old patient with worsening type 2 diabetes and hypertension across three simulated visits, she followed guidelines, adjusted medications, and scheduled follow-ups—just as she would in real practice. But she wasn’t competing against a seasoned colleague. She was up against AMIE, Google’s experimental AI, and in a head-to-head comparison involving 21 doctors and 100 complex cases, the AI didn’t just keep pace—it pulled ahead in key areas of care planning. Published in Nature, this study marks a turning point in how we think about AI’s role not just in diagnosis, but in the long-term, nuanced work of managing chronic disease.

Chronic conditions like diabetes, heart disease, and asthma don’t resolve in a single visit. They require careful monitoring, medication adjustments, and consistent alignment with evolving clinical guidelines. Yet in today’s fragmented healthcare systems—where patients see multiple providers across disconnected clinics—continuity often falters. That’s where AMIE shows promise. Built on Google’s Gemini models, AMIE uses long-context reasoning to track patient history across visits, cross-references up-to-date guidelines from sources like the UK’s NICE and BMJ Best Practice, and delivers treatment recommendations with a precision that, in several measures, exceeds that of trained physicians.

In the study, AMIE matched or outperformed 21 primary care doctors across 100 multi-visit scenarios. It scored significantly higher in treatment precision, investigation appropriateness, and adherence to clinical guidelines. Notably, AMIE was better at avoiding inappropriate treatments and making follow-up recommendations that aligned with best practices—at least once across the three visits. When it came to medication reasoning, tested using the RxQA benchmark developed from OpenFDA and the British National Formulary, AMIE outperformed doctors even on high-difficulty questions, whether or not external resources were available. While both clinicians and AI benefited from access to drug databases, AMIE’s internal knowledge proved more robust in closed-book conditions.

Still, the researchers emphasize that AMIE is not ready for the clinic. It remains a research prototype, tested only in simulated environments. No patient data was used, and real-world variables—emotional cues, socioeconomic barriers, unexpected side effects—weren’t fully captured. Yet the implications are profound. For health systems strained by shortages and siloed records, an AI that can maintain continuity, reduce errors, and stay rigorously aligned with guidelines could one day support overburdened clinicians.

The future of care may not be human versus machine, but human and machine—working in tandem. As AMIE demonstrates, the most powerful tool we can build isn’t one that replaces doctors, but one that helps them remember everything, every time.

Google’s AMIE beats doctors on key simulated disease-management tasks