Researchers at Mass General Brigham have uncovered a hidden epidemic: approximately 18 million Americans—double the commonly cited estimates—are living with long COVID, yet the vast majority remain invisible to the health system tracking mechanisms designed to count them.

The discovery came through a novel artificial intelligence algorithm that sifted through the electronic health records of 457,950 COVID-19 patients across 58 U.S. hospitals in New England, Southeast Texas, Southern California, and Western Pennsylvania. What the algorithm found was striking: roughly 16% of COVID-19 survivors developed long COVID—conditions that persisted long after the initial infection had cleared. Standard diagnostic coding systems, by contrast, capture fewer than 7% of these cases, meaning over 10 million people with long COVID go entirely undetected in the official counts that shape healthcare policy and resource allocation.

The researchers deployed what they call "precision-phenotyping"—a technique that analyzes the temporal sequences of clinical events in patient records to identify long COVID as a diagnosis of exclusion, meaning the conditions appeared after COVID-19 infection and cannot be explained by preexisting medical histories. This granular approach revealed something crucial: the disease burden is far more severe than conventional surveillance systems suggest. Across the full cohort, 14.5% of COVID-19 patients, or about 66,587 individuals, developed chronic conditions requiring sustained clinical care.

The findings also revealed stark regional variations. Long COVID rates ranged from 13.6% to 22.7% across the four U.S. regions studied, with prediabetes—an emerging consequence of long COVID—appearing at dramatically different rates depending on geography. Perhaps most troubling is what the data showed about the pandemic's arc: contrary to expectations that long COVID would fade as a legacy of early waves, cumulative prevalence continued to increase across all regions studied. Statistical modeling indicated significant quarterly increases in New England, Southern California, and Western Pennsylvania, with projections suggesting continued growth over the next decade if current patterns persist.

Hossein Estiri, the study's corresponding author and a faculty member in Mass General Brigham's Department of Medicine, notes that these figures are almost certainly undercounts. The study excluded undocumented infections—which have become the majority since widespread testing ended—and patients without longitudinal medical records in their health system. The true toll of long COVID likely extends well beyond 18 million Americans.

What makes this research particularly significant is how it exposes a gap between clinical reality and administrative identification. As lead researcher Jiazi Tian observed, patients are not absent from clinical care; they are simply absent from the diagnostic codes that would label them as long COVID patients. The cardiologist treating new heart arrhythmias, the endocrinologist managing new metabolic disease, the neurologist addressing unexplained cognitive complaints—they are all seeing long COVID arrive without the diagnostic label that would connect these diverse presentations to a single cause. The findings, published in JAMA Network Open, demonstrate how artificial intelligence designed for public health purposes can reveal the true dimensions of complex post-viral conditions that conventional surveillance systems have systematically undercounted. With long COVID continuing to accumulate across all age groups and regions, the implications for healthcare systems and policy are profound.