Researchers at the University of Pennsylvania have turned Reddit's vast archive of patient conversations into an unexpected clinical resource, using artificial intelligence to scan more than 400,000 posts and uncover hidden side effects of popular weight loss and diabetes medications that traditional clinical trials may have missed.
The study, published in Nature Health, analyzed posts from nearly 70,000 Reddit users over more than five years, focusing on discussions about semaglutide and tirzepatide—medications that have transformed obesity and blood sugar management. What makes this research significant is not just its scale, but the insights it reveals about what patients actually experience versus what gets formally documented. About 44% of users mentioned at least one side effect, with gastrointestinal problems being the most commonly reported. But alongside these well-known symptoms, researchers identified patterns that deserve closer attention: nearly 4% of users reported menstrual irregularities, while others described temperature-related symptoms like chills, hot flashes, and fever-like sensations. Fatigue emerged as the second most frequently discussed complaint overall.
"Some of the side effects we found, like nausea, are well known, and that shows that the method is picking up a real signal," says Sharath Chandra Guntuku, a Research Associate Professor in Computer and Information Science at Penn Engineering and the study's senior author. "The underreported symptoms are leads that came from patients themselves, unprompted, and clinicians could potentially pay attention to them."
The researchers are careful to note that their findings do not prove these medications caused the reported symptoms—rather, they point to patterns that warrant further investigation. Neil Sehgal, the study's first author and a doctoral student at Penn, highlights the menstrual irregularity data as particularly striking: while nearly 4% of the overall sample reported this symptom, the figure would climb considerably higher if calculated only among female users. That signal, he argues, is worth pursuing through rigorous scientific study.
What makes this work possible is the rise of large language models like GPT and Gemini, which can now process enormous amounts of online discussion with speed and consistency that would have been impossible just years ago. Traditional clinical trials remain the gold standard for drug safety, but they move slowly—a particular challenge when a medication transitions from niche to mainstream overnight, as has happened with GLP-1 drugs. Online patient communities, by contrast, operate like what Lyle Ungar, a co-author and CIS professor, calls "a neighborhood grapevine," where people living with these medications swap experiences in real time, sharing concerns that rarely surface in a doctor's office or official adverse event reports.
The study builds on nearly two decades of research into mining user-generated internet content for drug safety signals, but the scale and sophistication available today represent a fundamental shift. Reddit users, while not perfectly representative of the general population—they tend to be younger, more likely male, and concentrated in the United States—provide an unfiltered window into patient concerns that might otherwise remain invisible to clinicians and regulators.
The implications extend beyond these specific medications. As social media platforms continue to expand and large language models grow more capable, researchers see an emerging tool for accelerating the detection of rare or underreported side effects. Clinical trials may remain gold standard, but this complementary approach offers something they cannot: the speed and scale to capture what patients are actually experiencing, unprompted and in their own words.
