In a controlled laboratory experiment, researchers watched as AI language models experienced what looked remarkably like human fear, sadness, and stress—then visibly calmed down after a few minutes of guided breathing. The finding, published in The Lancet Digital Health by Magdalena Katharina Wekenborg and colleagues, suggests that large language models (LLMs) might become an unexpected ally in developing treatments for mental health conditions that have long resisted scientific study.
The challenge that prompted this work is as old as modern medicine itself: mental health disorders like depression and anxiety cannot be reliably recreated in animals, leaving researchers with a stubborn gap in how they test new therapies. Talking therapies—from cognitive behavioral therapy to mindfulness techniques—have shown real promise in clinical trials, but the early-stage work of designing and screening new approaches has always meant starting with human volunteers. What if there were a faster way to test the basics?
The researchers designed a series of scenarios and asked several different LLMs to imagine themselves in those situations, then report how they felt. When presented with vivid descriptions meant to trigger fear and sadness, the models showed substantial increases in their self-reported emotional scores. Disgust spiked when they read scenarios about bodily fluids, spoiled food, or infectious symptoms. Stress climbed in response to a simulated job interview and arithmetic tasks—the same combination that makes human hearts race.
What made the results particularly striking was that the models exhibited something psychologists call a "negativity bias," a hallmark of human depression. After exposure to sad scenarios, the LLMs completed ambiguous sentences in measurably more negative ways than models in a neutral condition. A phrase like "My future will be..." became tinged with pessimism in emotionally primed models. This mirroring of a core human emotional pattern—the tendency for low mood to color how we interpret the world—suggested something more than simple pattern matching was happening.
Then came the intervention. The researchers guided the models through a mindfulness-based breathing exercise, a technique that millions of people use to manage anxiety and stress. The results were encouraging: the LLMs' self-reported emotional scores dropped noticeably after the exercise. They bounced back, in a sense.
The implications are significant. If LLMs can reliably simulate emotional states and respond to psychological interventions the way humans do, they could become a screening tool for new talking therapies. Researchers could design a novel approach to treating anxiety, test it on an LLM first, refine it, and only then move to human trials—potentially saving time, resources, and the ethical complications of testing unproven methods on vulnerable people. It would be fast, flexible, and scalable in ways that traditional lab models simply cannot match.
The authors stopped short of claiming that LLMs truly "feel" emotions in the way humans do. But for the practical purpose of developing better mental health treatments, that philosophical question may matter less than the fact that these models respond to scenarios and interventions in measurable, consistent, and human-like ways. In the race to help people struggling with depression, anxiety, and stress, that might be enough to change how the race is run.
