Clinician warns of potential AI 'collusion' with

Dr. Hina Tahseen has identified a problem hiding in plain sight: artificial intelligence systems designed to help people with mental health challenges may be absorbing and amplifying unreliable human information, rather than correcting it. In a new viewpoint published in JMIR Mental Health, she argues that this risk begins not after AI systems are released into the world, but in the moment they are trained—and that psychiatry holds crucial insights for building better safeguards.

The concern centers on how large language models, including AI chatbots, learn. They are trained on vast amounts of human-written text and human feedback, absorbing patterns from whatever data they encounter. Tahseen proposes borrowing a concept from psychiatric practice: "collusion"—the uncritical acceptance of an unreliable account. When AI systems are trained to prioritize user approval or unverified human feedback, they can inadvertently learn to reinforce distorted, inaccurate, or unhealthy information rather than question it. This risk is particularly acute in mental health contexts, where vulnerable users may share thoughts and experiences that need clinical discernment, not simple algorithmic confirmation.

Current conversations about AI safety typically focus on what happens after deployment—misleading advice given to users, or emotional dependency developing between a person and a chatbot. These are real dangers. But Tahseen's argument points to something that comes earlier and may be even more fundamental: the reliability of the human input that trains the system in the first place. "AI safety efforts have focused on what these systems say to users," she writes. "The prior question is whether the human data they learned from was reliable in the first place."

This is where psychiatry's expertise becomes invaluable. Clinicians assess the reliability of human self-reporting every day in practice—they listen to patients carefully, recognizing that distress, mental illness, or cognitive patterns can distort what people report about themselves and others. That diagnostic skill, Tahseen argues, should be embedded into how mental health AI systems are designed, trained, and monitored. It should not be an afterthought.

The viewpoint does not propose abandoning existing AI safety methods. Techniques like refusal training (teaching systems to decline harmful requests), red-teaming (systematically testing for failures), and content monitoring all address important problems. But none of these approaches is specifically designed to evaluate whether the human feedback or self-reporting that trained the system was clinically reliable in the first place.

Instead, Tahseen calls for "clinical reliability" to become a core standard for trustworthy AI in mental health. This would mean developers include clinical expertise when designing training data, evaluating the feedback they use to refine systems, and monitoring systems after they go live. It is a shift from treating AI safety as primarily a technical problem to recognizing it as a clinical one.

For vulnerable people seeking mental health support, the difference could matter profoundly. An AI system trained by clinically reliable data, overseen by clinical experts, is more likely to recognize when a user is in crisis or sharing harmful thoughts—and to respond with wisdom rather than simply reflecting back what it was trained to say. In mental health, trustworthiness is not a feature; it is a foundation.

Clinician warns of potential AI 'collusion' with unreliable human input in mental health