Shengpu Tang, an assistant professor of computer science at Emory University, has discovered something troubling lurking inside the code of cutting-edge sepsis treatment models: a time-slip error that makes artificial intelligence agents see the future to predict the past.

The flaw matters urgently because sepsis kills with devastating speed. One in three adults who die in a hospital had sepsis during their stay, according to the Centers for Disease Control and Prevention. When doctors can't treat this life-threatening cascade quickly enough—the chain reaction of organ failure triggered by infection—the consequences are fatal. So when researchers began using reinforcement learning, a sophisticated AI method, to guide treatment decisions for sepsis patients, the promise seemed real: algorithms that could recommend optimal combinations of intravenous fluids, antibiotics, blood-pressure medications, and other interventions in real time.

But Tang and his colleagues found a critical problem. Most peer-reviewed studies using reinforcement learning for sepsis treatment contain a subtle time-misalignment error during data preprocessing. The researchers demonstrated the flaw through simulation experiments and published their findings in npj Digital Medicine. The misalignment causes the AI agent to slip off the arrow of time, using future events to predict past outcomes. "The flaw is masked behind 'inflated' performance metrics that look great on paper but will fail in practice," Tang explains.

The danger becomes concrete when you consider deployment. Tang's team showed that if these flawed systems were actually used in hospitals, they would recommend either overtreatment or undertreatment in nearly half of all patient states. Yet the deception runs deeper: if the testing data is misaligned in the same way as the training data, the problem stays hidden, and clinicians would have no warning that something is wrong.

What makes this discovery particularly sobering is its scope. "We found that the large majority of the papers that use reinforcement learning to analyze sepsis treatment over the last decade made this time-misalignment mistake—including our own work," Tang says. The problem has been quietly propagating through the scientific literature, with flawed models passing peer review and influencing the direction of the field.

Tang and colleagues, including Sonali Parbhoo at Imperial College London, Jenna Wiens at the University of Michigan, and Jiayu Yao from Columbia University, developed a straightforward workaround that represents a fundamental shift in how these problems are formulated. When they eliminated the time-shift flaw in their simulations based on real clinical data, the results were striking: an 8 to 10 percent decrease in patient mortality. That improvement—the difference between life and death for sepsis patients—vanished entirely when the flaw remained unaddressed.

The work serves as both a warning and a roadmap. It underscores why the thoughtful deployment of AI in healthcare must proceed carefully, especially in life-or-death settings. But it also demonstrates that these problems are solvable. By identifying the flaw and fixing it, researchers can build safer, more reliable reinforcement-learning models ready for the clinical bedside. For patients battling sepsis—where every minute and every treatment decision matters—that correction could mean the difference between recovery and organ failure.