An international research consortium led by Ludwig Maximilian University (LMU) Munich has published the first comprehensive guide for using routinely collected health data in medical research, setting new standards for a field that is transforming how scientists answer questions about disease and treatment.
Routinely collected data—information from electronic health records, patient registries, and billing systems—reflects the messy reality of medicine as it happens every day. Unlike carefully controlled clinical trials, these datasets capture how thousands of patients actually receive care, offering unprecedented scale and real-world relevance. But that richness comes with methodological challenges that have long puzzled researchers: they rarely know exactly how the data was generated, cannot control how it was collected, and face significant risks of biased or unreliable results.
The new guidelines, published in The BMJ, systematically address these pitfalls. Sabine Hoffmann, lead author and head of the Statistical Consulting Laboratory at LMU Munich, explains the motivation bluntly: "Routine data open up enormous possibilities for investigating medical questions more quickly and broadly. At the same time, we must be aware of the methodological challenges in order to achieve valid and trustworthy results." The researchers identified key problem areas including lack of representativeness, insufficient data quality, poor temporal alignment between measurements and interventions, and the risk that nonrandomized treatment decisions introduce bias into analyses.
What makes this guide distinctive is its practical focus. Rather than simply cataloguing problems, the authors—an interdisciplinary team of statisticians, methodologists, artificial intelligence experts, and cardiologists—developed a structured roadmap with concrete recommendations. These include specific strategies for ensuring data quality, correctly defining time points when events occurred, and reporting findings in transparent, reproducible ways. The guide also critically assesses the role of modern analytical methods, particularly those involving artificial intelligence. While these approaches hold genuine promise, the authors warn, they can produce misleading results without methodological rigor.
The risk of biased results and problems with missing or erroneous data received particular emphasis. A dataset might look comprehensive yet systematically exclude certain patient populations. Data entry errors might go undetected. Researchers might unknowingly compare treatments applied at different stages of disease progression. Each of these pitfalls can lead to conclusions that seem solid but crumble under scrutiny. The guide helps researchers spot these dangers before they compromise a study's findings.
This work emerged from a unique international collaboration. Several contributing authors belong to STRATOS—the STRengthening Analytical Thinking for Observational Studies initiative—an internationally recognized network dedicated to improving how statisticians design and conduct observational research. The effort also reflects a broader movement toward high-quality evidence in health policy, aligned with institutions like the Institute for Quality and Efficiency in Health Care (IQWiG), which emphasize that robust methodology matters as much as novel findings.
As Georg Nickenig, director of the Department of Cardiology at the University Hospital Bonn, and Holger Thiele, director of the Department of Cardiology at the Leipzig Heart Center, stated: "With this guide, we are providing, for the first time, comprehensive, practical guidance that combines clinical and methodological expertise." Their goal is to harness routinely collected data's full potential while maintaining the scientific rigor that evidence-based medicine demands. In a healthcare landscape drowning in data, having clear standards for using that data responsibly could reshape how quickly and confidently researchers answer pressing medical questions.
