Feeding data to AI to speed up drug discovery

Tim Cernak and his team at the University of Michigan College of Pharmacy have spent more than a decade building something that could fundamentally change how we discover new medicines. Their creation: the world's largest chemical reaction database, containing 50,688 carefully documented carbon-nitrogen coupling reactions—essentially the molecular building blocks that make up countless drugs we rely on today.

The database, published in the Journal of the American Chemical Society, is now freely available to researchers everywhere through the Open Reaction Database. For Cernak, an associate professor of medicinal chemistry, this isn't just a research milestone—it's an invitation.

"We are excited about the discoveries that other scientists can make within this new data set," Cernak said. "There's so much data to mine."

The need for this kind of resource is urgent. Developing a new drug traditionally requires thousands of chemistry experiments, each one testing different combinations of ingredients and conditions in search of the right recipe. It's slow, labor-intensive work, and the latest pharmaceuticals pushing through pipelines are raising the bar for how complex that synthesis needs to be. Meanwhile, many of the reactions that form these medicines depend on catalysts made from precious metals—and those supply chains are increasingly fragile.

Palladium is the workhorse catalyst for most drug synthesis, but global reserves are concentrated in just a few countries. Cernak's team tested tens of thousands of reactions comparing palladium with nickel and copper alternatives. The findings were striking: certain reactions performed equally well with nickel, and some even with copper, which can be sourced far more widely across the globe.

"The latest drugs in the pipeline are raising the bar of sophistication for chemical synthesis," Cernak noted. "At the same time, supply chains for precious metals and other critical reaction components are being exposed as risks. Big data drops like this one are going to be needed to build the predictive models that can make better drugs faster."

One unexpected discovery from the massive dataset: highly reactive molecules called arynes were forming at far lower temperatures than anyone predicted. Seeing this pattern emerge across hundreds of experiments would have been invisible in smaller studies—but the scale of this database made it impossible to ignore.

"One key takeaway was that large, systematically designed reaction data sets can uncover patterns that are difficult to see from traditional scope studies alone," Cernak said. "This is exciting as a possibility to synthesize drugs without precious metal catalysts."

For researchers working to bring down the cost and timeline of drug development, this open archive represents something rare: a foundation they can build on together. What took one lab more than ten years to assemble is now available to anyone with an internet connection—and that's exactly the point.

"There's so much data to mine."