Meridia Insight Archaeology Finds Knowledge

ProDock Turns Messy Drug Discovery Pipelines into Reproducible Science

A new open-source toolkit stitches together every broken step in protein-ligand docking—from raw PDB files to a queryable database—so labs stop losing results t

Most docking studies fail at the workflow level, not the science—ProDock fixes the plumbing.

Somewhere between a promising drug candidate and a published docking result, something often goes wrong—and it usually isn't the science. It's the plumbing. A receptor file cleaned by one lab member's script, a docking box defined by eyeballing a 3D viewer, a score extracted by a one-off parser written at midnight before a deadline. These fragmented steps—each seemingly minor, collectively catastrophic—are why so many computational drug discovery studies are difficult to reproduce, compare, or hand off to a collaborator. ProDock, a new open-source Python toolkit from researchers at Leipzig University, the University of Southern Denmark, Texas A&M, and the University of Medicine and Pharmacy at Ho Chi Minh City, is designed to fix exactly this problem (Phan et al., 2026).

The package doesn't propose a new algorithm for predicting how tightly a drug binds to its target. It does something arguably more important: it connects every stage of the docking workflow into one coherent, auditable, and reusable system. That might sound unglamorous. But in a field where the difference between a reproducible result and a dead end often comes down to whether someone remembered to add hydrogens to a protein structure, unglamorous infrastructure is what actually moves science forward.

The Science

To understand why ProDock matters, it helps to understand what protein–ligand docking actually involves in practice. Docking — the computational prediction of how a small molecule (a potential drug, or ligand) binds to a target protein (receptor) — is one of the most commonly used tools in structure-based drug discovery. It's fast, relatively cheap, and can screen thousands of candidate compounds against a target before anyone synthesizes a single molecule in a lab.

But "docking" is never just one step. A typical study requires retrieving a protein's 3D crystal structure from a public database (usually the Protein Data Bank, or PDB), cleaning that structure — removing water molecules, adding missing atoms, correcting protonation states — converting file formats (SDF, PDB, PDBQT, and others that only chemists love), generating 3D coordinates for candidate ligands from flat chemical notation called SMILES strings, defining the spatial "search box" that tells the docking engine where on the protein to look, running the docking engine itself, parsing its output, extracting scores, and then doing it all again for the next receptor, the next ligand, the next engine. Each of these steps is a place where something can silently break.

ProDock (Phan et al., 2026) organizes this chaos into four connected stages: preprocessing of receptors and ligands, provenance-aware docking execution, postprocessing of poses and interaction fingerprints, and persistent storage in a local SQLite database. SQLite is a lightweight, file-based database that requires no server infrastructure — it lives in the project folder alongside all the other files, and can be queried like any database long after the computation is done.

Figure 1: Overview of the ProDock workflow. The package organizes docking into four connected stages: preprocessing of receptor and ligand inputs, docking execution, postprocessing of poses and interaction fingerprints, and SQLite result repository for downstream querying and comparison.
Figure 1: Overview of the ProDock workflow. The package organizes docking into four connected stages: preprocessing of receptor and ligand inputs, docking execution, postprocessing of poses and interaction fingerprints, and SQLite result repository for downstream querying and comparison. Source: Tieu-Long Phan, Lai Hoang Son Le

The package is implemented in Python and integrates several well-established tools: RDKit for cheminformatics, Open Babel and PDBFixer for structure handling, OpenMM for optional energy minimization, ProLIF for interaction fingerprinting, and the AutoDock Vina family of engines — including Smina, QuickVina 2, GNINA, and QVina-W — for the actual docking calculations. ProDock doesn't replace any of these; it connects them, normalizes their inputs and outputs, and keeps a record of every parameter used along the way.

The research team demonstrated the toolkit with a screening campaign against EGFR (Epidermal Growth Factor Receptor), a protein implicated in several cancers and the target of established drugs including erlotinib and gefitinib. They used five distinct crystal structures of EGFR (PDB IDs: 2ITY, 1M17, 4G5J, 4I23, and 4ZAU), a library of known inhibitors and decoy molecules, and four docking engines — all orchestrated by ProDock from nothing more than PDB identifiers and SMILES strings as inputs.

What They Found

The central contribution of ProDock is architectural rather than empirical — it is a software paper, not a benchmarking study — but its design decisions are themselves findings about what computational drug discovery workflows actually need.

The first insight is about preprocessing as quality control, not convenience. ProDock's ReceptorPrep module repairs common structural issues, removes unwanted heterogens, adds missing atoms and hydrogens, and converts the result into docking-ready PDBQT format. Crucially, it retains every prepared file as an explicit artifact. The same logic applies to ligand preparation through LigandPrep, which accepts SMILES strings and generates 3D conformers, adds hydrogens, and optimizes coordinates before docking. Because the same prepared ligand library can be reused across multiple receptors, this separation of preparation from execution isn't just tidy — it's efficient.

The second insight is about box definition. Telling a docking engine where on a protein to search is more subtle than it sounds. ProDock's GridBox module offers six distinct autobox algorithms, each suited to different molecular shapes and use cases.

ProDock Autobox Algorithms by Use Case

The six autobox sizing strategies available in ProDock's GridBox module, mapped to their primary application niche. Each algorithm represents a different approach to automatically defining the 3D docking search region from a reference ligand.

ProDock Autobox Algorithms by Use Case
LabelValue
scale1
pad2
advanced3
percentile4
centroid-fixed2
pca-aabb5

The simplest — scale — just multiplies the ligand span along each axis by a factor. The most sophisticated — pca-aabb — constructs an oriented envelope in principal-component space before converting it to the axis-aligned box the docking engine needs. This is particularly useful for elongated ligands, where a naive bounding box might be badly oversized. A percentile mode can prevent rare outlier atoms from inflating the box. All of these replace a step that researchers typically do by eye in a separate 3D viewer, and all capture their output as a reproducible workflow artifact.

The third insight is about campaigns as first-class objects. Rather than treating each receptor-ligand pair as an isolated command-line call, ProDock represents an entire study as a machine-readable JSON campaign file. A single campaign can associate multiple receptors with multiple ligands and multiple docking backends — what the authors call a many-to-many study design. The campaign file records CPU allocation, random seed, exhaustiveness (how thoroughly the engine searches), and requested pose count. It functions simultaneously as an execution plan and a provenance record: someone can read it months later and know exactly how a result was produced.

ProDock's Four-Stage Workflow Coverage

Coverage of each workflow stage by ProDock's core modules, scored qualitatively on automation, reproducibility, and analytical depth based on the paper's feature descriptions.

ProDock's Four-Stage Workflow Coverage
LabelValue
Preprocessing5
Docking Execution5
Interaction Profiling4
Database Storage5

The fourth insight is about interaction fingerprints as standard output. Docking scores — the numerical estimates of binding affinity — are notoriously noisy. A compound ranked fifth by score might actually be more interesting than the compound ranked first, if it makes better contact with the residues that matter biologically. ProDock's InteractionProfiler module, built on ProLIF, automatically profiles which amino acid residues in the receptor interact with each docked pose, and in what way (hydrogen bond, hydrophobic contact, ionic interaction, and so on). These residue-level records are stored alongside scores in the SQLite database, enabling queries like "show me all ligands that make a hydrogen bond with residue Thr790" — a question that would previously require writing a custom parser for each engine's output format.

The database schema itself is worth examining. It uses separate dimension tables for receptors, ligands, and engines, linked to factual tables for poses, scores, and interactions.

Figure S3: Database schema used by the ProDock SQLite layer. Dimension tables manage receptors, ligands, and engines, while factual tables capture docking geometries, scores, and residue-level interactions.
Figure S3: Database schema used by the ProDock SQLite layer. Dimension tables manage receptors, ligands, and engines, while factual tables capture docking geometries, scores, and residue-level interactions. Source: Tieu-Long Phan, Lai Hoang Son Le

This normalized design keeps the database compact enough to archive with the project, while making cross-target, cross-ligand, and cross-engine comparisons straightforward SQL queries. No reparsing of raw files required.

Why This Changes Things

Reproducibility in computational science is a known crisis. A 2016 Nature survey found that more than 70% of researchers had tried and failed to reproduce another scientist's experiment. Computational studies are not immune — if anything, the opacity of bespoke analysis scripts makes the problem worse, because there is rarely a single clean "Methods" section that captures every parameter of every preprocessing step.

In docking specifically, the problem is compounded by the sheer number of steps that precede and follow the actual engine call. A 2021 review estimated that receptor and ligand preparation choices can shift predicted binding affinities by as much as a full kcal/mol — enough to reverse the ranking of candidate compounds. If those choices aren't recorded, they can't be scrutinized, and results can't be fairly compared across studies or laboratories.

ProDock addresses this not by enforcing a single "correct" workflow, but by making whatever workflow is chosen explicit, recorded, and reusable. The campaign JSON is portable: a collaborator in a different country can pick it up, install ProDock, and reproduce the same docking study. The SQLite database is self-contained: it can be emailed, archived, or deposited alongside a paper's supplementary materials. The prepared receptor and ligand files are retained rather than hidden inside temporary directories that vanish after the run.

There is also a practical efficiency argument. In multi-target drug discovery — increasingly the norm for complex diseases like cancer, where resistance mutations force researchers to consider multiple protein variants simultaneously — running the same ligand library against five receptor structures using four engines previously meant managing twenty separate pipelines. ProDock collapses that into one campaign. The EGFR demonstration is a direct illustration: five crystal structures, four engines, one workflow, zero manual file conversions.

The comparison to existing tools is instructive. Frameworks like DockStream (Guo et al., 2021) and pyscreener (Graff et al., 2022) have improved reproducibility in similar ways, and ProDock acknowledges them directly (Phan et al., 2026). What distinguishes ProDock is the tight integration of all four stages — including the database backend and interaction fingerprinting — into a single project-local system, without requiring external server infrastructure. It is explicitly designed for application-oriented studies: real drug discovery projects, not just benchmarking exercises.

The modular design also matters for adoption. Not every laboratory will want to replace its entire workflow. ProDock's architecture allows groups to plug in only the components they need — using just the PoseCrawler to organize existing outputs, or just the PoseDatabase to store results from a pipeline they already trust. This flexibility is rarely offered by more opinionated frameworks, and it dramatically lowers the barrier to entry.

What's Next

ProDock is intentionally modest about what it doesn't do. It doesn't claim to improve docking accuracy — the scoring functions of Vina-family engines have well-known limitations, including difficulty ranking diverse chemotypes and poor handling of protein flexibility. It doesn't replace established cheminformatics tools; it wraps them. And it currently focuses on local execution rather than high-performance computing clusters, which limits its scalability for truly large-scale virtual screens involving millions of compounds.

The most immediate open question is how the framework will extend to newer docking paradigms. Machine-learning-based scoring functions like GNINA are already supported, but deep-learning docking engines that predict binding poses without traditional grid-based search — tools like DiffDock or AlphaFold-based structure prediction pipelines — represent a different kind of integration challenge. The campaign and database abstractions in ProDock are generic enough that they could, in principle, accommodate these backends, but the preprocessing and box-definition stages would need rethinking.

There is also an opportunity to integrate ProDock more tightly with experimental data. The SQLite database already stores rich metadata about each pose and interaction. Linking those records to experimental binding affinity measurements — IC values, measurements from isothermal titration calorimetry — would enable retrospective analysis of how well each docking configuration actually predicted real-world binding. That kind of ground-truth comparison is what eventually tells researchers which preprocessing choices and which engines are most trustworthy for a given target class.

For now, what ProDock offers is something the field has needed for years: a stable, transparent foundation on which reproducible docking studies can actually be built. The scientific question in computational drug discovery has never really been "can we dock?" The answer to that has been yes for decades. The harder question has been "can we trust what we docked, compare it to what someone else docked, and hand it off to the next person without losing half the information along the way?" ProDock is a serious attempt at answering yes to that question too.

The toolkit is freely available at https://github.com/Medicine-Artificial-Intelligence/ProDock, installable in a single command via Conda or PyPI, and documented at https://prodock.readthedocs.io. For a field that generates enormous volumes of docking data every year — much of it trapped in formats that only its creator can parse — that kind of open, structured infrastructure is not a footnote. It's the point.

ProDock converts fragmented engine-specific outputs into structured analytical results that are easier to compare, reuse, and audit.

Comments (0)

No comments yet. Be the first to share your thoughts.