Skip to main content
Helmholtz Munich I Daniela Barreto

MISATO Dataset: Transforming Drug Discovery with AI

Featured Publication, STB,

A team led by Helmholtz Munich scientist Dr. Grzegorz Popowicz unveils the Molecular Interactions Structurally Optimized (MISATO) dataset, offering a transformative approach to training AI models for designing new drug molecules. MISATO represents a pivotal step forward in leveraging AI for drug discovery, offering promise for the future of medicine and pharmaceutical research. The results were published in Nature Computational Science.

With the rise of revolutionary AI technologies, the drug discovery community anticipates significant advancements in addressing major challenges associated with designing new drugs. Similar to how the AI system “AlphaFold2” transformed structural biology by accurately predicting the 3D structures of proteins, a potent AI model holds the promise of designing new drug molecules. However, the key to creating such a model lies in providing meticulously curated, reliable training data.

In a pioneering initiative, a team around Dr. Grzegorz Popowicz, research group leader at the Helmholtz Munich Institute of Structural Biology, has developed the Molecular Interactions Structurally Optimized (MISATO) dataset, aimed at empowering the training of drug discovery models. Unlike traditional methods that rely on simplistic ball-and-stick models, the MISATO dataset offers small molecules with quantum chemistry characterization, rendering a much more realistic representation. Furthermore, protein targets within the MISATO dataset are dynamic entities, as molecular dynamics data is integrated to visualize their behavior over time.

Transformative Dataset for AI-Driven Drug Design

“MISATO represents a significant leap forward in AI-driven drug discovery. By providing comprehensive and dynamic molecular information, we are equipping AI models with the necessary tools to transform the field.”, says Dr. Till Siebenmorgen, first author of the MISATO study. “What sets MISATO apart is its accessibility. The dataset is freely available to the AI community, accessible with only a single line of code! This accessibility has sparked a vibrant community of AI enthusiasts worldwide, eager to explore the dataset’s potential.”

“We believe that AI models trained on the MISATO dataset will achieve unparalleled accuracy in drug discovery.”, states Grzegorz Popowicz, the coordinator of the study. “By democratizing access to high-quality training data, we are accelerating progress towards safer and more effective drug development.”


Original publication

Siebenmorgen et al., 2024: MISATO: Machine learning dataset of protein-ligand complexes for structure-based drug discovery. Nature Computational Science. DOI 10.1038/s43588-024-00627-2

About the scientists

Dr. Grzegorz Popowicz, Research Group Leader at the Helmholtz Munich Institute of Structural Biology.

Dr. Till Siebenmorgen, Postdoc at the Helmholtz Munich Institute of Structural Biology.

Funding information
The paper received funding by BMWi ZIM. KK 5197901TS0 and BMBF, SUPREME, 031L0268. It was supported by the Helmholtz Association's Initiative and Networking Fund on the HAICORE@FZJ partition.