Skip to main content

MolE: Pre-trained molecular representations enable antimicrobial discovery

This paper introduces a novel computational strategy using the MolE framework, which leverages self-supervised learning to create molecular representations that significantly enhance the prediction and discovery of antimicrobial compounds, offering a promising approach to tackle antimicrobial resistance. The work has the potential to accelerate the discovery of new antibiotics and was carried out in the lab of Prof. Dr. Christian Müller.


Congratulations on your recent paper about MolE framework. Can you please briefly explain your paper's main findings and significance in your field?

Roberto Olayo Alarcon: We developed a novel framework, named MolE, that leverages unlabeled molecular structures available in PubChem, to learn a general representation of molecular structures. This representation captures relevant chemical features and improves the performance of machine-learning algorithms that aim to predict molecular properties. We used this pre-trained representation to identify novel bacterial growth inhibitors and experimentally validated their activity against Staphylococcus aureus. In this way, our study offers a strategy incorporating recent advances in unsupervised deep learning, to tackle chemical and biological data scarcity, thereby accelerating antimicrobial discovery.

What inspired you to pursue this research topic, and what challenges did you encounter during your study?

Roberto Olayo Alarcon: The underlying motivation behind our work was to address the issue of antimicrobial resistance, a major global health crisis. Determining the antimicrobial activity of new chemical compounds through experimental methods is still a time-consuming and costly endeavor. Compound-centric deep learning models hold the promise to speed up the search and prioritization process. However, current end-to-end learning strategies require large amounts of training data and a publicly available large-scale data resource for antimicrobial discovery is not yet available. This prompted us to look at pre-training strategies as a way to address this scarcity.

How do your findings advance our understanding of the subject, and what potential real-world applications do they have?

Roberto Olayo Alarcon: The MolE representation can be used by machine-learning methods for various molecular property prediction tasks. Furthermore, our framework to predict antimicrobial activity can serve as a valuable tool to researchers that wish to assess the antimicrobial potential for any compound whose structure is known.

What are the next steps in your research, and how do you plan to build on these findings?

Roberto Olayo Alarcon: Improvements to the pre-training framework are being explored. We also continuously explore different uses for the pre-trained molecular representation. Together with experimental collaborators, we are further interested in examining the effect that different chemical stressors have on bacterial stress response mechanisms.

How can other researchers or stakeholders in the community benefit from your work, and are there any opportunities for collaboration?

Roberto Olayo Alarcon: Given that MolE is a task-independent molecular representation, it can be directly used for a variety of property prediction tasks. Researchers who are interested in predicting a molecular property of interest can use our work to prioritize compounds for experimental validation.

The work was done in close collaboration with the labs of Dr. Cynthia Sharma (University of Würzburg) and Dr. Ana Rita Brochado (University of Tübingen). This work was funded by a grant awarded within the Bavarian research network funded through the Bavarian State Ministry of Science and Arts, Germany.

See our Foundation Models page here