DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology
We are excited to present our latest interview with Valentin Koch, who, jointly with Sophia Wagner, has led the research on DinoBloom - the first foundation model specifically designed for single-cell image analysis in hematology.
We are excited to present our latest interview with Valentin Koch, who, jointly with Sophia Wagner, has led the research on DinoBloom - the first foundation model specifically designed for single-cell image analysis in hematology.
By using a tailored DINOv2 pipeline and being trained on the largest multi-cohort dataset in the field, this new model demonstrates exceptional performance in cell-type classification and acute myeloid leukemia subtyping, offering new potential for automated and accurate hematological diagnoses. DinoBloom, was developed in the labs of Dr Tingying Peng, Dr Carsten Marr and Prof. Dr. Schnabel and the Dr. von Haunersches Kinderspital of the Ludwig-Maximilians-University.
Congratulations on your recent paper about DinoBloom. Can you please briefly explain your paper's main findings and significance in your field?
Valentin Koch: A foundation model in hematology is needed to extract meaningful features for disease classification of patients, as multiple instance learning models need strong features. We show that our models are much stronger than previous ones, that were not tailored towards hematology.
What inspired you to pursue this research topic, and what challenges did you encounter during your study?
Valentin Koch: A major driving force has been our realization that large scale (and high quality) data is important to build better models and reach the next level in diagnostic applications. This trend can be well seen in Large Language Models, but also strong vision models such as DINOv2 use huge, curated datasets. The typical challenges in our work are that datasets are hard to come by spread and that sufficient compute resources need to be available. We are thankful that we were able to train our model on 8 High Performance GPUs on our cluster.
How do your findings advance our understanding of the subject, and what potential real-world applications do they have?
Valentin Koch: A model such as DinoBloom can have a big impact in disease classification in hematology. Translating our findings and advances into applications is a major driving force for what we do.
What are the next steps in your research, and how do you plan to build on these findings?
Valentin Koch: A critical next step is gathering more data and including the latest methological advancements. At the same time, we are glad that a collaboration with the Munich Leukemia Laboratory (MLL) is already underway to gather more data and improve the model further, particularly with a strong clinical partner. We are also looking for more evaluations and use cases to show our models strength on a broader set of tasks and its clinical applicability.