Finding needles in a haystack of single-cell data: algorithm developed by Scialdone lab helps identify rare cells in single-cell sequencing datasets.

Finding a needle in a haystack variation

IES, IFE, June 12, 2023 Created by Ksenia Kuznetsova

The analysis of single-cell RNA-sequencing data is a powerful tool for identifying new types of cells. However, the rarer the cells are, the more challenging it becomes to find and characterize them. This means that rare cells can often go unnoticed, as their presence might be omitted, masked, or merged with more common cell types if they share certain markers.

Lubatti et al. came up with an elegant way to tackle this problem. They developed algorithm called CIARA (Cluster Independent Algorithm for the Identification of Markers of Rare Cell Types), which operates on the premise that markers of rare cells tend to exhibit high expression in small groups of cells with similar transcriptomes.

In simplified terms, the algorithm selects genes that are detected in a limited number of cells, and these cells are closer together than would be expected by chance in a K-nearest neighbor graph constructed from the dataset. The algorithm then employs these genes to identify potential groups of rare cell types.

The researchers demonstrated the effectiveness of this innovative algorithm by analyzing various datasets Specifically, they discovered previously uncharacterized rare cell populations in a human embryo and in a mouse stem cell dataset. Furthermore, they demonstrated that CIARA can handle extensive datasets comprising hundreds of thousands of cells and can be applied to any type of single-cell sequencing data, such as ATAC-seq, thus enabling the analysis of multiomic datasets.

If you are interested in identifying any rare cells lurking within your single-cell data, the Scialdone lab has made the CIARA algorithm available in both R (Package CIARA) and Python (https://github.com/ScialdoneLab/CIARA_python).

To read the full publication, please go here.