SAFE-clustering: Single-cell Aggregated (from Ensemble) clustering for single-cell RNA-seq data.

TitleSAFE-clustering: Single-cell Aggregated (from Ensemble) clustering for single-cell RNA-seq data.
Publication TypeJournal Article
Year of Publication2019
AuthorsYang, Yuchen, Ruth Huh, Houston W. Culpepper, Yuan Lin, Michael I. Love, and Yun Li
Date Published2019 Apr 15
KeywordsAlgorithms, Cluster Analysis, Gene Expression Profiling, RNA-Seq, Sequence Analysis, RNA, Single-Cell Analysis

MOTIVATION: Accurately clustering cell types from a mass of heterogeneous cells is a crucial first step for the analysis of single-cell RNA-seq (scRNA-Seq) data. Although several methods have been recently developed, they utilize different characteristics of data and yield varying results in terms of both the number of clusters and actual cluster assignments.RESULTS: Here, we present SAFE-clustering, single-cell aggregated (From Ensemble) clustering, a flexible, accurate and robust method for clustering scRNA-Seq data. SAFE-clustering takes as input, results from multiple clustering methods, to build one consensus solution. SAFE-clustering currently embeds four state-of-the-art methods, SC3, CIDR, Seurat and t-SNE + k-means; and ensembles solutions from these four methods using three hypergraph-based partitioning algorithms. Extensive assessment across 12 datasets with the number of clusters ranging from 3 to 14, and the number of single cells ranging from 49 to 32, 695 showcases the advantages of SAFE-clustering in terms of both cluster number (18.2-58.1% reduction in absolute deviation to the truth) and cluster assignment (on average 36.0% improvement, and up to 18.5% over the best of the four methods, measured by adjusted rand index). Moreover, SAFE-clustering is computationally efficient to accommodate large datasets, taking <10 min to process 28 733 cells.AVAILABILITY AND IMPLEMENTATION: SAFEclustering, including source codes and tutorial, is freely available at INFORMATION: Supplementary data are available at Bioinformatics online.

Alternate JournalBioinformatics
Original PublicationSAFE-clustering: Single-cell Aggregated (From Ensemble) clustering for single-cell RNA-seq data.
PubMed ID30202935
PubMed Central IDPMC6477982
Grant ListP01 CA142538 / CA / NCI NIH HHS / United States
R01 HG006292 / HG / NHGRI NIH HHS / United States
R01 HL129132 / HL / NHLBI NIH HHS / United States