Consistency and overfitting of multi-omics methods on experimental data.

TitleConsistency and overfitting of multi-omics methods on experimental data.
Publication TypeJournal Article
Year of Publication2020
AuthorsMcCabe, Sean D., Dan-Yu Lin, and Michael I. Love
JournalBrief Bioinform
Volume21
Issue4
Pagination1277-1284
Date Published2020 07 15
ISSN1477-4054
KeywordsComputational Biology, Genomics
Abstract

Knowledge on the relationship between different biological modalities (RNA, chromatin, etc.) can help further our understanding of the processes through which biological components interact. The ready availability of multi-omics datasets has led to the development of numerous methods for identifying sources of common variation across biological modalities. However, evaluation of the performance of these methods, in terms of consistency, has been difficult because most methods are unsupervised. We present a comparison of sparse multiple canonical correlation analysis (Sparse mCCA), angle-based joint and individual variation explained (AJIVE) and multi-omics factor analysis (MOFA) using a cross-validation approach to assess overfitting and consistency. Both large and small-sample datasets were used to evaluate performance, and a permuted null dataset was used to identify overfitting through the application of our framework and approach. In the large-sample setting, we found that all methods demonstrated consistency and lack of overfitting; however, in the small-sample size setting, AJIVE provided the most stable results. We provide an R package so that our framework and approach can be applied to evaluate other methods and datasets.

DOI10.1093/bib/bbz070
Alternate JournalBrief Bioinform
Original PublicationConsistency and overfitting of multi-omics methods on experimental data.
PubMed ID31281919
PubMed Central IDPMC7373174
Grant ListP30 ES010126 / ES / NIEHS NIH HHS / United States
R01 HG009974 / HG / NHGRI NIH HHS / United States
R01 HG009125 / HG / NHGRI NIH HHS / United States
T32 CA106209 / CA / NCI NIH HHS / United States
P01 CA142538 / CA / NCI NIH HHS / United States
R01 HL149683 / HL / NHLBI NIH HHS / United States
Project: