A general framework for integrative analysis of incomplete multiomics data.

TitleA general framework for integrative analysis of incomplete multiomics data.
Publication TypeJournal Article
Year of Publication2020
AuthorsLin, Dan-Yu, Donglin Zeng, and David Couper
JournalGenet Epidemiol
Volume44
Issue7
Pagination646-664
Date Published2020 Oct
ISSN1098-2272
KeywordsAlgorithms, Data Analysis, Genomics, Genotype, Humans, Linear Models, Models, Genetic, Phenotype, Proteomics, Sequence Analysis, DNA, Sequence Analysis, RNA
Abstract

There is a tremendous current interest in measuring multiple types of omics features (e.g., DNA sequences, RNA expressions, methylation profiles, metabolic profiles, protein expressions) on a large number of subjects. Although genotypes are typically available for all study subjects, other data types may be measured only on a subset of subjects due to cost or other constraints. In addition, quantitative omics measurements, such as metabolite levels and protein expressions, are subject to detection limits in that the measurements below (or above) certain thresholds are not detectable. In this article, we propose a rigorous and powerful approach to handle missing values and detection limits in integrative analysis of multiomics data. We relate quantitative omics variables to genetic variants and other variables through linear regression models and relate phenotypes to quantitative omics variables and other variables through generalized linear models. We derive the joint-likelihood for the two sets of models by allowing arbitrary patterns of missing values and detection limits for quantitative omics variables. We carry out maximum-likelihood estimation through computationally fast and stable algorithms. The resulting estimators are approximately unbiased and statistically efficient. An application to a major study on chronic obstructive lung disease yielded new biological insights.

DOI10.1002/gepi.22328
Alternate JournalGenet Epidemiol
Original PublicationA general framework for integrative analysis of incomplete multiomics data.
PubMed ID32691502
PubMed Central IDPMC7951090
Grant ListHHSN268200900019C / HL / NHLBI NIH HHS / United States
R01 HG009974 / HG / NHGRI NIH HHS / United States
HHSN268200900015C / HL / NHLBI NIH HHS / United States
HHSN268200900016C / HL / NHLBI NIH HHS / United States
U01 HL137880 / HL / NHLBI NIH HHS / United States
HHSN268200900018C / HL / NHLBI NIH HHS / United States
HHSN268200900013C / HL / NHLBI NIH HHS / United States
P01 CA142538 / CA / NCI NIH HHS / United States
HHSN268200900017C / HL / NHLBI NIH HHS / United States
HHSN268200900020C / HL / NHLBI NIH HHS / United States
HHSN268200900014C / HL / NHLBI NIH HHS / United States