Integrative analysis of sequencing and array genotype data for discovering disease associations with rare mutations.

TitleIntegrative analysis of sequencing and array genotype data for discovering disease associations with rare mutations.
Publication TypeJournal Article
Year of Publication2015
AuthorsHu, Yi-Juan, Yun Li, Paul L. Auer, and Dan-Yu Lin
JournalProc Natl Acad Sci U S A
Volume112
Issue4
Pagination1019-24
Date Published2015 Jan 27
ISSN1091-6490
KeywordsDNA Mutational Analysis, Genetic Diseases, Inborn, Genotype, Genotyping Techniques, Humans, Models, Genetic, Mutation, Oligonucleotide Array Sequence Analysis
Abstract

In the large cohorts that have been used for genome-wide association studies (GWAS), it is prohibitively expensive to sequence all cohort members. A cost-effective strategy is to sequence subjects with extreme values of quantitative traits or those with specific diseases. By imputing the sequencing data from the GWAS data for the cohort members who are not selected for sequencing, one can dramatically increase the number of subjects with information on rare variants. However, ignoring the uncertainties of imputed rare variants in downstream association analysis will inflate the type I error when sequenced subjects are not a random subset of the GWAS subjects. In this article, we provide a valid and efficient approach to combining observed and imputed data on rare variants. We consider commonly used gene-level association tests, all of which are constructed from the score statistic for assessing the effects of individual variants on the trait of interest. We show that the score statistic based on the observed genotypes for sequenced subjects and the imputed genotypes for nonsequenced subjects is unbiased. We derive a robust variance estimator that reflects the true variability of the score statistic regardless of the sampling scheme and imputation quality, such that the corresponding association tests always have correct type I error. We demonstrate through extensive simulation studies that the proposed tests are substantially more powerful than the use of accurately imputed variants only and the use of sequencing data alone. We provide an application to the Women's Health Initiative. The relevant software is freely available.

DOI10.1073/pnas.1406143112
Alternate JournalProc Natl Acad Sci U S A
Original PublicationIntegrative analysis of sequencing and array genotype data for discovering disease associations with rare mutations.
PubMed ID25583502
PubMed Central IDPMC4313847
Grant ListR01 CA082659 / CA / NCI NIH HHS / United States
R01 HG006292 / HG / NHGRI NIH HHS / United States
RC2 HL102926 / HL / NHLBI NIH HHS / United States
HHSN268201100046C / HL / NHLBI NIH HHS / United States
HHSN268201100003C / WH / WHI NIH HHS / United States
HHSN268201100004C / WH / WHI NIH HHS / United States
HHSN271201100004C / AG / NIA NIH HHS / United States
HHSN268201100002C / WH / WHI NIH HHS / United States
R01HG006703 / HG / NHGRI NIH HHS / United States
RC2 HL-102925 / HL / NHLBI NIH HHS / United States
HHSN268201100046C / HL / NHLBI NIH HHS / United States
HHSN268201100002C / WH / WHI NIH HHS / United States
HHSN268201100001C / WH / WHI NIH HHS / United States
HHSN268201100001I / HL / NHLBI NIH HHS / United States
R01 GM105785 / GM / NIGMS NIH HHS / United States
R01 HG006703 / HG / NHGRI NIH HHS / United States
RC2 HL-102926 / HL / NHLBI NIH HHS / United States
HHSN271201100004C / AG / NIA NIH HHS / United States
HHSN268201100001C / WH / WHI NIH HHS / United States
HHSN268201100004I / HL / NHLBI NIH HHS / United States
R01CA082659 / CA / NCI NIH HHS / United States
R37 GM047845 / GM / NIGMS NIH HHS / United States
R37GM047845 / GM / NIGMS NIH HHS / United States
R01 GM047845 / GM / NIGMS NIH HHS / United States
HHSN268201100003C / WH / WHI NIH HHS / United States
RC2 HL-102924 / HL / NHLBI NIH HHS / United States
RC2 HL102924 / HL / NHLBI NIH HHS / United States
R01HG006292 / HG / NHGRI NIH HHS / United States
P01CA142538 / CA / NCI NIH HHS / United States
HHSN268201100003I / HL / NHLBI NIH HHS / United States
HHSN268201100002I / HL / NHLBI NIH HHS / United States
P01 CA142538 / CA / NCI NIH HHS / United States
RC2 HL102925 / HL / NHLBI NIH HHS / United States
HHSN268201100004C / WH / WHI NIH HHS / United States
Project: