Analysis of untyped SNPs: maximum likelihood and imputation methods.

TitleAnalysis of untyped SNPs: maximum likelihood and imputation methods.
Publication TypeJournal Article
Year of Publication2010
AuthorsHu, Y J., and D Y. Lin
JournalGenet Epidemiol
Volume34
Issue8
Pagination803-15
Date Published2010 Dec
ISSN1098-2272
KeywordsAlgorithms, Alleles, Case-Control Studies, Computer Simulation, Confidence Intervals, Cross-Sectional Studies, Diabetes Mellitus, Type 1, Environment, Genetic Variation, Genome, Human, Genome-Wide Association Study, Genotype, Haplotypes, Humans, Likelihood Functions, Linkage Disequilibrium, Polymorphism, Single Nucleotide, Risk, Software
Abstract

Analysis of untyped single nucleotide polymorphisms (SNPs) can facilitate the localization of disease-causing variants and permit meta-analysis of association studies with different genotyping platforms. We present two approaches for using the linkage disequilibrium structure of an external reference panel to infer the unknown value of an untyped SNP from the observed genotypes of typed SNPs. The maximum-likelihood approach integrates the prediction of untyped genotypes and estimation of association parameters into a single framework and yields consistent and efficient estimators of genetic effects and gene-environment interactions with proper variance estimators. The imputation approach is a two-stage strategy, which first imputes the untyped genotypes by either the most likely genotypes or the expected genotype counts and then uses the imputed values in a downstream association analysis. The latter approach has proper control of type I error in single-SNP tests with possible covariate adjustments even when the reference panel is misspecified; however, type I error may not be properly controlled in testing multiple-SNP effects or gene-environment interactions. In general, imputation yields biased estimators of genetic effects and gene-environment interactions, and the variances are underestimated. We conduct extensive simulation studies to compare the bias, type I error, power, and confidence interval coverage between the maximum likelihood and imputation approaches in the analysis of single-SNP effects, multiple-SNP effects, and gene-environment interactions under cross-sectional and case-control designs. In addition, we provide an illustration with genome-wide data from the Wellcome Trust Case-Control Consortium (WTCCC) [2007].

DOI10.1002/gepi.20527
Alternate JournalGenet Epidemiol
Original PublicationAnalysis of untyped SNPs: Maximum likelihood and imputation methods.
PubMed ID21104886
PubMed Central IDPMC3030127
Grant ListP01 CA142538 / CA / NCI NIH HHS / United States
P01 CA142538-01 / CA / NCI NIH HHS / United States
R01 CA082659 / CA / NCI NIH HHS / United States
R01 CA082659-12 / CA / NCI NIH HHS / United States
Project: