Title | Analysis of untyped SNPs: maximum likelihood and imputation methods. |
Publication Type | Journal Article |
Year of Publication | 2010 |
Authors | Hu, Y J., and D Y. Lin |
Journal | Genet Epidemiol |
Volume | 34 |
Issue | 8 |
Pagination | 803-15 |
Date Published | 2010 Dec |
ISSN | 1098-2272 |
Keywords | Algorithms, Alleles, Case-Control Studies, Computer Simulation, Confidence Intervals, Cross-Sectional Studies, Diabetes Mellitus, Type 1, Environment, Genetic Variation, Genome, Human, Genome-Wide Association Study, Genotype, Haplotypes, Humans, Likelihood Functions, Linkage Disequilibrium, Polymorphism, Single Nucleotide, Risk, Software |
Abstract | Analysis of untyped single nucleotide polymorphisms (SNPs) can facilitate the localization of disease-causing variants and permit meta-analysis of association studies with different genotyping platforms. We present two approaches for using the linkage disequilibrium structure of an external reference panel to infer the unknown value of an untyped SNP from the observed genotypes of typed SNPs. The maximum-likelihood approach integrates the prediction of untyped genotypes and estimation of association parameters into a single framework and yields consistent and efficient estimators of genetic effects and gene-environment interactions with proper variance estimators. The imputation approach is a two-stage strategy, which first imputes the untyped genotypes by either the most likely genotypes or the expected genotype counts and then uses the imputed values in a downstream association analysis. The latter approach has proper control of type I error in single-SNP tests with possible covariate adjustments even when the reference panel is misspecified; however, type I error may not be properly controlled in testing multiple-SNP effects or gene-environment interactions. In general, imputation yields biased estimators of genetic effects and gene-environment interactions, and the variances are underestimated. We conduct extensive simulation studies to compare the bias, type I error, power, and confidence interval coverage between the maximum likelihood and imputation approaches in the analysis of single-SNP effects, multiple-SNP effects, and gene-environment interactions under cross-sectional and case-control designs. In addition, we provide an illustration with genome-wide data from the Wellcome Trust Case-Control Consortium (WTCCC) [2007]. |
DOI | 10.1002/gepi.20527 |
Alternate Journal | Genet Epidemiol |
Original Publication | Analysis of untyped SNPs: Maximum likelihood and imputation methods. |
PubMed ID | 21104886 |
PubMed Central ID | PMC3030127 |
Grant List | P01 CA142538 / CA / NCI NIH HHS / United States P01 CA142538-01 / CA / NCI NIH HHS / United States R01 CA082659 / CA / NCI NIH HHS / United States R01 CA082659-12 / CA / NCI NIH HHS / United States |
Analysis of untyped SNPs: maximum likelihood and imputation methods.
Project: