A general framework for studying genetic effects and gene-environment interactions with missing data.

TitleA general framework for studying genetic effects and gene-environment interactions with missing data.
Publication TypeJournal Article
Year of Publication2010
AuthorsHu, Y J., D Y. Lin, and D Zeng
JournalBiostatistics
Volume11
Issue4
Pagination583-98
Date Published2010 Oct
ISSN1468-4357
KeywordsAlgorithms, Biostatistics, Carcinoma, Non-Small-Cell Lung, Case-Control Studies, Cohort Studies, Computer Simulation, Cross-Sectional Studies, Cysteine Endopeptidases, Disease, Environment, Genetic Association Studies, Genotype, Haplotypes, Humans, Likelihood Functions, Nerve Tissue Proteins, Odds Ratio, Phenotype, Polymorphism, Single Nucleotide, Receptors, Nicotinic, Regression Analysis, Smoking
Abstract

Missing data arise in genetic association studies when genotypes are unknown or when haplotypes are of direct interest. We provide a general likelihood-based framework for making inference on genetic effects and gene-environment interactions with such missing data. We allow genetic and environmental variables to be correlated while leaving the distribution of environmental variables completely unspecified. We consider 3 major study designs-cross-sectional, case-control, and cohort designs-and construct appropriate likelihood functions for all common phenotypes (e.g. case-control status, quantitative traits, and potentially censored ages at onset of disease). The likelihood functions involve both finite- and infinite-dimensional parameters. The maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Expectation-Maximization (EM) algorithms are developed to implement the corresponding inference procedures. Extensive simulation studies demonstrate that the proposed inferential and numerical methods perform well in practical settings. Illustration with a genome-wide association study of lung cancer is provided.

DOI10.1093/biostatistics/kxq015
Alternate JournalBiostatistics
Original PublicationA general framework for studying genetic effects and gene-environment interactions with missing data.
PubMed ID20348396
PubMed Central IDPMC3294269
Grant ListR01 CA082659 / CA / NCI NIH HHS / United States
P01 CA142538-01 / CA / NCI NIH HHS / United States
R01 CA133996 / CA / NCI NIH HHS / United States
R37 GM047845 / GM / NIGMS NIH HHS / United States
R01 CA055769 / CA / NCI NIH HHS / United States
R01CA55769 / CA / NCI NIH HHS / United States
R01CA133996 / CA / NCI NIH HHS / United States
P01 CA142538 / CA / NCI NIH HHS / United States
Project: