| Title | A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL. |
| Publication Type | Publication |
| Year | 2017 |
| Authors | Sofer T, Heller R, Bogomolov M, Avery CL, Graff M, North KE, Reiner AP, Thornton TA, Rice K, Benjamini Y, Laurie CC, Kerr KF |
| Journal | Genet Epidemiol |
| Volume | 41 |
| Issue | 3 |
| Pagination | 251-258 |
| Date Published | 2017 Apr |
| ISSN | 1098-2272 |
| Keywords | Algorithms, Computer Simulation, Follow-Up Studies, Genome, Human, genome-wide association study, Genomics, Hispanic or Latino, Humans, Linkage Disequilibrium, Models, Statistical, Phenotype, Polymorphism, Single Nucleotide |
| Abstract | In genome-wide association studies (GWAS), "generalization" is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWER ) and FDR (FDR ) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWER or FDR under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values <5×10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values <6.6×10-5 (89 regions), we generalized SNPs from 27 regions. |
| DOI | 10.1002/gepi.22029 |
| Alternate Journal | Genet Epidemiol |
| PubMed ID | 28090672 |
| PubMed Central ID | PMC5340573 |
| Grant List | HHSN268201300005C / HL / NHLBI NIH HHS / United States P01 GM099568 / GM / NIGMS NIH HHS / United States R01 HL129132 / HL / NHLBI NIH HHS / United States N01HC65236 / HL / NHLBI NIH HHS / United States N01HC65235 / HL / NHLBI NIH HHS / United States N01HC65234 / HL / NHLBI NIH HHS / United States N01HC65233 / HL / NHLBI NIH HHS / United States N01HC65237 / HL / NHLBI NIH HHS / United States |
A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL.
MS#:
0389
ECI:
Yes
Manuscript Affiliation:
HCHS/SOL Genetic Analysis Center - University of Washington, Seattle
Manuscript Status:
Published
