Faster algorithm to control the Benjamini-Hochberg false discovery rate and its application for analysis of huge genomic data. | Innovative Methods Program for Advancing Clinical Trials (IMPACT)

Title	Faster algorithm to control the Benjamini-Hochberg false discovery rate and its application for analysis of huge genomic data.
Publication Type	Presentation
Year of Publication	2014
Authors	Madar, V
Keywords	Symposium III
Abstract	Quantitative methods in genetics are mainly focused on studying samples from huge amounts of genes and markers; typical gene expression studies consists of about 20K-50K genes, while GWAS searches for significant markers out of 1 million markers. The more complex expression quantitative trait loci (eQTLs) analyzes searches for significant correlations among all the possible combinations of genes and markers, which can sum up to billions in count. In these analyzes, all genes and markers (or their combinations) are of equal importance, and each of these contributes at least one p-value to measure its statistical significance - Posing researchers to deal with two major problems: (1) The computational effort needed to perform such enormous numbers of statistical tests and to record all computed p-values, and (2) The theoretical inflation of false positives. In the new reality of huge to enormous scale of hypotheses testing, even novel methods such as the Benjamini-Hochberg (1995) false discovery rate controlling procedure start to straggle under the computational burden. Sorting the entire set of p-values becomes less desirable, and the size of hypotheses itself usually requires to subset the tests into more computationally convenient sized groups. Many researchers choose to perform the significance decision at the group level, and while doing so, inflate considerably the number of false discoveries. Others, do the opposite, and choose to use a stringent Bonferroni cut-offs that are extremely conservative. In my talk I will show how to alter the Benjamini-Hochberg algorithm to give faster results for huge data without ordering p-values and without changing any of the final outcomes. I will demonstrate my approach on actual examples of eQTL studies.