Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. | Innovative Methods Program for Advancing Clinical Trials (IMPACT)

Title	Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences.
Publication Type	Journal Article
Year of Publication	2019
Authors	Zhu, Anqi, Joseph G. Ibrahim, and Michael I. Love
Journal	Bioinformatics
Volume	35
Issue	12
Pagination	2084-2092
Date Published	2019 Jun 01
ISSN	1367-4811
Keywords	Likelihood Functions, Linear Models, Sequence Analysis, RNA, Software
Abstract	MOTIVATION: In RNA-seq differential expression analysis, investigators aim to detect those genes with changes in expression level across conditions, despite technical and biological variability in the observations. A common task is to accurately estimate the effect size, often in terms of a logarithmic fold change (LFC).RESULTS: When the read counts are low or highly variable, the maximum likelihood estimates for the LFCs has high variance, leading to large estimates not representative of true differences, and poor ranking of genes by effect size. One approach is to introduce filtering thresholds and pseudocounts to exclude or moderate estimated LFCs. Filtering may result in a loss of genes from the analysis with true differences in expression, while pseudocounts provide a limited solution that must be adapted per dataset. Here, we propose the use of a heavy-tailed Cauchy prior distribution for effect sizes, which avoids the use of filter thresholds or pseudocounts. The proposed method, Approximate Posterior Estimation for generalized linear model, apeglm, has lower bias than previously proposed shrinkage estimators, while still reducing variance for those genes with little information for statistical inference.AVAILABILITY AND IMPLEMENTATION: The apeglm package is available as an R/Bioconductor package at https://bioconductor.org/packages/apeglm, and the methods can be called from within the DESeq2 software.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
DOI	10.1093/bioinformatics/bty895
Alternate Journal	Bioinformatics
Original Publication	Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences.
PubMed ID	30395178
PubMed Central ID	PMC6581436
Grant List	P01 CA142538 / CA / NCI NIH HHS / United States P30 ES010126 / ES / NIEHS NIH HHS / United States R01 GM070335 / GM / NIGMS NIH HHS / United States R01 HG009125 / HG / NHGRI NIH HHS / United States

Project:

Project 2.3