Integrative gene set analysis of multi-platform data with sample heterogeneity.

TitleIntegrative gene set analysis of multi-platform data with sample heterogeneity.
Publication TypeJournal Article
Year of Publication2014
AuthorsHu, Jun, and Jung-Ying Tzeng
Date Published2014 Jun 01
KeywordsBreast Neoplasms, DNA Copy Number Variations, DNA Methylation, Female, Gene Expression Profiling, Genes, Genomics, Humans, Sequence Analysis, RNA, Statistics, Nonparametric

MOTIVATION: Gene set analysis is a popular method for large-scale genomic studies. Because genes that have common biological features are analyzed jointly, gene set analysis often achieves better power and generates more biologically informative results. With the advancement of technologies, genomic studies with multi-platform data have become increasingly common. Several strategies have been proposed that integrate genomic data from multiple platforms to perform gene set analysis. To evaluate the performances of existing integrative gene set methods under various scenarios, we conduct a comparative simulation analysis based on The Cancer Genome Atlas breast cancer dataset.RESULTS: We find that existing methods for gene set analysis are less effective when sample heterogeneity exists. To address this issue, we develop three methods for multi-platform genomic data with heterogeneity: two non-parametric methods, multi-platform Mann-Whitney statistics and multi-platform outlier robust T-statistics, and a parametric method, multi-platform likelihood ratio statistics. Using simulations, we show that the proposed multi-platform Mann-Whitney statistics method has higher power for heterogeneous samples and comparable performance for homogeneous samples when compared with the existing methods. Our real data applications to two datasets of The Cancer Genome Atlas also suggest that the proposed methods are able to identify novel pathways that are missed by other strategies.AVAILABILITY AND IMPLEMENTATION:∼jytzeng/Software/Multiplatform_gene_set_analysis/

Alternate JournalBioinformatics
Original PublicationIntegrative gene set analysis of multi-platform data with sample heterogeneity.
PubMed ID24489370
PubMed Central IDPMC4029033
Grant ListP01 CA142538 / CA / NCI NIH HHS / United States
R01 MH084022 / MH / NIMH NIH HHS / United States
R01MH084022 / MH / NIMH NIH HHS / United States