Publications
Statistical significance for hierarchical clustering." Biometrics 73, no. 3 (2017): 811-821.
"Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design." Stat Med 36, no. 6 (2017): 985-997.
"SAFE-clustering: Single-cell Aggregated (from Ensemble) clustering for single-cell RNA-seq data." Bioinformatics 35, no. 8 (2019): 1269-1277.
"Biclustering via sparse clustering." Biometrics 76, no. 1 (2020): 348-358.
"Sample size calculation for cluster randomization trials with a time-to-event endpoint." Stat Med 39, no. 25 (2020): 3608-3623.
"bcSeq: an R package for fast sequence mapping in high-throughput shRNA and CRISPR screens." Bioinformatics 34, no. 20 (2018): 3581-3583.
"On model selections for repeated measurement data in clinical studies." Stat Med 34, no. 10 (2015): 1621-33.
"Sex differences in grey matter atrophy patterns among AD and aMCI patients: results from ADNI." Neuroimage 56, no. 3 (2011): 890-906.
"Optimizing delivery of a behavioral pain intervention in cancer patients using a sequential multiple assignment randomized trial SMART." Contemp Clin Trials 57 (2017): 51-57.
"Bayesian modeling and inference for clinical trials with partial retrieved data following dropout." Stat Med 32, no. 24 (2013): 4180-95.
"Predicting Alzheimer's Disease Using Combined Imaging-Whole Genome SNP Data." J Alzheimers Dis 46, no. 3 (2015): 695-702.
"SMAC: Spatial multi-category angle-based classifier for high-dimensional neuroimaging data." Neuroimage 175 (2018): 230-245.
"Analysis of secondary phenotypes in multigroup association studies." Biometrics 76, no. 2 (2020): 606-618.
"Marginal hazard regression for correlated failure time data with auxiliary covariates." Lifetime Data Anal 18, no. 1 (2012): 116-38.
"Marginal additive hazards model for case-cohort studies with multiple disease outcomes: an application to the Atherosclerosis Risk in Communities (ARIC) study." Biostatistics 14, no. 1 (2013): 28-41.
"A general framework for studying genetic effects and gene-environment interactions with missing data." Biostatistics 11, no. 4 (2010): 583-98.
"A general framework for association tests with multivariate traits in large-scale genomics studies." Genet Epidemiol 37, no. 8 (2013): 759-67.
"Multivariate recurrent events in the presence of multivariate informative censoring with applications to bleeding and transfusion events in myelodysplastic syndrome." J Biopharm Stat 24, no. 2 (2014): 429-42.
"A hybrid Bayesian hierarchical model combining cohort and case-control studies for meta-analysis of diagnostic tests: Accounting for partial verification bias." Stat Methods Med Res 25, no. 6 (2016): 3015-3037.
"Joint modeling of longitudinal and survival data with missing and left-censored time-varying covariates." Stat Med 33, no. 26 (2014): 4560-76.
"Sample size/power calculation for stratified case-cohort design." Stat Med 33, no. 23 (2014): 3973-85.
"Meta-analysis of sequencing studies with heterogeneous genetic associations." Genet Epidemiol 38, no. 5 (2014): 389-401.
"Estimating effect of environmental contaminants on women's subfecundity for the MoBa study data with an outcome-dependent sampling scheme." Biostatistics 15, no. 4 (2014): 636-50.
"Provider-based research networks and diffusion of surgical technologies among patients with early-stage kidney cancer." Cancer 121, no. 6 (2015): 836-43.
"Parameter estimation in Cox models with missing failure indicators and the OPPERA study." Stat Med 34, no. 30 (2015): 3984-96.
"Improving the efficiency of estimation in the additive hazards model for stratified case-cohort design with multiple diseases." Stat Med 35, no. 2 (2016): 282-93.
"Improving efficiency of parameter estimation in case-cohort studies with multivariate failure time data." Biometrics 73, no. 3 (2017): 1042-1052.
"A regularized variable selection procedure in additive hazards model with stratified case-cohort design." Lifetime Data Anal 24, no. 3 (2018): 443-463.
"Genetic association analysis under complex survey sampling: the Hispanic Community Health Study/Study of Latinos." Am J Hum Genet 95, no. 6 (2014): 675-88.
"Regression analysis for secondary response variable in a case-cohort study." Biometrics 74, no. 3 (2018): 1014-1022.
"Analysis of multiple survival events in generalized case-cohort designs." Biometrics 74, no. 4 (2018): 1250-1260.
"Multiplicative rates model for recurrent events in case-cohort studies." Lifetime Data Anal 26, no. 1 (2020): 134-157.
"Genetic analyses of diverse populations improves discovery for complex traits." Nature 570, no. 7762 (2019): 514-518.
"Accelerated failure time model for data from outcome-dependent sampling." Lifetime Data Anal 27, no. 1 (2021): 15-37.
"Comparative effectiveness of oxaliplatin vs non-oxaliplatin-containing adjuvant chemotherapy for stage III colon cancer." J Natl Cancer Inst 104, no. 3 (2012): 211-27.
"Checking semiparametric transformation models with censored data." Biostatistics 13, no. 1 (2012): 18-31.
"The association between copy number aberration, DNA methylation and gene expression in tumor samples." Nucleic Acids Res 46, no. 6 (2018): 3009-3018.
"Bayesian analysis on meta-analysis of case-control studies accounting for within-study correlation." Stat Methods Med Res 24, no. 6 (2015): 836-55.
"Bayesian gamma frailty models for survival data with semi-competing risks and treatment switching." Lifetime Data Anal 20, no. 1 (2014): 76-105.
"Pattern mixture models for clinical validation of biomarkers in the presence of missing data." Stat Med 36, no. 19 (2017): 2994-3004.
"Genetic variation determines VEGF-A plasma levels in cancer patients." Sci Rep 8, no. 1 (2018): 16332.
"PIK3CA mutations enable targeting of a breast tumor dependency through mTOR-mediated MCL-1 translation." Sci Transl Med 8, no. 369 (2016): 369ra175.
" "
Data for cancer comparative effectiveness research: past, present, and future potential." Cancer 118, no. 21 (2012): 5186-97.
"A framework for understanding cancer comparative effectiveness research data needs." J Clin Epidemiol 65, no. 11 (2012): 1150-8.
"Comparative effectiveness of oxaliplatin vs non-oxaliplatin-containing adjuvant chemotherapy for stage III colon cancer." J Natl Cancer Inst 104, no. 3 (2012): 211-27.
"Statistical considerations for analysis of microarray experiments." Clin Transl Sci 4, no. 6 (2011): 466-77.
"DiNAMIC: a method to identify recurrent DNA copy number aberrations in tumors." Bioinformatics 27, no. 5 (2011): 678-85.
"Rapid and robust resampling-based multiple-testing correction with application in a genome-wide expression quantitative trait loci study." Genetics 190, no. 4 (2012): 1511-20.
"Module-based association analysis for omics data with network structure." PLoS One 10, no. 3 (2015): e0122309.
"SynthEx: a synthetic-normal-based DNA sequencing tool for copy number alteration detection and tumor heterogeneity profiling." Genome Biol 18, no. 1 (2017): 66.
"PreMeta: a tool to facilitate meta-analysis of rare-variant associations." BMC Genomics 18, no. 1 (2017): 160.
"Ten Simple Rules for Effective Statistical Practice." PLoS Comput Biol 12, no. 6 (2016): e1004961.
"Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification." F1000Res 7 (2018): 952.
"Identifying individual risk rare variants using protein structure guided local tests (POINT)." PLoS Comput Biol 15, no. 2 (2019): e1006722.
"Purity Independent Subtyping of Tumors (PurIST), A Clinically Robust, Single-sample Classifier for Tumor Subtyping in Pancreatic Cancer." Clin Cancer Res 26, no. 1 (2020): 82-92.
"Consistency and overfitting of multi-omics methods on experimental data." Brief Bioinform 21, no. 4 (2020): 1277-1284.
"Tximeta: Reference sequence checksums for provenance identification in RNA-seq." PLoS Comput Biol 16, no. 2 (2020): e1007664.
"Association test using Copy Number Profile Curves (CONCUR) enhances power in rare copy number variant analysis." PLoS Comput Biol 16, no. 5 (2020): e1007797.
"Joint skeleton estimation of multiple directed acyclic graphs for heterogeneous population." Biometrics 75, no. 1 (2019): 36-47.
"Data for cancer comparative effectiveness research: past, present, and future potential." Cancer 118, no. 21 (2012): 5186-97.
"Robust test method for time-course microarray experiments." BMC Bioinformatics 11 (2010): 391.
"Improved doubly robust estimation when data are monotonely coarsened, with application to longitudinal studies with dropout." Biometrics 67, no. 2 (2011): 536-45.
"Analysis of untyped SNPs: maximum likelihood and imputation methods." Genet Epidemiol 34, no. 8 (2010): 803-15.
"Evaluating haplotype effects in case-control studies via penalized-likelihood approaches: prospective or retrospective analysis?" Genet Epidemiol 34, no. 8 (2010): 892-911.
"Semiparametric inference for a 2-stage outcome-auxiliary-dependent sampling design with continuous outcome." Biostatistics 12, no. 3 (2011): 521-34.
"Additive mixed effect model for clustered failure time data." Biometrics 67, no. 4 (2011): 1340-51.
"Sample size and power determination in joint modeling of longitudinal and survival data." Stat Med 30, no. 18 (2011): 2295-309.
"Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression." Am J Hum Genet 89, no. 2 (2011): 277-88.
"A general framework for detecting disease associations with rare variants in sequencing studies." Am J Hum Genet 89, no. 3 (2011): 354-67.
"Bayesian meta-experimental design: evaluating cardiovascular risk in new antidiabetic therapies to treat type 2 diabetes." Biometrics 68, no. 2 (2012): 578-86.
"Phase II cancer clinical trials with heterogeneous patient populations." J Biopharm Stat 22, no. 2 (2012): 312-28.
"A robust method for estimating optimal treatment regimes." Biometrics 68, no. 4 (2012): 1010-8.
"ROC curve estimation under test-result-dependent sampling." Biostatistics 14, no. 1 (2013): 160-72.
"Marginal additive hazards model for case-cohort studies with multiple disease outcomes: an application to the Atherosclerosis Risk in Communities (ARIC) study." Biostatistics 14, no. 1 (2013): 28-41.
"Meta-analysis methods and models with applications in evaluation of cholesterol-lowering drugs." Stat Med 31, no. 28 (2012): 3597-616.
"Semiparametric additive marginal regression models for multiple type recurrent events." Lifetime Data Anal 18, no. 4 (2012): 504-27.
"Multivariate phenotype association analysis by marker-set kernel machine regression." Genet Epidemiol 36, no. 7 (2012): 686-95.
"Time-varying latent effect model for longitudinal data with informative observation times." Biometrics 68, no. 4 (2012): 1093-102.
"Sample size estimation in educational intervention trials with subgroup heterogeneity in only one arm." Stat Med 32, no. 12 (2013): 2140-54.
"Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes." Genet Epidemiol 37, no. 3 (2013): 276-82.
"Kappa statistic for clustered dichotomous responses from physicians and patients." Stat Med 32, no. 21 (2013): 3700-19.
"Inference for optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap scheme." Biometrics 69, no. 3 (2013): 714-23.
"Factor selection and structural identification in the interaction ANOVA model." Biometrics 69, no. 1 (2013): 70-9.
"Inference on treatment effects from a randomized clinical trial in the presence of premature treatment discontinuation: the SYNERGY trial." Biostatistics 12, no. 2 (2011): 258-69.
"A spatial dirichlet process mixture model for clustering population genetics data." Biometrics 67, no. 2 (2011): 381-90.
"Bayesian lasso for semiparametric structural equation models." Biometrics 68, no. 2 (2012): 567-77.
"Maximum likelihood estimation in generalized linear models with multiple covariates subject to detection limits." Stat Med 30, no. 20 (2011): 2551-61.
"Empirical pathway analysis, without permutation." Biostatistics 14, no. 3 (2013): 573-85.
"Bayesian design of noninferiority trials for medical devices using historical data." Biometrics 67, no. 3 (2011): 1163-70.
"A general framework for studying genetic effects and gene-environment interactions with missing data." Biostatistics 11, no. 4 (2010): 583-98.
"Bayesian gamma frailty models for survival data with semi-competing risks and treatment switching." Lifetime Data Anal 20, no. 1 (2014): 76-105.
"A global logrank test for adaptive treatment strategies based on observational studies." Stat Med 33, no. 5 (2014): 760-71.
"A variable selection method for genome-wide association studies." Bioinformatics 27, no. 1 (2011): 1-8.
"Meta-analysis of gene-level associations for rare variants based on single-variant statistics." Am J Hum Genet 93, no. 2 (2013): 236-48.
"Multicategory reclassification statistics for assessing improvements in diagnostic accuracy." Biostatistics 14, no. 2 (2013): 382-94.
"Block-diagonal discriminant analysis and its bias-corrected rules." Stat Appl Genet Mol Biol 12, no. 3 (2013): 347-59.
"Pathway-based identification of SNPs predictive of survival." Eur J Hum Genet 19, no. 6 (2011): 704-9.
"Fixed and random effects selection in mixed effects models." Biometrics 67, no. 2 (2011): 495-503.
"Multiscale adaptive generalized estimating equations for longitudinal neuroimaging data." Neuroimage 72 (2013): 91-105.
"