Interaction Screening for Ultra-High Dimensional Data. | Innovative Methods Program for Advancing Clinical Trials (IMPACT)

Title	Interaction Screening for Ultra-High Dimensional Data.
Publication Type	Journal Article
Year of Publication	2014
Authors	Hao, Ning, and Hao Helen Zhang
Journal	J Am Stat Assoc
Volume	109
Issue	507
Pagination	1285-1301
Date Published	2014
ISSN	0162-1459
Abstract	In ultra-high dimensional data analysis, it is extremely challenging to identify important interaction effects, and a top concern in practice is computational feasibility. For a data set with observations and predictors, the augmented design matrix including all linear and order-2 terms is of size × ( + 3)/2. When is large, say more than tens of hundreds, the number of interactions is enormous and beyond the capacity of standard machines and software tools for storage and analysis. In theory, the interaction selection consistency is hard to achieve in high dimensional settings. Interaction effects have heavier tails and more complex covariance structures than main effects in a random design, making theoretical analysis difficult. In this article, we propose to tackle these issues by forward-selection based procedures called iFOR, which identify interaction effects in a greedy forward fashion while maintaining the natural hierarchical model structure. Two algorithms, iFORT and iFORM, are studied. Computationally, the iFOR procedures are designed to be simple and fast to implement. No complex optimization tools are needed, since only OLS-type calculations are involved; the iFOR algorithms avoid storing and manipulating the whole augmented matrix, so the memory and CPU requirement is minimal; the computational complexity is in for sparse models, hence feasible for ≫ . Theoretically, we prove that they possess sure screening property for ultra-high dimensional settings. Numerical examples are used to demonstrate their finite sample performance.
DOI	10.1080/01621459.2014.881741
Alternate Journal	J Am Stat Assoc
Original Publication	Interaction screening for ultra-high dimensional data.
PubMed ID	25386043
PubMed Central ID	PMC4224119
Grant List	P01 CA142538 / CA / NCI NIH HHS / United States R01 CA085848 / CA / NCI NIH HHS / United States

Project:

Project 2.1