How to Evaluate Variable Selection Approaches: Are We Too Traditional?
Howard Bondell
Howard
Bondell

In today’s world where technology runs rampant, as statisticians, we are well aware of the influx of potential variables that we are able to measure. Due to this data explosion, methods for variable selection have received a great deal of attention in the past few decades. While the idea of variable selection is not a new topic, it differs greatly in scale from what was formally the norm. Traditionally, there were a few, or perhaps if luck would have it, dozens of variables to sort through. But now, we have 100’s or 1,000’s, or even more. Although a vast number of new methods have been developed to handle this larger scale, the way that we as statisticians evaluate these methods often reverts back to the more traditional.

Of course, upon proposing a new method, we would like to compare it to established approaches via simulation for which the true data generating mechanism is known, unlike in reality. This is a necessary development in order to examine the properties of the proposal. Data is generated from a model, usually with only a handful of predictors being truly relevant, and the remainder as simply noise. A large number of datasets are generated, and the methods are then compared on how often each was able to identify the correct set of predictors that appear in the true generating model. This is the traditional method of comparison.

However, in the high-dimensional world, does anyone really expect any method to obtain the exact correct subset of predictors, with no mistakes? In reality, it is not even clear what this means. I would argue that this traditional measure to compare selection methods is no longer very useful. Unless the designed simulation is excessively simple, it seems unlikely in the high-dimensional case that any method would select the exact subset of correct predictors even once. In a somewhat realistic setting, all methods would have exactly zero for its proportion correct!

So how should we compare methods?

In many cases, the desired output from a selection method is not the presentation of what is thought to be the “correct model”, but an ordering of the variables that can be presented for further investigation. Often times, the size of the desired subset is not dictated by some statistical stopping rule, but instead may be dictated by the resources that are available for this further investigation.

In aligning with this goal, comparison of variable lists are potentially more relevant, and surely more informative. Consider, for simplicity, the sequence of variable subsets generated by a method such as forward selection. For forward selection, these are nested subsets, but they do not need to be. We can just order the subsets by increasing complexity. A variable selection method can typically be viewed in two stages. First, the approach provides this sequence of variable subsets, and second, it provides a stopping rule to pick out the subset to report.

Traditional evaluation combines these two stages and evaluates only the final model. This is a shame, as it entangles the two different parts. Instead, we should be evaluating the first stage on its own. This can be directly accomplished via the use of Receiver Operating Characteristic Curves, or, perhaps more relevant to the high-dimensional setting, Precision-Recall (or False Discovery) Curves.


Comments/Discussion
Hi Howard,

This article makes a great deal of sense and I'm interested in getting more details. Can you give some references where I can read more about Precision-Recall and False Discover Curves.

All the best,

Butch Tsiatis

The Precision-Recall curve has been used extensively in information retrieval when there are a small number of cases, and lots of controls, i.e. unbalanced class distributions. The main difference between Precision-Recall and ROC, is: They both look at the tradeoff between correct identification and false identification. They both measure success by the True Positive rate (as a measure of power). But, as a measure of the error, the ROC uses the total number of negative examples (or controls) in its denominator (so it is looking at the fraction of controls chosen out of the total number of controls, kind of like a type I error rate). Meanwhile Precision-Recall uses the total number selected as its denominator instead, so it is measuring error more in line with a False Discovery Rate. If you think about the ROC curve when there are lots of controls and a few cases, there may only be a small region that is of interest which is actually compressed into the very left hand portion of the x-axis (beyond that region, the number of false discoveries is unacceptably high). That is why there have been proposals to use things like partial area under the ROC. The Precision-Recall curve essentially automatically zooms in on that region without having to specify a cutoff of the region as needed by the partial ROC curve

Here is a reference that discusses some of the differences between the Precision-Recall Curve and the ROC Curve.

Davis, J. and Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning, ICML '06, 233-240.

There is a version available on the author's website http://pages.cs.wisc.edu/~jdavis/davisgoadrichcamera2.pdf.

Submit a question or comment.
Welcome to the IMPACT Blog!

Each month, one of our program investigators will introduce him/herself and will discuss their research, new research directions, or advancements made toward our goal of improving the clinical trial process. Readers are encouraged to send questions or comments. In addition, we will announce new software releases, publications, and upcoming events.

Subscribe

Archive
blog icon
By: Xiaofei Wang and Jianwen Cai
Date: February 15, 2016
blog icon
By: Michael Kosorok
Date: December 16, 2015
blog icon
By: Shannon Holloway
Date: November 24, 2015
blog icon
By: Donglin Zeng
Date: October 15, 2015
blog icon
By: Alison Motsinger-Reif
Date: September 17, 2015
blog icon
By: Shannon Holloway
Date: August 24, 2015
blog icon
By: Michael R. Kosorok, Marie Davidian, and Kouros Owzar
Date: June 4, 2015