Reinforcement learning trees

TitleReinforcement learning trees
Publication TypeReport
Year of Publication2012
AuthorsZhu, Ruoqing, Donglin Zeng, and Michael R. Kosorok
Series TitleThe University of North Carolina at Chapel Hill Department of Biostatistics Technical Report Series
Document NumberWorking Paper 33
Date Published12/2012
InstitutionThe University of North Carolina
CityChapel Hill

In this paper, we introduce a new type of tree-based regression method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001). The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach can be adapted to make high-dimensional cuts available at a relatively small computational cost. Second, we propose a variable screening method that progressively mutes noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards a terminal node when the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method. We can show that under the proposed splitting variable selection procedure, the constructed trees are consistent. The error bounds for the proposed RLT are shown to depend on a pre-selected number p0, where p0 is an educated guess of the number of strong variables which is usually much smaller than the total number of variables p but at least as large as the true number of strong variables p1. Hence when p0 is properly chosen, the error bounds can be significantly improved.

Alternate TitleUNC at Chapel Hill
Original PublicationReinforcement learning trees