Reinforcement learning trees | Innovative Methods Program for Advancing Clinical Trials (IMPACT)

Title	Reinforcement learning trees
Publication Type	Report
Year of Publication	2012
Authors	Zhu, Ruoqing, Donglin Zeng, and Michael R. Kosorok
Series Title	The University of North Carolina at Chapel Hill Department of Biostatistics Technical Report Series
Document Number	Working Paper 33
Date Published	12/2012
Institution	The University of North Carolina
City	Chapel Hill
Abstract	In this paper, we introduce a new type of tree-based regression method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001). The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach can be adapted to make high-dimensional cuts available at a relatively small computational cost. Second, we propose a variable screening method that progressively mutes noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards a terminal node when the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method. We can show that under the proposed splitting variable selection procedure, the constructed trees are consistent. The error bounds for the proposed RLT are shown to depend on a pre-selected number p0, where p0 is an educated guess of the number of strong variables which is usually much smaller than the total number of variables p but at least as large as the true number of strong variables p1. Hence when p0 is properly chosen, the error bounds can be significantly improved.
Alternate Title	UNC at Chapel Hill
Original Publication	Reinforcement learning trees

Project:

Project 2.4