|Year of Publication||2012|
|Authors||Linn, Kristin A.|
Clinicians wanting to form evidence based rules for optimal treatment allocation over time have begun to estimate such rules using data collected in observational or randomized studies. Popular methods for estimating optimal sequential decision rules from data, such as Q-learning, are approximate dynamic programming algorithms that require the modeling of nonsmooth transformations of the data. Unfortunately, postulating a model for the transformed data that is adequately expressive yet parsimonious is difficult, and under many simple generative models, the most commonly employed working models---namely linear models---are seen to be misspecified. Furthermore, such estimators are nonregular making statistical inference difficult. We propose an alternative strategy for estimating optimal sequential decision rules wherein all modeling takes place before nonsmooth transformations of the data are applied. This simple interchange of modeling and transforming the data leads to high quality estimated sequential decision rules, while in many cases allowing for simplified exploratory data analysis, model building and validation, and normal limit theory. We illustrate the proposed method using data from the STAR*D study of major depressive disorder.