Data Mining with Stepwise Regression Author: Robert Stine Department of Statistics Wharton School, University of Pennsylvania Abstract: This talk will discuss using stepwise regression to fit models to very large datasets. The goal of this talk is to show that with several modifications, stepwise regression becomes an easy and effective tool in data mining large datasets. The standard version of stepwise regression has several pitfalls that one must avoid. Three modifications to standard regression fix the most serious aspects of these problems: (1) use interactions to capture non-linearities and cope with missing data, (2) use Bonferroni-type procedures to select which variables to include in the model, and (3) use an adjusted critical value to accommodate high-leverage points. The talk will describe each of these three modifications and explain why each is necessary. With all three of these modifications in place, stepwise regression becomes a data-mining procedure well-suited to almost any data set.