High Dimensional Nonparametric Regression Using Two-Dimensional Polynomial Cascades Author: Greg Grudic Department of Computer Science, University of Colorado, Boulder Abstract: Many real nonlinear regression problems are characterized by 1. thousands of relevant input variables, each contributing a small but significant amount to the final regression function; 2. no subset of these variables can adequately describe the regression function; and 3. the relevant variables are confounded by thousands of other irrelevant variables (i.e. variables which have no influence on the target regression function). Examples of these problem domains can be found in such varied fields as robotics, computer vision, and economic modeling. Successful application of standard nonparametric regression techniques to these types of domains has been prohibitive because thousands of input variables must be selected from tens of thousands of potential candidates, many of which are irrelevant. Even newer kernel based techniques, such as Support Vector Regression, cannot adequately address these problems because feature selection is needed to create effective regression models, and kernel based feature selection techniques are computationally intractable for large input spaces. We propose a Polynomial Cascade (PC) based algorithm to deal with such problems (Grudic and Lawrence, 1997). The algorithm builds a cascade of two dimensional, second order polynomials, with each level of the cascade receiving input from the previous level plus a single input variable. Because each level of the cascade feeds into the next level, the resulting regression function is a very high dimensional, high order, polynomial (for example, even with only 5 levels, polynomial terms are generated of order x^2^2^2^2^2 = x^65536). As with algorithms such as Random Forests (Breiman, 1999), the polynomial cascade algorithm avoids overfitting by injecting randomness in the form of random ordering of input variables into the cascade (including the irrelevant variables), and the use of bootstrap samples to fit the two dimensional polynomial at each level. Furthermore, the PC algorithm deals with irrelevant or noisy inputs not by excluding them from the cascade, but by treating them as infusions of noise which are averaged out over many levels. We present proofs showing under which conditions the PC algorithm converges to the target regression function. We further show that the rate of convergence of the PC is independent of the size of the input space. The efficacy of the polynomial cascade is demonstrated on problem domains ranging from 4 to 19,200 inputs (with as many as half of the inputs being irrelevant), and with training sample sizes ranging from 200 to 40,000.