1

i have a new data set which is basically as bad as the last (same sort of data) and have been asked to try non linear regression on it, with the focus on partition (I will be using boosting and bagging). The dependent variable is continuous however.

I require the input variables intact (as in not factorized) as i again require identification of what input variables affect the dependent variable of interest and how.

Again the many X input variables are very different in terms of distribution, i have categorical inputs too.

I do not see much information in general with regards to non linear regression and linear correlation. Leaving the data set as is causes multiple inclusion of what are strongly correlated variables particularly in bootstrapping (the reason seems obvious due to the nature of bootstrap).

to what degree is this a problem? For example; if X1 and X2 are collinear and X1 is Out of Bag with X2 in bag you have effectively not taken that variable out. Is this even an issue?

Samuel
  • 121

0 Answers0