I'm currently taking my first applied linear regression class at the graduate level, and am struggling with predictor variable transformations in multiple linear regression. The text I'm using, Kutner et al "Applied Linear Statistical Models" doesn't seem to cover the question I'm having. (apart from suggesting that there is a Box-Cox method for transforming multiple predictors).
When faced with a response variable and several predictor variables, what conditions does one strive to meet with each predictor variable? I understand we're ultimately looking for constancy of error variance and normally distributed errors (at least in the techniques I've been taught so far.) I've had many exercises come back, where the solution was, as an example y ~ x1 + (1/x2) + log(x3), where one or more predictors was transformed.
I understood the rationale under simple linear regression, since it was easy to look at y~x1 and the related diagnostics (q-q plots of residuals, residuals vs. y, residuals vs. x, etc) and test to see if y~log(x1) fit our assumptions better.
Is there a good place to start understanding when to transform a predictor in the presence of many predictors?
Thank you in advance. Matt
A recent example I went through in a text had the resulting model y~x1 + log(x2), and the only note about the transformation is "it was apparent that x2 was well suited for a logarithmic transformation." I'm trying to improve my sense of when transformations are applicable. Is it enough to just look at y~x_i plots and proceed as we would in the single-predictor case? What else should I consider?
– Matt Nov 14 '11 at 00:11