When to transform predictor variables when doing multiple regression?

Question

I'm currently taking my first applied linear regression class at the graduate level, and am struggling with predictor variable transformations in multiple linear regression. The text I'm using, Kutner et al "Applied Linear Statistical Models" doesn't seem to cover the question I'm having. (apart from suggesting that there is a Box-Cox method for transforming multiple predictors).

When faced with a response variable and several predictor variables, what conditions does one strive to meet with each predictor variable? I understand we're ultimately looking for constancy of error variance and normally distributed errors (at least in the techniques I've been taught so far.) I've had many exercises come back, where the solution was, as an example y ~ x1 + (1/x2) + log(x3), where one or more predictors was transformed.

I understood the rationale under simple linear regression, since it was easy to look at y~x1 and the related diagnostics (q-q plots of residuals, residuals vs. y, residuals vs. x, etc) and test to see if y~log(x1) fit our assumptions better.

Is there a good place to start understanding when to transform a predictor in the presence of many predictors?

Thank you in advance. Matt

score 3 · Answer 1 · answered Nov 13 '11 at 21:17

3

I take your question to be: how do you detect when the conditions that make transformations appropriate exist, rather than what the logical conditions are. It's always nice to bookend data analyses with exploration, especially graphical data exploration. (Various tests can be conducted, but I'll focus on graphical EDA here.)

Kernel density plots are better than histograms for an initial overview of each variable's univariate distribution. With multiple variables, a scatterplot matrix can be handy. Lowess is also always advisable at the start. This will give you a quick and dirty look at whether the relationships are approximately linear. John Fox's car package usefully combines these:

library(car)
scatterplot.matrix(data)

Be sure to have your variables as columns. If you have many variables, the individual plots can be small. Maximize the plot window and the scatterplots should be big enough to pick out the plots you want to examine individually, and then make single plots. E.g.,

windows()
plot(density(X[,3]))
rug(x[,3])
windows()
plot(x[,3], y)
lines(lowess(y~X[,3]))

After fitting a multiple regression model, you should still plot and check your data, just as with simple linear regression. QQ plots for residuals are just as necessary, and you could do a scatterplot matrix of your residuals against your predictors, following a similar procedure as before.

windows()
qq.plot(model$residuals)
windows()
scatterplot.matrix(cbind(model$residuals,X))

If anything looks suspicious, plot it individually and add abline(h=0), as a visual guide. If you have an interaction, you can create an X[,1]*X[,2] variable, and examine the residuals against that. Likewise, you can make a scatterplot of residuals vs. X[,3]^2, etc. Other types of plots than residuals vs. x that you like can be done similarly. Bear in mind that these are all ignoring the other x dimensions that aren't being plotted. If your data are grouped (i.e. from an experiment), you can make partial plots instead of / in addition to marginal plots.

Hope that helps.

answered Nov 13 '11 at 21:17

gung - Reinstate Monica

145,122

2

I would encourage a more direct approach: use regression splines to model the effects of predictors so as to (1) not assume linearity and (2) estimate all transformations simultaneously. This is akin to quadratic regression - adding a square term for all predictors. With restricted cubic splines, for example, one adds one or more nonlinear basis functions to the model for each predictor not known to operate linearly. – Frank Harrell Nov 13 '11 at 22:13
@Frank I often like restricted cubic splines. The only negative is one of interpretation, which is a little tricky and often turns my clients off. Adding a polynomial term (after centering) seems to be more interpretable – Peter Flom Nov 13 '11 at 23:54
Thank you all for the input, I greatly appreciate it. I think you're currently giving me too much credit. My question is actually centered around what to look for in individual predictors to know when/if a transformation is applicable. For example, if I have a strictly additive model, with 3 predictors, how would I go about determining an appropriate transformation? In the multiple predictors case, are we generally striving for the same principles we look for in simple linear regression? (i.e. favorable residuals vs. predicted plot and qqplot of residuals). – Matt Nov 13 '11 at 23:58
1

Peter- regression splines are not much more complex than quadratics. Who knows how to interpret the coefficient of age when age^2 is in the model anyway? And I don't see where centering helps. I interpret spline fits with graphs, which my collaborators like. Matt a transformation is almost always needed. It's just a question of adequacy of sample size for estimating enough parameters to fit nonlinear effects. Regression splines directly estimate the transformations, and lead to appropriate confidence intervals that are penalized for "data looks". Residuals involve an indirect approach. – Frank Harrell Nov 14 '11 at 00:08
Sorry, still getting the hang of posting on here. Please forgive. To expound on my last comment:
A recent example I went through in a text had the resulting model y~x1 + log(x2), and the only note about the transformation is "it was apparent that x2 was well suited for a logarithmic transformation." I'm trying to improve my sense of when transformations are applicable. Is it enough to just look at y~x_i plots and proceed as we would in the single-predictor case? What else should I consider?
– Matt Nov 14 '11 at 00:11
@matt if you just want to know which of the available transformation is best, in practice, most people will just try a couple and pick the one that looks best. Transformations like x^2 and ln(x) are part of what's called the Box-Cox family of transformations; you can solve for an exact power (e.g., x^1.9) but I don't know anyone who does. Mostly, your goal is to get the functional form right (resids don't systematically deviate) without heteroscedasticity. As for normality, the Gauss-Markov theorem says you don't really need it, although you ought to bootstrap in that case. – gung - Reinstate Monica Nov 14 '11 at 00:15
Just fitting splines is a much more direct approach to the problem and requires fewer decisions. – Frank Harrell Nov 20 '11 at 19:28

When to transform predictor variables when doing multiple regression?

1 Answers1

Linked