Is it ok to spit non-normal variables in tertiles and put them into multivariate regression models?

Question

I am now reviewing a paper in which the authors decided to predict a DV through linear regression using, beyond other variables, dummy variables obtained from a tertile split of continuous variables, which were not normally distributed. In other words, for example, they took a not normally distributed variable, split the variable into three tertiles, created three dummy variables for each tertile (i imagine they assigned 1 to subject falling into the selected tertile in each variable), and put all the dummy variables in the regression model. Their regression models get a really high R^2 value (.90). Is it correct to do so?

The IVs aren't assumed to be normal in regression.
Did they have any better reason than that?

How many predictors were there, and what was the sample size? — Glen_b, Apr 16 '14 at 15:05
This is a good point from your answer: "Binning's really only a good idea when you'd expect a discontinuity in the response at the cut-points—say the temperature something boils at, or the legal age for driving–, & when the response is flat between them.." — wcampbell, Apr 17 '14 at 12:36
The sample size is 157 and we have something like 15 predictors which i think is adequate. The binned variable is not discontinue at cut points — user43897, Apr 17 '14 at 13:00

score 1 · Accepted Answer · answered Apr 16 '14 at 14:42

You can do this (put a continuous variable into bins) but it's generally considered a loss of information.

It would be appropriate if there is clearly a different effect when moving from one bin to another or if the relationship between the IV and DV appears to be stepwise: same effect from 0 to 5, different effect from 5 to 10, etc. You would want to check out a scatter plot to see what the univariate relationship looks like for the binned IV and DV. If the relationship looks linear, I would not understand why they chose to bin the variable.

Keep in mind that the dummy variables are interpreted relative to the single dummy variable left out of the model, which would probably be the first tertile.

Is it ok to spit non-normal variables in tertiles and put them into multivariate regression models?

1 Answers1