10

I have two groups of participants and they all took the same measurements.

When I perform a median split on one of the measurements to test my hypothesis about an interaction, the dependent variable fails Levene's test for equality of variances. When I perform a teritary split, it passes Levene's test - both when I am comparing only high and low thirds and when I compare all three thirds too.

There are main effects and interaction effects in all of the ANOVAs that I have run, but only the 2x2 ANOVA fails Levene's test for significance, neither of the 3x2 ANOVAs fail.

How is this possible?

jpf66
  • 315

1 Answers1

17

(This isn't a direct answer to the question, more a bunch of references relating to why the approach should be avoided.)

Some of the issues include downward bias in estimation of effects, inflation of error variance and (consequently) low power. There's also the dependence issue that impacts the calculation of p-values (i.e. p-values calculated in the 'usual' way are not correct).

There's a wealth of material on why median (etc) splits of variables are a bad idea.

http://www.uvm.edu/~dhowell/gradstat/psych341/lectures/Factorial2Folder/Median-split.html

http://psych.colorado.edu/~mcclella/MedianSplit/

http://core.ecu.edu/psyc/wuenschk/stathelp/Dichot-Not.doc

MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7, 19–40. here

Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions

http://www.theanalysisfactor.com/continuous-and-categorical-variables-the-trouble-with-median-splits/

Google turns up a bunch more references and links

Cutting in 3 or 4 doesn't avoid the problems but it's not quite as bad.

If you do cut into more than two segments, you're not necessarily best off making them all the same size, or giving them all the same weight, though optimal sizes and weights will depend on what you are doing (a straight up ANOVA would be different from a regression-like model where you're trying to find how much the response changes on average with a given amount of change in the predictor).

Glen_b
  • 282,281