Justification for low/high or tertiary splits in ANOVA

Question

I have two groups of participants and they all took the same measurements.

When I perform a median split on one of the measurements to test my hypothesis about an interaction, the dependent variable fails Levene's test for equality of variances. When I perform a teritary split, it passes Levene's test - both when I am comparing only high and low thirds and when I compare all three thirds too.

There are main effects and interaction effects in all of the ANOVAs that I have run, but only the 2x2 ANOVA fails Levene's test for significance, neither of the 3x2 ANOVAs fail.

How is this possible?

(1) I really think you should avoid categorizing continuous variables (either dichotomizing, or splitting into thirds). (2) My first guess is that you end up w/ less power w/ the tertiary split, but I'm not sure. — gung - Reinstate Monica, Oct 26 '12 at 02:47
I have heard and read arguments for both sides when it comes to categorizing continuous variables, but everyone does it - especially in the social and cognitive sciences.
the effects are all significant at least <.01 no matter the split. — jpf66, Oct 26 '12 at 02:51
this is a helpful site, especially the main wiki http://www.psychwiki.com/wiki/Why_cut_into_thirds_or_fourths_instead_of_dichotomizing%3F — jpf66, Oct 26 '12 at 02:57
I discuss this issue in this answer: how-to-choose-between-anova-and-ancova-in-a-designed-experiment. — gung - Reinstate Monica, Oct 26 '12 at 03:23
That psychwiki sit gives awful advice. Leave variables continuous. All the problems the site alludes to regarding dichotomizing also happen when you split into 3 or 4 levels. — Peter Flom, Oct 26 '12 at 10:43

Glen_b · Accepted Answer · 2024-01-16T23:20:26.603

(This isn't a direct answer to the question, more a bunch of references relating to why the approach should be avoided.)

Some of the issues include downward bias in estimation of effects, inflation of error variance and (consequently) low power. There's also the dependence issue that impacts the calculation of p-values (i.e. p-values calculated in the 'usual' way are not correct).

There's a wealth of material on why median (etc) splits of variables are a bad idea.

http://www.uvm.edu/~dhowell/gradstat/psych341/lectures/Factorial2Folder/Median-split.html

http://psych.colorado.edu/~mcclella/MedianSplit/

http://core.ecu.edu/psyc/wuenschk/stathelp/Dichot-Not.doc

MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7, 19–40. here

Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions

http://www.theanalysisfactor.com/continuous-and-categorical-variables-the-trouble-with-median-splits/

Google turns up a bunch more references and links

Cutting in 3 or 4 doesn't avoid the problems but it's not quite as bad.

If you do cut into more than two segments, you're not necessarily best off making them all the same size, or giving them all the same weight, though optimal sizes and weights will depend on what you are doing (a straight up ANOVA would be different from a regression-like model where you're trying to find how much the response changes on average with a given amount of change in the predictor).

Justification for low/high or tertiary splits in ANOVA

1 Answers1

Linked

Related