I am checking the assumptions of ANOVA model and I want to see if the condition of homogeneity of variances is met in my model. I made such a plot, but I am not sure what to look at here and what conclusions to draw...

- 27
1 Answers
For an ANOVA model like this, you have 1 mean value estimate for each of 3 groups. The residuals are the differences of each individual observation from its corresponding group mean.
Roughly, the spread of the residuals should be about the same for all groups and not depend on the predicted values. The spread in your data seems to increase somewhat with the predicted values. With an ANOVA model like this, a q-q plot can be more helpful, as it gets around the point-overlap problem in the residuals-vs-fitted plot that you show.
Both the middle- and high-fitted groups seem to have a clumping pattern among the residuals: some very close to the fitted value, some substantially higher, and some substantially lower (residuals of $\pm 20$ around means of about 40 to 50). For the ideal normal distribution of residuals, you should have a less clumped distribution with residuals gradually decreasing in frequency as you move away from 0 on both sides.
A non-normal distribution of residuals doesn't necessarily invalidate ANOVA. This page goes into detail about how you can handle non-normality of residuals and heteroscedasticity in ANOVA.
That distribution of residuals, however, makes me wonder if there's some other variable to consider. For example, might the age or comorbidity status of a patient be contributing to this, with older/sicker patients having high residuals and younger/healthier ones lower? Alternatively, could this represent day-of-the-week issues? If performance tests are only performed on weekdays, then you might have gaps in the observed Time if there aren't observations on weekends.
I'd worry about those potential systematic issues first. Then consult the pages linked above.
- 92,183
- 10
- 92
- 267
-
"That distribution of residuals, however, makes me wonder if there's some other variable to consider" - do you mean by that that it is better to consider the model such as:
Time ~Program + Day_of_the_weekand do two-way ANOVA? – MaximeTars May 22 '22 at 06:17 -
@MaximeTars that model wouldn't do what you need. For the weekend/weekday problem you have what's called "interval censored" time values: you know lower and upper limits to the time, but not the actual time. That can be handled by some types of survival analysis. In general, a 2-way ANOVA includes an interaction between 2 categorical predictors, not just their additive terms as you write. You write
Time ~ factorA * factorB, notTime ~ factorA + factorB– EdM May 22 '22 at 14:05
Time~Programmemodel, see this page for how to merge your accounts. – EdM May 21 '22 at 19:32