1

I'd like to perform an ANOVA with a normally distributed response variables and several explanatory variables. Some of the explanatory variables are continuous and some are categorical (factor(..)).

aov(a~numeric(b) + factor(c ) + numeric(d) + factor(e))

The residuals of this model are perfectly normally distributed but the assumption of homoscedasticity is not respected. What can I do?

  • Welch correction?
    • Does it work for multiple way ANOVA? How can we perform such a thing with R?
  • ordered logic model?
    • I tried the function polr (in R) but I get an error message saying that the response should be a factor
  • Friedman test?
    • I tried but I got an error message saying that the formula is incorrect (although it is exactly the same as for aov(..))
  • Kruskal.wallis?
    • It works only for one-way Anova I think.

Update

m = aov(myFormula, myData)
plot(y=residuals(m), x=m$fit)
abline(lm(residuals(m)~m$fit))

enter image description here

Remi.b
  • 5,112
  • 1
    Note that with continuous predictors, aov fits a general linear model - usually called regression rather than ANOVA, even though you may want to look at an ANOVA table. – Scortchi - Reinstate Monica Nov 18 '13 at 13:56
  • Is there an apparent relationship between the variance of the residuals and the fitted values? – Scortchi - Reinstate Monica Nov 18 '13 at 14:17
  • @Scortchi Should I check this on this kind of plot: plot(residuals(m), predict(m)) ? (where m = aov(myFormula, myData)) or doing this: summary(lm(predict(m)~residuals(m)))? Or something else? – Remi.b Nov 18 '13 at 14:22
  • You should plot residuals against fits. It may suggest a useful transformation or a mean-variance relationship to use in an estimating-equation approach with robust standard errors. See robust regression & sandwich estimators for standard errors. – Scortchi - Reinstate Monica Nov 18 '13 at 14:27
  • @Scortchi See my update post. – Remi.b Nov 18 '13 at 14:35
  • 2
    It's not gross heteroskedasticity. Investigate outliers & try sandwich estimators for the standard errors to see the difference. – Scortchi - Reinstate Monica Nov 18 '13 at 14:42
  • I don't fully understand how I can investigate further the violation of the assumption. I did sandwich(m) and vcovHC(m, type = "HC") and I get a 30*30 matrix. What does it mean? Thanks a lot for your help @Scortchi – Remi.b Nov 18 '13 at 16:32
  • 1
    Well read up on sandwich estimators, don't just type sandwich in R! It's giving you a robust estimate of the variance-covariance matrix for your model's coefficients, from which you can calculate their standard errors; all without assuming homoskedasticity of the error terms. – Scortchi - Reinstate Monica Nov 18 '13 at 16:51

0 Answers0