0

I have the following model where X is the duration of a particular event, A is a factor with five levels and B is a factor with two levels. I want to run a type III ANOVA analysis.

type3.model1 <- list(A = contr.sum, B = contr.sum)
model1<-lm(X~A*B, data = X.duration, contrasts = type3.model1)
summary(model1)
Anova(model1, type = 3)

Examination of the diagnostic plots (see below) show that the assumption of normality of residuals (plot 2) is violated:

enter image description here

Neither log nor sqrt transformation of the dependent variable improved this violation (in both cases the p-value from a Shapiro-Wilk test decreased after transformation) so I wanted to use a Box-Cox transformation in order to identify lambda and thus the best way to transform the data. However, I have only been able to find code that does this in cases where there is only one dependent variable (as below):

library(MASS)
x1<-boxcox(model1)

enter image description here

To calculate lambda:

lambda.1<- model1$x[which.max(model1$X)] where x is the sole dependent variable.

Is there a way I can calculate lambda for a model with more than one dependent variable?

  • 1
    (1) You don't need to transform the data. (2) If you do, you won't be doing exactly the same analysis: you will be comparing means of transformed data. (3) Box-Cox transformations will not cure this form of non-Normal behavior, because it's close to symmetric already. (4) Transforming dependent variables is not the right approach in the first place: it's for making relationships more linear. See https://stats.stackexchange.com/a/24236/919 for a brief explanation. – whuber Mar 20 '24 at 20:22
  • @whuber So in this case would it be preferable just to continue without transformation? I should also add that the model has two categorical independent variables not response variables (I apologize for the error in the title). The response variable is continuous. – Insect_biologist Mar 21 '24 at 12:28
  • Note also that your residuals are bounded presumably because not only is the outcome variable bounded, there are also several ties. This complication alone would frustrate search for normal residuals. As durations are (again presumably) positive by definition, this raises a question of looking beyond plain linear models to some kind of generalized linear model. – Nick Cox Mar 21 '24 at 13:27

0 Answers0