I am computing the following model in R, using lme4::lmer:
m3 = lmer(e ~ (X*Y*Z) + (1|ID/R), data = data_transform)
e is a continuous variable. X, Y, and Z are categorical factors. ID is the subject ID and R is the measurement repetition. The error was measured multiple times for each subject under the conditions, given by X, Y, and Z (repeated measures). The design is slightly unbalanced.
My goal is to compute an ANOVA, using the lmer model:
m3_anova = Anova(m3, type = 3, test.statistic = "F")
which does work and produces plausible results in the context of my experiment.
Now, I am checking the assumptions of the mixed-effects model, which are, as far as I understand, homoscedasticity and normality of the residuals as follows:
res = residuals(m3)
interaction_factor = with(data_transform, interaction(X,Y,Z))
leveneTest(res ~ interaction_factor, center = median)
which is significant with p << 0.001. Furthermore:
shapiro.test(res[1:5000])
is significant with p << 0.0001. I am already transforming my target variable error, using rcompanion::transformTukey.
The residual variances vary from ~0.3 to ~1.
My questions are:
- Am I checking the assumptions correctly?
- Can I still run the ANOVA and if not what can I do to get normally distributed residuals and homoscedasticity?
- Edit: From the plots below, are my assumptions valid or not? I feel like I am lacking the experience to rate them.
Edit:
I am using the packages:
library(lme4)
library(car)
library(lmerTest)
I also checked the assumptions of normality using a qqplot:
ggqqplot(res)
I know that for normal distribution the points are suppposed to lie on the main diagonal. From this plot, it is obvious that there is some deviation in the area around "x=2".
Furthermore, I plotted the residuals variaces across the factor level combinations:
data_transform$residuals = res
var3 = data_transform %>% group_by(X,Y,Z) %>% summarise(Variance = var(residuals, na.rm = TRUE))
boxplot(Variance ~ X + Y + Z, data = var3)
which looks like:
Additionally, residuals vs. predicted values look like:
plot(m3)
gives:
As suggested, I tried the DHARMa package:
plot(simulateResiduals(m3))
leading to:



