In my current research I'm using linear mixed effects models to explain reaction time data from a repeated measure design.
When testing different models for best fit I checked various random structures, keeping fixed effects constant, and noticed that the denominator degrees of freedom change, but I cannot explain why this is happening.
My main question is: How should I report results from a type III anova given df's differ a lot depending on which model I choose? Is there a general rule or is this something more specific for a given discipline? Shall I explain it somehow (with reviewer #2 in mind)?
The other question is: why is this happening especially for not balanced factors? Please note that I will most likely understand only plain English.
Here is an example of data that I'm working with (scroll down for a reproducible example):
- Factors
A,B,Care balanced within subject factors Responseis a dichotomous (yes / no) answer to a experimental trial, that I treat as factorlogRTis my outcome, which has some missing values due to extremely short latencies- Each combination of
A*B*Cis repeated 16 times within participants, making a total of 128 trials per subject.
I'm working with mixed() from afex package and using TYPE III sums of squares for F results as this is what 90% of people report in my discipline.
First I did a random intercept only model with 4 factors at the fixed part (see m0 below). This yielded a constant degrees of freedom for each effect in the anova table. I might not understand why, but I'm fine with that.
Then I did the maximal model justified by the design (see m1 below) and this one gave my df's closer to what I'd expect from a rm-Anova. I'm also fine with those since they look consistent.
As model m1 took ages to compute and threw convergence errors among some others I decided to uncorrelate all random slopes using the || syntax (m2 model below).
This in turn gave me something strange: each effect with response (is this due to empty cells?) has a very high value on Den Df while effects involving A, B, or C don't - with the exception of main effect of A (to make this even more confusing for me).
I will (most likely) pick m2 as my main model - since it's lighter to compute and gives very similar results to that I get with m1 (on my real data).
Any hints on how to approach this, links to similar doubts or answers are very welcome. Since I'm not feeling very comfortable with mixed effects I'd like to minimize the risk of submitting a paper for review with some really stupid mistakes.
Reproducible example: simulated dataset with results from three models:
###############################################################
#Simulate data replicating the structure of a real life example
###############################################################
library(AlgDesign) #for generating all levels a factorial design)
set.seed(1)
df <-gen.factorial(c(16,2,2,2,100), factors = "all",
varNames = c("rep", "A", "B", "C", "Subject"))
df$rep <- as.numeric(df$rep)
df$Subject <- as.numeric(df$Subject)
logRT <- rnorm(n=12800, m=7, sd=2)
trialno<- rep(1:128, times = 100)
response <- factor(sample(0:1, 12800, prob = c(0.3, 0.7), replace = T),
labels= c("no", "yes"))
sex <- factor(sample(0:1, 12800, prob = c(0.3, 0.7), replace = T),
labels = c("M", "F"))
#Simulate some values to drop (set as missings) due to extremly low latency
missingRTs<- as.logical(sample(0:1, 12800, prob = c(0.96, 0.04), replace = T))
logRT[missingRTs==1] <- NA
df <- cbind(df, logRT, trialno, response, sex)
df <- df[complete.cases(df),]
##########################
#Setup 3 models with afex
##########################
library(afex)
m0 <- mixed(logRT ~ A*B*C*response + (1|Subject),
data = df, progress = TRUE, method = "S")
m1 <- mixed(logRT ~ A*B*C*response + (A*B*C*response|Subject),
data = df, progress = TRUE, method = "S")
m2 <- mixed(logRT ~ A*B*C*response + (A*B*C*response||Subject),
data = df, progress = TRUE, method = "S", expand_re = TRUE)
###################################
#Print anovas and "compare" results
###################################
> anova(m0)
Mixed Model Anova Table (Type 3 tests, S-method)
Model: logRT ~ A * B * C * response + (1 | Subject)
Data: df
num Df den Df F Pr(>F)
A 1 12225 1.0287 0.31049
B 1 12225 0.0039 0.95042
C 1 12225 0.1011 0.75058
response 1 12225 0.5895 0.44264
A:B 1 12225 0.1253 0.72341
A:C 1 12225 1.4926 0.22184
B:C 1 12225 0.8637 0.35273
A:response 1 12225 1.1236 0.28916
B:response 1 12225 0.1597 0.68941
C:response 1 12225 0.5611 0.45382
A:B:C 1 12225 0.7137 0.39822
A:B:response 1 12225 1.2668 0.26040
A:C:response 1 12225 0.1672 0.68260
B:C:response 1 12225 0.0605 0.80576
A:B:C:response 1 12225 4.8535 0.02761 *
> anova(m1)
Mixed Model Anova Table (Type 3 tests, S-method)
Model: logRT ~ A * B * C * response + (A * B * C * response | Subject)
Data: df
num Df den Df F Pr(>F)
A 1 158.213 1.0767 0.30102
B 1 155.312 0.0177 0.89427
C 1 167.177 0.0916 0.76255
response 1 306.484 0.4871 0.48575
A:B 1 161.697 0.1026 0.74912
A:C 1 182.233 1.4885 0.22403
B:C 1 162.636 0.7466 0.38882
A:response 1 97.846 1.0823 0.30074
B:response 1 156.889 0.4600 0.49862
C:response 1 318.239 0.5207 0.47108
A:B:C 1 151.198 0.6763 0.41217
A:B:response 1 137.734 1.2008 0.27508
A:C:response 1 133.204 0.2930 0.58920
B:C:response 1 192.306 0.0290 0.86486
A:B:C:response 1 203.593 5.0284 0.02601 *
> anova(m2)
Mixed Model Anova Table (Type 3 tests, S-method)
Model: logRT ~ A * B * C * response + (A * B * C * response || Subject)
Data: df
num Df den Df F Pr(>F)
A 1 11826.44 1.1241 0.28905
B 1 136.94 0.0010 0.97443
C 1 139.59 0.0868 0.76867
response 1 12024.37 0.5834 0.44498
A:B 1 134.01 0.1322 0.71676
A:C 1 131.59 1.4965 0.22339
B:C 1 139.83 0.8194 0.36691
A:response 1 133.61 1.0513 0.30707
B:response 1 12013.99 0.1860 0.66625
C:response 1 12018.27 0.5957 0.44023
A:B:C 1 138.87 0.6922 0.40684
A:B:response 1 134.98 1.1751 0.28029
A:C:response 1 131.62 0.1604 0.68946
B:C:response 1 12013.31 0.0646 0.79936
A:B:C:response 1 12002.06 4.9410 0.02625 *
den Dfcolumn and comparing degrees of freedom for modelsm0,m1and specificallym2where the effect ofresponse, effect ofAand all interactions withresponsehave huge dfs while other effects don't. In modelm0alldenDfare constant (and highly related to number of rows), in modelm1alldenDfare estimated in some way (and tend to be much lower), while inm2some of them are similar to those from a correlated random structure (m1) and some (responseandA) are similar to those from the intercept only model (m0) – blazej Jun 12 '18 at 05:30method="KR")? Also, just out of curiosity, nothing seems to be significant apart from perhaps the 4-way (!) interaction. Was this your research hypothesis? – amoeba Jun 13 '18 at 09:46"KR"but gave up after an hour or so. This example has "only" 12800 rows, my real dataset is twice as big (244 participants with 128 trials per subject) – blazej Jun 13 '18 at 09:49lmerTest). – amoeba Jun 13 '18 at 10:00summary(m2)you will see that some random effects are estimated to have 0 variance. In a way, your model is still degenerate, even after you excluded all correlations. The large DFs that lmerTest is reporting correspond exactly to the factors that get 0 random variance. I think this is expected behaviour of Satterthwaite approximation. – amoeba Jun 13 '18 at 10:23m2did not throw any warnings or errors. I will run an analogous analysis (withmethod=KR) on my real dataset overnight and see if that helps. – blazej Jun 13 '18 at 10:26