Denominator df's depending on random structure in a mixed effects model - how to report them?

Question

In my current research I'm using linear mixed effects models to explain reaction time data from a repeated measure design.

When testing different models for best fit I checked various random structures, keeping fixed effects constant, and noticed that the denominator degrees of freedom change, but I cannot explain why this is happening.

My main question is: How should I report results from a type III anova given df's differ a lot depending on which model I choose? Is there a general rule or is this something more specific for a given discipline? Shall I explain it somehow (with reviewer #2 in mind)?

The other question is: why is this happening especially for not balanced factors? Please note that I will most likely understand only plain English.

Here is an example of data that I'm working with (scroll down for a reproducible example):

Factors A, B, C are balanced within subject factors
Response is a dichotomous (yes / no) answer to a experimental trial, that I treat as factor
logRT is my outcome, which has some missing values due to extremely short latencies
Each combination of A*B*C is repeated 16 times within participants, making a total of 128 trials per subject.

I'm working with mixed() from afex package and using TYPE III sums of squares for F results as this is what 90% of people report in my discipline.

First I did a random intercept only model with 4 factors at the fixed part (see m0 below). This yielded a constant degrees of freedom for each effect in the anova table. I might not understand why, but I'm fine with that.

Then I did the maximal model justified by the design (see m1 below) and this one gave my df's closer to what I'd expect from a rm-Anova. I'm also fine with those since they look consistent.

As model m1 took ages to compute and threw convergence errors among some others I decided to uncorrelate all random slopes using the || syntax (m2 model below).

This in turn gave me something strange: each effect with response (is this due to empty cells?) has a very high value on Den Df while effects involving A, B, or C don't - with the exception of main effect of A (to make this even more confusing for me).

I will (most likely) pick m2 as my main model - since it's lighter to compute and gives very similar results to that I get with m1 (on my real data).

Any hints on how to approach this, links to similar doubts or answers are very welcome. Since I'm not feeling very comfortable with mixed effects I'd like to minimize the risk of submitting a paper for review with some really stupid mistakes.

Reproducible example: simulated dataset with results from three models:

###############################################################
#Simulate data replicating the structure of a real life example
###############################################################

library(AlgDesign) #for generating all levels a factorial design)

set.seed(1)

df <-gen.factorial(c(16,2,2,2,100), factors = "all", 
                   varNames = c("rep", "A", "B", "C", "Subject"))

df$rep <- as.numeric(df$rep)
df$Subject <- as.numeric(df$Subject)
logRT <- rnorm(n=12800, m=7, sd=2)
trialno<- rep(1:128, times = 100)

response <- factor(sample(0:1, 12800, prob = c(0.3, 0.7), replace = T), 
                   labels= c("no", "yes"))
sex <- factor(sample(0:1, 12800, prob = c(0.3, 0.7), replace = T), 
              labels = c("M", "F"))

#Simulate some values to drop (set as missings) due to extremly low latency
missingRTs<- as.logical(sample(0:1, 12800, prob = c(0.96, 0.04), replace = T))
logRT[missingRTs==1] <- NA

df <- cbind(df, logRT, trialno, response, sex)
df <- df[complete.cases(df),]

##########################
#Setup 3 models with afex
########################## 

library(afex)
m0 <- mixed(logRT ~ A*B*C*response + (1|Subject), 
            data = df, progress = TRUE, method = "S")
m1 <- mixed(logRT ~ A*B*C*response + (A*B*C*response|Subject), 
                data = df, progress = TRUE, method = "S")
m2 <- mixed(logRT ~ A*B*C*response + (A*B*C*response||Subject), 
            data = df, progress = TRUE, method = "S", expand_re = TRUE)


###################################
#Print anovas and "compare" results
###################################  


> anova(m0)
Mixed Model Anova Table (Type 3 tests, S-method)

Model: logRT ~ A * B * C * response + (1 | Subject)
Data: df
               num Df den Df      F  Pr(>F)  
A                   1  12225 1.0287 0.31049  
B                   1  12225 0.0039 0.95042  
C                   1  12225 0.1011 0.75058  
response            1  12225 0.5895 0.44264  
A:B                 1  12225 0.1253 0.72341  
A:C                 1  12225 1.4926 0.22184  
B:C                 1  12225 0.8637 0.35273  
A:response          1  12225 1.1236 0.28916  
B:response          1  12225 0.1597 0.68941  
C:response          1  12225 0.5611 0.45382  
A:B:C               1  12225 0.7137 0.39822  
A:B:response        1  12225 1.2668 0.26040  
A:C:response        1  12225 0.1672 0.68260  
B:C:response        1  12225 0.0605 0.80576  
A:B:C:response      1  12225 4.8535 0.02761 *

> anova(m1)
Mixed Model Anova Table (Type 3 tests, S-method)

Model: logRT ~ A * B * C * response + (A * B * C * response | Subject)
Data: df
               num Df  den Df      F  Pr(>F)  
A                   1 158.213 1.0767 0.30102  
B                   1 155.312 0.0177 0.89427  
C                   1 167.177 0.0916 0.76255  
response            1 306.484 0.4871 0.48575  
A:B                 1 161.697 0.1026 0.74912  
A:C                 1 182.233 1.4885 0.22403  
B:C                 1 162.636 0.7466 0.38882  
A:response          1  97.846 1.0823 0.30074  
B:response          1 156.889 0.4600 0.49862  
C:response          1 318.239 0.5207 0.47108  
A:B:C               1 151.198 0.6763 0.41217  
A:B:response        1 137.734 1.2008 0.27508  
A:C:response        1 133.204 0.2930 0.58920  
B:C:response        1 192.306 0.0290 0.86486  
A:B:C:response      1 203.593 5.0284 0.02601 *

> anova(m2)
Mixed Model Anova Table (Type 3 tests, S-method)

Model: logRT ~ A * B * C * response + (A * B * C * response || Subject)
Data: df
               num Df   den Df      F  Pr(>F)  
A                   1 11826.44 1.1241 0.28905  
B                   1   136.94 0.0010 0.97443  
C                   1   139.59 0.0868 0.76867  
response            1 12024.37 0.5834 0.44498  
A:B                 1   134.01 0.1322 0.71676  
A:C                 1   131.59 1.4965 0.22339  
B:C                 1   139.83 0.8194 0.36691  
A:response          1   133.61 1.0513 0.30707  
B:response          1 12013.99 0.1860 0.66625  
C:response          1 12018.27 0.5957 0.44023  
A:B:C               1   138.87 0.6922 0.40684  
A:B:response        1   134.98 1.1751 0.28029  
A:C:response        1   131.62 0.1604 0.68946  
B:C:response        1 12013.31 0.0646 0.79936  
A:B:C:response      1 12002.06 4.9410 0.02625 *

I don't see in the output where the degrees of freedom are shown. The p-values tend to align closely with the F-statistics. I would expect big differences if, as you say, the DFs "differ a lot". Other than that, the degrees of freedom will differ depending on the random structure. — AdamO, Jun 11 '18 at 20:16
I was referring to the den Df column and comparing degrees of freedom for models m0, m1 and specifically m2 where the effect of response, effect of A and all interactions with response have huge dfs while other effects don't. In model m0 all denDf are constant (and highly related to number of rows), in model m1 all denDf are estimated in some way (and tend to be much lower), while in m2 some of them are similar to those from a correlated random structure (m1) and some (response and A) are similar to those from the intercept only model (m0) — blazej, Jun 12 '18 at 05:30
And what happens if you use Kenward-Roger instead of Satterthwaite (method="KR")? Also, just out of curiosity, nothing seems to be significant apart from perhaps the 4-way (!) interaction. Was this your research hypothesis? — amoeba, Jun 13 '18 at 09:46
@amoeba : Please note that these results are based on simulation values given at the beginning of my reproducible example. In my real research my main interest is a (significant regardless of model setting) 3way interaction, that I will need explore further. I tried "KR" but gave up after an hour or so. This example has "only" 12800 rows, my real dataset is twice as big (244 participants with 128 trials per subject) — blazej, Jun 13 '18 at 09:49
OK thanks. I guess the only person who can answer this here is @RuneHChristensen (one of the authors of lmerTest). — amoeba, Jun 13 '18 at 10:00
Ah, actually not, I ran your code and do have some insights. If you look at summary(m2) you will see that some random effects are estimated to have 0 variance. In a way, your model is still degenerate, even after you excluded all correlations. The large DFs that lmerTest is reporting correspond exactly to the factors that get 0 random variance. I think this is expected behaviour of Satterthwaite approximation. — amoeba, Jun 13 '18 at 10:23
Oh, that's a nice catch. Surprisingly m2 did not throw any warnings or errors. I will run an analogous analysis (with method=KR) on my real dataset overnight and see if that helps. — blazej, Jun 13 '18 at 10:26
It's not an error: it converges fine, but to a 0 variance. To be honest, I don't see a problem. I would report it as is, and briefly explain that you are using Satterthwaite approximation which results in very high dfs if corresponding random effect has 0 (or very low) variance. You can add that this does not really matter, as df=100 or df=10000 does not really affect the resulting p-value. These dfs are large enough anyway. — amoeba, Jun 13 '18 at 10:34
Df's for mixed models are tricky: https://stats.stackexchange.com/questions/274565/degrees-freedom-reported-by-the-lmer-model-dont-seem-plausible/274585#274585 — Tim, Jun 13 '18 at 11:54

Denominator df's depending on random structure in a mixed effects model - how to report them?

0 Answers0