3

I am running a glmer model

    glmer(RT ~ Prob * Bl * Session * Gr + (1  | Participant), 
          data= Data.trimmed, family = Gamma(link = "log"), 
            control=glmerControl(optimizer="bobyqa", 
            optCtrl=list(maxfun=1000000)), nAGQ = 0)

with gamma family and I am trying to determine whether this model shows a good fit. It doesn't seem like it does, but I would like to get someone else's opinion since these models are a lot more complicated to deal with. If you think this is poorly fit, do you have any suggestion for how to improve the model?
I have 209062 rows of data and this is response time data. I want to determine whether there differences between groups (Gr - 2 levels) on the learning of a task (Pr - 2 levels) across time (within sessions - Bl - 4 levels / across sessions - Session - 2 levels). It doesn't have zero response times, but some close to zero. I have also added the lmer model at the end.

Glmer Model summary:

Generalized linear mixed model fit by maximum likelihood (Adaptive Gauss-Hermite Quadrature, nAGQ = 0) ['glmerMod']
 Family: Gamma  ( log )
Formula: RT ~ Probability * Block * Session * Group + (1 | Participant)
   Data: Data.trimmed
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+06))
 AIC      BIC   logLik deviance df.resid 

2456107 2456538 -1228012 2456023 209020

Scaled residuals: Min 1Q Median 3Q Max -4.297 -0.625 -0.158 0.440 35.691

Random effects: Groups Name Variance Std.Dev. Participant (Intercept) 0.002203 0.04694 Residual 0.053481 0.23126 Number of obs: 209062, groups: Participant, 130

Fixed effects: Estimate Std. Error t value Pr(>|z|)
(Intercept) 6.024e+00 4.182e-03 1440.439 < 2e-16 *** Probability1 -2.835e-02 7.041e-04 -40.265 < 2e-16 *** Block2-1 -2.925e-02 2.077e-03 -14.084 < 2e-16 *** Block3-2 -3.676e-03 2.131e-03 -1.725 0.084500 .
Block4-3 4.085e-03 2.307e-03 1.771 0.076577 .
Block5-4 -1.220e-02 2.380e-03 -5.125 2.98e-07 *** Session1 4.795e-02 7.323e-04 65.478 < 2e-16 *** Group1 -4.616e-02 4.182e-03 -11.037 < 2e-16 *** Probability1:Block2-1 -7.228e-03 2.077e-03 -3.480 0.000501 *** Probability1:Block3-2 -5.332e-03 2.131e-03 -2.503 0.012331 *
Probability1:Block4-3 -2.076e-02 2.307e-03 -8.999 < 2e-16 *** Probability1:Block5-4 6.044e-03 2.380e-03 2.539 0.011104 *
Probability1:Session1 1.656e-03 7.046e-04 2.351 0.018743 *
Block2-1:Session1 -1.972e-02 2.077e-03 -9.494 < 2e-16 *** Block3-2:Session1 -8.521e-03 2.131e-03 -3.999 6.35e-05 *** Block4-3:Session1 4.380e-05 2.308e-03 0.019 0.984856
Block5-4:Session1 -3.768e-03 2.380e-03 -1.583 0.113389
Probability1:Group1 1.515e-03 7.041e-04 2.151 0.031478 *
Block2-1:Group1 -6.161e-03 2.077e-03 -2.966 0.003015 ** Block3-2:Group1 -1.129e-02 2.131e-03 -5.301 1.15e-07 *** Block4-3:Group1 7.095e-03 2.307e-03 3.076 0.002101 ** Block5-4:Group1 -4.055e-03 2.380e-03 -1.704 0.088414 .
Session1:Group1 3.782e-03 7.323e-04 5.164 2.41e-07 *** Probability1:Block2-1:Session1 5.729e-05 2.077e-03 0.028 0.977997
Probability1:Block3-2:Session1 3.543e-03 2.131e-03 1.663 0.096363 .
Probability1:Block4-3:Session1 -6.877e-03 2.308e-03 -2.980 0.002886 ** Probability1:Block5-4:Session1 4.329e-03 2.380e-03 1.819 0.068952 .
Probability1:Block2-1:Group1 -1.238e-03 2.077e-03 -0.596 0.550980
Probability1:Block3-2:Group1 1.022e-02 2.131e-03 4.795 1.63e-06 *** Probability1:Block4-3:Group1 -6.532e-03 2.307e-03 -2.831 0.004634 ** Probability1:Block5-4:Group1 2.351e-03 2.380e-03 0.988 0.323373
Probability1:Session1:Group1 -1.805e-03 7.046e-04 -2.562 0.010412 *
Block2-1:Session1:Group1 -2.060e-04 2.077e-03 -0.099 0.920984
Block3-2:Session1:Group1 -4.211e-03 2.131e-03 -1.977 0.048094 *
Block4-3:Session1:Group1 3.339e-03 2.308e-03 1.447 0.147888
Block5-4:Session1:Group1 -3.956e-03 2.380e-03 -1.662 0.096539 .
Probability1:Block2-1:Session1:Group1 -1.270e-03 2.077e-03 -0.611 0.540933
Probability1:Block3-2:Session1:Group1 1.678e-03 2.131e-03 0.788 0.430929
Probability1:Block4-3:Session1:Group1 -4.640e-03 2.308e-03 -2.010 0.044392 *
Probability1:Block5-4:Session1:Group1 4.714e-03 2.380e-03 1.980 0.047649 *


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation matrix not shown by default, as p = 40 > 12. Use print(x, correlation=TRUE) or vcov(x) if you need it

plotting residuals:

enter image description here enter image description here qqplot

enter image description here

plot(fitted(RT.model), residuals(RT.model))  

enter image description here

Dharma plot:

enter image description here

plot(RT.model,sqrt(abs(residuals(.))) ~  fitted(.), 
            type=c("p","smooth"))

enter image description here


For the lmer model:

RT.model <- lmer(logRT ~ Probability * Block * Session * Group + (1  | Participant), data= Data.trimmed)

Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest'] Formula: logRT ~ Probability * Block * Session * Group + (1 | Participant) Data: Data.trimmed

REML criterion at convergence: -51844.2

Scaled residuals: Min 1Q Median 3Q Max -23.0325 -0.6283 -0.0801 0.5553 11.0990

Random effects: Groups Name Variance Std.Dev. Participant (Intercept) 0.02413 0.1553
Residual 0.04541 0.2131
Number of obs: 209062, groups: Participant, 130

Fixed effects: Estimate Std. Error df t value Pr(>|t|)
(Intercept) 5.999e+00 1.364e-02 1.283e+02 439.726 < 2e-16 *** Probability1 -2.731e-02 6.488e-04 2.089e+05 -42.087 < 2e-16 *** Block2-1 -2.597e-02 1.914e-03 2.089e+05 -13.569 < 2e-16 *** Block3-2 -6.639e-03 1.963e-03 2.089e+05 -3.381 0.000721 *** Block4-3 3.532e-03 2.126e-03 2.089e+05 1.662 0.096610 .
Block5-4 -1.598e-02 2.193e-03 2.089e+05 -7.287 3.18e-13 *** Session1 4.636e-02 6.754e-04 2.090e+05 68.638 < 2e-16 *** Group1 -4.793e-02 1.364e-02 1.283e+02 -3.513 0.000613 *** Probability1:Block2-1 -7.919e-03 1.914e-03 2.089e+05 -4.138 3.51e-05 *** Probability1:Block3-2 -3.684e-03 1.963e-03 2.089e+05 -1.877 0.060567 .
Probability1:Block4-3 -2.211e-02 2.126e-03 2.089e+05 -10.399 < 2e-16 *** Probability1:Block5-4 6.860e-03 2.193e-03 2.089e+05 3.128 0.001763 ** Probability1:Session1 2.198e-03 6.493e-04 2.089e+05 3.385 0.000711 *** Block2-1:Session1 -1.879e-02 1.914e-03 2.089e+05 -9.817 < 2e-16 *** Block3-2:Session1 -7.896e-03 1.963e-03 2.089e+05 -4.022 5.78e-05 *** Block4-3:Session1 -1.407e-03 2.126e-03 2.089e+05 -0.661 0.508296
Block5-4:Session1 -5.281e-03 2.193e-03 2.089e+05 -2.408 0.016046 *
Probability1:Group1 1.695e-03 6.488e-04 2.089e+05 2.612 0.009010 ** Block2-1:Group1 -7.155e-03 1.914e-03 2.089e+05 -3.739 0.000185 *** Block3-2:Group1 -1.071e-02 1.963e-03 2.089e+05 -5.453 4.96e-08 *** Block4-3:Group1 6.027e-03 2.126e-03 2.089e+05 2.835 0.004578 ** Block5-4:Group1 -1.975e-03 2.193e-03 2.089e+05 -0.900 0.367938
Session1:Group1 3.271e-03 6.754e-04 2.090e+05 4.843 1.28e-06 *** Probability1:Block2-1:Session1 3.932e-05 1.914e-03 2.089e+05 0.021 0.983609
Probability1:Block3-2:Session1 2.946e-03 1.963e-03 2.089e+05 1.500 0.133491
Probability1:Block4-3:Session1 -5.279e-03 2.127e-03 2.089e+05 -2.482 0.013068 *
Probability1:Block5-4:Session1 4.600e-03 2.193e-03 2.089e+05 2.098 0.035950 *
Probability1:Block2-1:Group1 -1.013e-03 1.914e-03 2.089e+05 -0.529 0.596705
Probability1:Block3-2:Group1 1.014e-02 1.963e-03 2.089e+05 5.165 2.41e-07 *** Probability1:Block4-3:Group1 -6.676e-03 2.126e-03 2.089e+05 -3.140 0.001687 ** Probability1:Block5-4:Group1 2.475e-03 2.193e-03 2.089e+05 1.129 0.259079
Probability1:Session1:Group1 -1.077e-03 6.493e-04 2.089e+05 -1.658 0.097259 .
Block2-1:Session1:Group1 2.446e-03 1.914e-03 2.089e+05 1.278 0.201365
Block3-2:Session1:Group1 -3.793e-03 1.963e-03 2.089e+05 -1.932 0.053342 .
Block4-3:Session1:Group1 1.633e-03 2.126e-03 2.089e+05 0.768 0.442515
Block5-4:Session1:Group1 -2.495e-03 2.193e-03 2.089e+05 -1.137 0.255393
Probability1:Block2-1:Session1:Group1 -2.243e-03 1.914e-03 2.089e+05 -1.172 0.241206
Probability1:Block3-2:Session1:Group1 1.190e-03 1.963e-03 2.089e+05 0.606 0.544432
Probability1:Block4-3:Session1:Group1 -3.378e-03 2.127e-03 2.089e+05 -1.588 0.112179
Probability1:Block5-4:Session1:Group1 5.139e-03 2.193e-03 2.089e+05 2.343 0.019126 *


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation matrix not shown by default, as p = 40 > 12. Use print(x, correlation=TRUE) or vcov(x) if you need it

plotting residuals

enter image description here

plot(fitted(RT.model),residuals(RT.model))

enter image description here

qqplot

enter image description here

DHARMa plot

enter image description here

CatM
  • 476
  • There are many similar posts, like https://stats.stackexchange.com/questions/390063/what-are-the-assumptions-of-a-gamma-glm-or-glmm-for-hypothesis-testing some comments: 1) The histogram has way to few points to be useful! 2) There are no normal assumption in a gamma glm, so why a normal qqplot? Try simulated residuals, see example at https://stats.stackexchange.com/questions/295340/what-to-do-with-glm-gamma-when-residuals-are-not-normally-distributed/302413#302413 3) In the other plots, there is way to much overplotting. Use smaller symbols and transparency – kjetil b halvorsen Jul 12 '21 at 22:07
  • I used the DHARMa plots as you can see and they look disappointing. I did fit a nlme model as well which gave the exact same results but this model's fit seems very poor. – CatM Jul 12 '21 at 23:02
  • I agree, but without more onformation on data and model, variables used we can say no more! – kjetil b halvorsen Jul 12 '21 at 23:08
  • all variables are categorical and dependent variable is response times, I'm going to add this to the post – CatM Jul 12 '21 at 23:09
  • What is the goal of modeling? Prediction, interpretation? There is way to many extreme residuals for a gamma model, is there many zero response times? Please tell us a lot of the context – kjetil b halvorsen Jul 12 '21 at 23:18
  • I want to determine whether there differences between groups (Gr - 2 levels) on the learning of a task (Pr - 2 levels) across time (within sessions - Bl - 4 levels / across sessions - Session - 2 levels). It doesn't have zero response times, but some close to zero. – CatM Jul 12 '21 at 23:21
  • Your response variable is logRT, is it already logged? For a gamma glm it should not be! Otherwise, there is not enough information in your Q for me to say more, Maybe include better information & better plots, or even share the data? Or maybe ask a new question about alternatives to gamma glm's when the gamma model dont have capacity to fit the variance (or outliers)? – kjetil b halvorsen Jul 13 '21 at 15:25
  • Yes, I do have logRT but the plots look very much identical with RT as DV or logRT. Yet, the model fails to converge when using RT. Which plots would you recommend using? Just changed all plots to show the results for "RT" instead of logRT. – CatM Jul 13 '21 at 17:18
  • At least I would give the summary of the model fit, and do what I said above about more bins, smaller symbols and transparency ... – kjetil b halvorsen Jul 13 '21 at 17:32
  • I am sorry, I should have added the output. It never crossed my mind. I added a better plot for the residuals (I think). I think the problem is the left side, there are a lot more data points than there should be for a gamma distribution. I under if the best option would be truncate the data but not sure if removing data points means losing important information, since one of the groups is clearly faster than the other and there's theoretical reasons for why this happens. – CatM Jul 13 '21 at 18:11
  • One option is to try ordinal regression, see for instance https://stats.stackexchange.com/questions/410421/analysis-for-ordinal-categorical-outcome – kjetil b halvorsen Jul 13 '21 at 18:19
  • 1
    Throwing in a quick two cents after reading through this, have you tried an ex-gaussian regression? I don't have much personal experience with modeling reaction time data, but I see these models commonly run where the assumption is an ex-gaussian distribution. Also possible that you need to run a distributional model where there are other variables predicting the gamma's shape value. Gamma assumes variance increases with the mean, but running a distributional model lets you also add other predictors of that shape distribution parameter – Billy Jul 13 '21 at 18:55
  • What do the distributions of residuals look like if you subdivide them by predictor category? – Ben Bolker Jul 13 '21 at 23:38
  • @BenBolker Bolker which predictor? Should I do that for all? – CatM Jul 14 '21 at 00:51
  • I was thinking that you could draw Q-Q plots for all of the predictor combinations (I guess there are 40?) Since you've fit a "saturated" model (all possible interactions of categorical predictors), there can't be any problems with linearity (mean values). – Ben Bolker Jul 14 '21 at 01:32
  • I am sorry for bothering you again but is there any efficient way of doing the qq plots for each combination? – CatM Jul 14 '21 at 02:01

0 Answers0