Non significant difference between condition from LME model, when Confidence intervals clearly non overlapping

Question

Edit with graph:

I am struggling a bit conceptually to make sense of a result I get when applying a linear mixed model to my reaction time data.

I have a 2x2 within subjects design. When I plot the data by means of an interaction plot, one of the two lines is above the other, with non-overlapping confidence intervals. However, when I apply a linear mixed-model, which looks like this:

model26 = lme(log(RT_times) ~ location*task, ~1+location*task|participant,data= data,method='REML',weights = varComb(varIdent(form=~1|location*task)),control =list(msMaxIter = 1000, msMaxEval = 1000))

I don't find any significant main effect. This is the output:

Linear mixed-effects model fit by REML
Data: data_sac
Random effects:
Formula: ~1 + task * condition | pp
Structure: General positive-definite, Log-Cholesky parametrization
                         StdDev    Corr

(Intercept)              0.2479765 (Intr) tskndf cndtnv
taskundef                0.1391700 -0.708

conditionvalid           0.1722409 -0.672  0.651

taskundef:conditionvalid 0.1848967  0.652 -0.627 -0.990
Residual                 0.2490666
Combination of variance functions: 
Structure: Different standard deviations per stratum
Formula: ~1 | condition * task 
Parameter estimates:
invaliddef     validdef invalidundef   validundef 
  1.0000000     0.8943147     0.8514028     0.8917650 
Fixed effects:  log(latency) ~ condition * task 
Correlation: 
                         (Intr) cndtnv tskndf
conditionvalid           -0.680

taskundef                -0.688  0.646

conditionvalid:taskundef  0.628 -0.938 -0.673
Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-7.10755334 -0.40245682  0.02502696  0.51551241  4.18246501
Number of Observations: 5209
Number of Groups: 56

To the contrary, the p-value for task is about 0.7. I find this very strange, as for another dataset with a comparable graph, I do instead get significant results. Now, I do get that the computation of the 95% CIs and the linear mixed model are different, so they might lead to different results, but I don't get how they can be SO different. There does not seem to be anything wrong with my data, I even removed outliers etc, so it is difficult for me to grasp what is going on.

Hope the question is clear now. Many thanks for any insight you might provide!

It's very hard to know what's going on here unless you provide details in the question about what the results of the different models are. In particular, there are really no reliable individual "main effects" when you have an interaction in the model, as the "main effect" of any one predictor will depend on how its interacting predictors are coded. Please edit the question to provide those details about the results of your models. — EdM, May 17 '23 at 02:05
It helps if you provide images or tables of the data and outcome. Several issues can lead to your question 1) interpretation of overlap of error bars 2) the influence of adding a (random) effect 3) the interpretation of the p-value of a main effect or intercept — Sextus Empiricus, May 17 '23 at 06:06
@EdM thanks for your quick comments! I have edited my question and I hope that now it is clearer. — SinC, May 17 '23 at 08:26
@SextusEmpiricus I updated the post, thanks a lot for the answers — SinC, May 17 '23 at 08:27
How many measurements do you have and how many participants? Are the confidence intervals also computed with the assumption of random effects? — Sextus Empiricus, May 17 '23 at 10:31
In particular, how many observations are there per individual for each location/task combination? You might be trying to fit too complex a random-effect model for the data that you have. What happens if you use a simpler intercept-only random effect instead of this model with both an intercept and four slopes as random effects? It also would help to show summaries of the random-effect estimates in addition to the fixed-effect estimates that you display. Were there any warnings returned when you fit the model? — EdM, May 17 '23 at 11:54
So, I have in total 56 participants. Each has 25 trials per task/location combination (it is not much data per subject as this is a subset of my original data and this is an exploratory analysis). The confidence intervals are simply those obtained as a default output from python seaborn. When I do not model the random slope, the main effect of task is significant (p=0.03). There were no warnings in the outputs from the model, but I am adding the full outputs for better clarity — SinC, May 17 '23 at 12:11
Your situation might be as "the influence of adding a (random) effect". By considering a random effect you are effectively reducing the degrees of freedom. A study based on 56 measurements or 1400 measurements, that can be a big difference in the estimates of the standard error. This difference is not visible in your confidence intervals which are computed based on the assumption that the 1400 measurements are independent. — Sextus Empiricus, May 17 '23 at 12:49
There's a difference in predictor terminology among your displays. Some say "condition" (valid vs. invalid) while others say "location" (left vs. right). Please make sure that we are seeing commands and outputs for the same model in all of the displays. Also (admittedly, unlikely to be a problem) your interaction plot is on a linear scale, not the log scale the your model uses. — EdM, May 18 '23 at 12:12

score 0 · Answer 1 · answered May 18 '23 at 13:28

Here is an answer that combines several comments.

Several issues can lead to your question

interpretation of overlap of error bars
Why is mean ± 2*SEM (95% confidence interval) overlapping, but the p-value is 0.05?
the influence of adding a (random) effect
Fixed vs. random effect meta regression
the interpretation of the p-value of a main effect or intercept

What does a significant intercept mean in ANOVA?
If the plot shows confidence intervals of the levels, then the effects, when measured within the subjects, might have different accuracy

In your case, I believe that it is case 2 and your graph does not correctly represent the confidence intervals.

The confidence intervals are simply those obtained as a default output from python seaborn

By considering a random effect you are effectively reducing the degrees of freedom. A study based on 56 independent groups/measurements or 1400 independent measurements, that can be a big difference in the estimates of the standard error.

This difference is not visible in your confidence intervals which are computed based on the assumption that the 1400 measurements are independent (but they are not independent, they correlate a lot within the same participant).

                         StdDev    Corr                
(Intercept)              0.2479765 (Intr) tskndf cndtnv
taskundef                0.1391700 -0.708              
conditionvalid           0.1722409 -0.672  0.651       
taskundef:conditionvalid 0.1848967  0.652 -0.627 -0.990
Residual                 0.2490666

This would not be so bad if the random effect is small relative to the residual, but from the output it seems like the individuals have a large variation and the within individual variance (the residual term) is only 25% of the total variance.

many thanks for your reply. I upvoted your reply, but I don't have enough reputation to cast a vote. In any case, thanks a lot! — SinC, May 30 '23 at 07:33

Non significant difference between condition from LME model, when Confidence intervals clearly non overlapping

1 Answers1