In one of my research, we decided to "update" the procedure mid-study. In short, we initially tested a repeated measures design (1 factor with 3 levels, 5 times per condition for a total of 15 trials. Let's call the levels within the factor A A1, A2, and A3. I'm interested in seeing if factor A (A1, A2 or A3) affects our dependent variables (i.e. Sway Amplitude).
Factor A represents visual stimulation (A1 = no visual stimulation, A2 = Visual stimulation, A3 = erroneous visual stimulation).
After 15 participants, we decided that we should add a new factor to our design (i.e. feet position). let's call it factor B with B1 = feet together and B2 = feet shoulder width. B is also a repeated measure. We thus tested 25 additional participants with the 2x3 design ([A1 A2 A3] * [B1 B2]) for a total of 30 trials per participant (5 per condition).
Note that B1 (feet together) was also inherently present for the 15 first participants. We only added the feet apart level to the B factor, thus creating B2.
In other words,
15 participants:
- (B1*[A1 A2 A3]) -->A1B1, A2B1, A3B1
- 3 visual conditions under the feet together (repeated measures)
- 15 trials/participant.
25 participants:
- ([B1 B2 * A2 A3 A3]) --> A1B1, A2B1, A3B1, A1B2, A2B2, A3B2
- visual conditions under the feet together and feet shoulder width position (repeated measures)
- 30 trials/participant
. The procedure and instructions were the same other than that modification to the design.
Since the 15 participants share conditions with the next 25 participants (A1B1, A2B1, A3B1), I would like to analyze them together.
My question now is: How should I build my model to consider this modification in the protocol? Should I use the 15 first participants or should I not use the first 15 participants?
I did some preliminary statistical analysis using lmer and "ExperimentNumber" which indicates if the participant was in the first 15 or in the last 25:
model <- lmer(Value ~ A * B * ExperimentNumber + (1 | Participant) +(1 | Trial), data = data_clean)
and found a main effect of ExperimentNumber (no interaction with that predictor).
This suggests that for our dependent variables, the average of the 15 participants was significantly lower than the average of the 25 additional participants (under the B1 level of course). I honestly can't see why. Probably just the risk of "sampling".
Because of that, should I remove the 15 initials participants from my analysis? Should I look at additional things?
I also compared my first model with a second model without ExperimetNumber and a third model with ExperimetNumber as a random factor as pointed out here. Not sure yet why ExperimentNumber should be random instead of fix, but there was no significant difference between the models.
therefore:
> model <- lmer(Value ~ FB * Realized * Experiment + (1 | Participant) +(1 | Trial), data = data_filtered)
> model2 <- lmer(Value ~ FB * Realized + (1 | Participant) +(1 | Trial), data = data_filtered)
> model3 <-lmer(Value ~ FB * Realized + (1 | Participant) +(1 | Trial)+ (1 | Experiment), data = data_filtered)
> anova(model,model2,model3)
refitting model(s) with ML (instead of REML)
Data: data_filtered
Models:
model2: Value ~ FB * Realized + (1 | Participant) + (1 | Trial)
model3: Value ~ FB * Realized + (1 | Participant) + (1 | Trial) + (1 | Experiment)
model: Value ~ FB * Realized * Experiment + (1 | Participant) + (1 | Trial)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
model2 9 7066.4 7105.3 -3524.2 7048.4
model3 10 7067.1 7110.3 -3523.6 7047.1 1.2994 1 0.2543
model 15 7069.3 7134.0 -3519.6 7039.3 7.8743 5 0.1633
Therefore, what would be the best way to deal with my data? Should I combine them or should I stick with my 25 participants only? Should I make additional corrections? I've found conflicting information in the literature.
Thanks for your time!