4

I recently started to work on a case-control study using repeated measures over time (Modeling repeated measures data in R - Interpretation and Validation). I have several dependent variables and I would like to fit a linear mixed model on each of them.

My question is: should I adjust the p-values of the coefficients obtained from each model, and how would I do that? Additionally, as I am reporting simple contrasts to display differences between groups for each time point, should I also adjust the p-values obtained through the simple contrasts?

Ed9012
  • 311

2 Answers2

7

As Jacob Cohen put it, decades ago, "this is a subject on which reasonable people can differ." This has also been discussed many times here, in various contexts. See the tag "multiple-comparisons." This thread seems pretty general. The same issues arise regardless of what tests you are running.

Also see Statistics as Reasoned Argument by Abelson (a great book on a lot of topics). Abelson repeatedly says that a lot of this sort of statistical question is not "what should I do?" or (even more) "what can I do?" but "what can I justify doing?"

Peter Flom
  • 119,535
  • 36
  • 175
  • 383
  • I appreciate your answer, though I am seeking advice on how to do it practically as I don’t have any examples. What I mean is would it be correct to make a list of all coefficients’ pvalue from all the models and then apply the correction? The same for the simple contrasts. The problem with this approach is that I feel I will end up with too many comparisons even if I have only 2 independent variables and 1 interaction. – Ed9012 Feb 15 '24 at 13:12
  • 1
    @Ed9012 Peter's answer says that "would it be correct" is something that can be disagreed about. I think the only definitive answer that is never wrong is that you be open and honest about what you do choose to do, which includes reporting all the comparisons you do perform regardless of the result and how you do or don't adjust for multiple comparisons. – Bryan Krause Feb 15 '24 at 21:48
7

As Peter Flom said, there are many reasonable points of view to what one should do in this situation, and no one correct answer. However, since you are after concrete advice, I can say what I'd do in a comparable situation. I'm not a statistician though but a psychology researcher using statistics and this is just what I typically do, not necessarily what you should do.

I understood that you have a dataset with several dependent variables and you want to run separate linear models with two categorical predictors and their interaction as predictors and the interaction results are what you are mainly interested in. I think you mentioned in some question you have 36(?) different outcomes. In a situation like this, I 1) run the interaction contrasts in emmeans without any adjustment, and 2) take all the p-values from these interaction contrasts and run them through Benjamini-Hochberg correction. In R

pvalues<-c(p1, p2, ... pn) #in order from smallest to largest. Edited to add: so these would be the p-values from healthy vs. patient within each time point, i.e. 5 p-values per outcome variable.
p.adjust<-(pvalues, method=c("BH"))

This p-value adjustment is more lenient than Bonferroni, but still offers some "guard" against type I errors when you run a lot of tests. I know there are many reasonable counterarguments to p-value adjustment in general. However, I'm more comfortable with BH than with running 36 models with no correction (I'm learning the Bayesian way but not quite there yet). Generally, I try to avoid situations where I have to run that many models within the same research design but sometimes that can be difficult.

Sointu
  • 1,846
  • I have 36 different dependent variables so 36 different models and two categorical predictors and their interaction like lmm_model: Response ~ Group * Timepoint + (1 | SubjectID), where Timepoint can have 4 levels. So if I run simple contrasts by Timepoint I obtain 36x4=144 pvalues. The FDR-BH procedure would then use 144 as a denominator. However, in some papers (https://www.jstor.org/stable/40835672) the Bonferroni's correction is performed with n=number of outcomes. In my circumstances using Bonferroni with n=36 lead to slightly less stringent p-values than FDR. Am I missing something? – Ed9012 Feb 16 '24 at 16:11
  • In my understanding, you need to run Bonferroni using number of comparisons/tests of interest, which you have 144. At a quick glance, in the article you linked, they had continuous predictors, so they didn't have pairwise post-hoc comparisons like you, they considered getting a significant effect if their continuous x continuous interaction was significant in the omnibus model. That's reasonable with cont x cont but you can't do that if you want to conduct pairwise comparisons within each time point. So, Bonferroni correction for you would lead to critical p-value of 0.05 / 144 = 0.000347 – Sointu Feb 16 '24 at 17:08