3

I have data regarding level of damage in tissue samples, using a Likert scale. Our main goal is to understand if there is a difference in the levels between two groups - control and treated.

An example of a dataset:

df<-data.frame(Class=as.character(sample(1:5,size = 50,replace=T)),
               Trt=sample(c("Ctrl","Exp"),size = 50,replace=T),
               Exp=rep(paste0("Exp",1:5,sep=""),each=10))

Edit: We will have data from 5-7 independent experiment, which lead us to think mixed effect regression as the ideal approach (to account for among-experiment variability). Per experiment, we are expecting at most 12 observations but, might be less (<10).

I was thinking to perform a multinomial mixed effect regression. Do you agree? Or is it any other recommended alternative? Any package recommendation for this type of analysis?

Thanks in advance!

  • 1
    How many "independent experiments" do you have? How many observations within each "experiment"? Your example suggests 5 experiments with 10 observations each, but the actual details matter. Also, in what ways are the experiments different? Is there some reason why you expect their results to differ? Please edit the question to provide that information, as comments are easy to overlook and can be deleted. Also, it sounds like your outcome categories are ordered, so ordinal logistic regression would be preferable. – EdM Aug 29 '22 at 16:43
  • 2
    Thanks @EdM, I added some information to the question text. Thanks also for the suggestion for ordinal logistic regression, might indeed be the right approach! – André Barros Aug 29 '22 at 16:57

1 Answers1

2

It's good to be asking questions about data analysis before you collect the data.

First, with ordered categorical outcomes you should be considering a form of ordinal regression. It comes in two major versions, briefly summarized here, and it's possible in both versions to choose link functions besides the standard logit for logistic regression. This UCLA web page has a helpful introduction, via a proportional-odds logistic-regression fixed-effects model.

Second, experiment numbers of 5 to 7 are close to a common decision boundary for using random effects versus mixed effects to account for such differences. I'll point you to the GLMM FAQ and this Cross Validated page (among many others) for discussion of the issues. With 5 to 7 experiments you are unlikely to get a good estimate of the variance among experiments, if that's of interest in itself.

Third, you have to think carefully about what type of "among-experiment variability" you are trying to "account for." A simple fixed or random effect would only "account for" baseline (intercept) variability in Damage. It wouldn't "account for" variability in the association between Treatment and Damage. To account for that you would have to add interaction terms between Treatment and fixed experiment effects, or random "slopes" with respect to Treatment (along with an assumption about random intercept-slope correlations).

The R ordinal package can handle proportional-odds logistic regression mixed models, or similar models with other cumulative link functions. This page shows how to format and analyze data for continuation-ratio mixed models with tools in the GLMMadaptive package.

As you seem to still be in the study design phase, it would make sense to simulate outcomes of the type you expect, to get a handle on these analyses and to ensure that your design has enough power to answer your question. In particular, evaluate different scenarios for the variability among experiments. That would seem to be a big potential limiting factor. You might consider tradeoffs between the number of experiments and the number of cases per experiment, as inter-experiment variability changes.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Thanks for an amazing answer! Regarding point two and three, this is something I'm currently exploring, since with the amount of sampling per experiment I have might not allow me to get a random intercept + slope structure. In that case, I'd tend to either stick with the random intercept or use indeed an interaction term with experiment (in that sense, I could then ascertain if the treatment effect is similar across experiments). – André Barros Aug 30 '22 at 05:42
  • @AndréBarros an interaction term of Treatment by experiment might overfit or require a larger study. With 5 experiments, you need to estimate 4 extra individual coefficients (differences in intercepts) and 4 more for their Treatment interactions, in addition to 2 for your main model (intercept and Treatment coefficient). To avoid overfitting, you need about 15 observations per coefficient you are estimating (Section 4.4 , Frank Harrell's course notes], or about 150 observations. You can account for the random variability without estimating it closely. – EdM Aug 30 '22 at 12:38
  • 1
    @AndréBarros I added a paragraph to the answer, noting how simulation could be helpful during study design. – EdM Aug 30 '22 at 12:44
  • A good addition, thank you once again. – André Barros Aug 30 '22 at 13:03