2

I have a data set that looks like this toy data

library(tidyverse)
data <- tibble(ID = rep(c("Billie", "Elizabeth", "Louis"), times = 1, each = 6),
           Group = c(rep("control", 12), rep("patient", 6)),
           Time = rep(c("T1", "T2"), times = 3, each = 3),
           Item = rep(c("a", "b", "c"), times = 6, each = 1), 
           answer = sample(1:7, size = 18, replace = TRUE))

There are some individual participants (ID), who can be either patient or control participants (Group). The participants take part in an experiment two times (Time). At each time, they answer three items (Item), which all measure the same construct . The answershows their answer on a 7-point-Likert-Scale (if you are not from the psychology world, the patients can have biopsies two times, and each time, three samples (items a, b, c) are taken). The research question is: does the group-membership alter the change in answers between time points / is the change in answers between the time points different for the two groups? (are the changes in biopsied tissues different for the two groups). To analyze the data, I use the brms-package.

If I wanted an easy life, I would just calculate the average answer per person and time point and continue from there.

easy <- data %>% 
        group_by(ID, Time) %>% 
        summarize(Group = unique(Group), 
                  mean_ans = mean(answer))

To analyze with brms, my formula would then be

bf(mean_answer ~ 1 + Group * Time + (1|ID))

(At least I hope so...)

But life is nicer when it's complicated, so my question is: how can I specify a brms-formula that allows me to include the item-level information that is present in my data? I think what I would like to write is something like this

bf(answer ~ 1 + Group * Time + (1 | Item|Time|ID)) 

Reading into crossed and nested random effects here and here, I was under the impression that my data are crossed, leading to the following formula:

bf(answer ~ 1 + Group * Time +  (1+ ID) + (1|Time) + (1|Item))

But does this formula take into account the correlation structure of my data?

Moving on, following this paper, I was under the impression that my data are the "crossed and nested" part of the figure. Following this track, at the end of this site is a guide as to how to specify this case in lme4, but I have a hard time translating this into brms formulas. Finally, i found this great site on country-year panel data, which I am currently exploring, but I am having a hard time translating the scenarios there to my case. I would greatly appreciate any help in this. Thank you already in advance!

lilla
  • 23

1 Answers1

3

Let's go through each of the specifications that you've suggested one by one.

The simplest model you suggest is:

bf(mean_answer ~ 1 + Group * Time + (1|ID))

This corresponds to the 'random intercept' model. This will allow you to estimate the mean of mean_answer at baseline and follow-up in each group. The Group:Time interaction will indicate whether the treated group changed relatively more than the untreated group between baseline and follow-up.

But it would be nice to use the item-level information, as you suggest. Keeping things on their natural scale (7-point Likert) rather than a relatively artificial average might help for interpretation/presentation.

You suggest the following model:

bf(answer ~ 1 + Group * Time + (1 | Item|Time|ID)) 

I don't know what's going on here so I'm going to skip it.

The next one is:

bf(answer ~ 1 + Group * Time +  (1+ ID) + (1|Time) + (1|Item))

There is a mistake in this specification (at least, I think so). We usually don't want to include time (a 2-level variable) as a random intercept. It's already included as a fixed effect so it's redundant anyway. Rather, we want to allow the effect of time, that is, the trajectory from baseline to follow-up, to be different for each participant. In other words, we want a random slope of time. That would look like this:

bf(answer ~ 1 + Group * Time +  (1+time|ID) + (1|Item))

Here Item and ID are crossed random effects as each participant completes each item. A few issues. Firstly, it is usually suggested that you need about 5-6+ levels of the random effect to reliably estimate the variance and Item only has 3. Likely this will be fine with a little regularisation from the prior, though you might try just including Item as a fixed effect.

Secondly, for a pre-post study, you might struggle to estimate a random intercept for ID and a random slope for Time. The reason being that allowing each participant to have their own starting point and their own trajectory can result in a model with about as many parameters as there are data points. Again, the prior might help with this, I'm not sure.

Here is one last formulation to consider:

bf(answer ~ 1 + Group * Time + Item + (1|ID))

This model differs from the one above in that Item is included as a fixed effect and the random slope for Time is dropped (i.e., participants within groups are assumed to have a constant trajectory). It would be interesting to contrast this model with one that includes a random slope for time and see how they compare. Also, keep in mind, for Likert data, you are better off with an ordinal regression model rather than a metric (Gaussian likelihood) model.


Lachlan
  • 1,192
  • 1
    Thank you very much for your answer and for picking my brain! Your intuition about the item-as-random-intercept-and-time-as-random-slope was right, brms struggled with this formula, at least with default priors. However, the last formula works fine and makes sense to me (especially since in the real data, sometimes there are not than many items either). And yes, ordinal regression is what I use of course. For anyone interested, here is a nice tutorial to start. Thanks again @Lachlan! – lilla Apr 12 '23 at 07:49