0

I have my data like this. 100 patients are asked 3 sets of questions every year (for 5 years) and they were given a score for each sets of question. So I have data like this

id year s1 s2 s3
 1    1 60 30 50
 1    2 65 30 45
 1    3 40 25 50
 1    4 34 20 40
 1    5 32 23 45
 2    1 32 43 32
 ...

I want to fit a mixed model in R with lme4. I transformed the data into this long format:

id year score question
 1    1   60   1
 1    1   30   2
 1    1   50   3
 1    2   65   1
 1    2   30   2
 1    2   45   3
 ...

So my question is how should I fit this model?

I am new to mixed model; I am thinking:

score~1+year+(1+year|id)+(1+year|question)

But not sure if this is right.

I appreciate your suggestions!

1 Answers1

1

Why are you wanting to nest observations within question? I believe you could use question as a fixed effect:

score ~ 1 + year + question + (1+year|id)

You could also then do a year by question interaction; for instance, do scores improve every year, but only for certain questions?

score ~ 1 + year + question + year*question + (1+year|id)

That assumes that the three different questions are measuring different constructs. From your question, that's what it sounds like is happening. Maybe a few more details about what the questions are like would help clear that up.

For example, if all three questions are measuring the same thing (i.e., three different measures of depression), you could fit a latent growth curve model with the three questions being indicators of a latent factor for whatever those three questions measure.

But from what I can tell, I would make question a predictor, not a cluster.

Mark White
  • 10,252
  • +1. As a nit-pick: year * question in R becomes year + question + year:question, which thus includes the two main effects and the interaction. R figures out the redundancy in year + question + year * question and you get the same result, but it's probably good to distinguish between a*b and a:b. – Wayne Apr 27 '17 at 20:37
  • @Wayne correct, thanks for the clarification—I was redundant since I was making a demonstration, and I find the * to be more intuitive of indicating an interaction than :. If I were fitting the model myself, I would have just said score ~ year*question + (1+year|id), but I didn't want to assume anything, in case anyone reading doesn't know much about the code. – Mark White Apr 27 '17 at 20:41
  • Thanks Mark! Yes, the three different questions are measuring 3 different dimension of cognitive ability (like memory, logical, ...) for a person (the disease is Alzheimer's disease). Do you think I should include a random slope on question as well? Like (1 + year + question | id) – user3669725 Apr 28 '17 at 13:58
  • @necoli Yup! You could fit that model first, and then fit a number of nested models without slopes to compare if they are necessary. I posted a comment yesterday about how to test for the significance of the variance in a random slope: https://stats.stackexchange.com/a/276555/130869 – Mark White Apr 29 '17 at 16:01