0

is there a way to design a mixed model for uneven number of measurements per subject and (more importantly) with uneven time intervals between measurements which are taken at different time points (the dataset contains observations during several years)?

I have data about pig reproductive traits. The goal is to determine if a particular mutation (snp) affects the traits. Sample data:

pig_id measurement(order) breed year snp age(days) y
1 1 A 2020 AA 250 330
1 2 A 2020 AA 290 290
... ... ... ... ... ... ...
1 80 A 2021 AA 600 320
2 1 B 2016 BB 330 400
2 2 B 2016 BB 350 385
2 3 B 2017 BB 365 360

The biggest problem I see is that measurement 1 for pig 1 is a completely different time point (date) than measurement 1 for pig 2.

Using SAS, I wanted to try something like this:

proc mixed data=have;
    class pig_id breed year snp measurement;
    model y = age interval breed year snp measurement;
    repeated measurement / subject=pig_id(snp) type=SP(POW);
run;

where interval would mean number of days from the last measurement. But I am not sure if it can fix the problem above.

I also considered giving every unique date of observation its "serial number" (so instead of measurement_order I would use a time point from 1 to n), but then I end up having thousands of levels for fixed effect of time...

So is there a solution within these mixed models, or is my only chance to dig into unevenly spaced time series?

opiczak
  • 13

1 Answers1

1

I also considered giving every unique date of observation its "serial number" (so instead of measurement_order I would use a time point from 1 to n), but then I end up having thousands of levels for fixed effect of time...

The trick is to do that, but then model time flexibly as a continuous predictor, for example with a regression spline. The reduces your "thousands of levels for fixed effect of time" to a handful of regression coefficients (maybe only 4 or 5) that describe the spline from the data.

That's a strength of mixed modeling: you don't have to have the same number of observations per individual or the same timing between observations, if you model appropriately.

You'll have to apply your understanding of the subject matter to decide how to set your time reference. For example: do you use the same calendar time directly for all individuals, or instead use something like birth as the time reference for each individual and then include calendar date of birth as an additional covariate? The same principle applies whichever choice you make: model time flexibly and continuously.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Thanks a lot. I guess in SAS, I will need to use random statement instead of repeated (because repeated asks for classification effect, so measurement should be class). Or use lmer in R right away. But how about the covariance structure? If I control for time and intervals between two observations (both continuous predictors), can I then go with autoregressive? – opiczak Oct 23 '22 at 09:43
  • @opiczak I don't know SAS so I can't comment on that. I'm also not an expert on covariance structures. There are some links to reading on that here. The lme4 functions don't seem to provide as much flexibility for that as do other packages, but they seem to work well in practice. – EdM Oct 23 '22 at 16:43
  • Sorry, one more thing: "use something like birth as the time reference for each individual and then include calendar date of birth as an additional covariate". At first, I thought you had said to use date of birth as the time reference and add date of observation (or age) as a covariate. But now I read it again and I do not follow. Can you clarify, please? – opiczak Oct 24 '22 at 16:19
  • @opiczak I think that birth as time = 0 reference makes the most sense, so that for each observation you would include the age at that observation as a time covariate, to model smoothly and flexibly. In case there are systematic differences among animals depending on the year in which they were born, you might include an additional covariate representing something related to the actual date of birth. That additional covariate would be the same value for all observations of an individual (like breed or SNP). If you don't expect differences related to date of birth, no need to do that. – EdM Oct 24 '22 at 16:53
  • Ah, now I see what you meant. Thank you again. – opiczak Oct 24 '22 at 18:14