2

I want to compare X between different groups "material". Material is a categorical variable with four groups A,B,C,D. patient_ID is a subject specific identifier -> I use this as a random effect.

lmer(X~  material + (1|patient_ID), data)

Now my question/problem: Numbers of samples vary between the four groups: A: 1 per patient_ID B: 0-1 per patient_ID C: 0-2 per patient_ID D: 0-2 per patient_ID

(Edit: if more than 1 sample was taken, it was a repeated measure -> those 2 samples are expected to be very similar and are not independent)

"Missing" data should be no problem in a LMM, but in C,D when 2 samples exist, those are not independent as they are just a "repeated measurement". How do I handle this or does my model already account for this because of the random effect patient_ID?

User1865345
  • 8,202
  • With the (1 | patient_ID) component, the model accounts for the correlation between two observations from the same patient irrespective of the material variable. Say patient 1 has three observations: one with A and two with Bs. Let's denote these observations A1, B1, B2. Then the correlation between all three pairs of observations (A1, B1), (A1, B2) and (B1, B2) is the same. – dipetkov Mar 18 '23 at 15:40
  • Thank you for your answer, I really appreciate it. Could you elaborate what this means? I would like to see whether there is a difference between A,B,C,D. As samples are matched and variability between subjects is high I use the random effect. Is this correctly reflected in the model? – statistic_noob_MD Mar 18 '23 at 15:51
  • What do you mean by "samples are matched"? – dipetkov Mar 18 '23 at 15:53
  • Matched -> they are from the same subject = have the same patient_ID – statistic_noob_MD Mar 18 '23 at 15:56
  • Given the description, the LMM seems appropriate: You want to focus on the differences between the four materials (fixed effects) while accounting for the fact that we would expect two observations of the same patient are more similar than two observations from two different patients. PS: I just wanted to point out that the two B observations (B1 & B2) of the same patient are not expected to be more similar than one A and one B observation (A1 & B1) of the same patient. – dipetkov Mar 18 '23 at 15:57
  • Exactly, but in addition I wonder whether it is a problem that the number of samples between the groups A,B,C,D differs between those groups - as I have always only 0-1 in B and (mostly) 2 in C/D. Edit for p.s. : OK! So this seems to be a problem... as I would expect B1 and B2 to be very similar. How would I account for this? – statistic_noob_MD Mar 18 '23 at 15:59
  • The LMM doesn't expect a balanced design. But since you have more C & D observations you would expect to estimate the C & B fixed effects more accurately (more narrow confidence interval for example). – dipetkov Mar 18 '23 at 16:01
  • I just wanted to point out that the two B observations (B1 & B2) of the same patient are not expected to be more similar than one A and one B observation (A1 & B1) of the same patient. -> "not to be expected" ? B1 and B2 are probably very similar as its a repeated measurement. – statistic_noob_MD Mar 18 '23 at 16:04
  • Okay, I didn't do a good job explaining this. It's about the correlation in the error of the measurements. Say the two measurements B1 and B2 are taken one after the other, while the A1 measurement is taken 1 week later. Compare this with the situation where three measurements were taken one week apart. Obviously lots depends on the experimental design which you haven't explain in detail. – dipetkov Mar 18 '23 at 16:06
  • Sorry my experimental design description was insufficient. – statistic_noob_MD Mar 18 '23 at 16:09
  • samples were taken from subjects with specific patient_ID. samples from each subject were taken at the same time and under the same condition. repeated sampling => 2 samples were taken instead of just one ( they should not differ a lot) – statistic_noob_MD Mar 18 '23 at 16:12
  • That this mean that one patient is observed under one condition only? – dipetkov Mar 18 '23 at 16:19
  • yes. single time point for each patient. however at different body sites (material) – statistic_noob_MD Mar 18 '23 at 16:20

1 Answers1

1

Yes, the linear mixed model accounts for the dependence between two measurements taken from the same patient.

Say patient $i$ has two measurement taken at material/site B. Let's denote these by $x_{i,B_1}$ and $x_{i,B_2}$, respectively. Under the LMM X ~ material + (1|ID), the covariance between these two measurements is:

$$ \begin{aligned} \operatorname{Cov}\left\{x_{i,B_1}, x_{i,B_2}\right\} &= \operatorname{Cov}\left\{\mu_B + \eta_i + \epsilon_{i,B_1}, \mu_B + \eta_i + \epsilon_{i,B_2} \right\} = \operatorname{Cov}\left\{\eta_i + \epsilon_{i,B_1}, \eta_i + \epsilon_{i,B_2} \right\} \\ &= \sigma^2_\eta \end{aligned} $$ where $\mu_B$ is the fixed effect of B, $\eta_i$ is the random effect of patient $i$ and $\epsilon_{i,B_j}$ are measurement errors.

The random effects $\eta_i$ are similar to measurement errors in the sense that the $\eta_i$s are iid $\operatorname{Normal}(0, \sigma^2_\eta)$ while the errors are iid $\operatorname{Norma}(0, \sigma^2)$. Since all measurements of patient $i$ share the random component $\eta_i$, they are correlated.

Compare this with the covariance between measurements of two different patients $i$ and $j$ at site B:

$$ \begin{aligned} \operatorname{Cov}\left\{y_{i,B_1}, y_{j,B_1}\right\} &= \operatorname{Cov}\left\{\mu_B + \eta_i + \epsilon_{i,B_1}, \mu_B + \eta_j + \epsilon_{j,B_1} \right\} = \operatorname{Cov}\left\{\eta_i + \epsilon_{i,B_1}, \eta_j + \epsilon_{j,B_1} \right\} \\ &= 0 \end{aligned} $$ because the random effects $\eta_i$ and $\eta_j$ are independent, the errors are independent and the $\eta$s and $\epsilon$s are independent between each other.

PS: An alternative to the linear mixed model (LMM) is generalized least squares (GLS). With GLS you specify the variance/covariance structure explicitly. For example, you might want to let each material/site have a different variance while multiple measurements taken from the same patients are correlated as above. In R you can fit GSL with nlme::gls.

dipetkov
  • 9,805