1

Quick question on GLMM with repeated measures (crossover) design and how to deal with missing covariate data.

In this context, 28 participants complete 2 conditions (crossover design with 1 week washout between conditions) comparing glucose effect (AUC) of 1) sugary drink vs. 2) water. I am adjusting for several covariates in the model (e.g. BMI, age, fasting values, physical activity before each trial condition).

For the physical activity covariate - I have missing data for both conditions in 6 participants and partial missing data in a further 4 participants (i.e. I have complete covariate data for only 1 condition).

I see when I control for prior physical activity as a covariate - my total observations are 22 (suggesting that the participants with fully missing PA data are being dropped).

QU1: Is there a way to 'deal with this' or keep these participants in somehow to avoid this loss of sample size (imputation?) - or does one just have to concede this as a limitation?

QU2: What is the mixed model actually doing with the participants with partially missing PA data (i.e. only missing for one of the conditions) - as I see these seem to be kept in the model?

Ben Bolker
  • 43,543
Patrick
  • 13
  • 2
  • I don't understand your third paragraph. You have one of two conditions in 6 participants (?), then what is the difference with the other 4 you call partially missing? – Frans Rodenburg Feb 01 '18 at 01:23
  • Sorry I should have been clearer (added confusion with 'further'. 6 are missing activity data on both arms. 4 are missing data on only 1 arm. – Patrick Feb 01 '18 at 01:53

1 Answers1

1

If the missing covariates are missing at random (MAR) or missing completely at random (MCAR), you can impute them. However, if a covariate were to be missing entirely from one of the two conditions being compared, you cannot include it as it has no information about the difference between those conditions.

Mixed models can use restricted maximum likelihood (REML), which can use incomplete data to estimate the fixed effects. This is why you see $28-6=22$ observations being used: only the ones missing in both conditions are deleted. This is also why you normally don't have to impute the 4 partially missing ones.

If you are using R, you could try the package mice for imputation, but make careful consideration whether the 6 doubly missing physical activities are MCAR, MAR or MNAR before doing so.

You can find various explanations of REML from the linked answer. Here is a simple one.