1

I have a data that looks smth like this:

Patient | Side  |  X  |  Y  |  Z  
--------------------------------------
1       | Left  | 1.3 | 0.5 | 3   |
----------------------------------+
1       | Left  | 1.2 | 0.6 | 3.14|
----------------------------------+
1       | Right | 1.3 | 0.5 | 4.5 |
----------------------------------+
1       | Right | 1.4 | 0.4 | 31  |
----------------------------------+
2       | Left  | 1.3 | 0.5 | 3   |
----------------------------------+
2       | Left  | 1.3 | 0.5 | 3   |
----------------------------------+
2       | Right | 1.3 | 0.5 | 3   |
----------------------------------+

Where Patient in patient identifier, Side is side identifier and X,Y,Z... are factors (all of them are continuous). Generally, I want to determine what factors differs significantly in left-right sides. However, I cannot use simple tests or linear models because most of the data is dependent (one patient have multiple observations for each side). At this moment I just grouped the data by patients, computed the mean for each of it and performed a two-sided related t-test. But I assume that it is not the best nor good choice.

May you advice a better solution?

Rose Hartman
  • 2,185

1 Answers1

4

Your current approach ("I just grouped the data by patients, computed the mean for each of it and performed a two-sided related t-test") isn't that bad, actually. There are two potential problems with it, though:

  • Collapsing your samples by taking the mean for each patient makes it look like you have fewer observations than you actually do, reducing your statistical power
  • You lose your estimate of the variability within patients. If you have a highly reliable measure, you should get a boost in power from that --- and if you have a very noisy measure, that error should be taken into consideration in your final estimates. Moreover, if you have more measurements from some patients than others, ideally you would like to be able to weight those patients more heavily (they're more informative) than patients with fewer observations.

All of the above can be accomplished neatly by analyzing this as a mixed effects model.

Mixed effects model

To do this, you would run a separate model for each outcome (X,Y,Z, etc.). If that doesn't meet your needs or if you have so many outcome variables that this is ungainly, you'll want to switch to a multivariate analysis.

Side (left or right) would be a fixed effect in the model, and Patient would be a random effect with observations nested within Patient. I would recommend allowing both intercepts and slopes (the effect of Side) to vary randomly, as per Barr et al., 2013.

In R, you can do that with the lme4 or nlme packages.

Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 10.1016/j.jml.2012.11.001. http://doi.org/10.1016/j.jml.2012.11.001

Rose Hartman
  • 2,185
  • Thank you for the answers! > You can try to average/pool the multiple observations per side for a given patient. I did exactly that. However, with this approach I am loosing a lot of information (At this moment I have only four patients). >Mixed effects model Thank you, Rose! I will try that. I thought I dont have any account here and cannot upvote you, I am sorry. – user3365834 Jan 05 '18 at 22:59
  • You can comment and upvote provided you log on with your original user ID. – whuber Jan 05 '18 at 23:18