1

I understand what "repeated measures" denotes - given a set of subjects, if each subject receives more than one treatment, we call this repeated measures, or "within subjects." But what if we are giving the same subject the same treatment at different times, but time itself is not a relevant factor?

Let me give a more specific, albeit graphic example - Say that I am giving a patient a pill that will cause their body to purge in a multitude of ways. They will sweat, cry, urinate, etc. The same pill is used each time, there is no difference in the treatment. I will measure the quantity of each output. I will do this 7 times to each patient, in order to try to get an accurate representation of these outputs. The independent variables are patient and type of excretion (both categorical, non-ordinal). The dependent variable is simply the amount of liquid excreted (continuous).

I want to ask two questions of my data -

  1. Is there a statistically significant difference between the patients?
  2. Is there a statistically significant difference between the type of secretions, given the amount that is released?

Do these two questions require two separate statistical analyses, or is there a single test that can help me answer both questions?

I have been breaking the data into separate sets by secretion type and doing a Kruskall-Wallis test (the data is non-parametric) to identify if there is a difference between patients for each secretion type, but what if I want to talk about the difference across patients more generally?

Is this considered repeated measures?

I have dug myself into quite a statistical rabbit hole on this one.

1 Answers1

0

The question comes down to the independence among the observations. Even if you don't care about the time course, an individual who cried more than others in one exposure to the drug is likely to do so in other exposures. You can't treat the 7 drug exposures of that individual as independent observations--it's not the same as having 7 separate individuals who happened to have the same responses to the drug. Yet simple statistical tests assume that all observations are independent.

One way to model this for a small number of patients (say 6 or fewer) and continuous outcomes would be with a multivariate analysis of variance, with patientID as a predictor and separate outcome columns for each type of secretion. A textbook appendix by Fox and Weisberg illustrates that with the classic iris data set. Alternatively, you could use both patientID and a categorical predictor of secretionType, including an interaction between the two predictors to allow for differences among patients with respect to secretion types.

With a larger number of patients you might be more interested in the overall variance of responses among patients rather than specific individuals. That could be handled with a mixed model, with secretionType as the "fixed" predictor and patientID included for random effects. To allow for differences among patients with respect to secretion types, you could use both random intercepts and random slopes (coefficients for the secretion types) as you have a large number of replicates.

You say that "the data is non-parametric", but all data are non-parametric. "Parametric" is about the type of model used to describe the data. You might do well with the parametric models described above, perhaps with some transformations of outcome values. A proportional odds model can be considered a semi-parametric extension of non-parametric tests like Wilcoxon-Mann-Whitney and Kruskal-Wallis if your data can't readily be represented with parametric models.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Thank you for the thoughtful and thorough response. And my apologies, what I mean is that the data is not normally distributed; it actually appears to be log-normal, but typically data transformations are not a heuristic in my field. Can the multivariate anova be applied to non-normally distributed data? – data-toast Oct 09 '22 at 17:44
  • @DataOnToast the distributions of the data values themselves (whether outcomes or predictors) don't matter for ANOVA. What matters is the distribution of the error terms of observations around the model predictions. If they are close enough to normal, ANOVA is OK. There's nothing wrong with log transformations to make error distributions closer to normal, but then you're modeling the mean of the logs of the outcomes, not the mean of the outcomes. I would lean toward the second choice in that paragraph (patientID, secretionType and their interaction in a linear model), or a mixed model. – EdM Oct 09 '22 at 19:05
  • Thanks again, EdM. Two more questions for you, if I may -
    1. If the errors are NOT normally distributed (very cone-shaped residual plot / does not pass normality of residuals tests), then what?
    2. Where can I take your class? (;
    – data-toast Oct 10 '22 at 13:03
  • @DataOnToast 1. see this page and its links for suggestions about non-normal residuals. You often don't need normal residuals, if residuals don't change a lot as a function of estimated values and you have enough cases. Proportional odds (PO) models don't make assumptions about residuals; you can test the PO assumption and you can often get away with violations. 2. Almost everything I know about statistics I learned on this site, following its links to other resources. – EdM Oct 10 '22 at 13:26