19

Permutation tests (also called a randomization test, re-randomization test, or an exact test) are very useful and come in handy when the assumption of normal distribution required by for instance, t-test is not met and when transformation of the values by ranking of the non-parametric test like Mann-Whitney-U-test would lead to more information being lost. However, one and only one assumption should not be overlooked when using this kind of test is the assumption of exchangeability of the samples under the null hypothesis. It is also noteworthy that this kind of approach can also be applied when there are more than two samples like what implemented in coin R package.

Can you please use some figurative language or conceptual intuition in plain English to illustrate this assumption? This would be very useful to clarify this overlooked issue among non-statisticians like me.

Note:
It would be very helpful to mention a case where applying a permutation test doesn't hold or invalid under the same assumption.

Update:
Supppose that I have 50 subjects collected from the local clinic in my district at random. They were randomly assigned to received drug or a placebo at 1:1 ratio. They were all measured for paramerter 1 Par1 at V1 (baseline), V2 (3 months later), and V3 (1 year later). All 50 subjects can be subgrouped into 2 groups based on feature A; A positive = 20 and A negative = 30. They can also be subgrouped into another 2 groups based on feature B; B positive = 15 and B negative = 35.
Now, I have values of Par1 from all subjects at all visits. Under the assumption of exchangeability, can I do comparison between levels of Par1 using permutation test if I would:
- Compare subjects with drug with those received placebo at V2?
- Compare subjects with feature A with those having feature B at V2?
- Compare subjects having feature A at V2 with those having feature A but at V3?
- By which situation this comparison would be invalid and would violate the assumption of exchangeability?

doctorate
  • 1,105
  • 4
    Suppose I had each observation on a separate sheet of loose-leaf paper and as I handed you the stack, I slipped, and the sheets came flying out in all directions as they settled to the floor. It would be a shame if that destroyed the validity of the test you were hoping to perform on those data. If your observations are exchangeable and you were applying a test based on that, you'd comfort me and tell me not to worry while helping me collect up the papers off the floor. If not, and the data collection was especially expensive, I might need to run for my life. – cardinal Nov 13 '13 at 13:00
  • 2
    On the other hand, order does matter for things like time-series data (in general) and tests should generally respect this order in an appropriate way. – cardinal Nov 13 '13 at 13:01
  • 1
    @cardinal, while your intuitive story has drawn a vivid picture of how this assumption looks like, but I am still confused as how to judge whether the fallen valuable papers were exchangeable or not. You may run for another comment if it is possible! – doctorate Nov 13 '13 at 13:13

1 Answers1

11

First, the non-figurative description: Exchangability means that the joint distribution is invariant to permutations of the values of each variable in the joint distribution (i.e, $f_{XYZ}(x = 1, y=3, z=2)=f_{XYZ}(x=3,y=2,z=1)$, etc). If this is not the case then counting permutations is not a valid way of testing the null hypothesis, as each permutation will have a different weight (probability/density). Permutation tests depend on each assignment of a given set of numerical values to your variables having the same density/probability.

A concrete example where exchangeability is absent: You have N jars, each filled with 100 numbered tickets. The first M jars have tickets with only odd numbers from 1-200 (1 ticket per number), the remaining N-M have tickets for only even numbers between 1 - 200. If you select a ticket from each jar at random, you get a joint distribution on sample results. In this case, $f(X_1=1,X_2=2,X_3=3...X_N=N)\neq f(X_1=N,X_2=N-1,X_3=N-2...X_N=1)$

so you cannot just count permutations of the values 1 through N. In general, exchangeability fails when your sample can be stratified into sub-groups (as I have done with the jars). Exchangeabilty would be restored if, instead of taking 1 sample from N jars, you took N samples from 1 jar. Then, the joint distribution would be invariant to permutations.

  • 1
    +1, although the exchangability is well explained but still I was stumbled trying to apply the jars metaphor on the study in hand. (please see the update of the question). Given the duration of visits, and subgrouping based on features, how can I judge if the comparison of these values would be exchangeable or not? – doctorate Nov 13 '13 at 17:15
  • @doctorate: it sounds like you are stratifying your groups by factors that are relevant to the outcome of Par1, correct? As long as you are using permutations within a particular A/B feture quadrant, then I would assume your subjects are exchangable. Your first test, which will cut across the features, will need to be processed further before you can use a test that relies on exchangability. in particular, you need to quantify the effect of the treatment and correct for the confounding effects of features A and B - otherwise, goup size will influence the overall results (simpson's paradox) –  Nov 13 '13 at 18:37
  • 1
    @doctorate: I realized that my above comment may have been kind of oblique wrt what you want: the jars in your case would be the pairs of features, i.e. (A+,B+), (A-,B+), (A+,B-), (B-,A-) for a total of 4 "jars". Does that help make it a more concrete? –  Nov 13 '13 at 19:49
  • Tks, but what confuses non-statisticians like me, is how can one judge whether this assumption was met or not? there are often tests to examine assumptions, e.g., for normality there is Shapiro-Wilk test. But I wonder what test would examine exchangability? otherwise it would be very difficult or vague definition and two statisticians may not agree on this or that subgrouping. As you mentioned, within A/B quadrant no problem, but within Drug/Placebo you showed some concern. So is there any acid test for this assumption? – doctorate Nov 14 '13 at 08:29
  • The other issue left without comment, what about test 3 in the update, between two groups at two different time points?! – doctorate Nov 14 '13 at 08:33
  • Ah, sorry: Since you are trying to establish a difference, your null hypothesis in each case will be that both samples came from the same distribution. Therefore, test 3 is totally fine. For test 2: you will need to tease apart the group effect vs treatment effect in V1->V2 using the placebo groups for A and B in each time period. For test 1, you will need to do the opposite, you will need to remove the group effects to get a common "placebo group effect" sample and a "treatment groiup effect" samle. You see what we are doing? you need to make it so that the only difference is the treatment. –  Nov 14 '13 at 19:32
  • 2
    As far as exchangability, there is no "test" for exchangability. Unlike independence (which is testable), exchangability is more of a modelling assumption that had you taken repeated samples like the one you took, you would find that each permutation occurrs exactly the same fraction of the time. You only have 1 sample, so you cant "test" it. –  Nov 14 '13 at 19:40
  • Here's a simple, concrete example of why you cant test exchangability with 1 sample: You have 100 coins that you toss simultaneously and then note how many heads you have. Are the coins in this sample exchangable? With only 1 sample, you wouldn't know. You could test for independence, but that is just a specific case of exchangability. What if 25 of the coins were two sided heads and 25 were two sided tails? In that case, the coins are not exchangable and modeling them as a Binomial(100,.5) would overstate the sampling variability. Does that help? Look up exchangability on google too. –  Nov 14 '13 at 19:44
  • @user31668: It is not true that you cannot test exchangeability - that is precisely what permutation tests are for. In the case of coin tosses, exchangeability could be tested with a single sample by using a simple runs test (i.e., compare the observed number of runs to its distribution under the null hypothesis of exchangeability). – Ben Apr 17 '18 at 05:00
  • @Ben: I don't believe there are any useful omnibus tests for exchangeability (neither for independence). Only tests against very specific alternatives. So no test which can (from one sample) distinguish between iid and exchangeability – kjetil b halvorsen Dec 31 '18 at 01:47
  • @user31668: Consider a sequence of 100 outcomes which results in 50 "heads" and then 50 "tails" (in that order). Were the outcomes exchangeable? Do you seriously maintain that you cannot test exchangeability in this case? – Ben Dec 31 '18 at 05:43
  • 1
    Actually, the way you described the problem, $f(X_1=1,X_2=2,X_3=3...X_N=N)= f(X_1=N,X_2=N-1,X_3=N-2...X_N=1)=0$. Since the first M should only have odds, and the last N-M should only have evens. – Maverick Meerkat Aug 29 '21 at 13:57