Doubling the tails in two-sample permutation test

Question

Suppose we have two samples and we wish to determine if they are drawn from the same distribution, the samples A,B being composed of some integers say.

If we test this using a two-sample permutation test, specifically by looking at permutations where the differences in the means of the samples are as extreme as the difference observed: is there any reason to think that we can calculate the two-tailed p-value by looking at one tail and doubling the probability?

This is what it appears to say in my lecture notes, but I don't understand why we could assume that the tails are symmetrical (or why it doesn't entail that assumption). Explanations were not forthcoming.

caracal · Answer 1 · 2012-04-05T19:24:28.020

The permutation distribution of your test statistic is not guaranteed to be symmetric, so you can't do it that way. Instead, you add both tails. In your case of two independent samples, the null hypothesis is that the two location parameters are equal. Assuming continuous distributions and equal spread in both groups, we have exchangeability under the null hypothesis. Test statistic $T$ is the difference in means, with $E(T) = 0$ under the null.

The value for $T$ in the original sample is $T_{\text{emp}}$, and its values for the permutations $T^{\star}$. $\sharp(\cdot)$ is short for "number of" something, e.g., $\sharp(T^{\star})$ is the number of permutation test statistics. Then the $p$-value for the two-sided hypothesis is $p_{\text{ts}} = p_{\text{left}} + p_{\text{right}}$, where

$p_{\text{left}} = \frac{\sharp(T^{\star} \, <= \, \text{min}(T_{\text{emp}}, -T_{\text{emp}}))}{\sharp(T^{\star})}$

$p_{\text{right}} = \frac{\sharp(T^{\star} \, >= \, \text{max}(T_{\text{emp}}, -T_{\text{emp}}))}{\sharp(T^{\star})}$

(assuming we have the complete permutation distribution). Let's compare both approaches for the case of two independent samples when we can calculate the exact (complete) permutation distribution.

set.seed(1234)
Nj   <- c(9, 8)                      # group sizes
DVa  <- rnorm(Nj[1], 5, 20)^2        # data group 1
DVb  <- rnorm(Nj[2], 10, 20)^2       # data group 2
DVab <- c(DVa, DVb)                  # data from both groups
IV   <- factor(rep(c("A", "B"), Nj)) # grouping factor
idx  <- seq(along=DVab)              # all indices
idxA <- combn(idx, Nj[1])            # all possible first groups

# function to calculate test statistic for a given permutation x
getDM <- function(x) { mean(DVab[x]) - mean(DVab[!(idx %in% x)]) }
resDM <- apply(idxA, 2, getDM)       # test statistic for all permutations
diffM <- mean(DVa) - mean(DVb)       # empirical stest statistic

Now calculate the $p$-values and validate the proposed solution with the implementation in R's coin package. Observe that $p_{\text{left}} \neq p_{\text{right}}$, so it matters which way you calculate $p_{ts}$.

> (pL <- sum(resDM <= min(diffM, -diffM)) / length(resDM))  # left p-value
[1] 0.1755245

> (pR <- sum(resDM >= max(diffM, -diffM)) / length(resDM))  # right p-value
[1] 0.1585356

> 2*pL        # doubling left p-value
[1] 0.351049

> 2*pR        # doubling right p-value
[1] 0.3170712

> pL+pR       # two-sided p-value
[1] 0.3340601

> sum(abs(resDM) >= abs(diffM)) / length(resDM)  # two-sided p-value (more concise)
[1] 0.3340601

# validate with coin implementation
> library(coin)              # for oneway_test()    
> oneway_test(DVab ~ IV, alternative="two.sided", distribution="exact")
Exact 2-Sample Permutation Test
data:  DVab by IV (A, B) 
Z = 1.0551, p-value = 0.3341
alternative hypothesis: true mu is not equal to 0

P.S. For the Monte-Carlo case where we only sample from the permutation distribution, the $p$-values would be defined like this:

$p_{\text{left}} = \frac{\sharp(T^{\star} \, <= \, \text{min}(T_{\text{emp}}, -T_{\text{emp}})) + 1}{\sharp(T^{\star}) \, + \, 1}$

$p_{\text{right}} = \frac{\sharp(T^{\star} \, >= \, \text{max}(T_{\text{emp}}, -T_{\text{emp}})) +1 }{\sharp(T^{\star}) \, + \, 1}$

$p_{\text{ts}} = \frac{\sharp(\text{abs}(T^{\star}) \, >= \, \text{abs}(T_{\text{emp}})) \, + \, 1 }{\sharp(T^{\star}) + 1}$

The reason for adding one more extreme permutation case intuitively is that we need to count the empirical sample as well. Otherwise, the permutation $p$-value could be 0 which cannot happen in the continuous case (see here, note: some texts recommend this correction, some don't).

@whuber I added that with a null hypothesis of equal location parameters in both groups, we have exchangeability under the null and $E(T) = 0$ under the null (assuming continuity and equal spread). — caracal, Apr 05 '12 at 19:19
Thanks, that's an improvement. Could you explain then how the statistic could fail to have a symmetric distribution under this assumption? — whuber, Apr 05 '12 at 19:20
@whuber The permutation distribution can be asymmetric as it depends on the sample values. Values group A: 1, values group B: 2, 2. Three relevant permutations are possible yielding $T^{\star} = {-1, .5, .5}$. — caracal, Apr 05 '12 at 19:47
Instead of max(diffM, -diffM) and min(diffM, -diffM), abs(diffM) and -abs(diffM) are clearer, imo. — Firebug, May 31 '20 at 19:10

Doubling the tails in two-sample permutation test

1 Answers1

Linked

Related