Permutation test in R

Question

I have the following data for 10 subjects based on before and after measurements:

x <- c(12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3) 
y <- c(12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1)

and would like to perform a permutation test.

I used the permTS(...) to perform a two-sided t-test and obtained a value of 0.982.

But I would like to use the expand.grid(rep(x,k )) by interchanging the before and after tags. Does anyone know how this can be achieved?

Given the numbers of possible permutations it is not possible to stimulate every permutation, hence why I would like to use expand.grid to check if the same result is obtained.

Thanks

whuber · Accepted Answer · 2012-11-19T20:24:44.213

14

What you are saying is you would like to compare the paired t-test statistic to the distribution of such statistics obtained by independently switching all possible pairs of data. There are $2^{10}=1024$ such switches, small enough to enable fast computation of the full distribution.

It is convenient in R to code this as a t-test for the difference between the two sets of data: instead of switching values, we merely need to negate them.

Let's first run the t-test:

x <- c(12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3)
y <- c(12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1)
(value <- t.test(x,y, paired=TRUE, alternative="two.sided"))

The statistic and p-value are $-0.213$ and $0.836$, as expected. Now let's generate the permutation distribution (using expand.grid as requested):

perms <- do.call(expand.grid, lapply(as.list(1:length(x)), function(i) c(-1,1)))
dist <- apply(perms, 1, function(p) t.test(p*(x-y), alt="t")$statistic)

(This takes $0.33$ seconds.) As a quick check, let's graph the results:

hist(dist)
abline(v = value$statistic, col="Red", lwd=2)

Permutation distribution

Because the actual statistic is near the middle of the distribution and this is a two-sided test, the p-value looks approximately to be $0.9$ or so. We can compute it:

sum(abs(dist) > abs(value$statistic)) / 2^length(x)

The result is $0.836$, the same as the t-distribution gave us.

edited Nov 19 '12 at 20:24

answered Nov 19 '12 at 19:53

whuber

322,774

1

Thanks, this is what I was looking for. I was n't sure on how to use expand.grid
Also just to add, I thought the permutations where 2^20, which did indeed take a while.

Many thanks
– user1453477 Nov 19 '12 at 20:01
2

$2^{20}$ takes $6$ minutes :-). (That's about the limit of my patience: by that time, we should be content to sample from the permutation distribution.) – whuber Nov 19 '12 at 20:25
Since the p-value is the probability for the test statistic being at least as extreme as the empirically observed value, shouldn't the calculation be sum(abs(dist) >= abs(value$statistic)) / 2^length(x), giving a p-value of 0.8710938? – caracal Nov 19 '12 at 22:56
@caracal That's a subtle point which I should have checked. The issue here is that the permutation distribution is discrete. It might be best to report that the p-value lies in the interval $[0.836, 0.871]$. The difference in this case is of course inconsequential, due to the large p-value(s), but it becomes of consequence when the interval spans one's threshold of significance. By reporting the full interval, you allow readers to apply (for instance) randomized tests. – whuber Nov 19 '12 at 23:12
Hi, just to follow up from that remark, why is the permutation test statistic value almost exactly the same value as the original t-test? Given that this is a permutation of all possible values, should n't the t value be different, or is it a unique case for this data.
Thanks.
– user1453477 Nov 26 '12 at 19:47
1

It's an accident that they are so close, but it is no accident that they are pretty close: the t-test is a good approximation to the permutation test in most cases. – whuber Nov 26 '12 at 20:42

AdamO · Answer 2 · 2012-12-12T17:10:41.853

2

You should be using a paired T-test since these are paired data, not a 2 sample test.

All possible permutations of pre-post data would be obtained using

expand.grid(pre=x, post=y)

And we know it's only a 2 by 100 matrix which is far from impossible. I don't know why you're replicating x, or what k is.

edited Dec 12 '12 at 17:10

answered Nov 19 '12 at 19:07

AdamO

62,637

I wrote expand.grid(rep(x,k)) as an example of the function. My question is how would I perform the test after obtaining this matrix, i.e. how to calculate the p-value and check if it matches the one from permTS() – user1453477 Nov 19 '12 at 19:19
This is where I got the data from: http://www.r-bloggers.com/paired-students-t-test/ – user1453477 Nov 19 '12 at 19:32

Permutation test in R

2 Answers2

Linked

Related