0

First of all, I need to use a permutation test to calculate the significance of some data. When I do the permutation test, I need to shuffle the data and compare the original one and the shuffled one.

Then the problems raises, in some scenarios, I found that in what way to compare would affect significantly. For example, I count the number of situations when the original one <= the shuffled data. But if I count the number of situations when the original one < the shuffled data. It would be very different.

Could anyone tell me the descent and robust way to perform the permutation test? Mostly how to compare the original and the shuffled one.

Or what kind of circumstance would encounter this problem?


updated parts

Thank @Glen_b and @jbowman comments. Maybe I should implement more here.

First, it should be a one-tailed test. The hypothesis is that they show no differences between the shuffled data and the original data.

More specifically, supposing we have a network/graph and each node has an attribute, such as age. The network has 5000+ nodes. Depending on their connectivities, we want to define/calculate the degree of enrichment locally.

To do that, I compare the summed attributes from a specific region and the summed attributes after shuffling the data.

Taken together, I use the permutation test to find out whether it is an enrichment region.

thliao
  • 101
  • 1
  • 2
    The definition of p-value involves statistics "as, or more extreme". Is this a one-or-two-tailed test? What's the hypothesis you're trying to test? (/what are you trying to find out?) What's the statistic you're using as your test statistic? – Glen_b Sep 20 '21 at 02:43
  • 2
    Also, how many observations do you have? The presence of a lot of ties needs a little explaining too... – jbowman Sep 20 '21 at 03:39
  • Thank @Glen_b. I have explained the details in the updated question and welcome to your comments – thliao Sep 20 '21 at 06:24
  • Unless there is an extremely limited number of ages, it is implausible you would find many equalities, if any at all. This suggests there might be some error in your code. Would it be possible to present a tiny version of your data and an example of how your test works on it? – whuber Sep 20 '21 at 13:32
  • @whuber Maybe my example is a little bit out of my true situation. My attribute actually is a binary value, thus it should be either 0/1. So that should answer my part of questions such as why the 'equal' has a large impact. The code should be fine I think. If you truly interested in it, you could find this specific part at lines92-110 at https://github.com/GPZ-Bioinfo/tmap/blob/master/tmap/netx/SAFE.py – thliao Sep 21 '21 at 01:22

0 Answers0