0

I have two algorithms, $A_1$ and $A_2$.

  • $A_1$ either accepts or rejects samples from dataset $P$
  • $A_2$ either accepts or rejects samples from dataset $Q$

Below is a 2x2 contingency table for the number of accepted and rejected samples by $A_1$ and $A_2$ on their respective datasets:

enter image description here

i.e $r_P$ is the number of samples rejected by $A_1$ on dataset $P$, $n_P$ is the number of samples in $P$ and similarly for $r_Q$ and $n_Q$.

I would like the determine if $A_2$ is statistically more likely to reject samples in its dataset compared to $A_1$.

My first thought was to use fisher's exact test as suggested here. But upon learning more here I see that the null hypothesis for fisher's exact test is not suitable and I instead need to use Boschloo's exact test. The only problem is the scipy Boschloo requires something called simplicial homology global optimization which is extremely slow for large sizes of $n_P$ or $n_Q$.

I'm wondering first if Fisher's test is indeed not suitable for this situation? And second what a good exact test to do here is (since my samples sizes may not be large I would like the test to be exact). I'm also only concerned about the one sided alternative i.e $r_Q> r_P$

  • 1
    You can still use Fisher's exact if the margins are not fixed in advance, see this post. – philbo_baggins Jan 11 '22 at 18:18
  • While I agree with the points made in Gordon Smyth's answer linked by philbo_baggins above (indeed I have made similar points in several answers myself), it looks to me like your alternative hypothesis is directional: "I would like the determine if A2 is statistically more likely to reject samples in its dataset compared to A1", which would change things somewhat, if that's indeed what you seek to test. – Glen_b Jan 12 '22 at 09:19
  • @Glen_b Yes, this is what I seek. Can you elaborate – user2757771 Jan 12 '22 at 20:56

0 Answers0