How can I compare whether classifiers outcome is significantly different?

Question

I have a relatively small data set which I'm trying to classify. My training set consists of 24 items come from two different groups, 12 items from each group. each item has two properties. I have 12 new items which I would like to classify based on this training set. (not sure whether the numbers are so important to my question... )

I have tested 5 different classifiers, and their classification result is as follows:

1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0
0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1
0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1

my question is: how can i test whether those results are significantly different from each other ?

thanks!

What do you mean by significantly different? Are you talking about statistical significance or "performance significance"? There could be a statistically significant difference even if the difference between the classifiers is so small that their performance can be said to be the same for practical purposes. — MånsT, Jan 15 '12 at 09:03
not sure what to say.... my "practical purposes" is: I would like to do a statistic test to test whether those results are different from each other ( which I don't know exactly how to do, this is why I'm asking here....). from the classifier which are different from each other I would like to construct an Ensemble base system as described here: http://users.rowan.edu/~polikar/RESEARCH/PUBLICATIONS/csm06.pdf am I missing something here ? — Dov, Jan 15 '12 at 09:39
some questions for clarification: 1. You have two classes ? 2. What does 1 and 0 mean ? 1=correctly classified, 0=incorrectly classified ? — steffen, Jan 16 '12 at 09:28
...I have no idea what are the correct labels of those 12 items. all i would like to do is a statistic test (which ??) to test whether those results are different from each other. — Dov, Jan 16 '12 at 11:38

MånsT · Accepted Answer · 2012-01-16T07:53:25.670

4

To evaluate the difference you need a dissimilarity measure. Letting $C_1$ denote one classifier and $C_2$ another classifier, one such measure could be $d(C_1,C_2)=\#points- that-the-classifiers-disagree-about$. If that number is "large enough" then you have a significant difference. The question is what "large enough" means in this context. It should be related to what you would expect if there was "no difference" between the classifiers.

This is where the practical purposes enter. When is a difference between the classifiers significant in a practical sense? If they disagree 1 % of the time? 10 %? 20 %? Given that percentage $p$, $d(C_1,C_2)\sim Bin(12,p)$ if there is "no difference" and you could use a binomial test to compare your classifiers.

Bear in mind though that with such a small training sample, any differences that you find may tell you more about your training data than it does about your classifiers...

edited Jan 16 '12 at 07:53

answered Jan 16 '12 at 07:41

MånsT

11,979

thanks for your answer! but did you mean Bin(24,p) ?...also I'm sorry to ask , it probably newbie Q... but how do you compute this ? – Dov Jan 16 '12 at 11:42
1

No, it should be Bin(12,p) since you are comparing it for 12 new data points. See http://en.m.wikipedia.org/wiki/Binomial_test for more on the binomial test (but make sure not to use the large sample version)! – MånsT Jan 16 '12 at 15:39
I'm highly appreciate your time and effort! however, I still do not understand something very basic...I have read the wiki definition and there are 3 parameters there: (1) "number of success", (2) "number of trials" , (3) "the probability". and here you have mentioned only two ("number of trials" , "the probability") . can you please drop me a few more words about this... thank you very much!! – Dov Jan 17 '12 at 07:14
one more little thing...don't I need to use some pairwise measurement ? – Dov Jan 17 '12 at 07:28
The "number of successes" is $d(C_1,C_2)$, i.e. the number of points where the two classifiers give different results. This measure is by its very nature pairwise!
As an example, consider the first two classifiers. They disagree about 7 out of 12 points. We choose $p=0.1$. In R, you can then perform the binomial test as follows: binom.test(7,12,0.1,alternative="greater")
– MånsT Jan 17 '12 at 07:42

How can I compare whether classifiers outcome is significantly different?

1 Answers1