How can I discern whether a classifier's outcome is significantly different?

Question

This is a question I have posted here some while ago. and I would like to know if you think of more solutions for it from the ML perspective? Unfortunately, I can't use McNemar's test due to the small sample size (hence the values are not normally distributed)

How can I compare whether a classifier's output is significantly different?

I have a relatively small data set which I'm trying to classify. My training set consists of 24 items which come from two different groups, 12 items from each group. Each item has two properties. I have 12 new items which I would like to classify based on this training set. (not sure whether the numbers are so important to my question... )

I have tested 5 different classifiers, and their classification result is as follows:

1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0
0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1
0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1

How can I test whether those results are significantly different from each other?

score 5 · Answer 1 · answered Feb 18 '12 at 07:42

5

It's hard to derive anything of meaning from what is essentially 5 data points.

I would use a Hamming distance from a reference vector and accept any results that are less than n away from that.

It's not perfect, and it's probably not publishable, but you can get an idea for your own benchmarking and perhaps cross-validation purposes.

answered Feb 18 '12 at 07:42

jonsca

1,772

Or feed the results into yet another ML algorithm ;) – jonsca Feb 18 '12 at 07:42

score 2 · Answer 2 · edited Apr 13 '17 at 12:44

Firstly I'd agree with @jonsca - there's now way to derive anything from so few data points.

When you posted on stats.SE, you made a comment that:

I would like to do a statistic test to test whether those results are different from each other ( which I don't know exactly how to do, this is why I'm asking here....). from the classifier which are different from each other I would like to construct an Ensemble base system as described here: users.rowan.edu/~polikar/RESEARCH/PUBLICATIONS/csm06.pdf am I missing something here ?

So I'll try to answer from that point of view. If you're going to train an ensemble classifier, such as AdaBoost, you only need to worry about statistical differences between classifiers if you're worried that there isn't enough diversity in the set of weak learners. If all of the weak learners are essentially the same, then clearly the strongest learner you can create is as strong as the strongest weak learner.

In your case, with so few base classifiers, and what looks like a diverse set, I wouldn't worry about it and I would just try plugging it into AdaBoost. However again I would say that with so few datapoints you are unlikely to get anything meaningful out.

How can I discern whether a classifier's outcome is significantly different?

2 Answers2