0

I have to use p-value test to compare two classifiers. I am getting the error vectors of 2 hypothesis and I am using their difference as an input in ttest function of MATLAB

[H,p] = ttest(error,'Tail','both');

I am getting H as 0 and p as 0.34 As far as I read, this means that we cannot reject null hypothesis in this case. I am not sure I don't know how to interpret that? Can anybody help? And again what will one tailed test do in this case?

KingJames
  • 101

2 Answers2

1

The $t$-test is not a very good choice to compare classifiers, since their performance is not normally distributed. I suggest considering the Wilcoxon signed-ranks test instead.

For a thorough read on this topic please refer to this paper (available here):

Demšar, Janez. "Statistical comparisons of classifiers over multiple data sets." The Journal of Machine Learning Research 7 (2006): 1-30.

The above paper lists multiple arguments against using the $t$-test for classifier comparison.

Marc Claesen
  • 18,401
0

In the example you provided of the ttest, the hypothesis is whether your vector "error" came from a Gaussian distribution with null mean (if you use only one input), or in other words, if both classifiers provide the same accuracy. If you use 2 inputs than you can also specify the Tail. If you choose 'Tail','Right' (ttest(classifier1,classifier2,'Tail','Right')), then your hypothesis is that classifier1 is better than classifier2, and if 'Tail','Left' classifier2 is better than classifier1. In all cases, the hypothesis is true if h=1 (at a pval that you defined: typically pval=0.05).

Take into consideration however that ttest is a parametric test and is not be suitable to all cases. You should first perform a Chi-square test on each variable: h = chi2gof(classifier). If both tests fail (h=0), than your data is normal (gaussian) and you can use the ttest, otherwise you should use a non-parametric test such as : p = ranksum(classifier1,classifier2). The interpretation is the same.

ASantosRibeiro
  • 203
  • 2
  • 6