Suppose I have two models $M_1$ and $M_2$ and I want to compare their performances in terms of measure (accuracy on classification instances, i.e. number of correct predictions to size of all instances). I conjecture that the difference in their results is not significant. So
- Null H: $M_1$ and $M_2$ are different (in terms of accuracy measure).
- Alternative H: $M_1$ and $M_2$ are not different (in terms of accuracy measure).
First: Is this hypothesis testable? Is it is well-defined?
I have seen the reverse scenario in different places. For example, the paired-permutation test in [1] is designed for a different hypothesis (where null is not-difference of the two predictors).
At the link they're comparing generalization accuracies (the response) for two models. The test statistic is the mean difference in generalization accuracy. Under the null that the models have equal mean accuracy (and the additional assumption that they have the same distribution under the null), the model labels that go with the pairs of accuracies are arbitrary -- you could interchange them (flipping the sign of the difference in accuracy) without altering the distribution of differences.
If in the original algorithm in [1] I change the condition from $$ |\mu_{new}| \geq |\mu_{old}| $$ to $$ |\mu_{old}| \leq |\mu_{new}| $$ Second: Is it a valid algorithm for my defined hypothesis testing?
Third: If not, any suggestion on how to test this hypothesis?
[1] http://axon.cs.byu.edu/Dan/478/assignments/permutation_test.php