2

Suppose I have two models $M_1$ and $M_2$ and I want to compare their performances in terms of measure (accuracy on classification instances, i.e. number of correct predictions to size of all instances). I conjecture that the difference in their results is not significant. So

  • Null H: $M_1$ and $M_2$ are different (in terms of accuracy measure).
  • Alternative H: $M_1$ and $M_2$ are not different (in terms of accuracy measure).

First: Is this hypothesis testable? Is it is well-defined?

I have seen the reverse scenario in different places. For example, the paired-permutation test in [1] is designed for a different hypothesis (where null is not-difference of the two predictors).

At the link they're comparing generalization accuracies (the response) for two models. The test statistic is the mean difference in generalization accuracy. Under the null that the models have equal mean accuracy (and the additional assumption that they have the same distribution under the null), the model labels that go with the pairs of accuracies are arbitrary -- you could interchange them (flipping the sign of the difference in accuracy) without altering the distribution of differences.

If in the original algorithm in [1] I change the condition from $$ |\mu_{new}| \geq |\mu_{old}| $$ to $$ |\mu_{old}| \leq |\mu_{new}| $$ Second: Is it a valid algorithm for my defined hypothesis testing?

Third: If not, any suggestion on how to test this hypothesis?

[1] http://axon.cs.byu.edu/Dan/478/assignments/permutation_test.php

Daniel
  • 1,576
  • 1
    You should carefully define the terms you use. What do $M$ and $\mu$ represent? I suspect "$\mu$ is being used here in a way that would be likely to be misunderstood given the usual statistical conventions. – Glen_b Jan 26 '15 at 22:46
  • See the link I have included in the question. – Daniel Jan 26 '15 at 22:48
  • 1
    Simply referring to the link for definitions is not sufficient: $:$ (1) Later readers should not have to read the link simply to understand the question. (2) In any case, a post must have enough context to stand alone, since links can die without warning; this is explained in the help. Links for additional context are fine, but you can't rely on them continuing to be available. (3) I don't think the link sufficiently explains the situation; some background is absent. While your edits have definitely improved the question substantially, please clarify further in your question. – Glen_b Jan 26 '15 at 23:24
  • Your edits clarified things enough that I was able to add a paragraph giving some context from the link. If you think it needs to change, please edit, but something along those lines is required; if you can move some of that context earlier in the question (where it also applies to your setup), that may help. The $\mu$'s should still be defined (I think they're each differences (within pairs) of sample accuracy), but I'll leave that for you. I think it's now clear enough that it's a reasonably good question (+1). – Glen_b Jan 26 '15 at 23:52
  • 1
    The current title ("Permutation test: when the null hypothesis is") seems to be missing something from the end? – Silverfish Jan 26 '15 at 23:56
  • This question needs to be cleaned up in order to be answerable. Your stated null, "$M_1$ and $M_2$ are not different," is not a testable hypothesis. It is too indefinite. Later in the question you appear to reverse the null and alternative: "Under the null that the models have equal mean accuracy." In light of this confusion, the question "how to test this hypothesis?" cannot be understood. – whuber Jan 27 '15 at 00:11
  • Your edit has improved the question. Nevertheless, the null and alternative should definitely be made more precise. The models are clearly different, so it's pointless to hypothesize about that. For example, you might state that the (population) mean generalization accuracy for the two models is the same or different for the corresponding hypotheses. – Glen_b Jan 27 '15 at 05:14
  • @Glen_b updated! – Daniel Jan 27 '15 at 05:21
  • 1
    Well, they're still not specific enough to actually find a null distribution -- "in terms of a desired measure" doesn't pin down the measure. I believe it's clear enough to you which measure is intended, I think it's better to just state it and reduce the opportunity for people to think it's still unclear in at least that respect. – Glen_b Jan 27 '15 at 05:38
  • But why does it matter what measure? I fixed it in the in question. – Daniel Jan 27 '15 at 17:24
  • @whuber you want to open this? – Daniel Jan 27 '15 at 20:40
  • I think this question raises issues that have been extensively discussed and answered elsewhere on this site. My impression is that you are seeking information about how to formulate a null hypothesis, or perhaps even about what one is and what role it plays in statistical testing. There is plenty of good material about these issues available upon searching our site for null alternative hypothesis. If you like, add keywords "equivalence" or "self-study" to narrow down the material further. – whuber Jan 27 '15 at 23:08

1 Answers1

3

Hypotheses should usually be phrased as a statement about population quantities (though in some cases, possibly infinite-dimensional), and that statement should generally be a null statement (typically about lack of difference); in particular, it needs to be a statement under which a null distribution of a test statistic can be computed (or at least an edge-case null can).

In the case of a permutation test, you need that the permutations are equally likely; the null needs to make the labels (or whatever it is that's being 'swapped around') at least exchangeable. Yours does not.

Assuming I've correctly guessed at the meaning of some of the symbols, the two expressions of null hypotheses in the first few lines of your link are suitable null hypotheses (though I'd probably phrase them slightly differently, they're clear enough and should be fine).

You say:

The pair-permutation test in [1] is designed for a different hypothesis (where null is non-significance of the two predictors).

This is how null hypotheses have to work; if you make "difference" the null (an "open" compound hypothesis), the difficulty is in finding the distribution of the test statistic under the null; the limiting case would actually be in the alternative. In the case of a permutation test, this means that the permutation could only be done under the alternative you give... so that, perforce, must be the null hypothesis.

It might be that your needs might be better served by some form of equivalence test, but this may be tricky in a permutation test context.

[Note that you can, however, flip the direction of a one-sided test if you need to (between $\leq$ and $\geq$), as long as it definitely includes the equality (the edge-case under which you can compute the distribution of the null).]

Glen_b
  • 282,281