4

Disclaimer: I have no statistical background. So please excuse, and correct me please, if I make several amateur mistakes below.

I have two groups (let's call them $A$ and $B$) and a particular measured variable for the groups, $v$. I conduct $20$ trials of $A$ and $B$. The data suggests that the variable $v$ is not predicted by the group $A$ or $B$, that is, the null hypothesis "$v$ is not affected by $A$ vs. $B$" seems to be true. I would like to confirm this suspicion statistically.

To show statistical significance for the alternative hypothesis, I would calculate the $p$-value of the null hypothesis. But in this case, to show statistical significance for the null hypothesis, I should calculate the $p$-value of the alternative hypothesis? That doesn't seem to make sense in this case.

I realize that it is impossible to prove that two variables are independent. But it should be possible to express a degree of suspicion that they are at least not very dependent; for example, "with $95\%$ certainty, the difference of mean from group $A$ to $B$ is at most $0.02\sigma$". How is this generally done?

1 Answers1

1

What you have hit on here is the asymmetric nature of classical hypothesis testing - i.e., the fact that flipping the null and alternative hypotheses does not necessarily lead to results that are consistent with having them the other way around.

To understand this, it is necessary to understand the logic of a classical hypothesis test. This is essentially an inductive analogue to a deductive proof by contradiction. In a classical hypothesis test we have a scale of evidence ranking all possible observable outcomes according to how conducive they are to the null/alternative hypotheses. (This scale is implicit in the test statistic for the test.) Just as in a proof by contradiction, we begin with a null hypothesis that is accepted for the sake of argument, and we calculate the p-value, defined as the probability that we would observe evidence at least as conducive to the alternative hypothesis as what was actually observed, if the null hypothesis is true. If this value is low (compared to our chosen significance level), we regard the outcome as implausible under the null hypothesis, and reject the null.

This procedure works in a way that has been likened to a criminal trial. In that case there is a default position of acquittal, which applies if there is no evidence one way or the other. The onus is on a prosecutor to prove guilt beyond a reasonable doubt in order for the court to reject this initial hypothesis. An acquittal means that there was not enough evidence to convict, which is not the same as saying that there was positive evidence of innocence. In other words, absence of evidence is not the same as evidence of absence. Similarly, in a classical hypothesis test, the null hypothesis is given a privileged status as the default hypothesis, and the test checks if there is enough evidence to falsify this hypothesis beyond some specified evidentiary level. If there is not enough evidence to reject the null hypothesis, this merely means that it is not falsified, which is different from saying that there is evidence in its favour.


How do you choose the null hypothesis in a classical hypothesis test? For a test of independence vs. dependence, the former hypothesis is a simple hypothesis (leading to one specific model) whereas the latter is a compound hypothesis (leading to a set of possible models). In cases like this it is usual to take the simple hypothesis of independence as the null hypothesis, since this allows you to specify a null distribution for the test statistic, without any further model formulation. It would be possible to perform a hypothesis test the other way, with dependence as the null hypothesis, but this would require specification of the form of the dependence, so as to allow you to obtain a null distribution for the test.


Is there any symmetric alternative to classical hypothesis testing? Yes, there is. In Bayesian hypothesis testing you would formulate a model that encapsulates independence and dependence as special cases, and you would also formulate prior beliefs for the parameters of your model under each of these special cases. You would then calculate the posterior probability of the hypotheses as a function of your prior probabilities for the hypotheses. This method of testing is fully symmetric in the sense that it makes no difference if you reverse the labels for the hypotheses.


Ben
  • 124,856