2

This has been bugging me for a while, and I'm not getting an answer where I study, so maybe someone here can help me out. (Don't worry, this isn't homework) Consider this question:

Assume that you, Bob and Nate have a height of 172, 174 and 169 cm. Also assume the heights to be independently normally distributed with a known variance of 50. We have a hypothesis that the average height of the people in this course is 188 cm. Test this hypothesis with a 5% significance.

When we did this in class, the teacher chose hypotheses as such:
$H_0 : \mu = 188$
$H_1: \mu \neq 188$
$T = \sqrt{3}\frac{\frac{1}{3}\sum_{i=1}^3\, X_i - 188}{\sqrt{50}} \approx -4$
Now using $R=(-\infty,z_{0.025}) \cup (z_{0.975, \infty})=(-\infty,-1.96) \cup (1.96,\infty)$
$T \in R \rightarrow$ we reject $H_0$

  1. My question is, didn't they make a mistake and flip the hypotheses around? imagine if we failed to reject $H_0$, wouldn't this mean that we would "probably erroneously" have to assume $\mu = 188$ for eternity, until shown otherwise?

  2. My follow up question is, is it possible to perform a hypothesis test with no prior knowledge? How would one test for:
    $H_0 : \mu \neq 188$
    $H_1: \mu = 188$

This is my first post here in stats, so I hope it doesn't break any rules. Many thanks for your help.

  • 1
    The short answer: you don't accept the null hypothesis, you reject to fail the alternative, so you keep your null (but you don't accept it). In other words you fail to show that this value is different from 188, but you don't really know. There are many posts on this site on this topic. – user2974951 Jan 19 '23 at 08:31
  • 1
    This particular example has some troubles in computing p-values (just imagine that your class has a size $n=3$), but that's another topic. – Sextus Empiricus Jan 19 '23 at 08:58
  • a hypothesis test with no prior knowledge: we use hypothesis testing to verify/falsify a theory/discovery. Under $H_0$ goes the current state of nature, and under the $H_1$ goes what we want to discover. – utobi Jan 19 '23 at 09:07
  • Any sample mean you might observe would be consonant with some value of $\mu$ in $H_0 : \mu \neq 188$. See Seeking to understand asymmetry in hypothesis testing. – Scortchi - Reinstate Monica Jan 19 '23 at 09:12

1 Answers1

1
  1. ...if we failed to reject $H_0$, wouldn't this mean that we would "probably erroneously" have to assume $\mu = 188$...

It means that your data, which indicated a smaller size than 188 cm, is not statistically significant. If the 188 cm would be true, then the magnitude of the observed discrepancy in the sample with that value of 188 cm may occur with a reasonable probability.

The sample of size 3 is not a good indicator that allows you to say that 188 cm is false. That is different from saying that 188 cm is true.

  1. is it possible to perform a hypothesis test with no prior knowledge?

There will always be assumptions about the model. For instance in your example there is an assumption that the height has a variance of 50 and can be approximated with a normal distribution.

However, for hypothesis testing there is no need for a prior such as in Bayesian analysis, where a hypothesis or a range of hypotheses is related with a probability (density) of being true.

Informally there might be ideas about prior probabilities for hypotheses. For instance, choices for significance levels (the cutoff value below which a p-value needs to be in order for us to make a decision) that are appropriate to use, are based on practical information. If in practice too many hypotheses are falsely rejected (afterwards we find out that there's too many false negatives), then people might decide to use a different (reduced) level of significance.

didn't they make a mistake and flip the hypotheses around?

You can have a way to more or less accept the hypothesis 188 cm. Or more precisely reject that the value is far away from 188 cm, and accept the hypothesis that the value is within some range around 188 cm.

This relates to tests for equivalence. See it for instance explained in an answer to this question: Why are standard frequentist hypotheses so uninteresting?

An example is two one-sided t-tests for equivalence testing and can be explained with the following image and can be considered as testing three hypotheses instead of two for the absolute difference

$$\begin{array}{}H_0&:& \text{|difference|} = 0\\ H_\epsilon&:& 0 <\text{|difference|} \leq \epsilon\\ H_\text{effect}&:& \epsilon < \text{|difference|} \end{array}$$

Below is a sketch of the position of the confidence interval within these 3 regions (unlike the typical sketch of TOST, there are actually 5 situations instead of 4).

plot of extended TOST

The point of observations and experiments is to find a data driven answer to questions by excluding/eliminating what is (probably) not the answer (Popper's falsification).

Null hypothesis testing does this in a somewhat crude manner and does not differentiate between the situations B, C, E. However, in many situations this is not all too much of a problem. In a lot of situations the problem is not to test tiny effects with $H_0: |\mu-\mu_0|<\epsilon$. The effect size is expected to be sufficiently large and above some $\epsilon$. In many practical cases testing $|\text{difference}| > \epsilon$ is nearly the same as $|\text{difference}| > 0$ and the null hypothesis test is a simplification. It is in the modern days of large amounts of data that effect sizes of $\epsilon$ play a role in results.

  • Thank you for your very thorough question, correcting many of my misconceptions. Just one question left. I'm still uneasy about the choice of $H_0$.

    In our course, we have learnt that, to choose the right hypotheses, one can look at so called "Type I" and "Type II error", somewhat informally described as:

    Type I error: $H_0$ is correct but we "go for" $H_1$. Type II error: $H_1$ is correct but we "go for" $H_0$. We should choose $H_0$ and $H_1$ such that type I error is "worse" than type II.

    In this case, II is "worse". Doesn't this imply we should use opposite hypotheses?

    – FlutterTubes Jan 20 '23 at 07:02
  • 1
    @FlutterTubes If you have a 'choice' in deciding what $H_0$ is then you shouldn't be really doing a hypothesis test. $H_0$ can be translated as 'zero effect hypothesis '. It is a hypothesis (that might not even be likely true) that you use to compare your observed effect with in a statistical sense ("how easily can the effect be observed considering the hypothetical case that the effect wouldn't be there", in other words "how good or bad is the precision of my experiment/test"). – Sextus Empiricus Jan 20 '23 at 07:09
  • Possibly you confuse 'choice of $H_0$' with the 'choice of significance level'? – Sextus Empiricus Jan 20 '23 at 07:16
  • The 188 cm seems like something that was chosen. But that is probably because of the hypothetical nature of the question. In real life problems that value of 188 cm might be something like a theoretically determined value that is a baseline. It would be some sort of standard. Take a different example, a bread factory makes 800 grams breads and wants to test occasionally whether the machines are still running ok (making bread that is not too different from 800 grams). They sample 3 breads of 798, 805, 809 grams, which is on average 4 grams too high. Should the machine be adjusted? – Sextus Empiricus Jan 20 '23 at 07:25