1

I'm (re)studying hypothesis testing from the book All of Statistics by Larry Wasserman (2005, Chapter 10, link), because when I studied the concept in one of my university courses I didn't understand it very well.

I still don't understand some points that many may find simple, and here I'll review step by step the main concepts at the basis of hypothesis testing and p-values as presented in the book and then I'll point out my doubts.

Context

Wasserman begins by saying that first we choose a null hypothesis and a alternative hypothesis:

$H_0: \theta \in \Theta_0$ versus $H_1: \theta \in \Theta_1$

where $\theta$ is a parameter we want to test. The goal of a hypothesis test is finding a rejection region $R$ so that:

  • if $X \in R$ then we reject $H_0$
  • if $X \notin R$ then we don't reject $H_0$

Then the book talks about the type I and type II errors and introduces the concept of power function and size:

  • Power function: $\beta(\theta)=P_{\theta}(X \in R)$
  • Size: $\alpha=sup_{\theta \in \Theta_0} \beta(\theta)$

And it says:

A test is said to have level $\alpha$ if its size is less than or equal to $\alpha$.

Next it introduces the one and two-sided tests, and it then gives a one-sided test example.

Let $X_1, ..., X_n \sim N(\mu, \sigma)$ where $\sigma$ is known. We want to test $H_0: \mu \le0$ versus $H_1: \mu>0$. [...] reject $H_0$ if $T>c$, where $T=\bar{X}$.

Then it calculates the power function for the mean $\mu$:

$\beta(\mu)=P_{\mu}(\bar{X}>c)=P_{\mu}(\frac{\sqrt{n}(\bar{X}-\mu)}{\sigma}>\frac{\sqrt{n}(c-\mu)}{\sigma})=P(Z>\frac{\sqrt{n}(c-\mu)}{\sigma})=1-\Phi(\frac{\sqrt{n}(c-\mu)}{\sigma})$

Then it calculates the size:

$size=sup_{\mu\le0}\beta(\mu)=1-\Phi(\frac{\sqrt{n}c}{\sigma})$

And then:

For a size $\alpha$ test, we set this equal to $\alpha$ and solve for $c$ to get $c=\frac{\sigma\Phi^{-1}(1-\alpha)}{\sqrt{n}}$ We reject when $\bar{X}>\frac{\sigma\Phi^{-1}(1-\alpha)}{\sqrt{n}}$. Equivalently, we reject when $\frac{\sqrt{n}(X - 0)}{\sigma}>z_{\alpha}$

Then after some pages Wasserman introduces the concept of p-value:

$p-value=inf\{\alpha: T(X^n) \in R_{\alpha}\}$

That is, the p-value is the smallest level at which we can reject $H_0$. Informally, the p-value is a measure of the evidence against $H_0$: the smaller the p-value, the stronger the evidence against $H_0$.

The p-value is the probability (under $H_0$) of observing a value of the test statistic the same as or more extreme than what was actually observed.

Questions

As I understood, the steps in the example are:

  1. Define the null and alternative hypothesis on the basis of the parameter we test.
  2. Choose the test statistics. If the test statistic (from now on t.s.) is the sample mean as in the example given by Wasserman, for the central limit theorem we can approximate t.s. as a standard normal random variable.
  3. Choose the level $\alpha$ at which we perform the test. By definition, this number is the maximum of the probability that $X \in R$, where the maximization is made on the parameter. So can we say that $\alpha$ is the maximum of the probability of rejecting the null hypothesis? Notice that it's not like saying $\alpha$ is the probability that the null is true.
  4. After choosing $\alpha$ we calculate $R$. In the example $R$ is given by $c$, which is found thanks to the standard normal quantile at level $\alpha$. In this sense $\alpha$ represents the area under the normal curve at the right of the quantile. So the smaller $\alpha$, the smaller the area, the more the quantile must be to the right.
  5. Then we see if $\bar{X}>c$, that is if we reject the null. If $\bar{X}>c$, this means we chose a priori an $\alpha$ so big that we can say a posteriori that t.s. in the reject region (i.e. we had luck, assuming the null is true), that is we chose a probability threshold so big that we can say $\bar{X} \in R$. But if we choose a priori alpha too low, it can be that a posteriori t.s. is to the left of the quantile, hence $\bar{X} \notin R$ (i.e. we didn't have luck, assuming the null is true). But then, why the smaller the p-value, the stronger the evidence against the null? From the previous reasoning it seems that choosing a higher alpha would result in a higher probability of finding that t.s. is in $R$, hence a higher probability that the data suggests rejecting the null.

Also, if the p-value is found on the basis of the observed data, and p-value is the lowest alpha that gets me $\bar{X} \in R$, why I need the p-value if I already know the value of t.s. and hence I know if $\bar{X} \in R$ with certainty? You could say that the p-value is needed in order to find $R$, but since then $R$ is arbitrarily chosen, why can't I choose an alpha so big that I'm sure $\bar{X} \in R$?

  • 1
    "[...] Why can't I choose an alpha so big that I'm sure $\bar{X} \in R$". You can think about the consequences of choosing a high alpha level in decisions involving humans or animals. Do you want to reject the null at any cost, or do you want to reduce the risk of making a bad decision? – J-J-J Jun 09 '23 at 16:49
  • 4
    Please read the posts at https://stats.stackexchange.com/questions/31. – whuber Jun 09 '23 at 17:57
  • @J-J-J thanks. I don't quite fully understand your answer though: I don't want to reject the null at any cost, but then if it doesn't make sense to maximize the probability that $\bar{x} \in R$ (which I agree on), what is the goal? To see if at a certain alpha level the data suggests rejecting the null? But then why do I need this info if alpha is arbitrary? I'm confused. – SuperFluo Jun 10 '23 at 16:44
  • 1
    @SuperFluo alpha isn't (or shouldn't be) chosen arbitrarily. It should be a reasoned choice, made before conducting the test. See this answer for example: https://stats.stackexchange.com/a/245434/164936 – J-J-J Jun 11 '23 at 16:50

0 Answers0