7

Many sources suggest that there is a duality between confidence intervals and hypothesis testing.(*) But I'm having trouble making sense of this philosophically. The frequentist interpretation of a confidence interval is something like (per wikipedia):

Were this procedure to be repeated on multiple samples, the calculated [90%] confidence interval (which would differ for each sample) would encompass the true population parameter 90% of the time.

Yet the p-value is defined in terms of values the sample mean might take on if the null hypothesis is true. (I.e. in the one-tailed case: $p = P(\bar x\ge \bar x_{observed}\mid\mu = H_0)$).

How is it possible to manipulate a statement about a procedure that is likely to correctly bound the true population mean into a statement about the probability of the observed sample mean?

If we understand the confidence interval as characterizing the distribution of the means of samples from a population (a view the bootstrap procedure invites), then there's no problem. There's an obvious symmetry between the case in which there is a < 5% chance of the sample mean being more extreme than $H_0$, given the actual population (i.e. $H_0$ outside of the 95% CI) and the case in which there is a < 5% chance of getting a sample mean as extreme as was observed, given that the population is really centered at $H_0$ (i.e. $p<0.05$).

But this interpretation of CIs seems to be disfavored! In particular, the wikipedia article admonishes: "A confidence interval is not a range of plausible values for the sample mean, though it may be understood as an estimate of plausible values for the population parameter."

Even if the CI is in fact a range of plausible values of the sample mean, a question remains. How precisely is such a definition equivalent to the frequentist procedure definition above?

(*) A good example is this Minitab blog post:

The confidence level is equivalent to 1 – the alpha level. So, if your significance level is 0.05, the corresponding confidence level is 95%.

  • If the P value is less than your significance (alpha) level, the hypothesis test is statistically significant.
  • If the confidence interval does not contain the null hypothesis value, the results are statistically significant.
  • If the P value is less than alpha, the confidence interval will not contain the null hypothesis value.

1 Answers1

3

You use different null hypotheses in each situation.

When performing a hypothesis test, you set the null hypothesis to some value you are attempting to test the implausibility of. Let's consider the following model:

$$ Y = \beta * X + \epsilon $$

You will collect some data and with it, compute an estimate of $\beta$, which we will call $\hat{\beta}$. Then, you will generally set up a hypothesis test as follows:

$$ H_0 : \beta = 0 $$ $$ H_1 : \beta \neq 0 $$

The p-value is computed in the ordinary way for whatever test you are using. To compute a confidence interval, you use the following null hypothesis, which tests whether or not the true value for $\beta$ is equal to the estimated value you observed.

$$ H_0 : \beta = \hat{\beta} $$ $$ H_1 : \beta \neq \hat{\beta} $$

Say you are trying to compute a 95% confidence interval. You would find the bounds of the rejection region of the null distribution (that is, where p = 0.025 for both one-tailed tests) and, after converting your test statistic back to the units of $\beta$, you have your confidence interval.

This is where the duality of hypothesis testing and confidence interval computation comes in - this confidence interval contains the true value for $\beta$ for the same reason that setting $\alpha$ to 0.05 gives you a 5% Type I error rate. Of course, this depends on your test of choice actually being able to maintain the nominal Type I error rate for your dataset, but that's a separate issue entirely.

rishi-k
  • 511
  • +1 A way I like to think about it is that $p$ is the value such that a $(1 - p)%$ confidence interval will have $\hat{\beta}$ as an endpoint. – Dave Jun 29 '21 at 20:52