Why is it the convention to take equal tails in a two-tailed test with a statistic following a symmetric distribution?

Question

Is there a particular reason for conventionally dividing the tails equally in a two-tailed test? Consider an $\alpha$ level test with a statistic following standard normal distribution. Then, why do we take $(Z_{-\alpha/2},Z_{\alpha/2})$ as the confidence interval? Can it be proven to minimize the confidence interval width? Visually, I think it does, but is there a way to prove it?

Not to be flip, but ... what's a better convention? Some conventions in statistics are purely arbitrary (p = 0.05!) but this one seems like a natural. — Peter Flom, Aug 22 '23 at 17:50
@PeterFlom it does, I agree and I am not questioning it. But, is there a reason for this particular convention? Or is it simply a natural looking convention, is my question. — reyna, Aug 22 '23 at 17:52
The sense of "equal tails" is usually in terms of probability. That does not require the sampling distribution of the estimator to be symmetric. Whether that minimizes the interval width depends on the distribution and the confidence level. — whuber, Aug 22 '23 at 19:56

AdamO · Accepted Answer · 2023-08-22T18:44:18.803

You can absolutely propose a test with uneven tails. There is no written rule to do otherwise.

As @PeterFlom says, people have accepted equal tail tests because of a. convenience (you can take the absolute value or square of the test statistic, compare to the known distribution, and calculate only one $p$-value) and b. pragmatism, many statistical tests are simply pragmatic. In theory, reporting out the results of a significant two-sided tests affords the statistician no definitive comment about the directionality of effect, in practice most people accept significance, and then check direction against their assumptions. "Does alcohol rehab prevent recidivism? The two-sided test rejects the null, but by golly rehab increases recidivism?? The data must be wrong."

Ideally, a statistical hypothesis is connected to a decision - like to approve a drug, or to implement a policy, or even a flag, in a sense, to the scientific community. The agnosticism that usually accompanies two-sided hypotheses does not accord with a cost-benefit evaluation of science. However, if a compelling argument could be made about the respective costs of a type 1 error in each direction, you could justify uneven bounds. Otherwise, you would be expected to balance them because, if you truly don't know anything, the equal-tail test could be shown to be minimax.

There are examples. In practice, these types of hypotheses are posited as a family of hypotheses actually. In randomized clinical trials, a group sequential design might separately test the hypothesis that the drug is favorable $\mathcal{H}_{1,a}: \theta>0$ or that the drug causes harm $\mathcal{H}_{2,a}: \theta<0$ with a nominal alpha. Another example I have seen that is interesting is a sequential hypothesis design where one first tests for non-inferiority $\mathcal{H}_{1,a}: \theta> - d$ where $d$ is a non-inferiority margin, and then superiority $\mathcal{H}_{2,a}: \theta>0$.

score 5 · Answer 2 · answered Aug 22 '23 at 18:29

5

Why is it convention to take equal tails in a two-tail test with a statistic following a symemtric distribution?

It is not a convention. For example, people also use one-sided tests in which one tail is zero and the other tail at maximum.

The reasons to switch between a one-sided test and a two-sided test relates to the alternative hypothesis and the maximisation of the power of the test (indirect the size of the interval, but size can be determined in different ways and power is a good indirect way).

See the image below how the power of a one-tail test and two-tail test can differ depending on the true value of the parameter.

^{This image occurs in the question: Why does $\mu > 0$ (or even $\mu > \epsilon$) "seem easier” to substantiate than $\mu \neq 0$?}

These two cases are the extremes of either a fully symmetric interval or a maximally asymmetric interval. They are not really conventions but happen to be the most practical cases. It is not often that one has an in between situation, andthe problem case is often either one of the extremes.

answered Aug 22 '23 at 18:29

Sextus Empiricus

77,915

Where does the "one-and-a-half"-sided test fall on the power plot? I would imagine the 1-sided test is superior if you get the directionality right (u>0), but worse if you get it wrong. I could see "1.5" doing better than the 2-sided test by hedging your bet toward the direction you expect, so long as you get it right. – Nuclear Hoagie Aug 22 '23 at 18:43
I would argue that the 1 sided test is not comparable because we write the hypothesis differently. – AdamO Aug 22 '23 at 18:46
@NuclearHoagie other types of tests would fall in between the two cases of the current graph. Your idea of hedging sounds interesting, but I wonder what practical situation would relate to it. In the case that people have some previous nuanced believes about the direction of the effect, then possibly a Bayesian approach would make more sense. – Sextus Empiricus Aug 22 '23 at 18:57
@AdamO a one sided test can be written as using the same non-composite null hypothesis. It is the alternative hypothesis that is more important. – Sextus Empiricus Aug 22 '23 at 18:59

Why is it the convention to take equal tails in a two-tailed test with a statistic following a symmetric distribution?

2 Answers2