Is it possible for the power curve to be lower under some alternative parameter value than under the null parameter value?

Question

Let $p_{n\theta}$ be the power function for testing $H_0:\theta=\theta_0$ versus $H_0:\theta\neq\theta_0$ using test statistic $T_n$. Is it possible that $p_{n\theta}<p_{n\theta_0}$ for some $\theta\neq\theta_0$ and $n$? If it's possible, can you give an example (both analytical example and simulations are ok).

My guess is that it's entirely possible, but I don't know how to construct an example.

Or even a non-awful test. For example, an exact test for a binomial parameter often has its size considerably below the stated significance level, and the power curve has a sawtooth appearance due to the discreteness of the data. — Russ Lenth, Aug 26 '21 at 23:10
Oops, I may be mistaken here. In the sawtooth i got phenomena, I was thinking of power vs. sample size rather than power vs. the parameter value. — Russ Lenth, Aug 27 '21 at 16:17
@RussLenth You were right about there being non-awful examples though. My answer includes three examples that wouldn't usually be considered awful (though the sample size on one of them was very small; on the other hand, one of the examples had a vary large sample size, the first was in between). — Glen_b, Aug 29 '21 at 07:54
Example: https://stats.stackexchange.com/questions/279162/unbiased-hypothesis-test-what-does-it-mean-actually/279199#279199 — Christoph Hanck, Sep 21 '21 at 07:12

Glen_b · Accepted Answer · 2023-09-09T05:08:45.357

Sure, it's possible. Tests that do this are called biased tests. Many tests in common use have some bias against some alternatives, generally at smaller sample sizes. This is notably the case where the alternatives are very broad.

A common example is a chi squared goodness of fit with unequal probabilities; it's biased for some alternatives; sometimes the bias is surprisingly large.

An example of such bias for usual chi-squared goodness of fit for the multinomial occurs when the null is $\pi = (0.1, 0.2, 0.7)$. If you use the chi-squared $5\%$ critical value the actual significance level is about $5.2\%$ for $n=50$. Against the alternative $\pi = (\frac{27}{300}, \frac{58}{300}, \frac{215}{300})$, when $n=50$ the power is about $4.5\%$ (rejection rates based on $10^6$ simulations).
Indeed omnibus distributional goodness of fit tests are typically biased against some alternatives. The Kolmogorov-Smirnov and Anderson-Darling tests provide additional examples (e.g. vs light tailed alternatives, most severely for the Anderson-Darling). An example of testing a uniform against symmetric beta alternatives is here, the plot is reproduced below:

Here's another example I did a couple of years ago where you can draw a power curve and see the bias. In this case it's comparing mortality from a standard life table with simulated observed deaths where there's a constant percentage shift up or down in actual mortality rate from the standard table ($q_{x}=(1+\theta)q_x^s$ for adult ages). Here the model is not multinomial but rather a vector of binomials (which can also be formed as a $2\times k$ chi-squared table, but with $k$ degrees of freedom, since the total of observed frequencies and expected frequencies are not required to match across ages).

The hypothesized mortality is from a standard national life table, and the sample size is very large (representing observed mortality across age for membership of a hypothetical largish pension fund). The lowest rejection rate - the lowest point on the power curve - occurs near $\theta=-0.1$ (mortality is $90\%$ of hypothesized mortality), where the power is below $3.5\%$, and this is in spite of the actual type I error rate being above the nominal $\alpha=0.05$ (the true significance level is around $6\%$).

This bias would be a problem for a pension fund since a substantial drop in mortality would adversely impact their financial position. This sort of test is pretty commonly used for such a purpose, so such poor performance in a situation the fund would definitely want to detect the change should be a concern - a test that is fairly commonly taught for detecting shifts in mortality is unable to detect just the sort of change they need to be able to identify. The specific alternative is somewhat artificial, since we don't expect mortality change to be at a constant percentage rate across the range of adult ages, but it serves the purpose of illustrating a problem with using a chi-squared test to detect potentially lower mortality more broadly.

On the other hand the test does fairly well at detecting increases in mortality, albeit the power is likely well below what a life insurer (rather than a pension fund) would like.

One explanation for the chi-square phenomenon is that you're pulling a bait-and-switch: the right power curve to construct should consider the intended alternatives for the test; but by considering all possible alternatives, you are contemplating applying the test in an unintended way. This is not a criticism! Indeed, it points out one way to construct many practical examples of nonmonotonic power functions. — whuber, Aug 27 '21 at 13:52
I'm not sure I follow your intent there, sorry. The usual chi-squared goodness of fit test is a test for a specified set of multinomial probabilities; usually the intended alternatives would be any other sets of multinomial probabilities. There are other situations, such as test of a vector of specified binomial probabilities which would also be chi-squared (in this case still a test of goodness of fit but not the usual one) and which also have a bias problem. — Glen_b, Aug 27 '21 at 17:43
@whuber I included a specific example for the chi-squared; I don't know if that helps clarify the claim. — Glen_b, Aug 28 '21 at 04:00

Is it possible for the power curve to be lower under some alternative parameter value than under the null parameter value?

1 Answers1

Linked