How to choose the test statistic in a hypothesis test?

Question

I've read somewhere that we often choose an estimator for the parameter in H0, H1 as the test statistic. I've also noticed that we use the sampling distribution as the chosen distribution very often.

Do we always do this?

Please specify your question to some more detail as it is too wide to answer in its current state. — Rachel, Aug 23 '16 at 06:35
I will give an example. Let's say H0: miu (population mean) = 10. OK, right now what do I do? I just think of the sampling distribution of the sample mean or try to find an unbiased estimator for miu and then find out the distribution of that estimator? Or something else? — user_anon, Aug 23 '16 at 06:40

Glen_b · Accepted Answer · 2016-08-23T11:32:15.700

At heart for a test to work well (at least, to tend to give better chance of rejection under the null than under the alternative), you need the test statistic to behave differently under $H_0$ and $H_1$. In many cases of a hypothesis test involving a single parameter, an obvious test statistic would be some estimator for that parameter -- a reasonable estimator will indeed tend to have a different sampling distribution under the null than under the alternative, since a useful estimator will tend to be close in some sense to the true value and those are different.

Note that it might not necessarily be the "obvious" estimator such as sample mean for population mean, since once you specify the distribution you're sampling from there may be a much more efficient way to estimate it. Alternatively, you might deliberately construct a robust estimator with the aim of getting something that will perform reasonably even if the assumed distribution from which the data came were somewhat misspecified.

[You may want to look into pivotal quantities which are often used either as test statistics or as the basis for forming confidence intervals.]

However, this is not a requirement -- for example, sometimes with likelihood ratio tests (which are widely used) the test statistic (the ratio of the likelihoods evaluated at their respective maxima) doesn't necessarily correspond to an obvious/explicit estimator of the parameter(s), though the values of the maximum likelihood estimates themselves come into the computation of the likelihood.
Naturally, if you have a test statistic, the sampling distribution of that statistic under the null hypothesis is the distribution you use to define the boundary of your rejection region (but the sampling distribution under the alternative shows you which direction the rejection region should go if you want good power against that alternative).

Sometimes it's hard to evaluate the exact distribution in small samples, and then sometimes people use asymptotic approximations for the sampling distribution.

We don't necessarily need to be able to explicitly compute the sampling distribution under the null. For example, we might be able to simulate values from the distribution of the test statistic and so get approximate critical values or p-vales as desired. [Or we might not even be willing to specify a distribution for the data and so construct a permutation test/randomization test or a bootstrap test using our test statistic.]

To address the title question -- how to choose the test statistic -- there are many ways to do so, as the foregoing suggests. However, generally people seek to have tests with high power, and as a result many tests are based on the likelihood ratio, because of the Neyman-Pearson lemma.

However, we need not work that way (what if I don't know what distribution my data were drawn from? I almost never do) and can devise almost any statistic we might think would do well - not all choices will lead to tests with good properties, naturally.

Indeed in some cases, particularly when the alternatives may be rather broad (goodness of fit testing is one example), there's almost never a uniformly most powerful test, and then the choice of test statistic might depend on the kinds of alternatives you'd like to be able to pick up, and many different tests might be used in different situations.

--

(Edit in response to comment)

Let's look at the statistic for a one sample test of variance under sampling from a normal distribution. ($H_0: \sigma^2 = \sigma_0^2$ vs $H_1: \sigma^2 \neq \sigma_0^2$ to take the two-tailed-test specifically)

Now the "obvious" statistic for testing a variance would be $s^2$, but the problem is that the distribution of that test statistic depends on $\sigma^2$, the variance of the population it comes from. [This would not be a problem for a permutation test however - we could actually use $s^2$ as the test statistic without difficulty in that case. But let's leave the case of permutation tests aside.]

We actually only need to worry about the distribution under the null hypothesis, so the distribution of $s^2$ depends on the particular value for $\sigma_0^2$. Specifically under sampling from a normal distribution with variance $\sigma_0^2$, $s^2$ has a distribution from the gamma family whose mean is $\sigma_0^2$ and whose variance is $\frac{2}{n-1}\sigma_0^2$.

That's not a problem as such, but it would mean we'd need a different table for every possible $\sigma_0^2$, which would make for very large set of tables (not so much an issue on a computer, where you could just calculate the whole distribution on the fly but it was a big issue when people had to carry tables).

This is where pivotal quantities come in (which I mentioned before). The ratio $Q_0 = s^2/\sigma_0^2$ has a distribution that doesn't depend on the value for $\sigma_0^2$ -- at a given sample size the distribution is always the same no matter which particular value $\sigma_0^2$ has. So now we'd only need a new table for every different sample size. This is a much smaller problem. (Specifically, it has a gamma distribution with mean $1$ and variance $2/(n-1)$ ... this gamma distribution is now a scaled version of a chi-squared distribution. The chi-squared distribution was first derived in relation to the sample variance from a normal distribution by Helmert in 1876, though its name arises because of the notation Karl Pearson used when writing the multivariate normal distribution.)

So then - because the chi-squared was already well known in exactly this context - it would only need us to rescale from $Q_0$ to $Q_1=(n-1)Q_0$ and we'd have that $Q_1$ has a chi-squared distribution with $n-1$ degrees of freedom. Now we're dealing with a situation where some tables already existed, which is even more convenient.

Thank you very much for your reply! I also try to find the answer to other questions regarding what you said, but I am not used to the 'likelihood' concept. I would like to know why Chi test for Variance link uses that statistic. The mean of that test (which I calculated in order to see if it is an unbiased estimator) is n-1 and not the variance... :( — user_anon, Aug 23 '16 at 09:34
@user_anon see the discussion added to the bottom of the answer — Glen_b, Aug 23 '16 at 11:32
Wow! Thanks again! :) You mentioned "obvious" statistic. Did you mean just taking the statistic from null/alternative hypothesis (here, the variance) or thinking about the statistic from null/alternative hypothesis (here, the variance), checking its expectancy to see if it's an unbiased estimator for the population variance, and then taking this statistic only if it is an unbiased estimator? — user_anon, Aug 23 '16 at 12:09
Sorry, I've just come from confidence intervals where you MUST have the sampling distribution of an unbiased estimator and now, here, at hypothesis testing, I just don't know if it is a "must" just like in confidence intervals... — user_anon, Aug 23 '16 at 12:09

How to choose the test statistic in a hypothesis test?

1 Answers1

Linked