Does the null distribution matter when choosing between goodness-of-fit tests?

Question

There are multiple goodness of fit tests such as Kolmogorov-Smirnov, Cramer-von Mises, Anderson-Darling, and Kuiper's. Which one shall one use? There are threads (e.g. 1, 2, 3) and a nice textbook chapter (García-Portugués "Notes for Nonparametric Statistics" section 6.1) discussing that and related questions. My takeaway is that none of these tests uniformly dominates any other one in terms of power over all possible alternatives. Therefore, we cannot eliminate any of them right away. Still, if we have an idea about what the alternative is likely to be, we can choose a test that has the highest power against it.

Meanwhile, I am interested in a different aspect of comparing these tests and choosing between them. Does the null distribution of the random variable of interest $W$ (not of the test statistic) matter at all for the comparison? E.g. does it matter whether $\text{H}_0\colon \ W\sim\text{Uniform}[0,1]$ or $\text{H}_0\colon \ W\sim\text{Normal}(0,1)$?

Concretely, I am interested in testing whether the probability integral transform (w.r.t. my model) of a given time series is Uniform[0,1]. (If not, that indicates my model is inadequate.) Should this knowledge influence my choice between the goodness-of-fit tests in any way?

score 1 · Answer 1 · answered Jan 09 '24 at 09:17

If I understand correctly, the situation is as follows: some part of a model involves a specific distributional assumption $F$, which is used, say, through maximum likelihood estimation, to obtain estimates of parameters and predicted series of a variable $\{\hat W\}$ whose population counterpart $\{W\}$ is the one assumed to follow $F$.

Then one could think that a criterion to assess the distributional assumption would be

$$F\left(\{\hat W\}\right) \sim_{approx} U(0,1)\;??$$

in which case why not go further and examine

$$\Phi^{-1}\left[F\left(\{\hat W\}\right)\right] \sim_{approx} {\rm} N(0,1)\; ??$$

I have used this approach in my research, but

a) this sequence of non-linear transformations likely contains the risk of magnifying the effect of the inherent approximations (you don't really believe that you hit the nail in the head with your distributional assumption, do you?)

b) There is an issue as to how well your predicted series behaves in the sample as a continuous random variable. It doesn't matter whether, conceptually, it is a continuous random variable, but how it behaves in your sample. The problem is that for a discrete random variable $Y$ following a distribution $G$, while it is true that $G^{-1}[U] \sim_d Y $, (that is the principle of inverse transform sampling) it is not true that $G(Y) \sim U(0,1)$. So if in your sample, $\{W\}$ (or if unobservable, $\{\hat W\}$) exhibits "few" distinct values and many ties or near-ties, it will "behave like" a discrete r.v., and the first step above is likely to be invalid in practice, even if $W\sim_d F$ indeed in the population.

Opinion: I don't think a reject/not reject approach (i.e. formal hypothesis testing) is the most appropriate here, but rather use plots like Q-Q or related that visualize how much the (transformed as above) empirical distribution of your predicted series deviates from what it is a assumed to be.

I think my question is something like this. If we have $\text{H}_0\colon \ W\sim\text{Normal}(0,1)$, could you say "Do not use Anderson-Darling, use Kolmogorov-Smirnov instead" based on only $\text{H}_0$, ignoring for now what deviations from $\text{H}_0$ we might be envisioning as likely. And at the same time if $\text{H}_0\colon \ W\sim\text{Uniform}[0,1]$, you would say "Do not use Kolmogorov-Smirnov, use Anderson-Darling instead", so that your recommendation would depend on $\text{H}_0$. — Richard Hardy, Jan 09 '24 at 09:43

Does the null distribution matter when choosing between goodness-of-fit tests?

1 Answers1

Linked