5

As a MATLAB user, I have been using coefTest to perform linear hypothesis testing. For example in $y=\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3$, if I want to test if $\beta_1=\beta_2$, then I can simply use a linear contrast $$C=\begin{bmatrix}0&1&-1&0\end{bmatrix}.$$ Then, the test statistic will follow an $F$-distribution, whereby I can compute my $p$-value.

  • Does this hold for all generalized linear models? In particular, I am concerned about the general linear model (Gaussian case) and the logistic regression (binomial case).

  • If so, why does the test statistic, despite so many different instantiations of GLM, always follow an $F$-distribution?

It seems that many sources just take this as granted, probably because this is too basic. Yet, I need to understand why so as to get assured enough to use it. I would sincerely appreciate if someone can point me to an authoritative book.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185

1 Answers1

8

Why does the linear test static of GLM follow F-distribution?

It doesn't.

Then, the test statistic will follow an $F$-distribution [...] does this hold for all generalized linear models?

There's no result that establishes it in the general case, and indeed we can show (e.g. by simulation in particular instances) that it's not the case in general.

It holds for the Gaussian case, of course, but the derivation relies on the normality of the data. You can see it's not the case for logistic regression, since the data (and hence "F"-statistics based on the data) are discrete.

There is an asymptotic chi square result. This, combined with Slutsky's theorem should give us that the F-statistic will asymptotically be distributed as a scaled chi-square.

However, in sufficiently large samples (where how large "large" is will depend on a number of things), we might anticipate that The F distribution would still be approximately correct, since both the $F$ distribution being used to figure out p-values, and the actual distribution of the test statistic are both going to the same scaled chi-square distribution asymptotically.

We see the same issue with the common use of t-tests for parameter significance in GLMs (which many packages do) even though it's only t-distributed for the Gaussian case; for the others we only have an asymptotic normal result (but a similar argument for why the $t$ shouldn't do badly in sufficiently large samples can be made).

I don't have a good book suggestion. Some books give a handwavy argument for using the $F$ (some akin to mine above), others seem to ignore the need to justify it at all.

Glen_b
  • 282,281
  • I see. Thanks a lot. So in the Gaussian case, I can just use the $F$-distribution to test if $\beta_1=\beta_2$. For logistic regression case, I can do full-reduced model likelihood ratio test to test if $\beta_1=\beta_2$. Is this correct? Hmmm... So there is no such books at all? That is a bit tough for newbies like me... Where should I start learning linear hypothesis testing? – Sibbs Gambling Jun 14 '15 at 09:05
  • 1
  • Yes, you can use an F for the normal case and you can do a LR test for logistic regression (though there are other tests that may be suitable). You could use the F-test in large samples, but there's no guarantee that it will do better than the chi-square (for logistic regression I expect it won't).
  • – Glen_b Jun 14 '15 at 09:13
  • 1
  • I didn't say there were no books at all. I said that I didn't have a good suggestion for one. Specifically, you said you needed to understand why it was the case that the test was distributed F, and so asked for an authoritative reference. Leaving aside the false premise, I don't know of one that addresses the issue of the distribution of the F-statistic (outside of the normal case) that is both suitable for newbies and authoritative.
  • – Glen_b Jun 14 '15 at 09:13
  • Thanks again! I was misled to "this applies to all generalized linear models" by the "Definitions" paragraph. There is also a "Poisson" example on that page. The function indeed seems to be applicable to all generalized linear models. How do we reconcile this? – Sibbs Gambling Jun 14 '15 at 14:53
  • 1
    As I said, some people (and so books and packages) do use an F-test in that situation. I've given (in my answer) the only justification I'm aware of for doing so. To my mind, it's no more or less outrageous than using a t-test for a coefficient in the same circumstances, which lots of packages do (and which R stopped doing a while back, much to my delight). [If you want to understand why the Matlab people state the F-distribution for the statistic as if it was a plain fact instead of an approximation to an approximation, they'd need to justify it. They don't seem to offer any reference for it] – Glen_b Jun 14 '15 at 23:36