4

I am searching for a good criterion to measure the "goodness of fit" in generalized linear models. To make clear: I am not searching for a criterion which gives me an answer to the question "does overdispersion occur?". What do you think about Nagelkerke's pseudo R-squared? Any thought would be appreciated!

whuber
  • 322,774
MarkDollar
  • 5,955

2 Answers2

2

You can make a $R^2$ type quantity, by simply noting what $R^2$ is for normal OLS, but in the framework of an exponential family. An exponential family likelihood (with dispersion) can be written as follows

$$f(y_i|\mu_i,\phi)=\exp\left(\frac{y_ib(\mu_i)-c(\mu_i)}{\phi}+d(y_i,\phi)\right)$$

Where $b(.),c(.),d(.;.)$ are known functions. For normal OLS, we have $b(\mu_i)=\mu_i$, $c(\mu_i)=\frac{1}{2}\mu_i^2$ and $d(y_i,\phi)=-\frac{1}{2}\left(log(2\pi\phi)+\frac{1}{\phi}y_i^2\right)$. A goodness of fit test for each observation, or residual, can be obtained by using the scaled likelihood ratio test

$$d_i^2=2\phi\left(log[f(y_i|\mu_i=y_i,\phi)]-log[f(y_i|\mu_i=\hat{\mu}_i,\phi)]\right)$$

$$=2\left[y_ib(y_i)-y_ib(\hat{\mu}_i)-c(y_i)+c(\hat{\mu}_i)\right]$$

This means that, in the OLS case, the squared deviance residual is given by:

$$d_i^2=2\left[y_i^2-y_i\hat{\mu}_i-\frac{1}{2}y_i^2+\frac{1}{2}\hat{\mu}_i^2\right]=[y_i-\hat{\mu}_i]^2=e_i^2$$

Which is just the ordinary squared residual. For OLS, we have $R^2=1-\frac{SSE}{SST}$ where SSE is the sum of squared residuals from the fitted model, and SST is the sum of squared residuals from the intercept only model. Hence, we can analogously define $R^2$ for GLMs as:

$$R^2_{GLM}=1-\frac{\sum_id_{i,model}^2}{\sum_id_{i,null}^2}$$

1

It will depend on what kind of GLM you're using and your data. For example, the Wald chi-square and likelihood test are good statistics for categorical data.

  • Hello! I'm not searching for tests that can compare nested models (you mentioned LR Tests). And I'm not searching for quasi t-tests (like Wald Test) to look for significance of the coefficients! I'm searching for something which is based on the (deviance) residuals, something like an adj. R^2 for linear models. I just want to say something like, "This model describes 70% of the deviation". – MarkDollar Jun 06 '11 at 08:27