1

If I were to fit an example GAM model using mgcv:

m1 <- gam(cases~s(salary,bs="cs",k=50)+s(age,bs="cs",k=50)+
            s(density,bs="cs",k=50)+
            s(longitude,latitude,bs="ds",k=100),
          data=example,family=poisson(),method="REML")

And if

  • summary(m1) shows all smooth terms are significant
  • adjusted R-squared is 0.999
  • deviance explained is 99%
  • gam.check(m1) shows all k are penalized appropriately
  • the QQ plot looks to follow the line (bar a few points straying off on the tails)
  • resids vs linear predictor plot doesn't look like it's trending
  • histogram of residuals doesn't have too fat tails and follows closely to a gaussian distribution

Given all of this, why is it that a Chi Squared test on the deviance residuals suggest that the model fit is bad?

pchisq(m1$deviance,df=m1$df.residual,lower.tail=F)

which gives something close to zero (e.g. 1.234e-20).

summary(m1):

Parametric coefficients:
  Estimate Std. Error z value Pr(>|z|)    
(Intercept) 2.697446   0.003262   698.2   <2e-16 ***
  ---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms: edf Ref.df Chi.sq p-value
s(salary) 19.132 49.000 5793.49 <2e-16 *** s(age) 29.933 49.000 393.71 <2e-16 *** s(density) 36.198 49.000 354.83 <2e-16 *** s(longitude,latitude) 88.230 96.095 2545.63 <2e-16 ***


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) = 0.999 Deviance explained = 99.6% -REML = 7402.2 Scale est. = 1 n = 2155

Furthermore, should there be a "next course of action"? Like adding adding in variation of a covariate across space (e.g. ti(longitude, latitude, age)). If so, should this be done for each and every pair of covariates (salary,density etc.).

Knovolt
  • 15
  • Please the thread here: https://stats.stackexchange.com/questions/141177/ I think it answers your query. Also if it doesn't can you please add the output of the summary()? – usεr11852 May 07 '21 at 08:36
  • @usεr11852 added summary(). I have seen that thread before, they suggest that a 1-pchisq needs to be done or, equivalently, set lower.tail=FALSE. Yet my p-value is still close to zero. Are you/they are suggesting that pchisq(m1$deviance,df=m1$df.residual,lower.tail=F) is wrong, and that I need to get the difference between null deviance and model deviance? If so, where do I get the null deviance? The model m1 is the first model that I am using. – Knovolt May 07 '21 at 13:15
  • So if my p-value is close to zero, it implies that m1 is no better than the null model (so I interpret this as not exactly a good fit). If so, why does the bullet points outlined above suggest that the model looks good - seems contradictory? – Knovolt May 07 '21 at 13:18
  • The null model here is m0 <- glm(cases~1,data=cases, family=poisson). (Also, reasonably we probably want to ML to our GAM rather than REML to make the comparison fair). Then we just do pchisq(deviance(m0) - deviance(m1), df=m0$df.residual - m1$df.residual, lower.tail=FALSE). Notice that the latest result being "zero" is "good thing"; the p-value is exceptional small, so we can reject the null hypothesis that the two models are equal. – usεr11852 May 07 '21 at 22:13
  • Thank you for this information, I consolidated my main comment and some additional points to a post; please see my answer below. – usεr11852 May 07 '21 at 22:26

1 Answers1

1

The null model here is m0 <- glm(y~1,data=cases, family=poisson). (Also, reasonably we probably want to ML to our GAM rather than REML to make the comparison fair). Then we just do pchisq(deviance(m0) - deviance(m1), df=m0$df.residual - m1$df.residual, lower.tail=FALSE). Notice that this latest result being "zero" is "good thing"; the p-value is exceptional small, so we can reject the null hypothesis that the two models are equal.

I think the "next course of action" should be to justify why adjusted $R^2$ is so high (0.999). While counter-intuitive, I think that such extremely high values of $R^2$ should be scrutinised as to "why" they are so good; in effect this suggests we have no sampling errors, no unobserved confounders, no competing exposures, absolutely nothing aside the deterministic variation due to our observed features. Yeah, I have been to the Stats rodeo enough times to know there are bulls there...

usεr11852
  • 44,125
  • Yes, the p-value is now small indicating that the two models are not equal. But, does this also indicate that m1 is a good fit? Or is it a situation where the pchisq test used here is for telling me whether m1 is either bad or not? i.e. large p-value means the two models are practically equal and thus it's a bad fit, but small p-value means that m1 is NOT a bad fit (but can range anywhere from ok fit to amazing fit)? – Knovolt May 07 '21 at 23:10
  • The p-value itself it suggests we have some significant interaction between at least one of our predictor variables and our response variable; here m1 is not a bad fit compared to m0, it is obvious that some signal is captured, ergo m1 is a better model than m0 but we cannot really quantify by how much as you correctly recognise. It would be possible to have a "worse model" (with a significant $p$-value from pchisq compared to m0) if the $R^2$ was negative or something else pathological like that but clearly this is not the case here. – usεr11852 May 07 '21 at 23:42
  • So I guess I was using pchisq for the wrong reasons (being to test whether m1 was a good fit to the data) and should instead rely on other methods to determine the goodness of fit for this poisson GAM model? – Knovolt May 08 '21 at 03:25
  • Yes, you got it right now. :) m1 is "better" than m0 in terms of goodness-of-fit (GoF) for this data if that is the question; that said, assessing the goodness-of-fit (in practical terms: predictions) of any regression model cannot be a dichotomous decision as rejecting or failing to reject a null hypothesis. For example, providing a cross-validated estimate of RMSE or MAE would be much more relevant. But that is a rather large topic suitable for a separate question. :) – usεr11852 May 08 '21 at 03:43
  • Thank you very much for all your help! – Knovolt May 08 '21 at 05:18
  • Cool! I am glad I could help! – usεr11852 May 09 '21 at 02:29