1

This a folow-up of this question Independence in Poisson regression when used for rates estimation

I have a set of thousands of observations. Each relates to an individual, and for eachI have the date of entry in the study, the date of exit, the date of death, an indicator for smoking or not, the birth date, the gender (and a few others that don't matter here). I cut the whole interval of time covered from the first entry to the last at each time something happen (someone enters or leaves the study, or a covariate changes). For each interval of time, and for each set of similar age and other covariates, I compute the total exposition, and the number of deaths. And this way I get "pseudo-observations" with a number of deaths, an exposition, and covariates. Then I fit a Poisson tegression (generalised linear model, with exposition as offset).

How to assess the goodness of fit of such a model ? (you may read Jonny Lomond's answer to the previous question)

Nick Cox
  • 56,404
  • 8
  • 127
  • 185

1 Answers1

1

You are correct that the non-dependence of the pseudo-observations create a situation where treating them as independent is misleading. That said nothing is totally lost as we can still provide reasonable diagnostics for our goodness of fit.

First and foremost, assuming that these "pseudo-observations" now are treated as our real observations the first thing to do is actually show "how well" the resulting model procedure fits those. As such, the suggestions in the CV.SE thread: Diagnostic plots for count regression should carry forward. I would especially focus on showing the relevant rootograms, they are very informative and quite underused in my opinion (see: Visualizing Count Data Regressions Using Rootograms (2014) by Kleiber & Zeileis for more details). Graphical diagnostics (rootograms, leverage points, etc.) should be well-behaved irrespective of having "pseudo" or "real" observations.

Regarding testing procedures: having a correlated sample means that it is unclear how the degrees of freedom are calculated in such a model, i.e. what are our effective degrees of freedom given the "association" induced in our data. This is for example a pertinent point when dealing with mixed-effects models too.

The obvious solution here is to bootstrap at person level and re-run our analysis, i.e. the procedure of creating the "pseudo" observations in part of the modelling procedure so it is subjected to the sampling variation. This should allow you a reasonable approximation of our null. Side-note: One might also be tempted to be (over)-conservative and possible run all usual tests but diving the degree of freedom our pseudo-observation sample implies, by two. This would serve as our (crude approximation to) effective degrees of freedom, I wouldn't strongly recommend it but if it results in reasonable test statistics it can offer an additional tick-mark. I would suggest looking more carefully at the literature of analysis hierarchical data.

To recap: Run the usual graphical diagnostics, those should look good. Consider bootstrapping at the appropriate sampling unit to get nulls. If pressed, use a conservative approximation to the model's DoF.

usεr11852
  • 44,125
  • Thank you for this detailed answer, I learned some things. But the issue is that my pseudo-observations are not independant, and anything regarding residuals or involving the number of observation is irrelevant .. – MrSmithGoesToWashington Mar 11 '22 at 13:21
  • That is exactly the point and that's why I suggest first looking at the graphical diagnostics like a rootogram. If that "fails", there is nothing to be said about the goodness of fit. Subsequently, as the creation of the null is unclear, I suggested you bootstrap at the appropriate level (i.e. patient prior to pseudo-observations creation) rather than blindly bootstrap the Poisson model itself. Finally, as the actual number of observations is "misleading" in the sense that it doesn't reflect accurately the sample size for a DoF calculation I briefly comment on an aggressive way to get ESS. – usεr11852 Mar 11 '22 at 13:44
  • (Thanks for the feedback, I will edit my answer accordingly) – usεr11852 Mar 11 '22 at 13:46
  • Thanks a lot for you precisions. I will need a few days before trying what you suggest and will then give you feedback – MrSmithGoesToWashington Mar 11 '22 at 14:16