What does this pattern in residuals mean? I have seen this pattern in other models as well. One prime example is on the DHARMa vignette : https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html
-
1I recommend reading the DHARMa vignette - your question is answered there! – Florian Hartig Mar 30 '21 at 17:06
3 Answers
You could consider using randomized quantile residuals, which use randomization to average out the discrete patterns that appear in residuals from count response data. Randomized quantile residuals are the only type of residuals that are follow an exact normal distribution for non-normal generalized linear models. For example, if y is your vector of Poisson counts, you might use:
fit <- glm(y ~ x, family=Poisson())
library(statmod)
res <- qresiduals(fit)
plot(x, res)
Randomized quantile residuals were proposed by Dunn & Smyth (1996). An independent comparison study by Feng et al (2020) showed that randomized quantile residuals perform better than traditional residuals for diagnosing model problems for count models. Randomized quantile residuals are also used in the DHARMa vignette that you cite in your question.
References
Dunn PK, Smyth GK (1996) Randomized quantile residuals. J Comput Graph Stat 5(3), 236–244. https://doi.org/10.1080/10618600.1996.10474708
Feng C, Li L, Sadeghpour A (2020). A comparison of residual diagnosis tools for diagnosing regression models for count data. BMC Med Res Methodol 20, 175. https://doi.org/10.1186/s12874-020-01055-2
- 12,807
I suspect this is the result of fitting a logistic regression to binary data and computing the residuals. I can recreate a similar plot by doing so
These sorts of residuals aren't super useful, as the quantity $Y - \hat{Y}$ for logistic regression isn't a nice distribution unlike linear regression. There are a bunch of other types of residuals you can look at (See Frank Harrell's Regression Modelling Strategies) but I prefer the deviance residuals.
- 36,121
-
OK thanks, I understand. You were not far from the answer, my GLM has a Poisson distribution, but my data values only take 0, 1, 2. That's probably why there are 3 "lines" instead of 2. – christophe Mar 04 '21 at 15:48
For understanding what residuals should look like and if your residuals look like they should (no identifiable problems), or indicate some more work should be done I recommend reading this paper:
Buja, A., Cook, D. Hofmann, H., Lawrence, M. Lee, E.-K., Swayne, D.F and Wickham, H. (2009) Statistical Inference for exploratory data analysis and model diagnostics Phil. Trans. R. Soc. A 2009 367, 4361-4383 doi: 10.1098/rsta.2009.0120
Residuals from glm models have their own characteristics, even the different families within glm models will have different characteristics. The above paper has suggestions on simulating data where the assumptions hold, then plotting the residuals/diagnostics to familiarize yourself with what to expect, or even to put your actual plot in a "line up" with simulated models to see if the "real" diagnostic plot stands out enough to indicate a problem.
- 51,722

