Here: What do the residuals in a logistic regression mean? is a description of the different ways how one can generate residuals from a logistic regression.
My question concerns not the logistic regression specifically, but any regression with a binary outcome, apart from OLS and the generalized linear model for binary outcomes with a linear link.
Here: https://stats.stackexchange.com/a/118121/159259 I read "What you can do with plots of residuals against individual predictors is check to see if the functional form is properly specified. For example, if the residuals form a parabola, there is some curvature in the data that you have missed. To see an example, look at the second plot in @Glen_b's answer here: Checking model quality in linear regression. However, these issues don't apply with a binary predictor".
That's the point: I'd like to have residuals I can analyze wrt predictors even if the link is not the identity function. To do so, I'd need residuals that are uncorrelated with the predictors even in case of logit, probit or complementary log-log regressions. Is it possible?
Asked
Active
Viewed 44 times
1
Federico Tedeschi
- 1,212
-
Could you explain why correlation between residuals and predictors would be relevant? After all, correlation is not a part of logistic regression, either conceptually or mathematically. – whuber Dec 01 '22 at 20:59
-
There are three reasons for it: 1) When one wants to evaluate whether a specific variable helps in predicting a binary outcome controlling for a bunch of other predictors, sometimes doing so by including such variable in a logit/probit regression may lead to quasi-separation. Taking the Pearson’s residuals as outcome solves this issue: https://ir.stonybrook.edu/xmlui/bitstream/handle/11401/72390/000000684.sbu.pdf The fact that the same predictors used to build standardized residuals also predict them weakens the validity of such approach, in my opinion. – Federico Tedeschi Dec 01 '22 at 21:39
-
- When the link test or the Ramsey RESET test are violated, I'd like to understand whether the issue is related to a specific variable, that should appear in the model with a different functional form. I’d like to run this investigation by using residuals that are uncorrelated with all the predictors, to perform the correct-specification tests on variables one at a time, without creating many polynomial terms. 3) I struggle to understand how I can consider a variable as "residuals from a regression with given predictors", if such predictors are correlated with such residuals.
– Federico Tedeschi Dec 01 '22 at 21:46 -
As for my point 1 above (in my first comment), actually I meant the opposite, i.e. to use Pearson's residuals from a binary-outcome model with just ONE predictor to be regressed against many. Doing the opposite (first a model with a non-identity link and many covariates, then Pearson residuals used in a simple linear regression) would likely not solve the quasi-separation issue, so one should use some different methods, like Firth logit regression. – Federico Tedeschi Dec 02 '22 at 10:28
-
1The reason why you might struggle in (3) is because correlation is not relevant to logistic regression. Correlation assesses a degree of linearity of relationship, but logistic regression is not a linear model in this sense. – whuber Dec 02 '22 at 14:07
-
Ok: let me try to put it in a different way. Let's suppose I perform a logistic regression with outcome Y and predictor X. I interpret residuals from that regression as something that may be not further explained by X, unless we consider it in a different functional form (like a logarithm, a squared value, a fractional polynomial). If I can't observe it through a regression, how can I get this intuition? At the moment the only way I know to observe it is to perform a regression with the same link and the same outcome, using as regressors the fitted value from the previous regression and X. – Federico Tedeschi Dec 02 '22 at 14:18
-
The standard approach is to develop additional nonlinear functions of $X$ and include them in the logistic regression. – whuber Dec 02 '22 at 14:21