Using splines to address non-linearity in logistic regression

Question

I was wondering if the following is a reasonable way to proceed:

I have a number of logistic models, fitted using glm, that I want to use to make predictions. The models have a continuous variable (call it cont_var), and for some of the models, a Box-Tidwell test suggests a non-linear relationship between cont_var and the logit of the outcome.

Should I worry? Here's what I'm thinking:

For any model where there might be non-linearity, I use splines to fit that model, replacing cont_var with ns(cont_var, df = 4). Then, I make the same predictions I made with the original, non-spline model, and see if they give radically different predictions. If not, I conclude that any non-linearity isn't affecting my predictions.

Does that make sense?

Yes. Personally, I'd use mgcv::gam and let it choose the dfs (you can simply add the non-splines in the same way as in glm). That way you get its guess of the degree of non-linearity. When the edf (estimated d.f.) are around 1, cont_var has a near-linear effect and the glm is fine. — Carsten, Feb 15 '20 at 21:35
very interesting question. I wonder if we use spline basis on logistic regression, what does the decision boundary looks like ... — Haitao Du, Feb 17 '20 at 12:40

score 3 · Accepted Answer · answered Feb 17 '20 at 12:17

Yes, that makes sense. For a post doing this, and showing some plots, see Make Nonlinear Smooth Interpretable in Logistic GAM Regression. You should not only compare the predictions numerically, but look at plots of the estimated nonlinear smooth.

For how to do formal tests, see Anova on logistic regressions linearity. Here is a list of other relevant Qs with answers.

Using splines to address non-linearity in logistic regression

1 Answers1

Linked

Related