I was wondering if the following is a reasonable way to proceed:
I have a number of logistic models, fitted using glm, that I want to use to make predictions. The models have a continuous variable (call it cont_var), and for some of the models, a Box-Tidwell test suggests a non-linear relationship between cont_var and the logit of the outcome.
Should I worry? Here's what I'm thinking:
For any model where there might be non-linearity, I use splines to fit that model, replacing cont_var with ns(cont_var, df = 4). Then, I make the same predictions I made with the original, non-spline model, and see if they give radically different predictions. If not, I conclude that any non-linearity isn't affecting my predictions.
Does that make sense?
mgcv::gamand let it choose the dfs (you can simply add the non-splines in the same way as inglm). That way you get its guess of the degree of non-linearity. When the edf (estimated d.f.) are around 1, cont_var has a near-linear effect and the glm is fine. – Carsten Feb 15 '20 at 21:35