0

I have a dataset of 300 independent variables and 120,000 observations for which I will perform a binary logistic regression on my dependent variable. Because of the linearity assumption between the logit transformed independent variables and my dependent variables, I am performing a Box-Tidwell test.

I read in the binary logistic regression manual with SPSS that

Although logistic regression is often thought of as having no assumptions, we do assume that the relationships between the continuous predictors and the logit (log odds) is linear. This assumption can be tested by including in the model interactions between the continuous predictors and their logs. If such an interaction is significant, then the assumption has been violated. I should caution you that sample size is a factor here too, so you should not be very concerned with a just significant interaction when sample sizes are large. If none of the log interactions is significant, remove those terms from the model, report that there were no problems with the assumption, and present the results of the model without the log interaction terms.

I really don't understand (1) why that would be the case, and (2) then how should I interpret the p-values (i.e. when to reject)?

I found a similar statement in here on slide 7, but without any further implications.

Does this imply that a Box-Tidwell test is not useful (does not apply) in my case here? Something to add is that my variables are probably highly collinear, however I wanted to check for linearity first since those variables will have to be removed anyways and I do not want to delete variables that are only collinear but do not violate the linearity assumption.

Jim R.
  • 53
  • Very large sample sizes often produce small $p$ values even when the effect is very small (not just in this case of interaction but in general). You will be able to see this from a small estimated interaction value – Henry Oct 10 '22 at 23:31
  • @Henry thanks. So that implies that I could falsely omit variables in my logistic regression because they had a p-value of p=0.045 e.g.? – Jim R. Oct 11 '22 at 08:20
  • See https://stats.stackexchange.com/questions/498672/valid-alternative-to-box-tidwell-method-for-linear-regression. Rather that Box-Tidwell, go for more modern methods, like splines. – kjetil b halvorsen Oct 11 '22 at 23:12

0 Answers0