1

I am working on building a logistic regression model. I have 5 variables A, B, C, D, and E. Based on my domain knowledge, I know that A can interact with B, C, D, and E. But the condition is I can only add 1 interaction term to my model. Assuming, all the interaction terms, when individually added to the model, are statistically significant. How do I choose the best interaction term and explain my decision?

I am guessing that the lowest p-value and the variable with the highest coefficient value prior to adding the interaction term is the one I should select.

I appreciate your answer.

Lee Lu
  • 11
  • 4
    (1) What is your goal? Inference or prediction? (2) Why can you only add one interaction term? – Stephan Kolassa Oct 12 '22 at 09:57
  • (1) My goal is prediction (2) No particular reason. I just want to understand a bit more about an interaction term – Lee Lu Oct 12 '22 at 10:22
  • 4
    If your goal is inference, then any model selection step (especially anything that relies on p values and does any kind of "stepwise" model selection) will render the standard distributional results invalid, so you can't just use the p values your software outputs on the final model (see here). Instead, you need to account for this data snooping in a very careful way. This is highly nontrivial. Don't do this unless you know what you are doing. – Stephan Kolassa Oct 12 '22 at 10:29
  • 1
    The info you present, including the scenario with its artificial constraint, tell me that what’s important is not any single answer to this question, but that you continue to learn model-building strategies thru books, courses, coaches, and so on. Cheers ~ – rolando2 Oct 12 '22 at 12:59

0 Answers0