0

Suppose that there are 2 models,

  1. $y$ ~ $x_1+x_2+x_2^2+x_1:x_2+x_1:x_2^2$
  2. $y$ ~ $x_1+x_2+x_2^2+x_1:x_2^2$

For both models, their adjusted $R^2$ values are the same and BIC values are similar with the 2nd model having a slightly lower BIC. However, the interaction term, $x_1:x_2$ in model 1 is insignificant.

Not quite sure which model I should adopt as a result. More specifically, do I need to include the interaction term with all present polynomial orders for $X_2$? Are there any mathematical considerations in doing so?

scooch
  • 11
  • At https://stats.stackexchange.com/a/408855/919 I discuss a few of the considerations related to using polynomials in multiple variables for regressors. See the section titled "polynomials in multiple variables." – whuber Oct 06 '22 at 14:54

1 Answers1

1

You can think about the $x_1:x_2^2$ term as an interaction between $x_1:x_2$ and $x_2$. In that context consider:

Main effects are not significant anymore after adding interaction terms in my linear regression

with respect to your observation that "the interaction term, $x_1:x_2$ in model 1 is insignificant."

For a choice between model 1 and model 2, that's mostly a function of how you want to apply your model and your tradeoff between accuracy and parsimony. If model 1 isn't overfit, then it would be fine. If you do choose model 1, however, maintain its "insignificant" $x_1:x_2$ term. See:

Including the interaction but not the main effects in a model

More generally, think about whether a polynomial like this is what you want to use as a predictor. Unless there's some theoretical reason for that fixed polynomial form, a regression spline might work better. See this page for example, and Section 2.4 of Frank Harrell's course notes.

EdM
  • 92,183
  • 10
  • 92
  • 267