1

I am running a series of mixed effect models, which include both linear and quadratic term of a variable T (continuous) and the main IV I (categorical), and facing a dilemma.

Model 2 include interaction terms with both the linear and quadratic T. Model 1 include only the quadratic interaction.

With Model 1, I obtained the lowest AIC and BIC. In comparison, for Model 2 the AIC is 5.8 higher and BIC is 26 higher. If I followed the rule of thumb for AIC/BIC-based selection, Model 1 is the one I should select (and the one I preferred).

On the other hand, there's also a lot of discussion regarding the necessity of including the lower order term interaction when quadratic interaction is in the model.

However, it will be very difficult for me to justify the selection of a model that's very unlikely (probability: exp(-5.8/2)=5%) to be better than the one with AICmin. Thus the dilemma.

I appreciate your time.

  • Did you mix things up here? You say you should select model 1 which has linear and quadratic term. Note by the way that some literature has AIC and BIC as "larger is better" and some as "lower is better"; you should always say which version you are referring to. 2) Is there a subject matter reason in your specific situation to not include a linear term? Sometimes it can be justified. 3) " model that's very unlikely (probability: exp(-5.8/2)=5%) to be better" - this looks like a misinterpretation of the AIC. There is no such probability for a model to be better, unless you go Bayesian.
  • – Christian Hennig Oct 02 '20 at 23:47
  • @Lewian Thanks for the comment. I follow the recommendation that the lower the AIC and BIC, the better. Model 2 has a AIC that's 5.8 higher and BIC that's 26 higher, compared with Model 1. And for the probability, I was using this guidance "the quantity exp((AICmin − AICi)/2) can be interpreted as being proportional to the probability that the ith model minimizes the (estimated) information loss." Finally, there is no theoretical justification for including or excluding the linear interaction term. – user6606453 Oct 03 '20 at 01:53
  • @Lewian running out of space :) We have always included the linear term T and quadratic term T^2, but we are unsure if we can include the interaction term I:T^2 while leaving out the interaction term I:T. – user6606453 Oct 03 '20 at 01:58
  • So your question says "With Model 2, I obtained the lowest AIC and BIC", but then that its AIC is "higher". This looks contradictory. You also use "lower is better" versions of AIC, so effectively it now reads like Model 1 is better according to AIC. Model 1 has linear and quadratic interactions as you state, so what's wrong with it? To me the things that you state look inconsistent and I strongly recommend to revise the writing of your question. – Christian Hennig Oct 03 '20 at 09:10
  • " I was using this guidance "the quantity exp((AICmin − AICi)/2) can be interpreted as being proportional to the probability that the ith model minimizes the (estimated) information loss." You don't state where this comes from and within what model this "probability" is defined. The estimated information loss is observable, so it does not make sense to talk about a probability to minimise it. The guidance looks either confused or taken out of context. – Christian Hennig Oct 03 '20 at 09:12
  • @Lewian Yes, it was a typo. I meant to say Model 1. I updated my post. – user6606453 Oct 03 '20 at 18:37
  • I still read that Model 1 which has the linear and quadratic interactions is actually better according to AIC, so what's the problem? – Christian Hennig Oct 03 '20 at 21:27
  • @Lewian Ah, now I see why you said I got things mixed up. Sorry about the confusion. Edited again. – user6606453 Oct 04 '20 at 04:15
  • OK, finally I'm with the first answer; I'd include the interaction terms anyway. AIC is not sacred. – Christian Hennig Oct 04 '20 at 10:44