I inputted my data sets on r and it spit 2 AIC, one with interactions and one without it. Without Interactions I got 682.4, and with interactions I got an AIC of 684, the difference is minimal, but I do want to understand what does that mean.
4 Answers
I recommend Burnham & Anderson's book Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach. They explicitly discuss differences in AIC.
A difference of less than 2 is not a lot of evidence that the model with the lower AIC is truly a better description of the data. (Technically: that it has lower Kullback-Leibler difference from the true data-generating process.) In such a case, Burnham & Anderson recommend going with the simpler model.
In their parlance, AIC differences of 5-10 constitute certain evidence, and AIC differences larger than 10 strong evidence in favor of the model with the lower AIC.
- 123,354
AIC is the Akaike information criterion (wiki). In general, a model with smaller AIC is a better model. As an alternative, people also use BIC to choose model. For Bayesian models, DIC is also very popular. In modern statistics, we often use cross validation to choose models instead of AIC.
- 101
Without more information I believe you have two models, one with interaction and one without. Although your interaction model should fit no worse than your model without, however, it has been penalised by AIC for more parameters in your model. Your AIC tell you that you should stick with the model without interaction because it's AIC value is less.
- 7,211
Looking at the AIC formula, the parts that are model-dependent are goodness-of-fit (usually RSS) and dimensionality (d).
The formula penalize for both higher RSS (i.e. low goodness-of-fit) and higher d.
It's hard to get an intuition on which "pulls" stronger, since d is multiplied by 2, but RSS squares inaccurate predictions.
Feature interactions promotes a lower RSS (a more complicated, and thus a more accurate, model), while directly adding to d (d' = d + # of interactions).
So, that probably explains why the 2 AIC results are close, yet the non-interactions one is a bit better (lower AIC is better). For that reason, and for many other good reasons to keep your model dimensionality low, I recommend going with the non-interactions one.
- 281