2

I'd like to compare several distributions fitted to one dataset (of i.i.d. random variables) by AIC. Do there exist some specific rules of thumb for such a situation?
It seems that most of such rules are either for regression models or some extra conditions are needed (e.g. models are nested and sample sizes are large, as in book by Burnham and Anderson).

To be precise, I have a distribution $F$ and a distribution $G$, the sample size $n=100$, $$ AIC(G) - AIC(F) = 17.9. $$ Can I claim that there is substantial evidence that the fit of $F$ is better than that one of $G$?

Ievgen
  • 231
  • What do you mean by compare? Different aspects of that call for different tools. See https://stats.stackexchange.com/questions/198799. – Richard Hardy Mar 22 '22 at 07:40
  • Avoid "Rules of thumb." The best way to compare is by getting a good feeling for what log-probabilities are and what they mean. A model with an AIC that's 0.1 higher than another is "10% more likely" or "10% better, as judged by likelihood." – Closed Limelike Curves Mar 24 '22 at 01:47

2 Answers2

4

AIC penalizes the likelihood by the number of parameters of the model under the assumption that a less complex model is preferable. The rules of thumb you mentioned are not specific to regression models, you can use them here as well, though they are only rules of thumb, so there are no guarantees they would be meaningful.

When comparing the distributions, the number of parameters wouldn't usually be your biggest concern. In such a case, you care about comparing the likelihoods. If the models are nested, there are not only rules of thumb, but rigorous criteria for such comparison since the procedure is a likelihood-ratio test where the test statistic follows the $\chi^2$ distribution. In fact, likelihood-ratio tests are common criteria for judging goodness-of-fit in such cases.

Tim
  • 138,066
  • 3
    Surely AIC is useful for fitting distributions, regression models are just a particular form of describing a distribution (that also happens to be conditional on some covariates)? E.g. you could trade-off number of mixture components in a mixture distribution via AIC, surely? – Björn Mar 21 '22 at 12:06
  • 2
    @Björn ok, I improved the wording. – Tim Mar 21 '22 at 12:16
  • The likelihood ratio test follow the $\chi^2$ only for nested models, which is not necessarily the case in the OP question – matteo Mar 23 '22 at 11:41
  • @matteo right, good catch. – Tim Mar 23 '22 at 11:49
2

Distributions are statistical models, therefore you can quantify the evidence in a way that is semantically clearer using Akaike weights or evidence ratios (see section 2.9 and 2.10 in the book by Burnham and Anderson).

In you case, say $\Delta_G = AIC(G)−AIC_{\text{min}}=17.9$ (where $AIC_{\text{min}}$ is the smallest AIC among the set of models considered) we have that $$ \exp \left( { - \frac{1}{2}{\Delta _G}} \right)=0.0001297372 $$
can be understood as the 'unnormalized' likelihood of $G$. The unnormalized likelihood of $F$ insted is $1$, since $AIC(F)−AIC_{\text{min}}=0$.

Thus you can say that the model $F$ is $\frac{1}{0.0001297372}=7707.892$ times more likely to be the best approximating model in a K-L sense; this would be the evidence ratio for model $F$ (see page 78 in Burnham and Anderson). (One way to think about this is that this is the model more likely to make better predictions of new data.)

matteo
  • 3,203