5

I find that there are two ways to calculate AIC:

AIC = -2ln(likelihood)+ 2K and
AIC = n*ln(RSS/n)+2K

I have:

crf <- c(0.3333333, 0.5000000, 0.6666667, 0.6666667, 0.6666667, 0.8333333, 0.1666667, 0.3333333, 0.5000000, 0.5000000, 0.8333333, 0.5000000,0.6666667, 0.5000000, 0.6666667, 1.0000000)

co3 <- c(218.20, 243.84, 267.97, 286.31, 315.01, 315.01, 241.09, 242.52, 243.84, 245.04, 246.10, 284.15, 285.79, 287.31, 288.67, 289.49)

n <- length(crf)
model <- lm(crf ~ co3) #y=ax+b
aic1 <- AIC(model, k=2)
RSS <- sum(resid(model)^2)
aic2 <- n*log(RSS/n) + 2*3

I have aic1=-7.220462 and aic2= -52.62649

but if I will do like this:

aic3 <- n + n*log(2*pi) + n*log(RSS/n) + 2*3

I have aic3=-7.22046. I don't understand why, everywhere it is written that AIC = n*ln(RSS/n)+2K, what is this term n + n*log(2*pi)?

whuber
  • 322,774
Tali
  • 163

1 Answers1

10

Your first way is the correct and general definition. The second and third ways assume a Gaussian likelihood.

The second method ignores the constant so the two are not equivalent. It is ok to ignore the constant as it just shifts the AIC for all models, and so makes no difference to the relative ordering of models. Consequently some textbooks and a lot of software use the second formula. But you can't compare the AIC from the two formulas as a result.

In general, don't compare AIC values from different software packages as they often do different things with the constant. Some packages (e.g., Eviews) will also do some additional scaling.

Rob Hyndman
  • 56,782
  • Thanks for the answer @RobHyndman. Would it be possible to have a reference for the general definition? – ecjb Oct 21 '21 at 07:17
  • 1
    Burnham, K. P.; Anderson, D. R. (2002), Model Selection and Multimodel Inference: A practical information-theoretic approach (2nd ed.), Springer-Verlag. – Rob Hyndman Oct 21 '21 at 21:30
  • Dear Mr. Hyndman (@RobHyndman), I would like to ask you the following question: let's say I have a sligthly more complicated model model with $\mathbf{y}(\mathbf{x}) = a \cdot e^{b\cdot \mathbf{x}}$. There are 2 parameters to fit ($a$ and $b$). In the book you recommended (Burnham, p. 61 ed. 2002), it is written "K = the number of estimable parameters". According to this, it should be $k=2$, no? If $k = 2+1$, why do we have to add 1? What does it represent? (and $n$ is the length of vector $\mathbf{x}$ (or $\mathbf{y}$) correct?) – ecjb Feb 11 '22 at 13:55
  • Ok @RobHyndman. I posted a proper question there: https://stats.stackexchange.com/questions/564100/how-to-properly-count-the-number-k-in-the-aikaike-information-criterion-aic-in – ecjb Feb 12 '22 at 09:50