Why Is AIC computed with a term containing -log(SSR) instead of -SSR?

Question

Looking at https://en.wikipedia.org/wiki/Akaike_information_criterion I find the well known log likelihood $\ln\mathcal{L}(\mu,\sigma) \, = \, -\frac{n}{2}\ln(2\pi) - \frac{n}{2}\ln\sigma^2 - \frac{1}{2\sigma^2}\sum_{i=1}^n (x_i-\mu)^2$ for assuming that the errors scatter with a normal distribution around the model (with $RSS=SSR=\sum_{i=1}^n (x_i-\mu)^2$).

If $AIC =\ln\mathcal{L} +2k$ why does How can I apply Akaike Information Criterion and calculate it for Linear Regression? and e.g. statsmodels https://github.com/statsmodels/statsmodels/blob/2eac8066b068a88f00a29f8ff728b04e58248375/statsmodels/regression/linear_model.py#L595 compute AIC using log likelihood function llf=-log(SSR)+... shouldn't it be llf=-SSR+...(without a log applied to SSR)?

Does this answer your question? AIC/BIC formula wrong in James/Witten? — dipetkov, Jul 28 '22 at 08:40
Yes, to summarize the main point: $\sigma^2\approx \hat{\sigma}^2=RSS/n$ is inserted in the formua for the log likelihood of the model. Then we have the term $0.5 n \ln (\sigma)= 0.5 n \ln(RSS/n)$ left and the term $\frac{1}{\sigma^2}\sum_{i=1}^n (x_i-\mu)^2=n$. Therefore a term $ln(RSS)$remains. Thank you a lot! — Ggjj11, Jul 28 '22 at 09:00
The error variance $\sigma^2$ is a parameter. We can either: (a) assume it's known and plug in a specific value, or (b) estimate it simultaneously with the mean structure parameters, the $\beta$s. The likelihood (and hence the AIC) is different in cases (a) and (b). — dipetkov, Jul 28 '22 at 09:57

score 3 · Accepted Answer · edited Jul 28 '22 at 18:38

This question is about the Aikake Information Criterion (AIC) of a linear regression. However, the gist is to understand how we can write the likelihood $\operatorname{L}$ as a function of the residual sum of squares $\operatorname{RSS}$. Depending on whether we assume the error variance $\sigma^2$ is known or not, the likelihood contains either $-\log(\operatorname{RSS})$ or $-\operatorname{RSS}$.

The AIC is defined as $-2\log(\operatorname{L}) + 2k$ where $k$ is the number of parameters.

What are the parameters in a linear regression? The model is $Y_i = \mathbf{x}_i\boldsymbol{\beta} + \epsilon_i$ for observations $i = 1,\ldots,n$ and the errors $\epsilon_i$ are iid $\operatorname{N}(0,\sigma^2)$.

Sometimes we know the error variance $\sigma^2$ or at least have a reliable estimate of it, $\hat{\sigma_r}^2$. The parameters of the linear regression are the regression coefficients $\beta_1,\ldots,\beta_d$ and $k = d$. We can show that in this case:

$$ \log(\operatorname{L}) \propto - \operatorname{RSS} / (2\hat{\sigma_r}^2) $$

And since $\hat{\sigma_r}^2$ is known, we can ignore it together with the other constant terms.

Otherwise, we estimate the error variance together with the regression coeffcients and $k = d + 1$. We can show that in this case (with $\hat{\sigma}^2=RSS/n$):

$$ \log(\operatorname{L}) \propto -n \log(\operatorname{RSS}) $$

See the derivations in more detail here.

Why Is AIC computed with a term containing -log(SSR) instead of -SSR?

1 Answers1

Linked