Maybe some detailed explanation about the excellent answer of @Frank Harrell
The test statistic of a Likelihood-ratio test (LRT) is defined as (Wikipedia)
$$
\lambda_{\text{LR}} = -2(\ell_0 - \ell_A)
$$
where $\ell_i$ is the log likelihood of model $i$. Under $H_0$
$$
\lambda_{\text{LR}} \overset{a}{\sim} \chi^2_q.
$$
The AIC is defined as (Wikipedia)
$$
\text{AIC} = 2k - 2\ell
$$
where $k$ is the number of estimated parameters and $\ell$ is the log likelihood.
The difference in AIC between the two models (let's say Model $0$ and Model $A$ where the difference in the number of free parameters is $q$) is given by
\begin{align*}
\Delta\text{AIC} &= 2k_0 - 2\ell_0 - (2k_A - 2\ell_A) \\
&= -2q - 2(\ell_0 - \ell_A).
\end{align*}
Therefore,
$$
\Delta\text{AIC} + 2q = \underbrace{-2(\ell_0 - \ell_A)}_{\lambda_{\text{LR}}}.
$$
This shows a direct association between AIC and LRT.
- The AIC of both models will be equal if $\lambda_{\text{LR}} = 2q$
- The AIC of the null model will be smaller if $\lambda_{\text{LR}} < 2q$
- The AIC of the alternative model will be smaller if $\lambda_{\text{LR}} > 2q$
If we select models by AIC we implicitly apply a LRT and check if the $\lambda_{\text{LR}}$ is larger or smaller then $2q$. The $2q$ threshold corresponds to a specific p-value of the LRT which can be calculated in R using pchisq(2q,df=q,lower.tail=FALSE). In the following you find a table with lists some p-values for different values of $q$.
$$
\begin{array}{rrr}
\hline
q & \lambda_{\text{LR}} & p\text{-value} \\
\hline
1 & 2 & 0.157 \\
2 & 4 & 0.135 \\
3 & 6 & 0.112 \\
5 & 10 & 0.075 \\
10 & 20 & 0.029 \\
20 & 40 & 0.005 \\
\end{array}
$$
For example, selecting between two models based on AIC where the nested model has 3 parameters constrained compared to the alternative one is equivalent to making a LRT and rejecting the null model at a significance level of $0.112$.
A similar association can be made between LRT and BIC which is defined as (Wikipedia)
$$
\text{BIC} = \log(n)k - 2\ell
$$
where $n$ is the number of observations, $k$ is the number of estimated parameters, and $\ell$ is the log likelihood. Using the same approach as above we see that
$$
\Delta\text{BIC} + \log(n)q = \underbrace{-2(\ell_0 - \ell_A)}_{\lambda_{\text{LR}}}.
$$
In the following you find a table which lists corresponding p-values of the LRT for different values of $n$ and $q$
$$
\begin{array}{rrrr}
\hline
n & q & \lambda_{\text{LR}} & p\text{-value} \\
\hline
10 & 1 & 2.30 & 0.1292 \\
10 & 2 & 4.61 & 0.1000 \\
10 & 3 & 6.91 & 0.0749 \\
100 & 1 & 4.61 & 0.0319 \\
100 & 2 & 9.21 & 0.0100 \\
100 & 3 & 13.82 & 0.0032 \\
\end{array}
$$