What is the relationship between R² and Log Likelihood?

Question

I estimated a non-linear model using the MATLAB function @fmincon which returns me a Log-likelihood value.

I also estimate a linear model (OLS) from which I can compute the R².

Here I need to compare the goodness of fit of the two models and see whether the non-linear model is statistically different from the linear model. I am thus wondering if there is a formula to express R² as a function of Log-Likelihood or the other way around.

There is a simple relationship for certain kinds of nonlinear models: they are the ones fit using ordinary least squares. With other fitting methods, such as maximum likelihood, if there is a relationship it usually is only an approximate asymptotic one. What exactly is your model? — whuber, Feb 26 '13 at 15:13
I estimate a Markov-Switching model on fiscal policy in the United States and thus allow for structural breaks in my series (I break my regression in two parts : the one pertaining to, say, regime 1 and then the one pertaining to regime 2.) I fit my series with the coefficients of regime 1 when the model identifies regime 1 and so forth. I end up with a value of Log-Likelihood which I would like to compare to a simple R² for the linear case (when I do not allow for switches) and see whether the difference is large or not. I doubt that comparing corr(y,y_hat)^2 for the two models is enough. — Olivier Hubert, Feb 27 '13 at 16:50

score 1 · Answer 1 · answered Mar 25 '23 at 21:42

It depends on how this log-likelihood value is calculated. When you assume a Gaussian error distribution, maximizing log-likelihood is equivalent to minimizing the sum of squared residuals. Since maximization does not depend on constants out front, it is not clear to me how exactly this log-likelihood is calculated. Consquently, I would not use that value. I would directly calculate the sum of the squared residuals: $\overset{N}{\underset{i=1}{\sum}}\left( y_i-\hat y_i \right)^2$. If you want to normalize this to get some kind of $R^2$, you can calculate $R^2=1-\left(\dfrac{ \overset{N}{\underset{i=1}{\sum}}\left( y_i-\hat y_i \right)^2 }{ \overset{N}{\underset{i=1}{\sum}}\left( y_i-\bar y \right)^2 }\right)$.

If you apply this formula to your linear model, you will get the same value of $R^2$ are your software returns (unless you lack an intercept in the linear model). Likewise, you can use this equation to convert your model $R^2$ to a sum of squared residuals by calculating the denominator term (though I would expect it to be easier just to calculate the sum of squared residuals).

As far as determining if there is a statistical difference between the two models, first, consider what exactly you mean. Especially if sample sizes are large, statistics can detect very small differences that might not be of interest. (My comments here about the Princess and the Pea fairy tale concern such a situation.) However, if you want to calculate some statistics about differences in model performance, Benavoli et al. (2017) give some standard ways of doing such a comparison and also make a strong argument for why their proposed approach is superior. Even if you do not buy their argument, the paper at least goes through more standard approaches.

Benavoli et al. (2017) deal with classification accuracy, rather than regression metrics, but I see no reason why their proposed or referenced approachs could not apply to $R^2$ or the sum of squared residuals.

REFERENCE

Benavoli, Alessio, et al. "Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis." The Journal of Machine Learning Research 18.1 (2017): 2653-2688.

What is the relationship between R² and Log Likelihood?

1 Answers1