Calculating error metrics on log10(y) bayesian ridge regression model. Why does model perform better when trained on log10(y)?

Question

I am using scikit learn's Bayesian ridge regression model and am training my model on log10(y), exponentiating (10 ** y_i) my predictions back to their original value, then calculating my error metrics to assess my model's performance. The error metrics I use are R^2 and MAE.

I chose to do this because (1) I have a log tail in my dependent variable's distribution and logging it makes the distribution closer to normal and (2) there is an increase in the standard deviation of my residual plots as with y.

However, I am not sure why logging my dependent variable, training my model on the log10(y), then exponentiating my predictions back seems to result in slightly increased model performance. Does anyone have an explanation for this?

My first intuition is that linear models perform better when the dependent variable is normally distributed and, thus, logging the dependent variable to make it normal results in better training.... but I am not sure if this is right.

P.S: Please let me know if I need to make this question clearer

score 0 · Accepted Answer · answered Dec 04 '22 at 05:40

Let us assume your target variable is conditionally normally distributed on the log scale (equivalent to residuals being normal). Then your point predictions on the log scale are the expectation and the median (which coincide for the normal distribution), again on the log scale.

Exponentiating the log-expectation and the residuals does not result in a normal distribution on the original scale, but in a lognormal distribution. For this distribution, the expectation and the median are no longer the same, the median is lower than the expectation. The exponentiation turns your point predictions on the log scale into median predictions on the original scale. They are no longer expectation predictions here. (If you want an expectation forecast, see here under "bias adjustments" for $\lambda=0$.)

The final piece of the puzzle is that the MAE is minimized precisely by the median, not the expectation. This is why your exponentiated predictions (which aim at the conditional median on the original scale) perform better under MAE than modeling directly on the original scale, because (presumably and almost certainly) your model on the original scale targets the expectation and not the median.

Comparing the two predictions does not make sense, because they aim at minimizing two different loss functions: modeling on the original scale (presumably and almost certainly) aims at an unbiased expectation prediction, which minimizes the MSE, while modeling on the log scale and exponentiating aims at a median forecast, which minimizes the MAE. I have written a little paper on this (Kolassa, 2020, IJF).

If you want an expectation forecast, you should do a bias-adjusted back-transformation (see the link above) and assess both models using the MSE. If you want a median forecast, you can go the route you took, and/or model on the original scale using the MAE as a loss function, and in any case evaluate the predictions using the MAE. Per this argument, expanded in the paper, it does not make sense to evaluate a forecast using multiple different error metrics.

There are illustrations of how optimal forecasts differ by error metric for different conditional distributions here and here.

Hi, Stephan, thank you! Wow, this makes sense. I am essentially making median predictions when I log and exponentiate back. And thank for for letting my know about the comparison, I do want to minimize MAE and, thus, I would rather a median forecast over an expectation forecast. — lambdaChops, Dec 04 '22 at 19:25

Calculating error metrics on log10(y) bayesian ridge regression model. Why does model perform better when trained on log10(y)?

1 Answers1