MAPE vs R-squared in regression models

Question

Usually regression models are evaluated using $R^2$. I understand this metric can be misleading too at times but as far as I understand the first parameter we look at is $R^2$.

There is another parameter which is often used and it is $MAPE$. Both are functions of errors between predicted and true value. I am just wondering if there are certain cases where one should be preferred above another?

$R^2$ can become negative if being used on test data (on which the model is not built). Is this a reason to not use $R^2$ in these cases?

Can anyone give a qualitative insight on where $MAPE$ should be used and where $R^2$ should be used.

Have your read the Wikipedia article on MAPE's list of disadvantages with the metric? — Alexis, Feb 08 '18 at 05:41
@Alexis yes I have. I did not see any comparison with R^2 and more interested in comparing these two metrics. Does R^2 overcome any of the disadvantages which MAPE has? — PagMax, Feb 08 '18 at 05:54
Yes. For example, $R^{2}$ is, for a model with a single predictor, bounded by 0 and 1. Adjusted-$R^{2}$, which is a quantity one gets with more than one predictor variable breaks some of the interpretability of $R^{2}$. However, there are other measures, such as generalized $R^{2}$, which provides much of the interpretability, even for multiple predictors, and even for things like binomial-outcome link regressions (which don't have residuals in the same way as OLS regression). — Alexis, Feb 08 '18 at 06:44
Very much related: What are the shortcomings of the Mean Absolute Percentage Error (MAPE)? — Stephan Kolassa, Feb 08 '18 at 06:59

score 7 · Answer 1 · answered Feb 08 '18 at 08:39

Let's look at the definition of $R^2$: if $y_i$ are the actuals and $f_i$ the predictions, then we let $\overline{y}$ denote the mean of the actuals and $e_i = y_i-f_i$ the error. Then

$$ R^2 := 1-\frac{\sum e_i^2}{\sum (y_i-\overline{y})^2}. $$

If we want to maximize $R^2$, we note that we cannot influence the denominator in this formula. Thus, maximizing $R^2$ is equivalent to minimizing the sum of squared errors (or the Mean Squared Error, mse).

And this actually makes a lot of sense. The prediction that minimizes the expected MSE is the expected value of each $Y_i$ (the distribution from which we observe $y_i$). This is often what we want. Note that other error measures like the MAPE may be minimized by other quantities, so minimizing the MAPE may yield biased point predictions.

For the difference between the MAPE and $R^2$ (also known as MSE, as we have seen), see this earlier post: The difference between MSE and MAPE.

MAPE vs R-squared in regression models

1 Answers1

Linked

Related