How can I compare the effectiveness of a regression model for different datasets

Question

I have a multiple linear model that works on different datasets. suppose that the first dataset produces y in range of [1,100] and the second one in range of [1, 1000].

I can't simply compare the MAE for the two datasets. If MAE for the first one is 2 and for the second one is 20, I'd say the model is consistent, but I could not find a scientific way to show this:

There is no such thing as a Normalised MAE. I can consider NRMSE using RMSE / (ymax - ymin), but I was wondering if there are any better ways to compare the effectiveness of the same model on different datasets? I am also aware of MAPE and MASE. Just wondering what is the best practice in reporting a scale-independent forecast error metric.

I am interested in the theory: which one of these work for my case: NRMSE, MAPE or MASE? I'm also using Python.

I have seen examples that claim "MAPE puts a heavier penalty on negative errors than on positive errors", but there is also sMAPE as a solution, but I fear sMAPE is not as common as something like NRMSE. So, again, I would like to choose the most common metric that is familiar to someone who is not necessarily a statistician. — towi_parallelism, Aug 06 '19 at 12:58
I believe, "MAPE DOES NOT put a heavier penalty on negative errors". Actually the famous example in Makridakis (1993) paper is wrong. It is actually discussed in another post: https://stackoverflow.com/questions/30064841/understanding-forecast-accuracy-mape-wmape-wape So, there is this arguments around MAPE that made think twice. Any further input on that? — towi_parallelism, Aug 06 '19 at 13:11
Have a look at https://stats.stackexchange.com/questions/299712/what-are-the-shortcomings-of-the-mean-absolute-percentage-error-mape/299713#299713 — user2974951, Aug 06 '19 at 13:30
Thanks, I read that. I think none of the problems apply to my case though. Also, in terms of, NRMSE, shall I use the min and max of the whole data? or should it be "Split, scale your training data, then use the scaling from your training data on the testing data." : as described in https://stackoverflow.com/questions/43302871/do-you-apply-min-max-scaling-separately-on-training-and-test-data — towi_parallelism, Aug 06 '19 at 14:12

How can I compare the effectiveness of a regression model for different datasets

0 Answers0