Does it make sense that the loss function for traning and evaluaton is different?

Question

Huber loss function is widely used, because it combines the good properties of squared and absolute losses. Therefore, when I apply the penalized regressions, i.e. LASSO, Elastic net and Ridge, to make predictions, the Huber loss is used to tune the hyperparameter by cross validation method in training process, and the MAE or MSE is applied for evaluation in validation and test stages. In Kolassa(2020), the author calims that it makes no sense that a model to be fitted by minimizing the in-sample MSE, but for holdout forecasts to be evaluated using the MAPE. (see the third point in Section 4 "Takeways").

So my main question is - Does it make sense that I use Huber loss for training, but use other measures, such as MAE and MSE, to evaluate the forecasts?

Might this be helpful? https://stats.stackexchange.com/questions/470626/why-is-using-squared-error-the-standard-when-absolute-error-is-more-relevant-to/470786#470786 — Richard Hardy, Mar 22 '22 at 07:43
@RichardHardy, thanks for your comment. This link provides explict clarification for estimation loss and prediction loss. What I concern is the mismatch of loss function in training and evaluation process. About this question, I have looked through a lot of answers in the StackExchange website, but related answers are not that consistent. — John Williams, Mar 22 '22 at 08:55
According this link: https://stats.stackexchange.com/a/518526/351802. The answerer thinks the measure for for fitting has nothing to do with the test measure. — John Williams, Mar 22 '22 at 09:01
It would be great to disambiguate as much as possible. Estimation loss and training loss are used as synonyms in my link. How does your definition of training loss differ from estimation loss? Are you perhaps referring to evolution loss in the validation phase vs. evaluation loss in the test phase? — Richard Hardy, Mar 22 '22 at 10:04
I suggest evaluation loss in the validation stage and evaluation loss in the test stage, I think these terms would be in line with most of the standard terminology. Regardless of which terminology you choose, consider editing your post accordingly. To answer your question briefly, I do not think it makes sense to use different losses in the two stages, so I guess I am in line with Kolassa (2020). But what motivates your question? What example do you have where this seems questionable or wrong? — Richard Hardy, Mar 22 '22 at 11:27
For cross-validation in regression models, I think there should be three loss funcitons, the training loss (which I think equivalent to the estimation loss as your comment) in the tuning process, the evaluation loss in the validation stage and the evaluation loss for holdout forecasts (may be the test stage in your comment) . — John Williams, Mar 23 '22 at 03:31
I am still confused even after the edit. Cross validation is the validation stage. I have a hard time parsing phrases like to tune the hyperparameter by cross validation method in training process and the training loss (which I think equivalent to the estimation loss as your comment) in the tuning process. — Richard Hardy, Mar 23 '22 at 08:43
It is also hard for me to express exactly... What I mean is for the cross-validation proceduce, like a 8-fold cross-validation , 7 subgroups are randomly used as training samples to estimate parameters, and the Huber loss is used to calculate training loss. Other one subgroup is used as validation subsample, to calculate evaluation loss (i.e. MSE). After compeleting the cross-validation proceduce, the hyperparameter is selected according to the minimum evaluation loss in validation stage. Using the selected hyperparameter, holdout forecasts are generated, and evaluated by loss (i.e. MSE). — John Williams, Mar 23 '22 at 09:28
OK, so we have (i) training/estimation loss (Huber), (ii) evaluation loss in the validation stage (MSE) and (iii) evaluation loss in the test stage (MSE). I maintain that it makes little sense to choose different loss function for (ii) and (iii) but (i) can be different. A classical example is estimation of median of a normal distribution (corresponding to absolute loss) where the efficient estimator is the sample mean (square training loss), not the sample median (absolute training loss). Does that make sense? Does that contradict Kolassa's paper? — Richard Hardy, Mar 23 '22 at 09:34
I tryed my best to express it clearly....So if according to last comment, the evaluation loss in validation stage and test stage (evaluate holdout forecasts) is the same, i.e. MSE, and the training loss is Huber loss. Then, in terms of the loss function, whether this situation makes sense? — John Williams, Mar 23 '22 at 09:34
OK, I guess I grasp your ideas. Thank you a lot! I will find out more about this question. — John Williams, Mar 23 '22 at 09:39
I have now checked the takeaway point #3 and do not think it is valid more generally than stated. (It may still be valid for MAPE which it focuses on.) Even so, I do not think one can claim (i) and (iii) must be equal. As I mentioned, there exist classical examples showing they need not be equal. — Richard Hardy, Mar 23 '22 at 10:07
Thank you for the detailed explanation very much. So you mean that the (ii) and (iii) should be the same, but the (i) not necessarily to be same with (ii) and (iii) ? Then, how should I understand the point#3 in Kolassa(2020), I don't quite understand your phrase "I have now checked the takeaway point #3 and do not think it is valid more generally than stated. (It may still be valid for MAPE which it focuses on.) " — John Williams, Mar 23 '22 at 13:41
I do not have time to read Kolassa (2020) in detail. What he says may well be right, but he is talking specifically about MSE vs. MAPE. However, it is not true that estimation loss must equal evaluation loss. If they were forced to be equal, inefficient estimators would have to be used, and forecast accuracy would be lower than without this restriction under the same fixed evaluation loss function. — Richard Hardy, Mar 23 '22 at 14:20
ok, Thanks very much for your detaild and insightgul comment. Your answers really help resolve my confusion. — John Williams, Mar 23 '22 at 15:30

Does it make sense that the loss function for traning and evaluaton is different?

0 Answers0