2

I am doing analysis on a data set regarding tree volume prediction. I'm using regularized least squares as my prediction model and I'm using RMSE and cross-validation to evaluate my model.

Currently, I have simply used cross-validation for selecting the model parameters and RMSE for evaluating the performance of the model, i.e. I have calculated the predictions $\hat{y}$ of my model and compared them with the true values $y$ using RMSE.

Before calculating the RMSE-value I did not transform the predictions $\hat{y}$ in any way. What I mean by this is that if my model gave negative predictions $\hat{y}$, I simply plugged these negative values into RMSE-formula even though negative values don't make sense for tree volume.

My colleague told me to first transform negative predictions to 0 and then evaluate the model.

My question is:

is the transformation of negative predictions in this case allowed before calculating the RMSE-value?

This transformation bugged me for some reason, because it seemed to me that I'm evaluating a different model than which I trained. Is my concern valid?

jjepsuomi
  • 5,807

1 Answers1

3

It is perfectly valid

Simple reason: you are using an algorithm which does something and outputs something. The only reason why you trust it is because you are using a metric to test the algorithm. It is completely up to you to implement your own algorithm. If it outperforms the previous one, take the new one.

And by "your own algorithm" I mean "the old algorithm with all weights below zero set to zero". You just add another "rule" to the algorithm. No problem with that.

To broaden the view: Did you already consider to use other algorithms? They may outperform your current one. And did you do the other things right? Enough data, not overfitting etc.? Because negative predictions hint to something wrong with your algorithm.

To summarize: You take the best performing algorithm. What the algorithm looks like (so if you add another transformation to a pre-existing "black box") is up to you. Just use cross-validation (plus a hold-out set). But may consider looking at other algorithms as they are probably better then your current with additional transformation. Just testing will give you the answer.

Mayou36
  • 1,137
  • Hello @Mayou36 and thank you, appreciate it. I think I got it now, but I'm not perfectly satisfied, don't know why this simple thing is bugging me x) It somehow seemed like a violation to me because I'm falsifying the functional relationship my model found between $x$ and $y$ by altering the predictions. If I give some input $x$ to my model and the output is negative $y<0$, then it is negative, period, that is the relationship my model found and I can not try to falsify this. This is how it seemed strange to me :) – jjepsuomi May 16 '17 at 11:34
  • I think my confusion was that I was not thinking this second transformation (i.e. negatives to 0) as part of my learning model. – jjepsuomi May 16 '17 at 11:38
  • What are you evaluating it for? For prediction? Then pretty much anything goes, as long as your model has good predictive power on the test set. But if you are interested in specifically evaluating the original model for whatever reason, then you are right, you shouldn't tamper with it as that would give you a different model. – rinspy May 16 '17 at 13:59
  • @rinspy thank you for your help :) Yes, that's what I was thinking about myself. If e.g. $f(x_1) = \hat{y}_1 = -5$ and $f(x_2) = \hat{y}_2 = -45$, then after the transformation we would have: $f(x_1) = f(x_2) = \hat{y}_1 = \hat{y}_2 = 0$, which seemed weird to me (information about the learned dependency relationship between $x$ and $y$ is lost). But I get it know! Thanks :) – jjepsuomi May 16 '17 at 14:18
  • @jjepsuomi: sure, if you have to predict negative values, things may break (I implicitly assumed it would not). Sure, you change the correlations your algorithm has "found", but this correlations are "wrong" anyway somehow (negative values...).
    Anyway, don't forget about the big picture! My main advice was to check other algorithms, to change your hyper-parameters and so on. And an implicit assumption I made is that you will predict the "same kind of data" as you train/test on. If you want to predict negative values, things are different, of course...
    – Mayou36 May 16 '17 at 14:50