In model selection, what to do if expected prediction loss of all models is infinity?

Question

Consider choosing a model for prediction. The criterion is expected prediction loss: the lower the expected loss, the better the model. Suppose the distribution of prediction errors has relatively heavy tails for each of the models under consideration AND the loss function is such that the loss grows relatively fast w.r.t. prediction errors (e.g. quadratic loss or even exponential loss). Suppose that taken together, the distributions of prediction errors from different models and the shape of the loss function result in the expected loss being infinite.

This makes it hard to choose among models as comparing infinity with infinity is hard. Note that both the distributions of prediction errors and the loss function are beyond out control (we have a given set of models and our client has specified the loss function reflecting his/her preferences). At the same time, the logic of choosing the model based on expected loss is appealing and we would probably like to stick to it (though if there is no feasible solution, we might be willing to give it up).

How do we choose a model in this situation?

If your client gave you models and their loss function and told you to select the best model for them, then you should just tell them that according to their loss all those models are really horrible, actually infinitely horrible. And if that's really their loss, then they have bigger problems than model selection — rep_ho, Sep 10 '19 at 13:54
@rep_ho, perhaps. However, one needs to make a decision nevertheless, even if that would lead to a poor outcome. The point is to secure the outcome that is the best among the ones we are choosing from. Importantly, I think the situation above is far from being unrealistic. — Richard Hardy, Sep 10 '19 at 14:14
Might it be possible to normalize using the loss of one of the models? Then , all others would be of different order of inifinity --hence, clearly better or worse-- or of similar order up to different constants k1, k2, etc. One would in that case choose the one with the smallest k_i, provided k_i < 1, or else the "benchmark" model if all k_i greater than 1. — F. Tusell, Sep 10 '19 at 18:21
Does it make sense to use something like the median loss (or some other percentile)? Then you wouldn't be as affected by the heavy tail. — roundsquare, Sep 10 '19 at 20:36
@roundsquare, that seems like an arbitrary choice that is hard to justify. Meanwhile, expected loss is justified by decision-theoretic arguments. — Richard Hardy, Sep 11 '19 at 05:12

score 1 · Answer 1 · answered Sep 10 '19 at 19:47

You actually have two "kinds" of expectation in your expected loss, at least in your current formulation:

You take an expectation over the expected loss at a particular setting of the predictors
You take an expectation of these expected losses over the distribution of predictors

(I'm not going to go into iterated expectations, Fubini etc. here, let's keep this hand-wavy.)

An infinite expected loss can occur at either one of these two points. You could have an infinite expected loss for a particular setting of your predictors. Or the expected loss could be finite for each particular predictor setting, but your predictors could simply be so far over the place that when you integrate over your predictors, you end up with infinite losses.

So I would say the first step should be to understand how losses at each particular predictor settings behave. Are there particularly problematic such settings, with infinite loss? If so, can we control the predictors so this situation does not occur? Once we have controlled the "pointwise" expected losses, can we constrain the possible predictor settings so that our many finite expected losses do not integrate up to infinity?

And if this doesn't work, i.e., you cannot even control the predictors to achieve "pointwise" finite expected losses, then I think you have a very strange problem on your hands, and you would need to go deeper into just what is happening here. I don't think this situation could be dealt with in general.

Perhaps a simple example could help: under square loss, I am trying to predict the value of a random variable that is $t(1)$-distributed, not having any additional information. My best guess is 0, but I could add some random noise around this guess to form predictions of the different prediction models. Since the second moment of $t(1)$ is infinite, the expected loss will be infinite. Could this be dealt with? — Richard Hardy, Sep 11 '19 at 05:18
Just to note, @F.Tusell had a nice suggestion in a comment to the OP. I also hope that my last comment offers a convincing example of how a situation with infinite expected loss arises quite naturally. — Richard Hardy, Feb 11 '20 at 14:32

In model selection, what to do if expected prediction loss of all models is infinity?

1 Answers1

Linked