Ideal loss for a multi-label problem with soft targets

Question

Given an input X, my goal is to predict a list of probabilities for n factors, where the factors could be attributes like feasibility, comfort, ease etc. using a neural network. I have soft targets for each of these classes that I have computed for each of these targets. The factors are not necessarily related to each other (and even if they are, an earlier part of the neural network can take care of that). What would the best loss for this situation be? Pytorch examples would be a bonus! Thanks

How have you wound up with soft targets? Is there any way to access the original data that generated those proportions? — Dave, Aug 15 '23 at 03:15
Yes. Let us say they are user ratings. Multiple users provide a binary label for each class. And I average out the ratings (i.e. no_of_ones/total_ratings) to get the soft labels — OlorinIstari, Aug 15 '23 at 03:22
So you have the original labels and the pairing information? Why do you not want to work with those data? — Dave, Aug 15 '23 at 03:24
Let us say for one input, x1, I had 3 user ratings for 2 features : feasibility : [0,1,0], comfort : [1,1,1]. I would want my NN to predict for x1 : feasibility : 0.333, comfort : 1. — OlorinIstari, Aug 15 '23 at 03:28
“Classification” models can and often do predict probabilities/proportions, not just categories. Indeed, “classification” is often a misnomer. It is not clear that you should be using the soft labels at all. It seems like you can approach this as a multi-label problem and get the model to predict probabilities. The software function to access these probabilities might be more like predict_proba instead of just predict to predict the category with the highest predicted probability. — Dave, Aug 15 '23 at 03:32

Dave · Accepted Answer · 2023-08-15T12:07:56.983

From the comments, it sounds like you have a fairly standard multi-label problem and want the model predictions to be the probabilities of class membership instead of the predicted classes. The good news is that the math already gives you this, and if you’re only getting predicted categories from your code, the path forward should be to change your code. For instance, the predict method in sklearn will give your the category with the highest predicted probability, yet predict_proba gives the raw predicted probabilities. Your software should have such a capability, and if it does not, you might want to use a different software.

Consequently, I would not use any special kind of loss function with the soft labels to predict probabilities. Rather, I would model the original binary decisions as a multi-label problem and predict the probabilities from that model.

One caveat is that neural networks tend to give predicted probabilities that are overconfident. Calibrating these predictions so they better reflect the reality of true event occurrence is possible but not always trivial. The multi-label aspect of this problem where event occurrence can be influenced by other events occurring further muddies the picture. (This may very well be an open problem.)

Ideal loss for a multi-label problem with soft targets

1 Answers1

Linked