How do I make use of "soft" labels in binary classification?

Question

Let's say we have a binary classification task, but our dataset contains more fine grained values of how much an examples belongs to the class or not. So the labels are real numbers in $\left[0,1\right]$. I can see two ways to make use of this additional information:

Approach this as a classification problem and use the cross entropy loss, but just have non-binary labels. This would basically mean, we interpret the soft labels are a confidence in the label that the model might pick up during learning.
Frame this as a regression problem, where we want to predict the exact amount of how much an example belongs to the class. In this case, we would use a regression loss like MSE or Huber loss.

What is the difference between the two approaches? How do I decide between them?

This isn't part of the question, but you could use a logistic regression, which makes use of both the target and the proportions. — Firebug, Aug 27 '16 at 00:59
One approach is to use quasi-likelihood, where you would essentially just be doing ordinary logistic regression only on a continuous response. To fit this model you would use quasi or quasibinomial as the family argument in R's glm function. — dsaxton, Aug 27 '16 at 02:18
@dainjar so what's been your best working shot at it so far? how did the cross-entropy approach work for your case? — matanox, Mar 17 '18 at 11:34

score 1 · Answer 1 · answered Jun 22 '23 at 07:13

In this answer, cross-entropy is exposed as a possibility, since it does not assume the underlying distributions are discrete.

Another possibility is the Beta distribution (a continuous distribution with support $\in [0,1]$, which is often elicited in problems involving the Bernoulli distribution (it often appears in conjugate priors).

A third possibility is the Continuous Bernoulli, which was created with the exact purpose of correcting the likelihood function in Variation Autoencoders applied to continuous data bounded between $[0,1]$ (for example, normalized pixel intensity values). Cross-entropy, while a valid loss function, does not lead to a proper likelihood in this context, requiring this continuity correction.

How do I make use of "soft" labels in binary classification?

1 Answers1