10

Let's say we have a binary classification task, but our dataset contains more fine grained values of how much an examples belongs to the class or not. So the labels are real numbers in $\left[0,1\right]$. I can see two ways to make use of this additional information:

  • Approach this as a classification problem and use the cross entropy loss, but just have non-binary labels. This would basically mean, we interpret the soft labels are a confidence in the label that the model might pick up during learning.

  • Frame this as a regression problem, where we want to predict the exact amount of how much an example belongs to the class. In this case, we would use a regression loss like MSE or Huber loss.

What is the difference between the two approaches? How do I decide between them?

danijar
  • 990
  • 2
    This isn't part of the question, but you could use a logistic regression, which makes use of both the target and the proportions. – Firebug Aug 27 '16 at 00:59
  • 1
    One approach is to use quasi-likelihood, where you would essentially just be doing ordinary logistic regression only on a continuous response. To fit this model you would use quasi or quasibinomial as the family argument in R's glm function. – dsaxton Aug 27 '16 at 02:18
  • @dainjar so what's been your best working shot at it so far? how did the cross-entropy approach work for your case? – matanox Mar 17 '18 at 11:34

1 Answers1

1

In this answer, cross-entropy is exposed as a possibility, since it does not assume the underlying distributions are discrete.

Another possibility is the Beta distribution (a continuous distribution with support $\in [0,1]$, which is often elicited in problems involving the Bernoulli distribution (it often appears in conjugate priors).

A third possibility is the Continuous Bernoulli, which was created with the exact purpose of correcting the likelihood function in Variation Autoencoders applied to continuous data bounded between $[0,1]$ (for example, normalized pixel intensity values). Cross-entropy, while a valid loss function, does not lead to a proper likelihood in this context, requiring this continuity correction.

Firebug
  • 19,076
  • 6
  • 77
  • 139