2

Binary cross entropy is normally used in situations where the "true" result or label is one of two values (hence "binary"), typically encoded as 0 and 1.

However, the documentation for PyTorch's binary_cross_entropy function has the following:

target (Tensor) – Tensor of the same shape as input with values between 0 and 1.

(In this context "target" is the "true" result/label.)

The "between" seems rather odd. It's not "either 0 or 1, just with a real-valued type", but explicitly between. Further digging reveals this to be deliberate on the part of the PyTorch programmers. (Though I can't seem to find out why.)

Granted, given the definition of BCE $( y\log x + (1-y) \log (1-x) )$ it's certainly possible to compute things with target values that aren't strictly {0, 1}, but I'm not sure what the potential use of such a situation is.

Under what sort of situations would one potentially compute the binary cross entropy with target values which are intermediate? What would a class label of 0.75 actually mean, philosophically speaking?

R.M.
  • 1,016

1 Answers1

3

This is about something that the ML community has taken to calling “soft labels”.

Think of the original zero or one labels as a probability distribution. These place all the probability mass on one outcome or the other. By smoothing the labels, we can ascribe fractional certainty to the outcomes, and the model can fit to these smoothed values instead of 1.0 and 0.0. An observed benefit is that it avoids the saturation problems attested in the sigmoid and tanh functions.

Aside from the empirical benefit to label smoothing, sometimes you want to explicitly model the prior uncertainty in the label. If you have noisy labels for your data, or if your data are aggregations of multiple trials with different outcomes, then there is inherent uncertainty in what the correct label for a given instance is. You can interpret the number as a probability, leveraging whatever philosophical stance you take toward the meanings of those.