I understand that for multi-class classification the correct loss to use is categorical cross-entropy. However, when performing mixup as a regularisation technique two samples $(X_1, y_1)$ and $(X_2, y_2)$ are combined to create a new sample such that $(X_{new}, y_{new}) = \lambda(X_1, y_1) + (1-\lambda)(X_2, y_2)$, which effectively gives the new sample two labels with different weights.
My question is should I be using categorical cross-entropy because we are classifying non-mixed samples during evaluation, or should I be using binary cross-entropy because the training has effectively become a multi-label classification problem?
Edit: Just to clarify this is a multi-class classification problem where all 100 classes are mutually exclusive, however during training mixup can cause a sample to be labelled with 2 classes where class $i$ has label weight $\lambda$ and class $j$ has label weight $1 -\lambda$. The two losses I am comparing are specifically keras.losses.BinaryCrossentropy and keras.losses.CategoricalCrossentropy. During evaluation, samples can only be labelled with one class.
keras.losses.BinaryCrossentropyis for the case of 2 classes ("Use this cross-entropy loss for binary (0 or 1) classification applications.") but you have 100. The documentation forkeras.losses.CategoricalCrossentropysays "Use this crossentropy loss function when there are two or more label classes." Does this answer your question? – Sycorax Jun 29 '21 at 21:55