Does it make sense to regularize the loss function for binary/multi-class classification?

Question

When discussing linear regression it is well known that you can add regularization terms, such as,

$$\lambda \|w\|^2 \quad \text{(Tikhonov regularization)}$$

to the empirical error/loss function.

However, regularization seems to be under-discussed when it comes to binary/multi-class training.

For example, I've browsed through hundreds of code examples online for CNN training and not one has included a regularization term to the cross-entropy loss function.

This makes me wonder a couple of things:

does adding regularization to the loss functions for binary/multi-class classification training make sense?
if so, what type of regularization makes sense and why?
if not, why not?

Hope someone can answer.

Of possible interest: https://cs231n.github.io/neural-networks-2/#reg — Dave, Oct 12 '20 at 10:16

score 6 · Accepted Answer · answered Oct 12 '20 at 09:31

Depending on what you are trying to do with your CNN, regularization may indeed make sense. Pruning your network by regularization to make it sparse has two main advantages:

It simplifies the network, making training and computation faster and easier;
It prevents overfitting, and allows to make sure your network will generalize well on new data.

An intuitive way to reach these objectives is to perform $L_0$ regularization, which penalizes parameters than are not strictly equal to 0. This induces sparsity in the network. This procedure is described in the following paper : https://arxiv.org/abs/1712.01312

The authors also discuss other kinds of regularization (namely $L_1$ regularization).

Does it make sense to regularize the loss function for binary/multi-class classification?

1 Answers1

Linked