8

When discussing linear regression it is well known that you can add regularization terms, such as,

$$\lambda \|w\|^2 \quad \text{(Tikhonov regularization)}$$

to the empirical error/loss function.

However, regularization seems to be under-discussed when it comes to binary/multi-class training.

For example, I've browsed through hundreds of code examples online for CNN training and not one has included a regularization term to the cross-entropy loss function.

This makes me wonder a couple of things:

  1. does adding regularization to the loss functions for binary/multi-class classification training make sense?

  2. if so, what type of regularization makes sense and why?

  3. if not, why not?

Hope someone can answer.

1 Answers1

6

Depending on what you are trying to do with your CNN, regularization may indeed make sense. Pruning your network by regularization to make it sparse has two main advantages:

  • It simplifies the network, making training and computation faster and easier;
  • It prevents overfitting, and allows to make sure your network will generalize well on new data.

An intuitive way to reach these objectives is to perform $L_0$ regularization, which penalizes parameters than are not strictly equal to 0. This induces sparsity in the network. This procedure is described in the following paper : https://arxiv.org/abs/1712.01312

The authors also discuss other kinds of regularization (namely $L_1$ regularization).

Camille Gontier
  • 2,616
  • 6
  • 13