2

I was reading a paper and came across this sentence:

"ML methods perform well by employing regularization to reduce variance and trading off regularization bias with overfitting in practice.:

I am wondering if anyone would have any insight into this sentence. I understand that regularization involves shrinkage of parameters, but I fail to see how shrinkage reduces variance and what it means to trade off the bias? Does anyone have any further insights here? Thanks.

user321627
  • 4,474

1 Answers1

1

The variance referred to here means the variance between different models. Like eg if you fit 5 different nets, from scratch, with different initialization weights, you'll get 5 sets of results on the final test data, or on the dev/validation data. The stronger the regularization, the more similar the results will be, at the expense of fitting the training data less well.

For example, if you use L2 regularization, at the extreme of infinite L2 regularization, the weights will be 0, the output will be 0 for all inputs, and the variance will be zero.

Hugh Perkins
  • 4,697