2

I've been looking over http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/ and trying to get a deeper understanding of a stable vs unstable solution for L1 vs L2.

Seemingly they show L1 generally has a larger slope which makes it less "stable." The piece I'm missing is why L1 would generally have a larger slope. Is this true or perhaps am I missing something?

gitness
  • 23
  • The webpage you linked to compares the l1 and l2 norms for two applications: as a loss function, and for regularization. Which of these applications are you asking about? – user20160 Aug 12 '19 at 23:31

1 Answers1

1

I think you mean that you would like to know why the slope for the $L_1$ regularized loss function is (allegedly) steeper than the slope for the $L_2$ regularized loss function. Here I am referring to a $L_R$-regularization loss function $Loss(data,\omega, p, L) = your$-$favorite$-$unregularized$-$loss(data,\omega) + q ||\omega||_R$, where $\omega \in \mathbb{R}^p$ are the $p$ parameters $\omega_1\dots\omega_p$ for the $p$ predictors, $q \in \mathbb{R}$ is the hyper parameter associated with the R-norm $||\cdot||_R$. Some points.

  1. The $L_1$ slope isn't larger, in general, but it is larger when you are within a critical distance in $\omega$ space from the optimium $\omega_*$. The $L_1$ slope $dLoss/d||\omega||_1=q$, so it is constant everywhere, while the $L_2$ slope $dLoss/d||\omega||_2=2q||\omega-\omega_*||$, which is linear in $||\omega-\omega_*||$ and so approaches zero as $\omega \rightarrow w_*$. This means that $L_1$ slope is smaller than the $L_2$ slope far from the minimum and larger close to the minimum.

  2. Because $L_1$ slope is constant, algorithms designed to minimize it cannot help but "overshoot" the minimum. One might call this "unstable" but it isn't hard to fix that instability by changing step size when overshooting is detected $-$ same as with step size management when minimizing $L_2$ but not quite as efficient (since you can see the minimum coming with $L_2$ and you can't until you've passed it with $L_1$.)

The "instability" is not particular concern in practice when you are using a well-tested library. If you are writing your own code, then yes, it is something to worry about.

Peter Leopold
  • 2,234
  • 1
  • 10
  • 23