0

The objective function of Lasso is often presented by:
$ f(\boldsymbol{b}) = \color{red}{\frac{1}{2}} (\boldsymbol{y} - \boldsymbol{Xb})^T (\boldsymbol{y} - \boldsymbol{Xb}) + \lambda ||\boldsymbol{b}||_1$

I understand that scaling the residual sum of squares by half doesn't affect the LASSO estimate. But why to write it like this? In "The Elements of Statistical Learning" (p. 93) , it's explained that scaling was done out of "convenience" without any further explanation.

  • 2
    The $1/2$ will be cancelled by $2$ generated by taking partial derivative of $f(b)$ with respect to $b$. – Zhanxiong Jan 17 '18 at 18:12

1 Answers1

1

In some sense, this is a mathematics version of code golf. You get to save ONE character when writing later math using the $\frac{1}{2}$ objective function.

Motivating example:

Consider two optimization problems A and B:

  • Optimization problem A

\begin{equation} \begin{array}{*2{>{\displaystyle}r}} \mbox{minimize (over $x$)} & x^2 \\ \end{array} \end{equation}

  • Optimization problem B: \begin{equation} \begin{array}{*2{>{\displaystyle}r}} \mbox{minimize (over $x$)} & \frac{1}{2}x^2 \\ \end{array} \end{equation}

Problems A and B have the same solution since any increasing, monotonic transformation of the objective function will have the same solution.

The only difference is the purely aesthetic one that the first order condition for Problem A is $2x = 0$ while the first order condition for Problem B is $x = 0$. The latter has one less character.

Matthew Gunn
  • 22,329