The objective function of Lasso is often presented by:
$ f(\boldsymbol{b}) = \color{red}{\frac{1}{2}} (\boldsymbol{y} - \boldsymbol{Xb})^T (\boldsymbol{y} - \boldsymbol{Xb}) + \lambda ||\boldsymbol{b}||_1$
I understand that scaling the residual sum of squares by half doesn't affect the LASSO estimate. But why to write it like this? In "The Elements of Statistical Learning" (p. 93) , it's explained that scaling was done out of "convenience" without any further explanation.