7

This article presents the following formulation of the best subset selection problem

$$\min_{\|\beta\|_0\leq k}\frac{1}{2}\|y-X\beta\|^2_2$$

I wonder where the $1/2$ comes from. Help appreciated.

k88074
  • 1,661
  • 8
  • 18

1 Answers1

9

The factor of $\frac{1}{2}$ is just for convenience, so that the Hessian of the objective does not have a factor of two.

The argmin (optimal value of $\beta$) is the same, whether or not the factor $\frac{1}{2}$ is included.

Note that rather than $\frac{1}{2}$, some "authors" use $\frac{1}{2n}$, This also does not affect the argmin.

Mark L. Stone
  • 13,392
  • 1
  • 31
  • 67
  • 4
    Well, the square on the norm is also optional in same sense as the 0.5. Needed if you want a QP though. – ErlingMOSEK Jan 15 '20 at 18:28
  • 1
    @ ErlingMOSEK Yes that is true. And as a true conic man, I know you appreciate the potential advantages, including not squaring the condition number. Nevertheless, least squares, and to a large extent, nonlinear least squares, is the genesis for the 1/2. – Mark L. Stone Jan 15 '20 at 18:49
  • Thanks @MarkL.Stone, could you please clarify in more details so that the Hessian of the objective does not have a factor of two (the rest is rather clear) – k88074 Jan 16 '20 at 08:28
  • @k88074 Without the factor of 1/2 the Hessian of the objective would be $2X^TX$. With the factor of 1/2, the Hessian is $X^TX$. – Mark L. Stone Jan 16 '20 at 12:50
  • I the following reasoning correct? If want to minimize the objective (forget the constraint), I set the first derivative of the objective to zero, that is $-2Xy+2X^\top~X\beta=0$. Why does the $2X^\top~X$ bother us? We still get the same $\beta$ – k88074 Jan 18 '20 at 08:00
  • Yes, you get the same optimal $\beta$. So it really doesn't matter. Someone just liked having Hessian without the factor of 2. That's all. I think the origin for the people who do it this way is Quadratic Programming. – Mark L. Stone Jan 18 '20 at 13:16