Why do we have to divide by 2 in the ML squared error cost function?

Question

I'm not sure why you need to multiply by $\frac1{2m}$ in the beginning. I understand that you would have to divide the whole sum by $\frac1{m}$, but why do we have to multiply $m$ by two?

Is it because we have two $\theta$ here in the example?

I think, just for the conviniency of further derivation of squared error cost: 1/(2m)*loss**2 derivative becomes 1/m*loss because 2/2m=1/m - no further division of quantity of sample by halfs or parts — JeeyCi, Feb 22 '24 at 06:55

TwinPenguins · Accepted Answer · 2019-05-18T13:58:53.507

24

It is simple. It is because when you take the derivative of the cost function, that is used in updating the parameters during gradient descent, that $2$ in the power get cancelled with the $\frac{1}{2}$ multiplier, thus the derivation is cleaner. These techniques are or somewhat similar are widely used in math in order "To make the derivations mathematically more convenient". You can simply remove the multiplier, see here for example, and expect the same result.

edited May 18 '19 at 13:58

answered May 18 '19 at 13:46

TwinPenguins

4,249
3
19
53

Thanks for your clean answer!
But doesn't this affect the result at all?
– Marton Langa May 18 '19 at 14:11
3

Not at all. It is a simple constant. After it is a convex function, and will lead to a global minima! – TwinPenguins May 18 '19 at 15:13

score 7 · Answer 2 · answered May 18 '19 at 13:43

7

It makes the math easier to handle. Adding a half or not doesn't actually matter since minimizing is unaffected by constants.

answered May 18 '19 at 13:43

Kane Chua

206
1
2

Why do we have to divide by 2 in the ML squared error cost function?

2 Answers2

Linked