26

I'm not sure why you need to multiply by $\frac1{2m}$ in the beginning. I understand that you would have to divide the whole sum by $\frac1{m}$, but why do we have to multiply $m$ by two?

Is it because we have two $\theta$ here in the example?

Simon Larsson
  • 4,173
  • 1
  • 14
  • 29
Marton Langa
  • 363
  • 1
  • 3
  • 4
  • I think, just for the conviniency of further derivation of squared error cost: 1/(2m)*loss**2 derivative becomes 1/m*loss because 2/2m=1/m - no further division of quantity of sample by halfs or parts – JeeyCi Feb 22 '24 at 06:55

2 Answers2

24

It is simple. It is because when you take the derivative of the cost function, that is used in updating the parameters during gradient descent, that $2$ in the power get cancelled with the $\frac{1}{2}$ multiplier, thus the derivation is cleaner. These techniques are or somewhat similar are widely used in math in order "To make the derivations mathematically more convenient". You can simply remove the multiplier, see here for example, and expect the same result.

TwinPenguins
  • 4,249
  • 3
  • 19
  • 53
7

It makes the math easier to handle. Adding a half or not doesn't actually matter since minimizing is unaffected by constants.

Kane Chua
  • 206
  • 1
  • 2