I'm not sure why you need to multiply by $\frac1{2m}$ in the beginning. I understand that you would have to divide the whole sum by $\frac1{m}$, but why do we have to multiply $m$ by two?
Is it because we have two $\theta$ here in the example?
I'm not sure why you need to multiply by $\frac1{2m}$ in the beginning. I understand that you would have to divide the whole sum by $\frac1{m}$, but why do we have to multiply $m$ by two?
Is it because we have two $\theta$ here in the example?
It is simple. It is because when you take the derivative of the cost function, that is used in updating the parameters during gradient descent, that $2$ in the power get cancelled with the $\frac{1}{2}$ multiplier, thus the derivation is cleaner. These techniques are or somewhat similar are widely used in math in order "To make the derivations mathematically more convenient". You can simply remove the multiplier, see here for example, and expect the same result.
But doesn't this affect the result at all?
– Marton Langa May 18 '19 at 14:11It makes the math easier to handle. Adding a half or not doesn't actually matter since minimizing is unaffected by constants.
1/(2m)*loss**2derivative becomes1/m*lossbecause2/2m=1/m- no further division of quantity of sample by halfs or parts – JeeyCi Feb 22 '24 at 06:55