The claim is that, for a regression task, the conditional regression function $f(x) = \mathbb{E}[\mathbf{Y}|\mathbf{X}=x]$ minimizes the L2 loss $\arg\min(\mathbb{E}[\mathbf{Y} - f(\mathbf{X})]^2)$. I can see why it's true for a normal distribution. But why is it true in general?
1 Answers
To find $\arg\min(\mathbb{E}[\mathbf{Y} - f(\mathbf{X}|\theta)]^2)$ you get the first order condition $$\frac \partial {\partial \theta} \mathbb{E}[\mathbf{Y} - f(\mathbf{X}|\theta)]^2=0$$ The differentiation is with respect to the parameters $\theta$ of your model $f(x|\theta)$. Proceed with differentiationg: $$ \mathbb{E}\left[\frac \partial {\partial \theta}(\mathbf{Y} - f(\mathbf{X}|\theta))^2\right]= \mathbb{E}\left[-2(\mathbf{Y} - f(\mathbf{X}|\theta))\frac \partial {\partial \theta}f(\mathbf{X}|\theta)\right]= $$ $$\mathbb{E}\left[(\mathbf{Y} - f(\mathbf{X}|\theta))\right](-2\frac \partial {\partial \theta}f(\mathbf{X}|\theta) ) $$ This holds when the following condition holds: $$\mathbb{E}\left[(\mathbf{Y} - f(\mathbf{X}|\theta))\right]=0$$ $$\mathbb{E}\left[\mathbf{Y}\right]= f(\mathbf{X}|\theta)$$
- 61,310
-
Why were you able to pull out $-2\frac \partial {\partial \theta}f(\mathbf{X}|\theta)$? – Ryker Feb 04 '19 at 18:57
-
@Ryker, if you have your function constant for any input data X then it's a trivial solution with all coefficients zero, no? – Aksakal Feb 04 '19 at 21:35
-
Which function is constant, the derivative? – Ryker Feb 05 '19 at 01:33