Why does the conditional expectation minimize L2 loss?

Question

The claim is that, for a regression task, the conditional regression function $f(x) = \mathbb{E}[\mathbf{Y}|\mathbf{X}=x]$ minimizes the L2 loss $\arg\min(\mathbb{E}[\mathbf{Y} - f(\mathbf{X})]^2)$. I can see why it's true for a normal distribution. But why is it true in general?

score 6 · Accepted Answer · answered Mar 21 '18 at 14:26

To find $\arg\min(\mathbb{E}[\mathbf{Y} - f(\mathbf{X}|\theta)]^2)$ you get the first order condition $$\frac \partial {\partial \theta} \mathbb{E}[\mathbf{Y} - f(\mathbf{X}|\theta)]^2=0$$ The differentiation is with respect to the parameters $\theta$ of your model $f(x|\theta)$. Proceed with differentiationg: $$ \mathbb{E}\left[\frac \partial {\partial \theta}(\mathbf{Y} - f(\mathbf{X}|\theta))^2\right]= \mathbb{E}\left[-2(\mathbf{Y} - f(\mathbf{X}|\theta))\frac \partial {\partial \theta}f(\mathbf{X}|\theta)\right]= $$ $$\mathbb{E}\left[(\mathbf{Y} - f(\mathbf{X}|\theta))\right](-2\frac \partial {\partial \theta}f(\mathbf{X}|\theta) ) $$ This holds when the following condition holds: $$\mathbb{E}\left[(\mathbf{Y} - f(\mathbf{X}|\theta))\right]=0$$ $$\mathbb{E}\left[\mathbf{Y}\right]= f(\mathbf{X}|\theta)$$

Why were you able to pull out $-2\frac \partial {\partial \theta}f(\mathbf{X}|\theta)$? — Ryker, Feb 04 '19 at 18:57
@Ryker, if you have your function constant for any input data X then it's a trivial solution with all coefficients zero, no? — Aksakal, Feb 04 '19 at 21:35

Why does the conditional expectation minimize L2 loss?

1 Answers1

Related