4

We know that in a prediction task of predicting $Y$ given $X$, $g(x) = E[Y|X=x]$ is the best predictor if the loss function is mean squared loss (albeit not the only one), $E[(y-g(x))^2]$.

For which expected loss function (or functions), conditional variance $VAR[Y|X]$ is the optimal predictor?

I acknowledge the fact that there may not be such a loss function as it may require the loss function to have a priori knowledge of the mean. And if it exists it is definetely not a proper loss function as variance is always positive but the domain of a distribution may be on the negative part of the real line. Nevertheless I cannot prove its absence.

  • 2
    The question is confusing. First you say you want to predict Y given X and point out correctly that for squared error loss it is the best predictor. But then you pick VAR[Y|X]. Are you still trying to predict Y? – Michael R. Chernick Dec 18 '16 at 01:48
  • @MichaelChernick There is nothing confusing about it. I provided the answer below, albeit with some restrictions (which were actually conjectured as part of the question). – Cagdas Ozgenc Jun 26 '17 at 07:22
  • If there is no a priori knowledge of the mean, it is possible to prove that no loss function exists, see my answer to this question – picky_porpoise Nov 25 '23 at 16:34

1 Answers1

4

It turns out that the question has a neat answer under some conditions.

Linex Loss function is defined as follows:

$L(Y, \hat{Y}; a) = \frac{2}{a^2} (e^{a(Y-\hat{Y})}-a(Y-\hat{Y})-1)$

If we assume that $Y \sim N(\mu,\sigma^2)$ then

$\hat{Y} = argmin_\hat{y} (E[e^{a(Y-\hat{y})}]-aE[Y-\hat{y}])$

$ = argmin_\hat{y} [e^{a(\mu-\hat{y})+\frac{1}{2}a^2\sigma^2}-a(\mu-\hat{y})]$

taking the derivative with respect to $\hat{y}$ and setting it to 0 yields

$\hat{Y} = \mu + \frac{a}{2}\sigma^2$

If the mean is known to be 0, Linex Loss with parameter $a = 2$ yields the variance as the optimal predictor. Same goes for a conditional distribution, where the result is the conditional variance.

  • Perhaps you can address here why the result is conditional variance for a conditional distribution. I kind of get it (something about a zero-mean error term), but I’d like to see it made explicit. – Dave Aug 18 '22 at 16:11