1

I am really confused... why linear regression is modelling the expected value of response(or conditional expected value)?

If we don't use mean square error as the loss function to minimise, is it still modeling E[Y]?

My understanding is that, we believe, there is a linear relation between X and Y. though in realization of Y (Y_hat), there are some error exist.

Y = f(X) + random error term

Y_hat = f(X)

Why/How does E[Y|X] has anything to do with this?

Thank you all in advance

Richard Hardy
  • 67,272
danni
  • 11
  • One possible explanation is that, one can define least squares regression as a projection, that is that $\hat{y} =X^T ( X^TX)^{-1} X^T y $. Weighted regressions are also projections, and are unbiased for the mean response. I don't think the same would necessarily be the case if you chose a skewed error term and derive the maximum likelihood estimator, however. – AdamO Apr 11 '22 at 20:47
  • @AdamO That's correct. A nice (and commonly encountered) example of an asymmetric loss function is analyzed at https://stats.stackexchange.com/questions/251600, where it is shown that such a loss is modeling $F^{-1}(q; X),$ the $q^\text{th}$ quantile of the conditional distribution $F(;X).$ In a linear regression model, if this conditional quantile is a linear function of $X,$ then unless $F(;X)$ has a special functional form, the conditional expectation $E[\mid X]$ will not be linear. Thus, there is no valid way one could claim $E[\mid X]$ is even a valid proxy for the correct solution. – whuber Apr 12 '22 at 11:17

1 Answers1

0

If we don't use mean square error as the loss function to minimise, is it still modeling E[Y]?

Pretty much anything can be an estimator. For instance, we might choose to calculate the empirical median and use it as an estimator of the mean. To estimate variance, we might calculate $\frac{1}{n}\sum(x_i - \bar x)^2$, $\frac{1}{n-1}\sum(x_i - \bar x)^2$, or $\frac{1}{n+1}\sum(x_i - \bar x)^2$. Depending on the situation, each of these can be defended as an estimator of variance (MLE for a Gaussian, unbiased estimator, minimum MSE estimator for a Gaussian, respectively).

Consequently, you might choose to use the estimator calculated by minimizing absolute loss (explicitly models the conditional median, not the mean) or ridge loss, and you are perfectly within your rights to say that you are modeling (estimating) $\mathbb E[Y]$, (even if you are not doing so explicitly).

That might be a rather poor estimator of the conditional mean, but the OLS estimator might be a rather poor estimator of the mean, depending on the particulars of the problem.

Dave
  • 62,186
  • Um... no. When you minimize absolute loss, for instance, you are modeling the conditional median of $Y,$ not its expectation. – whuber Apr 11 '22 at 19:33
  • @whuber Explicitly, yes, but that doesn't invalidate its use as an estimator of the conditional mean. – Dave Apr 11 '22 at 19:35
  • It does except when there is a simple predictable relationship between the mean and median, which misses the point anyway: when you are using a different loss function, you usually are not* estimating the conditional expectation.* In particular, when the conditional response distribution is not symmetric, a linear regression for the expectation will usually not be a linear regression for the median (or some other property). – whuber Apr 11 '22 at 20:48
  • @whuber Empirical median is a lower-variance estimator of the mean of a Laplace distribution than $\bar X$. If I want to estimate the mean of a Laplace distribution using the empirical median, I can (and arguably should). – Dave Apr 11 '22 at 20:55
  • Again, that's not really relevant to the question. The distinctions that are relevant are made salient by considering either asymmetric conditional response distributions and/or asymmetric loss functions. – whuber Apr 11 '22 at 20:57
  • @whuber The maximum likelihood estimator of the mean of a Laplace distribution is the empirical median. – Dave Apr 12 '22 at 10:29