1

Bias, in machine learning, is mathematically defined as $f-E(\hat{f})$, where $f$ is the true model and $\hat{f}$ is the estimate.

I was wondering how we can compute theoretically $E(\hat{f})$, given some data points $\{ (x_i,y_i)\}$. For a simple start, we can let $\hat{f} = w_0 + w_1x$ (for a single feature input $x$). Please give some insights.

rolando2
  • 12,511
cgo
  • 9,107
  • I am not sure I understand your question... To obtain $\hat f$ from the data you could use OLS, could you not? – Richard Hardy Feb 02 '24 at 08:44
  • @RichardHardy though that may not help illustrate the question if OLS gives unbiased estimates – Henry Feb 02 '24 at 11:06
  • @Henry, I was thinking about the more general case where $\hat f$ is actually some $\hat g$ where $g\not\equiv f$ (a model does not coincide with the DGP). If $\hat f$ has the same shape as $f$ (the model coincides with the DGP), then indeed there may not be any bias. If the OP wants a biased estimator, they may try quantile regression. So sgo, are we getting there? Or did you mean something else? – Richard Hardy Feb 02 '24 at 12:35
  • Sorry, I made a mistake. I meant, how do you compute $E(\hat{f})$, the expectation of the estimate? And in effect, how do you compute the bias? – cgo Feb 02 '24 at 12:56
  • The method of calculation of bias depends on what you know: you might be able to do this theoretically or you might use some form of simulation. Perhaps take a well known example as an illustration: a distribution with unknown mean $\mu$ and variance $\sigma^2$ and you want to estimate $\sigma^2$ from your sample . If you use $\hat{s^2_{k}} = \frac{1}{k} \sum\limits_{i=1}^n(x_i-\bar x)^2$ as your estimator of $\sigma^2$, some simple analysis will tell you $\mathbb E\left[\hat{s^2_{k}}\right] = \frac{n-1}{k}\sigma^2$ and a large enough simulation will give you a similar result. – Henry Feb 02 '24 at 13:15
  • ... The bias is then $\sigma^2- \mathbb E\left[\hat{s^2_{k}}\right] = \frac{k-n+1}{k}\sigma^2$ (or for some people its negative). Hence $\hat{s}^2_{n-1}$ is an unbiased estimator, though not necessarily the best estimator: if the sample comes from a normal distribution then the biased estimator $\hat{s}^2_{n}$ is the maximum likelihood estimator of $\sigma^2$ while the even more biased estimator $\hat{s}^2_{n+1}$ would minimise the expected mean square error, so this is also an example of the bias-variance tradeoff. – Henry Feb 02 '24 at 13:23
  • yes, I am familiar with your example, thank you. But then, the bias is defined as the difference between the true $f$ and the expected value of the model/estimate. Comparing this with your example, does $f$ represent a particular parameter and not the model itself? – cgo Feb 02 '24 at 15:15

0 Answers0