Bias, in machine learning, is mathematically defined as $f-E(\hat{f})$, where $f$ is the true model and $\hat{f}$ is the estimate.
I was wondering how we can compute theoretically $E(\hat{f})$, given some data points $\{ (x_i,y_i)\}$. For a simple start, we can let $\hat{f} = w_0 + w_1x$ (for a single feature input $x$). Please give some insights.