Rigorous statement of expectations for the bias-variance trade-off

Question

Consider a data generating process $$Y=f(X)+\varepsilon$$ where $\varepsilon$ is independent of $x$ with $\mathbb E(\varepsilon)=0$ and $\text{Var}(\varepsilon)=\sigma^2_\varepsilon$. According to Hastie et al. "The Elements of Statistical Learning" (2nd edition, 2009) Section 7.3 p. 223, we can derive an expression for the expected prediction error of a regression fit $\hat f(X)$ at an input point $X=x_0$, using squared-error loss:

\begin{align} \text{Err}(x_0) &=\mathbb E[(Y-\hat f(x_0))^2|X=x_0]\\ &=(\mathbb E[\hat f(x_0)−f(x_0)])^2+\mathbb E[(\hat f(x_0)−\mathbb E[\hat f(x_0)])^2]+\sigma^2_\varepsilon\\ &=\text{Bias}^2\ \ \ \quad\quad\quad\quad\quad\;\;+\text{Variance } \quad\quad\quad\quad\quad\quad+ \text{ Irreducible Error} \end{align}

(where I use the notation $\text{Bias}^2$ instead of $\text{Bias}$).

Question: What are the expectations taken over? What is held fixed and what is random?

_{The question arose in the comments of the thread "Why is there a bias variance tradeoff? A counterexample".}

$x_0$ and $f(x_0)$ are assumed fixed. $\hat{f}(.)$, however, depends on the training data, which is taken as random. Finally, there is randomness in $Y|X=x_0$, which is assumed independent of $\hat{f}(.)$ (conditional on $X=x_0$). — Tim Mak, Jun 09 '20 at 10:01
@TimMak, sounds reasonable. So what are the expectations taken over? The random sampling of the training set jointly with the randomness in $\varepsilon$, i.e. a double integral? I have added the qualification that $\varepsilon$ is independent of $x$, but should that be the case? (I guess it should.) — Richard Hardy, Jun 09 '20 at 10:16
Actually, I read the book in more detail and found that in their presentation $X$ is actually assumed fixed. Section 7.3 refers back I think, to Section 2.9, where they said "For simplicity here we assume that the values of $x_i$ in the sample are fixed in advance (nonrandom)". However, in other presentations, it is common to have $X$ assumed random also. Either way, $\hat{f}(.)$ depends on both $X$ and $Y$, and hence is random even when $X$ is held fixed. — Tim Mak, Jun 10 '20 at 02:04
@TimMak, since you are knowledgeable about the matter, consider writing up an answer. — Richard Hardy, Jun 10 '20 at 05:52
https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote12.html - best explanation I found — TheRajVJain, Aug 12 '23 at 05:55

Richard Hardy · Answer 1 · 2020-09-03T11:34:23.820

1

$X$ is assumed fixed; see Section 2.9, p. 37:

For simplicity here we assume that the values of $x_i$ in the sample are fixed in advance (nonrandom).

Then the only source of random variation here is $\varepsilon$. Hence, the expectations are taken w.r.t. to the distribution of $\varepsilon$.

edited Sep 03 '20 at 11:34

answered Sep 03 '20 at 11:19

Richard Hardy

67,272

Rigorous statement of expectations for the bias-variance trade-off

1 Answers1

Linked

Related