0

I'm currently reading p228 of The Element of Statistical Learning, which covers training error, in-sample error, and optimism. Let me quote some of the textbook contents as follows.

The $Y^{0}$ notation indicates that we observe N new response values at each of the training points $x_i$. We define the optimism as the difference between $\text{Err}_{in}$ and the training error $\overline{\text{err}}$ as $$\text{op} = \text{Err}_{in} - \overline{\text{err}}.$$ Finally, the average optimism is the expectation of the optimism over training sets: $$w = \mathbb E_{\mathbf y}[\text{op}].$$ Here the predictors in the training set are fixed, and the expectation is over the training set outcome values, hence we have used the notation $\mathbb E_{\mathbf y}$ instead of $\mathbb E_{\tau}$.

I'm uncertain about the meaning of "training set outcome values". Based on my understanding, $Y^0$ and $\mathbf y$ has same meaning. That is, $Y^0$ and $\mathbf y$ can be regarded as iid random variables. If so, why authors intentionally use different notation for same meaning? If not, does the notation $\mathbf y$ refer to all the $y_i$ in the training set $\tau$?

I'm confused with this concept, so any help would be appreciated.

Thank you.

jason 1
  • 311
  • 1
  • 6
  • Related: https://stats.stackexchange.com/questions/228394/what-is-the-difference-between-in-sample-error-and-training-error-and-intuition and https://stats.stackexchange.com/questions/357623/in-which-scenarios-are-the-in-sample-error-and-training-error-not-the-same – Henry Dec 07 '23 at 02:11

0 Answers0