In the book Elements of Statistical Learning in Chapter 7 (page 228), the training error is defined as: $$ \overline{err} = \frac{1}{N}\sum_{i=1}^{N}{L(y_i,\hat{f}(x_i))} $$
Whereas in-sample error is defined as $$ Err_{in} = \frac{1}{N}\sum_{i=1}^{N}{E_{Y^0}[L(Y_{i}^{0},\hat{f}(x_i))|\tau]} $$
The $Y^0$ notation indicates that we observe N new response values at each of the training points $x_i, i = 1, 2, . . . ,N$.
Which seems to be exactly the same as training error because training error is also calculated i.e by computing the response of the training set using the fitted estimate $\hat{f}(x)$. I have checked this and this explanation of this concept, but could not understand the difference between training error and in-sample error, and why optimism is not always 0: $$ op\equiv Err_{in}-\overline{err} $$
So how are the errors $Err_{in}$ and $\overline{err}$ different, and what is the intuitive understanding of optimism in this context?
Additionally, what does the author mean by "usually biased downward" in the statement:
This is typically positive since err is usually biased downward as an estimate of prediction error.
while describing Optimism (Elements of Statistical Learning, page 229)
needs not to be the samebutdoesn't need to be the same... – Francesco Boi Nov 06 '19 at 14:44needs not to beasnotbeing the negation ofbe, i.e.need to be different, not ofneed. Now I see what you mean, honestly I do not know but I understood what you meant now. – Francesco Boi Nov 06 '19 at 16:45