0

Say we have a regression of $y=\beta_1x_i + \epsilon_i$, for $i=1,...,n$, such that it doesn't have a y intercept. How would we go about working out the LOOCV error. I know LOOCV is a case of K-fold for $k=n$, but not sure how I'd go about working it out.

ILE2091
  • 45

1 Answers1

2

The error is a sum of errors over all folds

$$\text{CV} = \sum_{i=1}^n e_i^2 = \sum_{i=1}^n \left(y_i - \hat{y}_{(i)}\right)^2$$ where $y_i$ is the y-value for the $i$-th data-point and $\hat{y}_{(i)}$ is the prediction for it when leaving out the $i$-th data-point in the OLS regression. That is, we have $e_i = y_i - \hat{y}_{(i)}$.

A formula for computing this LOOCV error is:

$$\text{CV} = \dfrac{1}{n}\sum\limits_{i=1}^{n}\left(\dfrac{y_i - \hat{y}_i}{1-h_i}\right)^2$$

where $\hat{y}_i$ the estimate for the $i$-the y-value, while using all data.

$$h_i = \dfrac{1}{n}+\dfrac{(x_i - \bar{x})^2}{\sum\limits_{j=1}^{n}(x_j - \bar{x})^2}\text{.}$$

The formula and it's derivation are derived in this post: Proof of LOOCV formula

You can see this formula as computing the terms $ \left(y_i - \hat{y}_{(i)}\right)^2$ by using the residuals in regular OLS $\left(y_i - \hat{y}_{i}\right)^2$ with the addition of a factor $h_i$ which describes the difference between the two.