0

(TL;DR version below) If my understanding is correct, bias/variance are measures of goodness of fit of a statistical estimator w.r.t. the sampling distribution. So if I have a statistic $t(X)$ that estimates a given population parameter $\theta$ with high variance, it means that for every sample $x$ drawn from the population, the estimate $t(x)$ will have high variance w.r.t. the true parameter $\theta$ in the sampling distribution.

Now I'm trying to think of how this applies to cross-validation techniques such as:

  1. Holdout
  2. Leave one out Cross Validation (LOOCV)
  3. K-fold Cross Validation (kCV)

My understanding is that these techniques are also meta-estimators where the population is the training set and the parameter to be estimated is the generalization error on the entire unseen portion of the data. What I do not understand, however, is how bias/variance of these estimators are measured. I guess for Holdout and kCV, the sampling distribution is going to comprise the different ways in which you can divide the dataset into partition(s). But what about LOOCV? The partitioning appears to be deterministic and yet many textbooks seem to suggest that this method exhibits high variance. I know that there are other answers that tackle this question (Bias and variance in leave-one-out vs K-fold cross validation), but I am trying to understand cross-validation as statistical estimators from a theoretical perspective.

TL;DR: What does the sampling distribution of cross-validation methods (especially LOOCV) look like and how can bias/variance be calculated?

statkun
  • 63
  • 1
    I can't make sense of anything in the first paragraph. According to standard definitions, none of it is correct. Your final question is difficult to understand, too, because it is so extremely general. – whuber Dec 22 '22 at 02:42
  • I apologize for the confusion. Since LOOCV is a statistical estimator of the generalization error, I am trying to understand what the sampling distribution looks like for a training dataset $X$ of fixed size $n$ (particularly, what will be the random variables that the sampling distribution will be defined on?). Does this make sense or is it still too general? I will change the question accordingly. – statkun Dec 22 '22 at 03:12

0 Answers0