K-fold crossvalidation: how do MSE average and variance vary with K?

Question

I'd like to get an intuition about how varying k impacts k-fold validation. Is the following right?

Average of the OOS MSEs should generally decrease with k Because, a bigger "k" means the training sets are larger, so we have more data to fit the model (assuming we are using the "right" model).

Variance of the OOS MSEs should generally increase with k. A bigger "k" means having more validation sets. So we have have more individual MSEs to average out. Since the MSEs of many small folds will be more sparse than MSEs of few large folds, variance will be higher.

keiv.fly · Accepted Answer · 2018-08-19T22:17:00.693

Average of the OOS MSEs should generally decrease as k increases. This is right but the difference is much less then on your chart. Suppose we have a dataset where the error will halve if we increase the data 10 times (approximately true for the paper Scaling to Very Very Large Corpora for Natural Language Disambiguation). Then the difference between 5-fold and 20-fold validation will be about 5% (1/(2^log10(0.95/0.8)) not halving like on your graph. And the difference between 20-fold and infinity-fold will be only about 1.5% (1/(2^log10(1/0.95))

For the chart you could use the formula: Average OOS MSE = 1/(2^log10(1-1/k))*MSE_inf. This will assume that you have MSE = MSE_inf at infinity.

Variance of the OOS MSEs should generally increase as k increases. MSE is an average and according to the Central Limit Theorem (if squared errors (SE) are independent and identically distributed which, in my opinion, is supposed for most of the machine learning algorithms) variance should equal to Var(SE)/N, where N is the number of data points used to calculate MSE. So for 5-fold you will have variance Var(SE)/(Npop/5) Where Npop is the total number of points that you have. For the average MSE between all k-folds the variance will be the same and equal to Var(SE)/Npop. So the answer is that the variance of individual MSE numbers of each k-fold increases with k, but the variance of the final average MSE does not depend on the number of folds.

To calculate the variance of the final MSE based on MSE of folds:

Var(MSE_final) = Mean(Var(MSE_folds))/k = Sum((MSE_folds - Mean(MSE_folds))^2)/k^2

MSE change based on number of folds Here MSE at infinity is assumed 0.05.

Variance of individual k-fold MSEs change based on number of folds Here Var(squared error for one point) is assumed 10 and number of observations is 1000.

thank you very much! I can't +1 your answer because I've just joined the community, I don't have enough reputation. But your answer really took me to the next level ! — elemolotiv, Aug 21 '18 at 19:27

K-fold crossvalidation: how do MSE average and variance vary with K?

1 Answers1