1

I found a question (Question 7) here:

Question: For k cross-validation, larger k value implies more bias
Options: True or False

My answer is: True.
Reason: Larger K means more folds means smaller test set which means larger training set. As you increase training data you bring down variance which means increase bias.

So as K increases --> Training data size increases --> Variance reduces --> Bias increases Hence answer is True

But the website says answer is False.
Can someone explain if my logic is wrong and why their answer is right?

desertnaut
  • 1,988
  • 2
  • 14
  • 23
Hitesh Somani
  • 399
  • 2
  • 10

2 Answers2

1

Why should bias increase as the training set size increases?

My intuition is that as you increase K, your test sets get smaller, increasing the variance of your evaluation metric. At the same time, you're fitting your model to more and more similar training data (since the training set in each fold is approaching the full set), so you're more likely to overfit to the training data because you're fitting your candidate models on almost the entire training data set in each fold.

Matt Kaye
  • 126
  • 1
  • I respect your opinion. But what I have learned from courses is that if you want to bring down variance (which will increase bias) you should increase training set size. I think this question is ambiguous. I think it depends a upon training set and model complexity also. If the model is very complicated like some deep learning model or may be even gradient boosting model it may start over fitting even if the fold size approaches entire data set, but if its some simple linear regression model then may be increasing fold size may not overfit. – Hitesh Somani Aug 01 '21 at 09:47
  • 1
    I think it's important to remember that in this case, bias and variance are not always trading off. Generally, a good rule of thumb is that increasing the size of your training set with decrease both the bias and the variance of your models. So you can't just reason from "Well, increase the size of the training set so variance decreases so bias must increase" since that isn't necessarily true – Matt Kaye Aug 02 '21 at 11:59
0

Larger K in cross validation means you have that many more Models created out of slices in your dataset. So - the average of the predictions from each of the K models - would even out the bias associated with outliers in your dataset.

Jayaram Iyer
  • 815
  • 5
  • 8
  • Thanks for answering but in that quiz it was not clear whether the dataset have outliers. Practically I think we can assume that data set will have outliers. – Hitesh Somani Aug 01 '21 at 09:52