Questions tagged [cross-validation]

Refers to general procedures that attempt to determine the generalizability of a statistical result. Cross-validation arises frequently in the context of assessing how a particular model fit predicts future observations. Methods for cross-validation usually involve withholding a random subset of the data during model fitting and quantifying how accurate the withheld data are predicted and repeating this process to get a measure of prediction accuracy.

641 questions
6
votes
1 answer

Cross-validation for model comparison: use the same folds?

Let's say we have model M1 and model M2 that we want to compare. When we do 5-fold (say) cross validation, would the correct method to be to partition the data into F1, F2, F3, F4, and F5 and then run both models through those folds? Then would the…
Dave
  • 3,818
  • 1
  • 8
  • 29
6
votes
1 answer

Can we use k fold Cross Validation without any extra (excluded) Test Set?

I have seen this in two Papers: The authors use 10 fold cross validation, and then present the results from this validation or even odder the results from the best Fold as their modelling Result. there has been no testing data put aside to validate…
5
votes
2 answers

Cross Validation how to determine when to Early Stop?

When using "K-Fold Cross Validtion" for Neural Net, do we: Pick and save initial weights of the network randomly (let's call it $W_0$) Split data into $N$ equal chunks Train model on $N-1$ chunks, validating against the left-out chunk (the $K$'th…
Kari
  • 2,726
  • 2
  • 20
  • 49
4
votes
1 answer

Cross-Validation: Repeated K-Fold/Group K-Fold

Repeated K-Fold vs Group K-Fold As per my understanding from sklearn docs Repeated K-Fold: RepeatedKFold repeats K-Fold n times. It can be used when one requires to run KFold n times, producing different splits in each repetition. Repeated…
Pluviophile
  • 3,808
  • 13
  • 31
  • 54
4
votes
1 answer

Cross-validation of a cross-validated stacking ensemble?

let me begin by saying that I understand how to build a stacked ensemble by using cross-validation to generate out-of-fold predictions for the base learners to generate meta-features. My question is about the methodology when cross-validating the…
Reii Nakano
  • 291
  • 1
  • 5
3
votes
1 answer

Why does cross validation have a pessimistic bias?

My course notes list two reasons why cross-validation has a pessimistic bias. The first one is that the accuracy is measured for models that are trained on less data, which I understand. However, the second reason I don't understand. Supposedly,…
Uberfatty
  • 131
  • 1
3
votes
1 answer

K-fold crossvalidation: how do MSE average and variance vary with K?

I'd like to get an intuition about how varying k impacts k-fold validation. Is the following right? Average of the OOS MSEs should generally decrease with k Because, a bigger "k" means the training sets are larger, so we have more data to fit the…
elemolotiv
  • 133
  • 5
2
votes
3 answers

Publish without validation score?

My mentor wants me to write and submit an academic paper reporting a predictive model, but without any validation score. Everything I have read in textbooks or the Internet says that this is wrong, but is there any case where only reporting a train…
user82841
  • 23
  • 2
2
votes
1 answer

choosing classifiers

For what I read the 5x2cv t test is "a procedure for comparing the performance of two models (classifiers or regressors) that was proposed by Dietterich to address shortcomings in other methods such as the resampled paired t test and the k-fold…
Lila
  • 217
  • 2
  • 7
2
votes
4 answers

Best practice with cross validation

I have done a 10 fold Cross Validation on my data and have selected the best model from the results. With cross validation, I will have 10 models trained from different folds of the data. For the final model to use, should I take the average of the…
william007
  • 775
  • 1
  • 10
  • 20
2
votes
3 answers

Should I use GridSearch CV for hyper-parameter tuning in a data-rich context?

My textbook states that k-fold cross-validation is a resampling technique that is useful for estimating generalization error in a data-poor setting. Ideally, if we had enough data, we would set aside a validation set and use it to assess the…
2
votes
1 answer

leave one pair out cross validation

I am trying to train and validate my datasets which contains 17 datasets. I have divided them as 15 for training and 2 for validation. In the process, I train on 15 datasets and use the generated model to predict the results on the remaining 2…
1
vote
2 answers

Cross Validation and bias relation

I found a question (Question 7) here: Question: For k cross-validation, larger k value implies more bias Options: True or False My answer is: True. Reason: Larger K means more folds means smaller test set which means larger training set. As you…
Hitesh Somani
  • 399
  • 2
  • 10
1
vote
2 answers

Does adding a model complexity penalty to the loss function allow you to skip cross-validation?

It's my understanding that selecting for small models, i.e. having a multi-objective function where you're optimizing for both model accuracy and simplicity, automatically takes care of the danger of overfitting the data. Do I have this right? It…
Redrock
  • 11
  • 1
1
vote
1 answer

nested CV feature selection

I have a small dataset of 150 records with 25 features (too small to do train/test). I'm using nested cv for both hyperparameter tuning and feature selection. 10cv in the outer loop, 5 cv in the inner loop. Eventually i'm getting 10 sets of…
XPeriment
  • 11
  • 1
1
2