0

I was trying to evaluate different classification models on MNIST dataset. There are two datasets provided : train - 42000 images, and test - 28000 images.

I first divided the original training dataset (42000 images) into a (80:20 split ) of train_set (33600) and test_set (8400) . I trained several models, from on training set, cross-validated them on the training_set only, and lastly evaluated the final model on the test_set for generalization error.

Now that my final model is ready to generate the submission file using the Kaggle provided test set, should I train my model on the whole Kaggle provided training set, ie train_set + test_set (ie the full 42000 images provided, instead of just 33600 images that I split), since Kaggle is going to evaluate my model on its own provided test set ?

  • Similar questions have been asked before, see https://stats.stackexchange.com/questions/361494/how-to-correctly-retrain-model-using-all-data-after-cross-validation-with-early – Adrian Apr 22 '23 at 18:03
  • https://stats.stackexchange.com/questions/11602/training-on-the-full-dataset-after-cross-validation this too – Adrian Apr 22 '23 at 18:03
  • One more that may help: https://stats.stackexchange.com/questions/52274/how-to-choose-a-predictive-model-after-k-fold-cross-validation – Adrian Apr 22 '23 at 18:05

0 Answers0