2

I understand the concept of training, validation, and testing datasets for model building. Typically when searching for the optimal hyperparameters for a given class of model, we choose the hyperparameter configuration that optimizes our chosen measure of performance on the validation dataset. Then we test the model's "true" out-of-sample performance on the test set.

Would it be poor form to select the model that minimizes the difference in whatever performance measure I've chosen on the training and validation sets? Suppose I have two models:

model train_performance validation_performance
1                  0.95                   0.75
2                  0.77                   0.74

It looks to me like model 1 is overfitting the training set due to the stark difference in performance between the training and validation datasets. I have a hunch this model won't generalize well to the test set, and would rather go with model 2, which doesn't do nearly as well on the training dataset but performs similarly across the training and validation datasets. I believe model 2 will generalize better, and think the small difference between training and validation performance for model 2 should produce more consistent results out-of-sample, even though model 1 has the highest validation performance.

  • 1
    I wouldn't make it a formal objective, e.g., consider the possibility that Model 1 has a train / test performance of $(0.77, 0.74)$ but model 2 has a train / test performance of $(0.95, 0.88)$; you'd probably prefer the second to the first even though the train-test gap is larger. But in your case I think your reasoning is sound, and I'd go with model 2 as well. You can always extend your analysis by using repeated K-fold cross-validation to get a more accurate estimate of validation performance in the two cases, at some possibly-considerable expense of runtime. – jbowman Feb 17 '20 at 19:55

0 Answers0