I understand the concept of training, validation, and testing datasets for model building. Typically when searching for the optimal hyperparameters for a given class of model, we choose the hyperparameter configuration that optimizes our chosen measure of performance on the validation dataset. Then we test the model's "true" out-of-sample performance on the test set.
Would it be poor form to select the model that minimizes the difference in whatever performance measure I've chosen on the training and validation sets? Suppose I have two models:
model train_performance validation_performance
1 0.95 0.75
2 0.77 0.74
It looks to me like model 1 is overfitting the training set due to the stark difference in performance between the training and validation datasets. I have a hunch this model won't generalize well to the test set, and would rather go with model 2, which doesn't do nearly as well on the training dataset but performs similarly across the training and validation datasets. I believe model 2 will generalize better, and think the small difference between training and validation performance for model 2 should produce more consistent results out-of-sample, even though model 1 has the highest validation performance.