I know that a train-validation-test splits the data into:
- a training dataset - obviously my "in-sample" data
- a validation dataset
- a test data set - obviously my "out-of-sample" data
My question is: Should I refer to the validation dataset as in-sample or out-of-sample data?
If we're using the validation dataset to fine-tune the parameter values, then the model has seen this data before. So I'm thinking it is "in-sample" data. Am I right?
Thanks for your help!
Kitty Kenty.
In the context of hyperparameter tuning, however, you can argue that it is in-sample data because you have seen it and possibly tuned the model to overfit it. This is why we need a third set for testing the final model.
– runcoderun May 22 '19 at 19:21