In machine learning context, suppose I have 100 observations which will be split into training and validation set (say #1 ~ #100) and totally separate 100 observations for test set (say #101 ~ #200). Suppose there is no order in the observations #1 ~ #100.
I tried 5 different split. Model 1. #1 ~ #20 as validation set and #21 ~ #100 as training set. Model 2. #21 ~ #40 as validation and #1 ~ #20, #41 ~ #100 as training set ... Model 5. #81 ~ #100 as validation set and #1 ~ #80 as training set.
I fitted 5 machine learning models using the above 5 different split and measured performance (such as RMSE) at the test set (#101 ~ #200). If I choose a model which shows lowest RMSE at the test set among model 1 ~ model 5 and say this is the 'best split' of observations #1 ~ #100 into training and validation set and use this as a final model, is this a correct argument?
I feel something is wrong in this argument, but cannot logically rebut it.