Out-of-sample testing simulates the real way machine learning is used. Think about how Amazon designs Alexa. The goal is to analyze future speech, sentences and speakers she hasn't necessarily heard.
The goal of out-of-sample testing is to determine if your model has overfit to the data on which it was trained. In other words, we don't want to memorize patterns in the training data or model mere coincidences. We want something that will generalize to new data, much like Alexa should.
I do agree that out-of-sample testing is an excellent way to assess performance, since it's kind of the ultimate test of the generality in which a model works. However, in the regression setting (think OLS), there is adjusted $R^2$ which gives a penalty for having many parameters, with the assumption being that throwing all kinds of parameters at a model will end up memorizing the data. Remember, if you have $N$ points in the plane with different $x$ coordinates, you can hit all of them with a polynomial of order $N-1$ (e.g., a parabola connects any $3$ such points) and achieve perfect accuracy on your training data, but that's little more than playing connect the dots.
Biostatistics and epidemiology do have a variant of out-of-sample testing where the model is repeatedly trained on bootstrap samples and then tested on the normal dataset, if that counts as what you mean (though even such an approach does test the trained model on points on which the model was not fit). This is not without its critics. Even Harrell has remarked that splitting the data to have a designated test set is reasonable once the data set gets to be quite large (the number he tends to give, such as in the link, is $20000$ observations).