How big a difference for test/train RMSE is considered as overfit?

Question

I read that when:

RMSE of test > RMSE of train => OVER FITTING of the data.

RMSE of test < RMSE of train => UNDER FITTING of the data.

Is there a actually delta threshold that determine if the model is over fit or under fit? It's almost impossible to get equal RMSE for test and train data. If it is not equal, then based on the above rule, it is always overfit or underfit.

I also read that RSME is good or bad depends on the dependent variable (DV) range. Example if RMSE is 300 and if the range of DV is 20 to 100000, this is considered small? Should this be measured as a percentage such as (RMSE / range of DV) or (RMSE/stddev of y_test data) when we wants to compare accuracy between multiple dataset?

I do not think that is the right way to think about over-fitting or under-fitting. There is no threshold that everyone is seeking. One can always construct train and test data such that test error is less than the train error, so that is a heuristic rather than a rule. The precise characterization of fit is a much broader and subjective question. — psiyumm, Nov 19 '20 at 09:44
Hi, what if I have a test RMSE of 347 and a train RMSE of 342? My Max and Min for the test data is 1291 with std dev 275.03. I obtain from Autoregression. Does this means my model is good or bad? — user3782604, Nov 19 '20 at 09:52
As said above, it is not as easy as just comparing two numbers. Often it is easy to see evidence of overfitting with a learning curve, that is, plot the training and testing accuracy over some third variable like model complexity, training time or even number of training samples. — Cameron Chandler, Nov 19 '20 at 13:10

score 1 · Answer 1 · answered Jul 17 '22 at 13:06

If you are training neural network models, you will quickly find that validation performance is worse than training performance.

This is OK as long as the validation error does not increase while the training error is decreasing. Personally, I call "overfitting" to "training error decreases and validation error increases". The difference between them is largely irrelevant as long as you don't have bugs.

score 0 · Answer 2 · answered Jun 12 '22 at 19:13

So generally speaking, there isn't a specific threshold for a single value, but rather a range of values over which certain indicators are undergoing a specific trend. More specifically, if using a Linear Model for example, the range of degrees of freedom in which the in sample error is decreasing and the out of sample error is increasing would be the range of generated models for which the values would be considered overfit.

How big a difference for test/train RMSE is considered as overfit?

2 Answers2