10

Conceptually, where do you draw the line between an overfit model and adequately fit model?

It's clear that if your model is doing a couple percent better on your training set than your test set, you are overfitting. But let's say theoretically, I trained a model on a training set, then validated on a test set, and found out that my training set had an accuracy of 0.2% higher than my test set. Is this too much overfitting?

foboi1122
  • 233
  • 2
  • 6

1 Answers1

9

It's clear that if your model is doing a couple percent better on your training set than your test set, you are overfitting.

It is not true. Your model has learned based on the training and hasn't "seen" before the test set, so obviously it should perform better on the training set. The fact that it performs (a little bit) worse on test set does not mean that the model is overfitting -- the "noticeable" difference can suggest it.

Check the definition and description from Wikipedia:

Overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model that has been overfit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data.

The possibility of overfitting exists because the criterion used for training the model is not the same as the criterion used to judge the efficacy of a model. In particular, a model is typically trained by maximizing its performance on some set of training data. However, its efficacy is determined not by its performance on the training data but by its ability to perform well on unseen data. Overfitting occurs when a model begins to "memorize" training data rather than "learning" to generalize from trend.

In extreme case, overfitting model fits perfectly to the training data and poorly to the test data. However in most of the real life examples this is much more subtle and it can be much harder to judge overfitting. Finally, it can happen that the data you have for your training and test set are similar, so model seems to perform fine on both sets, but when you use it on some new dataset it performs poorly because of overfitting, as in Google flu trends example.

Imagine you have data about some $Y$ and its time trend (plotted below). You have data about it on time from 0 to 30, and decide to use 0-20 part of the data as a training set and 21-30 as a hold-out sample. It performs very well on both samples, there is an obvious linear trend, however when you make predictions on new unseen before data for times higher than 30, the good fit appears to be illusory.

enter image description here

This is an abstract example, but imagine a real-life one: you have a model that predicts sales of some product, it performs very well in summer, but autumn comes and the performance drops. Your model is overfitting to summer data -- maybe it's good only for the summer data, maybe it performed good only on this years summer data, maybe this autumn is an outlier and the model is fine...

Tim
  • 138,066
  • 3
    With kernel models, such as the SVM, it is not uncommon to get the best generalisation performance with zero error on the training set. IMHO looking at the training set error causes more problems than it is worth, better just to look at the validation set error (although that can be over-fit as well if you tune the hyper-parameters too much). – Dikran Marsupial Mar 18 '16 at 11:27
  • Should difference between error in training set and test set be taken into consideration while comparing two different models or whichever model gives least error on test set should be selected? – Siddhesh Mar 18 '16 at 11:35
  • 1
    @Siddhesh you have two models: model1 correctly classified 2% of cases in train and 2% in test sets (0% difference), model2 correctly classified 90% cases in train and 50% in test set (30% difference) -- which one would you choose..? The difference can suggest problems, but it does not measure model performance per se. – Tim Mar 18 '16 at 11:39
  • @Tim : But what if test error is comparable , here is the question I have http://stats.stackexchange.com/questions/202339/should-difference-between-accuracy-of-model-on-training-data-and-testing-data-be – Siddhesh Mar 18 '16 at 11:53
  • 1
    @Siddhesh as written by Dikran Marsupial and as in stated in my answer and the comment above, the difference does not have to suggest anything. The example in my answer illustrates situation where there is no difference between train and test sets but still model behaves poorly on future data. – Tim Mar 18 '16 at 12:13
  • so what if we assume that be have fully represented the data that we wish to model. I.e. we are modeling time series and sample across 100 years of data sampled every second to predict temperature the next day. Then does a small difference in training and test error matter? – foboi1122 Mar 19 '16 at 01:04
  • @foboi1122 If you'd have 100% representative data that you'd train your model on the whole dataset and you wouldn't care about overfitting since overfitting would mean perfect fit to the data :) – Tim Mar 19 '16 at 06:30
  • @Tim - let's say my dataset is imbalanced and I focus on f1-metric only. Between my train and test, if there is an 8-10 points drop in F1-score, is it considered overfitting? train f1 is 70 and test f1 is 62. Whereas individual metrics such as precision and recall, seperately differ more than 15 points. But since, am interested only on F1, do you think 8-10 points drop in f1-score is considered overfitting? If you wish to know more, you can refer this post here - https://datascience.stackexchange.com/questions/109416/assess-overfitting-all-model-metrics-or-only-specific-metric – The Great Mar 28 '22 at 09:30
  • @TheGreat there’s no hard criteria for overfitting. – Tim Mar 28 '22 at 10:21
  • @Tim - Here in stack exchange, I have created a detailed post on my problem. Would you be intersted to have a look? https://stats.stackexchange.com/questions/569423/how-to-not-overfit-with-little-data – The Great Mar 28 '22 at 10:26
  • @Tim - So, you suggest that 10 point drop may not necessariky be overfitting? It's upto us to keep this model or not – The Great Mar 28 '22 at 10:48
  • @TheGreat it’s not black & white that model overfits or not and you can easily tell it from the metrics. – Tim Mar 28 '22 at 10:52
  • You mean, I CAN easily tell it from the metrics? but you say it's not black and white. So, am confused. So, then how do we say it is overfit it or not? If we can easily tell from the metrics, can you share tips based on your experience on how we can identify from the metrics? – The Great Mar 28 '22 at 10:56
  • @TheGreat you can if you have 100% train accuracy and 0% test, everything in between is a gray area. If 10% difference was the threshold, why not 9% or 9.99% or 11%? – Tim Mar 28 '22 at 11:04
  • Okay, yeah. So, you say finally, it;s upto the business to decide. Except when there is 100 in train and 0 in test. Your argument makes sense. Do you usually leave that to business? how do you handle such scenarios? – The Great Mar 28 '22 at 11:05
  • @TheGreat business doesn’t care of what is overfitting and won’t tell you that. I’m just saying that there is no any threshold you could use to decide. It’s subjective and problem-specific. – Tim Mar 28 '22 at 11:09
  • one last question. When we assess overfitting (based on our subjective nature), is it logical to look at only the metric that we are interested in? Or we should look at all the metrics? just trying to know how it is done and what is the right approach to do? – The Great Mar 28 '22 at 11:20