In my research group we are discussing if it is possible to say a model has overfitting just by comparing the two errors, without knowing anything more about the experiment.
ps: I am personally interested in non redundant small datasets (i.e., without duplicates or very similar instances), say 100 instances, and in classifiers with few or no parameters to adjust, like decision trees (that's why I don't have any validation error to mention at all)
I am thinking about some arguments against doing the comparison stated in the title of this question,
- It seems that comparisons to the random error on the testing set (i.e., always bet in the majority class) would be more informative
- Depending on the complexity and level of noise of the data, the overfitting tendency can be increased or attenuated
- Depending on the classifier, data can match perfectly its representation bias (a linear separable problem vs a linear regression) or, in opposition, each instance can fit exactly the classifier (k-NN with k=1)
- Ensembles can achieve 100% of training accuracy without affecting testing accuracy; see about this apparent paradox on page 82 here: link
See below one of my results, a Leave-One-Out (LOO) for example (using 10x10-fold the result was similar). The standard deviation column can be ignored, as std. dev. has no meaning in LOO:
classifier train accuracy/std dev test acc./std dev
1. random forest w/ 1000 trees : 1.000/0.000 0.479/0.502
2. k-NN k=5 neighbors : 0.613/0.019 0.479/0.501
3. C4.5 w/ 5 trees : 0.732/0.018 0.500/0.503
4. **Random guessing** : 0.372/0.005 0.372/0.486
Histogram of classes:
35
Histogram of predicted classes for random forest in testing set:
43 <- A
32 <- B
18 <- C
1 <- D
0 <- E
