How can training and testing error comparisons be indicative of overfitting?

Question

In my research group we are discussing if it is possible to say a model has overfitting just by comparing the two errors, without knowing anything more about the experiment.

ps: I am personally interested in non redundant small datasets (i.e., without duplicates or very similar instances), say 100 instances, and in classifiers with few or no parameters to adjust, like decision trees (that's why I don't have any validation error to mention at all)

I am thinking about some arguments against doing the comparison stated in the title of this question,

It seems that comparisons to the random error on the testing set (i.e., always bet in the majority class) would be more informative
Depending on the complexity and level of noise of the data, the overfitting tendency can be increased or attenuated
Depending on the classifier, data can match perfectly its representation bias (a linear separable problem vs a linear regression) or, in opposition, each instance can fit exactly the classifier (k-NN with k=1)
Ensembles can achieve 100% of training accuracy without affecting testing accuracy; see about this apparent paradox on page 82 here: link

See below one of my results, a Leave-One-Out (LOO) for example (using 10x10-fold the result was similar). The standard deviation column can be ignored, as std. dev. has no meaning in LOO:


    classifier            train accuracy/std dev   test acc./std dev
 1. random forest w/ 1000 trees : 1.000/0.000          0.479/0.502
 2. k-NN k=5 neighbors          : 0.613/0.019          0.479/0.501
 3. C4.5 w/ 5 trees             : 0.732/0.018          0.500/0.503
 4. **Random guessing**             : 0.372/0.005          0.372/0.486


Histogram of classes:
35 
Histogram of predicted classes for random forest in testing set:
43 <- A
32 <- B
18 <- C
1  <- D
0  <- E

I am working on some similar research. Did this work ever result in a publication? If so, where can I access it? — Him, Feb 12 '18 at 19:08

score 5 · Accepted Answer · answered Jun 13 '15 at 07:18

5

Overfitting does not refer to the gap between training and test error being large or even increasing. It might be true that both training and testing error are decreasing, but training error is decreasing at a faster rate.

Overfitting specifically relates to the training error decreasing at the expense of model generalization (approximated through cross validation) as model hyperparameters are tuned (such as max tree depth, max nodes, min samples per split, and min samples per node for simple decision trees).

From Wikipedia: Wikipedia example

Tree-based methods often have the training error decrease at a faster rate than the test error as specific hyperparameters are changed. If you are not testing different hyperparameters for a specific model, then you cannot identify overfitting. Perhaps the specific combination of hyperparameters chosen is the best and any other combination causes cross validated testing error to increase.

answered Jun 13 '15 at 07:18

Jason Sanchez

710

Ok, it is clearer now. Suppose a comparison between classifiers, e.g. SVM, kNN and C4.5. Do you know any reason to publish their training errors in a paper? – dawid Jun 15 '15 at 20:52
1

Not as an accuracy metric. A kNN with k=1 leads to zero training error. Most fully formed decision trees also have zero training error. That said, a researcher might show a chart with the training error and testing error as specific hyper parameters are adjusted. This is to visually show that they have optimized specific hyper parameters (whether or not they actually accomplish their intention is for another discussion). – Jason Sanchez Jun 16 '15 at 01:47
2

I feel like your point could have been made real clearer if the axis of your graph had been labeled. – Soltius Jul 13 '18 at 14:58

How can training and testing error comparisons be indicative of overfitting?

1 Answers1

Linked