A huge gap between training and validation accuracy, confusion with the concept of Overfitting

Question

I have a fairly small dataset with 100 examples per class and 12 classes in total. Out of all the CNN models I have tried, the only inference I could make is that my training accuracy plateaus at 97%, but my validation accuracy is 7-8% which can be random.

So where is the problem?

Is my dataset too small?
Is my code wrong? (I am not asking for any code advices, just a conceptual question)
Something else

If you have 12 classes and can decide which one is the majority class, then simply classifying everything as that majority class will give you an accuracy of at least 100/12 = 8.3%. Why is accuracy not the best measure for assessing classification models? — Stephan Kolassa, Aug 03 '18 at 16:37
Before going gung-ho with "all the CNN models" have you tried a usual multinomial logistic regression to get some idea of baseline performance? — usεr11852, Aug 03 '18 at 23:56
@Acccumulation No I haven't used k-fold validation, but I will definitely add it. — rishabhBudhouliya, Aug 04 '18 at 12:59
@usεr11852 Actually I have tried SVM with multiple kernels and could reach a 83% accuracy but my assigned work is to achieve >95% accuracy, and to do so,I plan to increase the dataset by various methods. — rishabhBudhouliya, Aug 04 '18 at 13:03

rinspy · Answer 1 · 2018-08-03T15:45:53.603

13

Sounds like you are severely overfitting. Basically, you need to use a simpler model than the one you are currently using or collect (a lot) more data. Generally, the more data you have, the more complex a model you can fit without overfitting.

I do not think you are going to get meaningful results using a CNN on such a small dataset. Start with a simple decision tree with 1 to 3 levels to establish a benchmark. Maybe try linear models with high regularization. You are looking for poor performance (but better than random) on the training set and similar performance on the validation set. Then you can start trying more complex models that fit the training set better and maybe generalize to the validation set a bit better, too.

edited Aug 03 '18 at 15:45

answered Aug 03 '18 at 14:57

rinspy

3,360

5

I disagree that a CNN could not be used in this case. I do think that training a CNN will not work without overfitting. This could be a good candidate for transfer learning. Using a pretrained CNN on Imagenet to extract image features then classify the features with a linear SVM can work well for small datasets. – J_Heads Aug 03 '18 at 17:41
5

Never underestimate the ability of a ninth order polynomial to fit your data. – Joshua Aug 03 '18 at 23:29

A huge gap between training and validation accuracy, confusion with the concept of Overfitting

1 Answers1