0

The dataset we are using consists of ~3000 images split at 60/40 partition for training/testing. We have used sklearn's GridSearchCV and RandomSearchCV, Bayesian Optimization, and a Hyperband implementation for hyperparameter tuning. After all these methods, we have been getting around 96% accuracy on training and around 78% accuracy for testing. Without changing the dataset size, the partition split, or augmenting the data in any way we want to increase accuracy as much as possible. Overfitting is most likely occurring and we are using sklearn's StratifiedKFold for cross validation with n_splits=10. We are using a SVC for classification and there are two classes we are dealing with (pictures of wind turbines and pictures of no wind turbine).

Would there be a better cross validation method to use, all while hopefully conserving the class ratios for each fold? Or any other suggestions for preventing overfitting?

  • Approximately how many of the images are of wind turbines vs. not? – Sterling Oct 28 '22 at 21:41
  • You may consider using nested CV with stratified CV. https://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html, https://stats.stackexchange.com/questions/357926/intuitive-explanation-of-stratified-cross-validation-and-nested-cross-validation – Sterling Oct 28 '22 at 21:44
  • 1
    Training set consists of 1820 WT/ 572 NWT. Testing set is split in half, 780 WT/780 NWT – Colton Seegmiller Oct 28 '22 at 21:57
  • Ok, thanks. Are you attached to the idea of using SVC? Not related to the stratification, but you might consider using a FastAI classification model (see for example, https://docs.fast.ai/tutorial.medical_imaging.html). Or you could use skorch https://medium.datadriveninvestor.com/train-a-cnn-using-skorch-for-mnist-digit-recognition-53d7d2f971c7. My guess is that you'll only get so much mileage with SVC, even with a ton of hyperparameter tuning. – Sterling Oct 28 '22 at 22:23
  • 1
    Ya I'm required to stick with an SVC. I think I'm getting as good as I can get as well, especially since the images I'm given are at 5% resolution. Thank you! – Colton Seegmiller Oct 30 '22 at 00:39

0 Answers0