1

I am trying to train a NN model in MATLAB to predict the amount of overflow for flooded junctions in an urban runoff system and I have 45 samples and 15 features. The issue is, I don't think 45 samples are quite enough for predictive modeling. Is there anything I can do to have a have good model without the need to augment my dataset (maybe training another model instead of NN is a better choice?), or should I just jump right into data augmentation techniques? Is data augmentation even a wise choice?

Ari
  • 11

1 Answers1

0

As I wrote yesterday, there are issues with data augmentation. If you have a small sample size, you put yourself at similar risk of overfitting as you would be fitting a complex model, and if you have a large sample size where that is not such a concern, then I question the need to synthesize artificial data.

With just $45$ observations, you lack the sample size to do sophisticated modeling like neural networks (unless you just want to learn the mechanics of writing neural network code). Unless this overflow follows a simple pattern, consistently strong performance is unlikely.

My suggestion is to work with a simple model like a linear regression on a few features, perhaps three to five features, following a rule-of-thumb for using one feature per $10$-$15$ observations. This is unlikely to achieve the kind of performance that you would get from sophisticated modeling on a large data set, but this is probably all your data will allow.

Dave
  • 62,186