I am playing with the IRIS dataset and want to see underfitting and overfitting in action. I am using a multilayer perceptron (2 layers).
The problem is that I cannot underfit or overfit the data (see the plot below). I understand why I cannot underfit: it might happen if data is easily separable, but why I cannot overfit? The dataset capacity is 600 (# of samples (150) times # of features (4)), so I should be able to overfit using a network with a capacity bigger than that. I am trying to use a multilayer perceptron with a total # of parameters ranging from 15 to ~32000, but neither under-, nor over-fitting happens. What is going on? Maybe overfitting does not happen for the same reason, because the data is easily separable? Thank you!
