What can cause training and validation graphs to look like this?

Question

I know the basics of machine learning from Andrew Ng's course, but I'm quite new to the practice with tensorflow. I'm creating a model with 12288 nodes in the input layer, 100 in the hidden, and 2 in the output. It receives low resolution images and classifies if they're a cat or not, but when I plot the graphs, they look like this:

I tried to find causes for this in sites that offer some plot analysis like this one, but I wasn't successful. I also know that I can mess with learning rate, regularization, number of nodes in hidden layer, etc; but I'd like to undestand the causes of this problem. So what could cause a accuracy graph that oscillates a LOT but a consistent decrease loss graph?

More info:

The 12288 input nodes come from a low resolution rgb image (64 * 64 * 3 = 12288), one for each pixel of each color. Their values originally are from 0 to 255, but I normalized them from 0 to 1, with a division by 255.
my 3 layers are as follows:

model.add(layers.Flatten())

model.add(layers.Dense(100, kernel_regularizer=tf.keras.regularizers.l2(0.1), kernel_initializer="random_uniform", bias_initializer="random_uniform", activation="relu"))

model.add(layers.Dense(2, kernel_initializer="random_uniform", bias_initializer="random_uniform", activation="softmax"))
my model.compile is as follows: model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=0.005), loss="categorical_crossentropy", metrics=["accuracy"])
my model.fit is as follows:
results = modelo.fit(Xtrainset, ytrainset, validation_data = (Xtestset, ytestset), batch_size = 64, epochs=300, verbose=1)

It is really impossible to know just looking at the figure.
Both curves for the training and validation sets are very unstable, meaning that small changes in your network's weights are causing large changes in the predictions.

You mention that your network consists of 12,000 inputs. Does this mean you are inputing the images as simply a vector of pixel values? Also, what is your loss function and what optimization method are you using? — KirkD_CO, Jul 22 '22 at 18:19
Hey! I added a "More info" at the bottom of my question for you, with these informations — N. Joppi, Jul 22 '22 at 19:14
How many images do you have? How large is the validation set? You should be aware that accuracy is a very poor way to assess classifiers (cf, Why is accuracy not the best measure for assessing classification models?). Among other things, it can fluctuate wildly due to its discrete nature. That would be my first guess here. — gung - Reinstate Monica, Jul 22 '22 at 20:15
very few images, 209 for training, 50 for testing/validation — N. Joppi, Jul 22 '22 at 20:16

score 2 · Answer 1 · answered Jul 22 '22 at 23:13

2

I agree with gung's comment regarding accuracy.

The instability you're seeing it's likely due to the small size of the training and test sets and the nature of your data. With such a small training set but such a large set of inputs, you're model is likely learning noise as much or more than real features that determine cats. A change in the classification of even just 1 image in the test set is 2% change in accuracy.

The increasing instability in accuracy on the training set is also likely to be due to learning noise and shifting classifications on the training set from epoch to epoch.

answered Jul 22 '22 at 23:13

KirkD_CO

1,138

in my case i cant get more images, so i'll have to work with the 209. could it be better in this case to lower the resolution of the images so there are less inputs? or maybe try to use the same images multiple times but each time they're shifted 1 pixel up/down left/right so the pixels activate different neurons? – N. Joppi Jul 24 '22 at 02:03
Data augmentation may help - here's a good tutorial - https://machinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-learning-neural-networks/
Also, I see your have 100 neurons in your hidden layer. I would reduce the size of the hidden layer to see if you can get similar performance with a less complex model. I would start with only 10 and then move up in steps of 10 from there to see if it works.

Another minor point is that you have 2 neurons in the output layer. Given this is a binary class, you could reduce that to once neuron with a sigmoid output.
– KirkD_CO Aug 08 '22 at 12:57

What can cause training and validation graphs to look like this?

1 Answers1