Validation loss and accuracy remain constant

Question

I am trying to implement this paper on a set of medical images. I am doing it in Keras. The network essentially consists of 4 conv and max-pool layers followed by a fully connected layer and soft max classifier.

As far as I know, I have followed the architecture mentioned in the paper. However, the validation loss and accuracy just remain flat throughout. The accuracy seems to be fixed at ~57.5%.

Any help on where I might be going wrong would be greatly appreciated.

My code:

from keras.models import Sequential
from keras.layers import Activation, Dropout, Dense, Flatten  
from keras.layers import Convolution2D, MaxPooling2D
from keras.optimizers import SGD
from keras.utils import np_utils
from PIL import Image
import numpy as np
from sklearn.utils import shuffle
from sklearn.cross_validation import train_test_split
import theano
import os
import glob as glob
import cv2
from matplotlib import pyplot as plt

nb_classes = 2
img_rows, img_cols = 100,100
img_channels = 3


#################### DATA DIRECTORY SETTING######################

data = '/home/raghuram/Desktop/data'
os.chdir(data)
file_list = os.listdir(data)
##################################################################

## Test lines
#I = cv2.imread(file_list[1000])
#print np.shape(I)
####
non_responder_file_list = glob.glob('0_*FLAIR_*.png')
responder_file_list = glob.glob('1_*FLAIR_*.png')
print len(non_responder_file_list),len(responder_file_list)

labels = np.ones((len(file_list)),dtype = int)
labels[0:len(non_responder_file_list)] = 0
immatrix = np.array([np.array(cv2.imread(data+'/'+image)).flatten() for image in file_list])
#img = immatrix[1000].reshape(100,100,3)
#plt.imshow(img,cmap = 'gray')


data,Label = shuffle(immatrix,labels, random_state=2)
train_data = [data,Label]
X,y = (train_data[0],train_data[1])
# Also need to look at how to preserve spatial extent in the conv network
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=4)
X_train = X_train.reshape(X_train.shape[0], 3, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], 3, img_rows, img_cols)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

X_train /= 255
X_test /= 255

Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()

## First conv layer and its activation followed by the max-pool layer#
model.add(Convolution2D(16,5,5, border_mode = 'valid', subsample = (1,1), init = 'glorot_normal',input_shape = (3,100,100))) # Glorot normal is similar to Xavier initialization
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size = (2,2),strides = None))
# Output is 48x48

print 'First layer setup'
###########################Second conv layer#################################
model.add(Convolution2D(32,3,3,border_mode = 'same', subsample = (1,1),init = 'glorot_normal'))
model.add(Activation('relu'))
model.add(Dropout(0.6))
model.add(MaxPooling2D(pool_size = (2,2),strides = None))
#############################################################################

print ' Second layer setup'
# Output is 2x24

##########################Third conv layer###################################
model.add(Convolution2D(64,3,3, border_mode = 'same', subsample = (1,1), init = 'glorot_normal'))
model.add(Activation('relu'))
model.add(Dropout(0.6))
model.add(MaxPooling2D(pool_size = (2,2),strides = None))
#############################################################################
# Output is 12x12

print ' Third layer setup'
###############################Fourth conv layer#############################
model.add(Convolution2D(128,3,3, border_mode = 'same', subsample = (1,1), init = 'glorot_normal'))
model.add(Activation('relu'))
model.add(Dropout(0.6))
model.add(MaxPooling2D(pool_size = (2,2),strides = None))
############################################################################# 

print 'Fourth layer setup'

# Output is 6x6x128
# Create the FC layer of size 128x6x6#
model.add(Flatten()) 
model.add(Dense(2,init = 'glorot_normal',input_dim = 128*6*6))
model.add(Dropout(0.6))
model.add(Activation('softmax'))

print 'Setting up fully connected layer'
print 'Now compiling the network'
sgd = SGD(lr=0.01, decay=1e-4, momentum=0.6, nesterov=True)
model.compile(loss = 'mse',optimizer = 'sgd', metrics=['accuracy'])

# Fit the network to the data#
print 'Network setup successfully. Now fitting the network to the data'
model. fit(X_train,Y_train,batch_size = 100, nb_epoch = 20, validation_split = None,verbose = 1)
print 'Testing'
loss,accuracy = model.evaluate(X_test,Y_test,batch_size = 32,verbose = 1)
print "Test fraction correct (Accuracy) = {:.2f}".format(accuracy)

You haven't set any validation data or validation_split in your fit call, what would it validate on? Or did you mean test? — Jan van der Vegt, Aug 23 '16 at 13:24
That is after experimenting around. I set validation_split = 0.2 before setting it to None and experimented around with that also. — pseudomonas, Aug 23 '16 at 15:30
And does it change somewhat around the same area or is it literally constant? — Jan van der Vegt, Aug 23 '16 at 15:33
Not literally constant, but varies so little that for all purposes it can be thought of as a constant. 56.4% to 57.10%. — pseudomonas, Aug 23 '16 at 16:16
Can you fit one batch for a lot of times to see if you can get the training loss to be lower? — Jan van der Vegt, Aug 23 '16 at 16:18
Thanks for the tip. However, I have trouble understanding your advice. Do you mean a single batch of data multiple times? — pseudomonas, Aug 23 '16 at 16:27
Yeah, intentionally overfitting on that batch to see if your training loss will go down, good tool in diagnosis of what is happening — Jan van der Vegt, Aug 23 '16 at 18:42
If I understood it correctly, fit the model to a single batch would be done using a loop (for/while)? — pseudomonas, Aug 24 '16 at 03:49
@JanvanderVegt I was able to do what you asked me to do. I fitted one batch of data for 15 epochs. There is an increase in the accuracy which turns out to be 85.80%. However, the funny thing is it still remains constant throughout the epochs. — pseudomonas, Aug 28 '16 at 05:18
I think I have been in the same situation lately while doing in my MLP classification on thing that solved my stable and loss decrease is that trying the same value with different sets of solvers and try to keep no .of layer to minimal as layers increment causes loss increase I don't know why technically but trying to do some work on that Hope this helps — Sampath Madala, Dec 24 '17 at 09:58
How much data do you have? It may be that the network is underfitting. — StatsSorceress, Jan 27 '18 at 23:01
Have you tried to increase (even a lot) the learning rate, and see if anything changes? — Vincenzo Lavorini, Feb 18 '18 at 14:56

score 5 · Answer 1 · answered Feb 21 '18 at 21:41

It seems that you use MSE as the loss function, from a glimpse on the paper it seems they use NLL (cross entropy), MSE is considered prone to be sensitive to data imbalance among other issues and it may be the cause of the problem you experience, I would try training using categorical_crossentropy loss in your case, moreover learning rate of 0.01 seems too large I would try to play with it and try 0.001 or even 0.0001

score 3 · Answer 2 · answered Jul 19 '18 at 04:39

Though I am a bit late here, I would like to put my two cents in as it helped me solve a similar issue recently. What came to my rescue was scaling the features into (0,1) range besides the categorical cross-entropy loss. Nevertheless, it is worth saying that feature scaling helps only if the features belong to different metrics and possess way more variation (in orders of magnitudes) relative to one another, as was in my case. Also, scaling could be really useful if one uses the hinge loss, as max-margin classifiers are generally sensitive to the distances among feature values. Hope this helps some future visitors!

How about LSTM please? I used mse loss and I got same issue as above (constant val_loss, loss). — Avv, Dec 20 '21 at 15:16

Validation loss and accuracy remain constant

2 Answers2

Linked