I am working on Street view house numbers dataset using CNN in Keras on tensorflow backend. I have queries regarding why loss of network is not decreasing, I have doubt whether I am using correct loss function or not. First I preprocess dataset so my train and test dataset shapes are:
('Training set and labels', (26721,32,32,1), (26721, 6) )
('Validation set and labels ', (6680,32,32,1), (6680,6) )
('Test set and labels', (13068,32,32,1), (13068,6) )
Reason why labels array has 6 columns, because maximum digits in one image is 6. For suppose, if image has 2 digits "1 and 2", then labels array is considered as [1,2,10,10,10,10] , where 10 represents no digits.
Now this is my model definition:
def modelCNN(input_shape, num_classes):
model = Sequential()
model.add(Convolution2D(16, 7, 4, border_mode='same',
input_shape=input_shape))
model.add(PReLU())
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2), border_mode='same'))
model.add(BatchNormalization())
model.add(Convolution2D(32, 5, 3, border_mode='same'))
model.add(PReLU())
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2), border_mode='same'))
model.add(BatchNormalization())
model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(PReLU())
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2), border_mode='same'))
model.add(Flatten())
model.add(Dense(512))
model.add(PReLU())
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
return model
I tried with different customs model, but nothing seems working. I do compilation, loss and optimization like this:
model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])
print(model.summary())
csv_logger = CSVLogger('training.log')
early_stop = EarlyStopping('val_acc', patience=200, verbose=1)
model_checkpoint = ModelCheckpoint(model_save_path,
'val_acc', verbose=0,
save_best_only=True)
model_callbacks = [early_stop, model_checkpoint, csv_logger]
K.get_session().run(tf.global_variables_initializer())
model.fit_generator(train,
samples_per_epoch=np.ceil(len(train_dataset)/batch_size),
epochs=num_epochs,
verbose=1,
validation_data=valid,
validation_steps=batch_size,
callbacks=model_callbacks)
Now, What I think that categorical_crossentropy loss needs one hot vector representation, but I am not sure how should i convert my labels array. Because unlike Mnist dataset, I have multiple labels in each image.
Is there any loss function that I can use? or Any suggestion to make things work?
Note: I am using CNN just for classification only, not for detection.
Thanks very much in advance