2

Here is my model :

model = Sequential()
model.add(LSTM(4, input_shape=(look_back, trainX.shape[2])))
model.add(Dense(1))
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])
model.fit(trainX, trainY,validation_split=0.3, epochs=100, batch_size=50, verbose=1)

and here is what I get... I see that the accuracy is suddenly falling for some epochs and remains low until the end, while the validation accuracy is quite great.

What's wrong ? Is it under/over fitting (I guess no overfitting, otherwise validation score should be low) ? How can I prevent from such a thing a priori (i.e. by initializing a parameter or sth like this). Is reducing the number of epochs would be a solution ?

What is quite strange is that the accuracy first grows and reaches a very high value, and then suddenly falls, with no appearing reason...

Thanks for help

2 Answers2

5

I think that the reason of the instability of your network is the missing activation function as last layer.

As you are preformin a binary classification you should add a sigmoidat activation function right after the fully-connected layer. You can doit like this:

model.add(Dense(1, activation="sigmoid"))
Tonca
  • 616
  • I test it and give you feedback :) . Do you think I should change my number of epochs ? Indeed, it's only from the 68th epoch that it falls dramatically – MysteryGuy Aug 17 '18 at 09:49
  • I put model.add(LSTM(50, return_sequences=True,input_shape=(look_back, trainX.shape[2]))) model.add(LSTM(50)) model.add(Dense(1), activation='sigmoid') and I've got add() got an unexpected keyword argument 'activation' error for the last layer... Why ? Should I set return_sequences=True for second LSTM layer ? – MysteryGuy Aug 17 '18 at 10:06
  • 1
    I've corrected the answer, I made a typo earlier – Tonca Aug 17 '18 at 10:07
  • So return_sequences no needed (Why ?) ? – MysteryGuy Aug 17 '18 at 10:09
  • 1
    No, you don't need any information about the hidden states. – Tonca Aug 17 '18 at 10:14
  • But between the two LSTM layers, it seems it's necessary ? – MysteryGuy Aug 17 '18 at 10:17
  • 1
    You're right I'm sorry, it's necessary for the Dense layer as well. – Tonca Aug 17 '18 at 10:19
  • That's okay, I don't have the problem anymore BUT get another problem. Please look at https://stats.stackexchange.com/questions/362676/how-to-interpret-that-my-model-gives-no-negative-class-prediction-on-test-set – MysteryGuy Aug 17 '18 at 13:32
0

I had a similar situation.

What worked for me is to decrease the learning rate when epochs increase.

What probably happens, that model arrives close to some local maximum and then jumps of it, since the learning rate is to high.

For example you can make a few epochs with high learning rate optimizer = optimizers.SGD(0.02), than switch to SGD(0.02) and after that to SGD(0.01).

The actual numbers might be different.

Michael D
  • 725