I am creating a CNN-LSTM model to forecast sequential simulation data. At the moment I am not sure what the best place is to use Dropout in a CNN-LSTM architecture. Is it between the CNN and LSTM layer or after the LSTM layer?
At this site they mention that dropout is less effective at CNN layers:
dropout is generally less effective at regularizing convolutional layers.
The reason? Since convolutional layers have few parameters, they need less regularization to begin with. Furthermore, because of the spatial relationships encoded in feature maps, activations can become highly correlated.
And in this post Where should I place dropout layers in a neural network? :
We must not use dropout layer after convolutional layer as we slide the filter over the width and height of the input image we produce a 2-dimensional activation map that gives the responses of that filter at every spatial position.
In this paper they conclude:
It seems like dropout is best close to the inputs and outputs of the network.
So, I am not sure what the best place is to add dropout in a CNN-LSTM architecture but I assume that it is after the LSTM layer, right?