What layers are affected by dropout layer in Tensorflow?

Question

Consider transfer learning in order to use a pretrained model in keras/tensorflow. For each old layer, trained parameter is set to false so that its weights are not updated during training whereas the last layer(s) have been substituted with new layers and these must be trained. Particularly two fully connected hidden layers with 512 and 1024 neurons and and relu activation function have been added. After these layers a Dropout layer is used with rate 0.2. This means that during each epoch of training 20% of the neurons are randomly discarded.

What layers does this dropout layer affect? Does it affect all the network including also the pretrained layers for which layer.trainable=false has been set or does it affect only the newly added layers? Or does it affect only the previous layer (i.e., the one with 1024 neurons)?

In other words, which layer(s) do the neurons that are turned off during each epoch by the dropout belong to?

import os

from tensorflow.keras import layers
from tensorflow.keras import Model
  
from tensorflow.keras.applications.inception_v3 import InceptionV3

local_weights_file = 'weights.h5'

pre_trained_model = InceptionV3(input_shape = (150, 150, 3), 
                                include_top = False, 
                                weights = None)

pre_trained_model.load_weights(local_weights_file)

for layer in pre_trained_model.layers:
  layer.trainable = False
  
# pre_trained_model.summary()

last_layer = pre_trained_model.get_layer('mixed7')
last_output = last_layer.output

# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add two fully connected layers with 512 and 1,024 hidden units and ReLU activation
x = layers.Dense(512, activation='relu')(x)
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)                  
# Add a final sigmoid layer for classification
x = layers.Dense  (1, activation='sigmoid')(x)           

model = Model( pre_trained_model.input, x) 

model.compile(optimizer = RMSprop(lr=0.0001), 
              loss = 'binary_crossentropy', 
              metrics = ['accuracy'])

score 4 · Accepted Answer · answered Sep 04 '20 at 09:56

The dropout layer will affect the output of the previous layer.

If we look at the specific part of your code:

x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)                  
# Add a final sigmoid layer for classification
x = layers.Dense  (1, activation='sigmoid')(x)

In your case, 20% of the output of the layer defined by x = layers.Dense(1024, activation='relu')(x) will be dropped at random, before being passed to the final Dense layer.

score 3 · Answer 2 · answered Sep 04 '20 at 10:17

Only the previous layer's neurons are "turned off", but all layers are "affected" in terms of backprop.

Later layers: Dropout's output is input to the next layer, so next layer's outputs will change, and so will next-next's, etc.
Previous layers: as the "effective output" of the pre-Dropout layer is changed, so will gradients to it, and thus any subsequent gradients. In the extreme case of Dropout(rate=1), zero gradient will flow.

Also, note that whole neurons are only dropped if input to Dense is 2D (batch_size, features); Dropout applies a random uniform mask to all dimensions (equivalent to dropping whole neurons in 2D case). To drop whole neurons, set Dropout(.2, noise_shape=(batch_size, 1, features)) (3D case). To drop same neurons across all samples, use noise_shape=(1, 1, features) (or (1, features) for 2D).

Sorry, the last part of your answer is unclear: what do you mean by `whole neurons` and by `if input is 2D`? do you mean two parameters or what? — Francesco Boi, Sep 04 '20 at 10:22
@FrancescoBoi Every neuron outputs a tensor shaped `(batch_size, ...)`; if input is 3D, then it's `(batch_size, timesteps)`. If there are `features` neurons, the entire layer's output shape is `(batch_size, timesteps, features)`. A "whole neuron" is dropped if all of its outputs are zeroed - example [here](https://stackoverflow.com/a/59294133/10133797) (at bottom): neuron 0, 5, and 11 are dropped. — OverLordGoldDragon, Sep 04 '20 at 10:28

score 2 · Answer 3 · answered Sep 04 '20 at 09:56

Dropout technique is not implemented on every single layer within a neural network; it’s commonly leveraged within the neurons in the last few layers within the network.

The technique works by randomly reducing the number of interconnecting neurons within a neural network. At every training step, each neuron has a chance of being left out, or rather, dropped out of the collated contribution from connected neurons

There’s some debate as to whether the dropout should be placed before or after the activation function. As a rule of thumb, place the dropout after the activate function for all activation functions other than relu.

you can add dropout after every hidden layer and generally it affect only the previous layer in (your case it will effect (x = layers.Dense(1024, activation='relu')(x) )). In the original paper that proposed dropout layers, by Hinton (2012), dropout (with p=0.5) was used on each of the fully connected (dense) layers before the output; it was not used on the convolutional layers. This became the most commonly used configuration.

I am adding the resources link that might help you:

https://towardsdatascience.com/understanding-and-implementing-dropout-in-tensorflow-and-keras-a8a3a02c1bfa

https://towardsdatascience.com/dropout-on-convolutional-layers-is-weird-5c6ab14f19b2

https://towardsdatascience.com/machine-learning-part-20-dropout-keras-layers-explained-8c9f6dc4c9ab