I know that Dropout will remove neurons randomly to reduce over-fitting.
If Dropout is going to remove neurons, why are those neurons built? We could remove those neurons from the architecture.
Why don't we do that?
I know that Dropout will remove neurons randomly to reduce over-fitting.
If Dropout is going to remove neurons, why are those neurons built? We could remove those neurons from the architecture.
Why don't we do that?
To add to @frank's answer, the reason using dropout is not the same as training a smaller network is that the neurons that are dropped out are randomly selected each time the weights are updated. So while on each iteration only some of the neurons are used and updated, over the entire training cycle all the neurons are trained. According to Jason Brownlee's A Gentle Introduction to Dropout for Regularizing Deep Neural Networks, dropout can be thought of as training an ensemble of models in parallel.
The neurons are only dropped temporarily during training. They are not dropped from the network altogether. It is just that it turns out that we get better weights if we randomly set them to zero, temporarily, so the other neurons "think" they cannot "rely" on the other neurons and have to "perform well themselves". The neural network that you get out at the end contains all the neurons.
The goal of dropout isn't to disable neurons permanently. It is to prevent the network from splitting into separate sections functioning in parallel and not utilizing interconnections between them. An extreme, undesired case would be a network relying on only a single neuron processing certain feature.
Dropout disables neurons randomly during training which forces the consecutive layer to optimize weights coming from other inputs, and thus making a better use of all the possible connections available in the architecture.
i and start with searching for a dot, and then run another function checking if it might be a j and search for a dot again. You'd rather prefer to search for the dot once, and reuse the search outcome for both i and j. That would either save you CPU or allow to write a more advanced dot finder. This is what dropout helps to achieve.
– user1079505
Oct 03 '22 at 17:27