0

I'm trying to understand the Dropout algorithm. In this paper, the authors say that the nodes are randomly switched off with probability $p$ for each "Training Example".

Does this literally mean that every row in your dataset has a different dropout layout? Or, can "Training Example" mean a training batch?

Connor
  • 625

1 Answers1

1

As detailed in this question and answer, a training example refers to a "row", not a "batch".

The paper you refer to actually says the nodes are randomly switched off for "each presentation of each training example". So each time a training example (row) is used (assuming you train for more than one epoch), a different set of nodes are dropped out. There's no permanent association between a row and a "dropout layout" - over the entire training process all the data is used to train the whole network.

Lynn
  • 1,707
  • Surely that would create issues with backpropagation? The original paper says dropout applies to a "Training Case", which is explicitly a mini-batch. Have the authors used the wrong terminology, or is "Training Example" often used in place of "Training Case"? – Connor Dec 14 '22 at 09:44
  • 1
    See the answer to the question I linked - it explains why backpropagation works with dropout. – Lynn Dec 14 '22 at 10:48
  • Okay, interesting, conceptually I always considered it like training a mini neural net using a batch of examples. But I guess having a different dropout for every example is taking that concept to its maximum extent. I presume either choice is mathematically fine to apply. Has anyone studied what affect choosing one option over the other has on your final model? – Connor Dec 14 '22 at 11:51