I'm trying to understand the Dropout algorithm. In this paper, the authors say that the nodes are randomly switched off with probability $p$ for each "Training Example".
Does this literally mean that every row in your dataset has a different dropout layout? Or, can "Training Example" mean a training batch?