Are there any tutorials or guides on conventional wisdom for designing neural networks? For example, how do you pick:
- the number of layers or number of units per layer?
- an activation function?
- a step size?
- a regularization parameter?
- the minibatch size?
I think the 3rd one should be like $1/L$ where $L$ is the Lipschitz constant of some convex function over the minibatch size, but I'm not entirely sure how that goes. (I'm just using beta/size of minibatch where beta is some fixed parameter less than 1, but from my few empirical experiments it looks like it also depends on number of layers / units in a layer?)
I know we can use cross validation for the 4th one, but conventional wisdom would also be helpful there.
Thanks!