When it comes to tuning neural networks, there are lots of design options:
- Optimiser
- Learning rate
- Activation function
- Number of hidden layers
- Number of nodes/neuron in each hidden layer
- Regularisation, e.g. early stopping
- Batch size
I know the general principle is to start simple and incrementally add complexity. However, I have no idea what order to go in.
Say I start with a small fully connected network with a single hidden layer, SGD optimiser, batch size of 32 and ReLU activations. What should I change/explore first? And what after that? All the design choices seem to affect each other, so I'm struggling to figure out a systematic way.
Could someone provide a step-by-step guide, and also a justification for the order proposed?
EDIT, In response to Sycorax's comment:
Essentially I am training a simple fully connected network for a regression problem, 6 inputs and 6 outputs. And I wish to find the configuration -- i.e. set of design choices/hyperparameters -- that yield the lowest validation loss for this regression problem.
EDIT 2: my question isn't to do with generalisation. I'm not having trouble getting my model to generalise. I just want to know whether there's a systematic way of tuning neural network parameters such that the best performance is achieved.
https://stats.stackexchange.com/questions/352036/what-should-i-do-when-my-neural-network-doesnt-learn/352037#352037
– Adrian Keister Feb 11 '24 at 20:54