Say I have some deep learning model architecture, as well as a chosen mini-batch size. How do I derive from these the expected memory requirements for training that model?
As an example, consider a (non-recurrent) model with input of dimension 1000, 4 fully-connected hidden layers of dimension 100, and an additional output layer of dimension 10. The mini-batch size is 256 examples. How does one determine the approximate memory (RAM) footprint of the training process on the CPU and on the GPU? If it makes any difference, lets assume the model is trained on a GPU with TensorFlow (thus using cuDNN).