9

I experiment with CIFA10 datasets. With my model I found that the larger the batch size, the better the model can learn the dataset. From what I see on the internet the typical size is 32 to 128, and my optimal size is 512-1024. Is it ok? Or are there any things which I should take a look at to improve the model. Which indicators should I use to debug it?

P.S. It seems that the gradient is too noisy and if we have a larger sample size, it reduces the noise.

2 Answers2

8

Read the following paper. It's a great read. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, Nitish Shirish Keska et al, ICLR 2017.

There are many great discussions and empirical results on benchmark datasets comparing the effect of different batchsizes. As they conclude, large batchsize causes over-fitting and they explain it as it converges to a sharp minima.

The code is also available here.

PickleRick
  • 758
  • 6
  • 20
  • Link only answers are not welcome on CV. Furthermore you do not have to write in capital letters. – Ferdi Oct 24 '17 at 12:48
2

The too-large batch size can introduce numerical instability and the Layer-wise Adaptive Learning Rates would help stabilize the training.

Lerner Zhang
  • 6,636
  • 1
  • 41
  • 75