1

I am running a convolutional neural network on image data, and returning the gradients in each step yields gradients of exactly zero. At the same time, the network is not converging, and returns high loss.

What does this mean in terms of what I should do to learning rate, momentum, decay, etc.?

Thanks!

user135237
  • 291
  • 1
  • 4
  • 6
    What does it mean when the gradient of any function is 0? – Sycorax Dec 04 '16 at 23:24
  • I'm using RELU which has f'(x)=0 for x<0 and f'(x)=1 for x>0. – user135237 Dec 04 '16 at 23:44
  • 3
    I'm familiar with the RELU function. It's possible that all of the neurons have died. However, my question was more general. – Sycorax Dec 05 '16 at 00:06
  • I'm not sure of the more general implications -- I just know that it means f'(x)=0, and that parameter would no longer update.... Is there something else you're alluding to? – user135237 Dec 05 '16 at 02:17
  • 1
    Have a look at https://stats.stackexchange.com/questions/352036/what-should-i-do-when-my-neural-network-doesnt-learn/352037#352037 – kjetil b halvorsen Oct 19 '21 at 13:01

3 Answers3

3

I am assuming there are no bugs in your code.

Sounds like it might be the Vanishing Gradient Problem.

Try decreasing the depth of your network. Regularization might also help

0

Gradients all equal to zero does not necessarily imply any problem with the network. Both minima and maxima occur where the gradient is zero. So it’s possible that your network has arrived at a local minimum or maximum. Determining which is the case requires additional information.

A corner case that is somewhat unlikely is that some combination of RELU units has “died,” so that they give 0s for every input in your data set. But this is somewhat unlikely.

On the other hand, if your network has any bugs or mistakes in the code, then it’s impossible to interpret the model’s results. Always check for bugs.

Sycorax
  • 90,934
-2

If you're using "tensorflow" and giving your model a batch make sure you put model(batch, training=True). If training isn't set to true then the gradient will be all 0s.