Neural network weights explode in linear unit

Question

I am currently implementing a simple neural network and the backprop algorithm in Python with numpy. I have already tested my backprop method using central differences and the resulting gradient is equal.

However, the network fails to approximate a simple sine curve. The network has one hidden layer (100 neurons) with $\tanh$ activation functions and a output layer with a linear activation function. Each unit also has a bias input. The training is done by simple gradient descent with a learning rate of 0.2.

The problem arises from the gradient, which gets larger with every epoch, but I don't know why. Further, the problem is unchanged if I decrease the learning rate.

EDIT: I have uploaded the code to pastebin: http://pastebin.com/R7tviZUJ

There is probably an error within your algorithm, if we don't see it it's very hard to find any problem. — Matteo De Felice, Nov 13 '13 at 09:53
I have uploaded my source. If you uncomment the print statements in the train function you can observe how the gradient explodes. — Masala, Nov 13 '13 at 13:47
Hi Masala - I've been doing a similar thing with a toy problem of function approximation and have seen similar behavior (using both RELU and logit in the hidden layer, linear in the output layer). — T3am5hark, Sep 22 '16 at 15:31
Can you say more about the values of the learning rate you've tried?
I think there's an argument to be made that if you make the learning rate small enough, it will become stable at some point (at least for the inputs that your network is exposed to). Clearly as learning rate goes to zero, the system becomes completely stable (although fairly uninteresting since the weights will never change). — T3am5hark, Sep 22 '16 at 15:41
I think you could also show that if learning rate is large enough you can make any network explode as well. Because it's analogous to a negative-feedback control system (granted, nonlinear elements and multi-layer cascading effects etc.), there's probably some statement that can be made about guaranteed stability (at least over some regime of inputs) versus the learning rate, but I haven't been able to find any specific theorems or proofs on this as of yet. — T3am5hark, Sep 22 '16 at 15:43

Neural network weights explode in linear unit

0 Answers0

Linked