I am currently implementing a simple neural network and the backprop algorithm in Python with numpy. I have already tested my backprop method using central differences and the resulting gradient is equal.
However, the network fails to approximate a simple sine curve. The network has one hidden layer (100 neurons) with $\tanh$ activation functions and a output layer with a linear activation function. Each unit also has a bias input. The training is done by simple gradient descent with a learning rate of 0.2.
The problem arises from the gradient, which gets larger with every epoch, but I don't know why. Further, the problem is unchanged if I decrease the learning rate.
EDIT: I have uploaded the code to pastebin: http://pastebin.com/R7tviZUJ
I think there's an argument to be made that if you make the learning rate small enough, it will become stable at some point (at least for the inputs that your network is exposed to). Clearly as learning rate goes to zero, the system becomes completely stable (although fairly uninteresting since the weights will never change).
– T3am5hark Sep 22 '16 at 15:41