0

I am following the CS231n NN case study — a derivation of gradient descent for a simple network with a single hidden layer.

I have followed the rest of the tutorial and have confidence that the derivations are correct.

However, when I run their code (after “The full code looks very similar:”), I encounter increases to the loss function.

How is this possible? Do I have an error in my code? Or is it possible for the loss function to increase? I have seen a reference to loss increases depending on step size…

Henry
  • 709
  • The duplicate explains that if the step size is too large, the loss can increase. While it's possible that you have a bug in your code, but debugging is not on-topic here. In any event, it's impossible for someone to debug code that they can't see. – Sycorax Dec 19 '23 at 17:09

0 Answers0