1

As a project I've built a graphical neural network sandbox software, which has the feature of drawing a loss graph to show how well the network is learning during training. I'm receiving these really odd results shown below though, and I can't figure it out. Loss function graph

The loss decreases over time telling me it is learning, however eventually it tends to reverse and increases in loss as I feed it more samples. These are all samples used in training by the way, so it's not just overfitted. The samples are also well shuffled.

Is there a reason a pattern like this can occur? Or could it be caused by something wrong with my software, such as my implementation of backpropagation?

  • What statistic is being measured by "Error"? Is this the proportion of samples with incorrect predictions, or something else (e.g. binary cross-entropy loss, or square error)? – Sycorax Jun 06 '22 at 15:30
  • @Sycorax In this case I'm using MSE, however a similar pattern occurs with MAE as well. – Lemniscate Jun 06 '22 at 15:36
  • Does the pattern persist if you reduce the learning rate? I recommend trying a variety of learning rates on a logarithmic scale (e.g. 0.1, 0.01, 0.001, etc.) – Sycorax Jun 06 '22 at 15:44
  • @Sycorax Well this is interesting, it does persist but the learning rate does seem to impact when this reversal happens

    A learning rate of 0.1 causes it to happen later (around the 80 sample mark), and oddly a learning rate of 0.01 causes it to happen sooner (around the 50 sample mark)

    – Lemniscate Jun 06 '22 at 16:01
  • Diagnosing exactly what's happening and why will require more detail about your data and your model. – Sycorax Jun 06 '22 at 16:40
  • My model is fully connected and has 2 inputs, a single hidden layer with 3 neurons, and a final output neuron. The activation of all neurons is sigmoid.

    I'm feeding it a 3 dimentional regression problem, I am giving it X and Y as inputs for it to learn to predict Z.

    – Lemniscate Jun 06 '22 at 16:56
  • I believe the duplicate answers your question. If it does not, please [edit] to clarify what you still want to know. Also, please include the model details in the question and also include information on if the data are scaled & how, how the network is initialized, and how the model is trained (what optimizer, what learning rate, if you're using mini-batches, the mini-batch size) and the number of observations you have, and any regularization you're applying (and what it is, and its configuration details). – Sycorax Jun 06 '22 at 16:58
  • It is certainly sounding more like a coding issue, I'll have a look through my program to see what might be going wrong – Lemniscate Jun 06 '22 at 18:08

0 Answers0