In a Neural Network training, the cost of the model changes throughout the training process when using gradient descent (or something analogous), this is the point of the algorithm. However the cost might not decrease monotonically. In some points the cost might even increase.
So, Is it OK to keep track of the parameters that output the lowest cost and use those as the best parameters for the model? In the image it would imply using the parameters that output the lowest cost instead of the last parameters found.
Assume that the cost returned by the lowest cost outputs an acceptable model accuracy.
Does doing this cause some kind of problem?
