How to quantitatively determine when to stop training ANN

Question

I've implemented an artificial recurrent neural network and want to start training it on a variety of tasks. I've extensive searching online and haven't found a satisfactory answer of how the algorithm can autonomously determine when to terminate training.

So far I'm doing a check if the past n errors are below a certain hardcoded threshold, which is definitely not universal to all tasks. Perhaps there is some probabilistic interpretation of the problem so I can terminate on some universal probability of the derivative of the network decreasing by some order of magnitude?

I could throw together some statistical measure of this sort, but I'm not sure of its pitfalls and perhaps something better has been developed by researchers.

Why can't you just check the absolute value of gradient of your cost function and if it is small enough than optimizer found a local minimum and training could be stopped? — Alexander Rodin, Jun 01 '16 at 15:35

score 1 · Answer 1 · answered Jun 01 '19 at 15:45

In a blog post, Davis King suggests treating the task as a regression problem. With respect to the validation data, we have a series of estimates of out-of-sample error. We would like to continue training as long as the validation error is decreasing. On the other hand, if the validation error has a low probability of decreasing, then we can infer that there is not much benefit to continued training.

His blog post is a more elaborate than this idea, but this is the basic premise.

A sharp corner that I have observed is that the optimizer may not make much progress for several epochs before finally finding a good path forward and making more significant progress once again. I don't know that there's an ideal way to mitigate this issue in the neural network setting, other than some trial-and-error.

score 0 · Answer 2 · answered Jun 01 '16 at 17:03

As far as I'm aware, there is no established convergence criteria that work in all cases. Keep in mind that you're dealing with a non-convex optimization problem where many, actually an unknown number of local minima may exist across a large stretch of the search space.

I presume you're using some sort of gradient descent algorithm, so the optimum you find is only one of many solutions. There is no point of putting in a huge effort just to land precisely on one local optimum that you don't even know how good it is.

My advice is to pick many random starts and run gradient descent to a small gradient neighborhood, then pick the best solution.

score 0 · Answer 3 · answered Jun 03 '16 at 06:32

The same way you would decide on when to stop for any ML algorithm. What you want is good generalizability. Commonsense dictates that you keep on measuring performance on a held-out dataset. You want the training loss to be small and close to the loss on the held out dataset so you stop when the training loss keeps falling but the loss on held out dataset stagnates.

How to quantitatively determine when to stop training ANN

3 Answers3

Linked