I've recently learned gradient descent and it clearly gets stuck at local minimums when applied to non-convex functions.
Can't we just randomly kick the values in between steps when iterating?
Kind of like quantum tunneling. That would drastically increase the probability to reach global minimum.
