Adaptive simulated annealing

Question

I have been doing some reading about adaptive simulated annealing and as far as I know it is an algorithm that is really useful when it comes to finding the global maxima/minima of some functions, which is useful for calibration purposes.

However, the explanation on Wikipedia for both ASA and SA are from the "physics" aspect, which involves heavily on the energy state, etc.

I am just wondering if anyone could somehow explain to me, in the simplest math or non-technical ways, how this algorithm actually works and why it is better than the other algorithms when it comes to finding global extremes.

I can quote Paul Wilmott book, part 3, ch. 60. It says, that "simulated annealing gets its name from the form of crystallization that occurs as molten metals are cooled. In the optimization algorithm there's a degree of randomness in the search for the maximum, and as the number of iterations increases, so the randomness gets smaller and smaller. The analogy is with random motion of atoms in liquid state of metal; as the metal cools so the motion decreases. The hope is that the random motion will find neighborhood of the global maximum, and as temperature decreases, search converges." — GusRustam, Jun 22 '13 at 18:37
@chrisaycock You lost the answer that I was looking for during the migration....which somehow started like this: answer: It's actually really simple: 1. start at any point in the function you wish to minimize 2. from t... — AZhu, Jun 23 '13 at 16:00
@GusRustam This is exactly the physics aspect that I prefer not to get into. And what I am looking for is the mechanism, in the math or non-technical ways, of how it is done. Thanks for your answer — AZhu, Jun 23 '13 at 16:01

score 7 · Accepted Answer · answered Jun 23 '13 at 17:10

The basic idea behind simulated annealing optimization is that it is a random search. Practically you draw samples from a distribution describing your possible solutions and based on certain conditions (eg. if the new solution is better than the old one, $f(x_{old}) > f(x_{new})$) you accept or reject them.

To got into a bit more detail now: Simulated annealing simulates a system that slowly cools down after being heated (given energy). In metallurgy annealing is done because by causing the particles of the metal to "jump around" the particles explore the whole surface, "fill the cracks", and result into more rigid structures, ie. more efficient crystallization.

Now, in the case of an optimization problem the surface you try to "fill", or more accurately explore, is (in a 2D case) the surface dictated by your cost function. You want to find the "deepest crack", your minimum $f$. You immediately see that a very important parameter is how "hot" your system is. On the one side, more energy/heat results into bigger fluctuations in solution search and translates into exploring more of your optimization space. On the other side if you never "cool down" your system, even you "land" in a "deep crack" it is possible that you "jump out" of it. That is why you introduce a cooling schedule: as your iteration count increases you decrease the size of your jumps in the solution space and you explore the regions around where you at the moment (hopefully closer to good minimum) more carefully.

The problem here is what happens if that "hopefully closer to a good minimum" does not really materialize. Here is where Adaptive SA comes in. It presents a methodology to control the cooling. It practically says that if the optimization procedyure is not satisfied with the progress seen in the minimization of the cost function $f$, it "does not cool" the system; it keeps making significant jumps. As such, through an adaptive cooling scheme you control the transition probability of your random search; both in terms of the acceptance rate of candidate solutions as well as the variability of their generating distribution.

And an initial reference: I personally found the article Generalized Simulated Annealing for Function Optimization by Bohachevsky et al. the most straightforward introduction to Sim.Annealing. Oldie, but it has got everything you need to move forward. I don't add a Adaptive Sim. Annealing reference mostly because all them are "special cases" of the original Sim.Annealing. Once you get your head around that, the adaptive schemes are mostly tweaks - OK, some are quite sophisticated tweaks but let's dwell on that. :)

Thanks for the detailed answer @user11852, really appreciate it! — AZhu, Jun 23 '13 at 19:08

Adaptive simulated annealing

1 Answers1