The basic idea behind simulated annealing optimization is that it is a random search. Practically you draw samples from a distribution describing your possible solutions and based on certain conditions (eg. if the new solution is better than the old one, $f(x_{old}) > f(x_{new})$) you accept or reject them.
To got into a bit more detail now: Simulated annealing simulates a system that slowly cools down after being heated (given energy). In metallurgy annealing is done because by causing the particles of the metal to "jump around" the particles explore the whole surface, "fill the cracks", and result into more rigid structures, ie. more efficient crystallization.
Now, in the case of an optimization problem the surface you try to "fill", or more accurately explore, is (in a 2D case) the surface dictated by your cost function. You want to find the "deepest crack", your minimum $f$. You immediately see that a very important parameter is how "hot" your system is. On the one side, more energy/heat results into bigger fluctuations in solution search and translates into exploring more of your optimization space. On the other side if you never "cool down" your system, even you "land" in a "deep crack" it is possible that you "jump out" of it. That is why you introduce a cooling schedule: as your iteration count increases you decrease the size of your jumps in the solution space and you explore the regions around where you at the moment (hopefully closer to good minimum) more carefully.
The problem here is what happens if that "hopefully closer to a good minimum" does not really materialize. Here is where Adaptive SA comes in. It presents a methodology to control the cooling. It practically says that if the optimization procedyure is not satisfied with the progress seen in the minimization of the cost function $f$, it "does not cool" the system; it keeps making significant jumps. As such, through an adaptive cooling scheme you control the transition probability of your random search; both in terms of the acceptance rate of candidate solutions as well as the variability of their generating distribution.
And an initial reference: I personally found the article Generalized Simulated Annealing for Function Optimization by Bohachevsky et al. the most straightforward introduction to Sim.Annealing. Oldie, but it has got everything you need to move forward. I don't add a Adaptive Sim. Annealing reference mostly because all them are "special cases" of the original Sim.Annealing. Once you get your head around that, the adaptive schemes are mostly tweaks - OK, some are quite sophisticated tweaks but let's dwell on that. :)