Skip to content

Simulated Annealing and Parallel Tempering

dhaho edited this page Aug 23, 2017 · 2 revisions

Simulated Annealing

Simulated annealing is a search algorithm, without the need for prior knowledge of the search space and therefore operating without a derivealble cost - function. The parameters, to tune this algroithm are: Starting temperature, number of iteration steps, what cooling schedule to use (see below) and the temperature decay parameter or ending temperature (depending on the cooling schedule type). The algorithm works the following way:

Init starting position and temperature
cost[position] = costfunction(position)
for i in range(number_iterations):
    position_new = take_random_step(position)
    cost[position_new] = costfunction(position_new)
    if(cost[position_new] < cost[position]):
        position = position_new
    elseif( randn() < exp(-(cost[position_new] - cost[position]) / constant*temperature):
        position = position_new
    temperature[i] = decay_temperature(temperature[i])

Where randn() is the realization of a random variable equally distributed between 0 and 1, costfunction is the function to be minimized and which is the bottleneck in terms of computation time. decay_temperature is a function, that assigns a decreasing value to the temperature, each iteration. The different functions are discussed further below.

Parallel Tempering

Parallel tempering is an algorithm, which used several simulated annealing algorithms simultaneously. Each of those algorithms uses different parameters and the positions between those algorithms can be swapped during each iteration step, according to the Metropolis - Hastings criterion:

Init starting positions and temperatures
for j  in range(number_parallel_algs):
    cost[positions[j]] = costfunction(positions[j])
for i in range(number_iterations):
    for m in range(number_parallel_algs):
        position_new[j] = take_random_step(position[j])
        cost[position_new[j]] = costfunction(position_new[j])
        if(cost[position_new[j]] < cost[position[j]]):
            position[j] = position_new[j]
        elseif( randn() < exp(-(cost[position_new[j]] - cost[position[j]]) / constant*temperature[j]):
            position[j] = position_new[j]
        temperature[j] = decay_temperature(temperature[j])
# up until this point normal annealing algorithm
# below is the swapping of the parameters, which makes a parallel simulated annealing into a parallel tempering algorithm
    for j in range(number_parallel_algs-1):
        # make random comparison
        comparison = random_choice(range(j+1, number_parallel_algs)) 
        k = constant # based on boltzman constant if there is a physical correlate in the other used parameters like temperature and energy / cost
        p = exp(-|(E1 - E2) * (1 / (k * T1) - 1 / (k * T2))|)
        if( randn() < p):
           swap(position[j], position[comparison])

Cooling Schedules

Different kinds of optimization problems need different approaches in terms of exploration rate and finding local minima. This can be accounted for by choosing the right cooling schedule, represented by temperature[i] = decay_temperature(temperature[i]) in the above algorithms. There are generally two types of cooling schedules: One has a temperature decay, independent from the current step and only dependent on the current temperature and a decay parameter 'alpha'. Those cooling schedules are more useful in problems without a fixed number of iterations and problems which benefit from a steep cooling curve in the beginning and a shallow one at the end of the optimization. Those schedules are called 'Multiplicative Monotonic Cooling' [1] The other type has a fixed shape of the trajectory, which is only based on: starting temperature, ending temperature and the number of algorithm iterations. These schedules allow for different behaviour of the cooling in different stages of the algorithm and can combine advantages of high exploration in the beginning and a good ability to find a minimum in the end, but don't have a steep cooling in the beginning of the algorithm and needs the exact number of iteration steps in advance to perform as intended. It is called 'Additive Monotonic Cooling' and based on the work of Brian Luke [2]

Usage of the LTL code

To make use of these algorithms, adapt the code from the repository from: /bin/ltl-fun-pt.py and /bin/ltl-fun-sa.py. Replace the optimizee by the function, that has to be optimized. For the normal simulated annealing, all parameters can be specified by using parameters = SimulatedAnnealingParameters(...) in the example code and adapting them accordingly. For the parallel tempering, the number of parallel algorithms (n_parallel_runs) have to be specified earlier and for each of those runs, the used cooling schedule, the temperatures, and the temperature decay have to be specified separately, as exemplified in the code. Note that the algorithms are implemented, to maximize the optimizee. Specify optimizee_fitness_weights accordingly, to adapt it to the optimizer (i.e. set it to -1, to make it minimize the optimizee).

Choosing of the algorithm parameters

In principle the simulated annealing and by extension, the parallel tempering algorithms, are good at exploring unknown cost - functions. If some properties of the cost - function are known, they can be used to help the optimizer. For example, if there are few local minima, the starting temperature can be set very low and the decay very high, as the algorithm probably will not have to escape them. If the opposite is true, a higher average temperature is recommended. For parallel tempering, the usage of many different temperatures and schedules is recommended.

References

[1] S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi http://www.jstor.org/stable/1690046?seq=1#page_scan_tab_contents

[2] http://www.btluke.com/simann0.html