Simulated Annealing : Methods and Real-World Applications

Summary

Optimization techniques play a critical role in numerous challenges within machine learning and signal processing spaces. This blog specifically focuses on a significant class of methods for global optimization known as Simulated Annealing (SA). We cover the motivation, procedures and types of simulated annealing that have been used over the years. Finally, we look at some real world applications of simulated annealing, not limited to the realms of Machine Learning, demonstrating the power of this technique.

Introduction

Non-convex optimization problems can be challenging due to the presence of numerous local minima, potentially causing algorithms to get stuck with sub-optimal solutions. To overcome this issue, it becomes crucial to employ global optimization methods that can escape local minima and discover the overall optimal solution. Introducing randomness is often key to this process. Stochastic optimization algorithms generate a sequence of random variables, guiding the convergence toward global minima.

The concept of simulated annealing was first pioneered by Kirkpatrick et al., who drew inspiration from the annealing process used in metal working [1]. This process seeks to determine the most efficient molecular arrangements of metal particles by gradually cooling the metal after exposing the metal to high heat, thereby minimizing the potential energy of the mass. In a broader sense, the procedure follows an iterative approach guided by a variable temperature parameter, mimicking the gradual cooling observed in the annealing process of metals until the metal reaches a state of minimal energy configuration.

(Image taken from Neonickel)

Method

The general procedure of a simulated annealing algorithm initially starts with the system choosing a high temperature (T), followed by the implementation of a cooling or annealing scheme, gradually reducing T based on a predefined procedure. At each temperature level, a set of random new states is generated. States that lead to an improvement in the cost function are accepted. In contrast to outright rejecting states that do not enhance the cost function, there is a possibility of accepting such states with a finite probability. This approach introduces randomness into the iterative improvement phase and allows occasional uphill moves. These random moves may not enhance the solution, but they aim to minimize the likelihood of getting trapped in a local minimum.

Although there have been multiple variations in simulated annealing procedure over the years, a general SA algorithm has the following structure [7]:

PROCEDURE SIMULATED ANNEALING

    Set S ← S₀ (random initial state)
    Set T ← T₀ (initial temperature)
    While (stopping criterion is not satisfied) do
        While (required number of states is not generated) do
            Generate a new state (S') by perturbing S
            Evaluate cost function E
            Compute ΔE = E(S') − E(S)
            If (ΔE ≤ 0) then
                S ← S'
            Else
                Generate a random variable α, 0 ≤ α ≤ 1
                If α ≤ e^(−ΔE)/T then S ← S'
        End
    End
    Update T (decrement)
End

The cost function represents a factor that is dependent on the application / use-case. In certain applications, the cost function can take on an analytical value, allowing for apriori determination of the function. Conversely, in other instances, the cost function can be non-analytical and requires determination through indirect means by observing the optimized process. Despite the nature of the cost function, the difference in its values (ΔE) holds paramount importance for the success of the iterative process. The significance of e^-ΔE/T implies that as the ratio of T to ΔE diminishes, the likelihood of accepting that state also decreases.

A schedule for annealing needs to be developed to regulate the temperature throughout the annealing process. A general implementation typically outlines the start and final temperatures, along with the rate at which the temperature is reduced. Besides the rate of temperature change, the number of iterations for a given T can also be a variable. Several strategies have been suggested for estimating the termination (freezing) temperature. One such formula [7] which determines the temperature at which no further improvements can occur is found to be:

Here,

E_m is the absolute minimum value of the cost function, potentially predetermined based on application
E_m′ is the next largest value of the cost function compared to E_m
ν is the number of moves required to transition from E_m to E_m′.

Using T_f ensures that the annealing process does not terminate too early, preventing the process from stopping before the minimum solution is found. An alternative approach can be to stop the annealing process when no new solutions are accepted after a fixed number of consecutive iterations of T reduction.

Further Reading:
Zomaya, Albert Y., and Rick Kazman. “Simulated annealing techniques.” Algorithms and theory of computation handbook: general concepts and techniques. 2010. 33-33

Variations

There have been multiple variants of the standard SA algorithm over the years. Many of these variants focus on faster convergence by introducing adaptive parameters, parallel processing and so on [7].

Fast Simulated Annealing

Fast simulated annealing (FSA) is characterized as a semi-local search approach with intermittent long jumps [3]. The cooling schedule employed in the FSA algorithm is linearly inversely proportional to time, allowing for quicker convergence compared to the classical simulated annealing (SA). In contrast, SA is strictly a local search method and necessitates the cooling schedule to be inversely proportional to the logarithmic function of time.

Adaptive Simulated Annealing

Adaptive simulated annealing is a modified version, wherein the parameters governing the annealing schedule and step selection are automatically tuned based on the algorithm’s progress. This adaptation enhances the algorithm’s efficiency and reduces its sensitivity to user-defined parameters compared to the conventional approach. Another variant of ASA, known as thermodynamic simulated annealing, dynamically adjusts the temperature at each step by considering the energy difference between two states, aligning with the principles of thermodynamics [10].

Sequential Monte Carlo Simulated Annealing

A Sequential Monte Carlo Simulated Annealing (SMC-SA) algorithm was introduced in 2013 [9] as a population-based optimization technique designed for continuous global optimization. SMC-SA integrates the sequential Monte Carlo method to monitor the converging sequence of Boltzmann distributions in simulated annealing. It has been demonstrated that SMC-SA is a more favorable choice than the multi-start (parallel) simulated annealing method, particularly when the sample size is sufficiently large.

Further Reading (Sign in with GT credentials if required):

Applications

Simulated Annealing finds its use in many applications like data association, vehicle routing, data transmission in computer networks, scheduling, drilling of printed circuit boards, analysis of crystal structures, clustering of data arrays, assignment of routes for specified fleets of planes, image processing, pattern recognition, transportation, and logistics, highlight the potential benefits of obtaining near-optimal solutions with acceptable costs.

VLSI Design

The efficacy of simulated annealing in solving a variety of optimization challenges within the domain of design automation has been well known, particularly for VLSI circuits as shown by Wong et al. [4]. Simulated annealing has proven successful in addressing problems such as placement, floorplan design, routing, PLA folding, Gate Matrix layout, logic array optimization, two-dimensional compaction, via minimization, logic minimization, transistor sizing, and digital filter design. We will look at some of these use cases in more depth.

(Image generated by Midjourney)

1. Placement

The fundamental placement problem involves arranging a collection of circuit modules on a chip in a manner that minimizes a specified objective function. The primary objective is to reduce the overall chip area utilized by both the circuit modules and the interconnections (wire length reduction) among them. One standard algorithm used is the Timber Wolf [11] standard-cell placement algorithm, which consists of two stages: the first allowing module rearrangements both within and between rows, including module overlaps, and the second eliminating overlaps and restricting interchanges to adjacent modules in the same row. The cost function evaluates interconnection cost, module overlaps, and row length deviations. The annealing schedule involves temperature reduction with a specified rate, and various moves modify the placement solution.

2. Channel Routing

The channel routing problem involves finding optimal interconnections between pins across two rows in a rectangular channel, using horizontal and vertical wire segments with connections through via holes. The objective is to minimize the number of tracks needed. Three types of moves are defined to modify solutions in the solution space: swapping subnets between groups, moving a subnet between groups, and creating a new group for a subnet. The cost function considers channel width, longest path length, and track sparsity. The annealing schedule adapts temperature based on the channel width, allowing more rearrangements at higher temperatures. The algorithm terminates when acceptance probabilities decrease, or the temperature becomes sufficiently low, and may end early if an optimal solution is found.

3. PLA Folding

Programmable Logic Arrays (PLAs) are widely used circuits implementing multiple-output combinational logic functions, commonly described by a sparse personality matrix. Direct PLA implementation can lead to significant chip area waste, impacting yield and performance. PLA folding is a technique aimed at reducing PLA area, achieved by allowing input variables or output functions to share columns. The goal of PLA folding is to find a folding arrangement minimizing the total number of columns in the PLA, optimizing chip area usage. The solution space considers permutations of rows in the personality matrix, aiming to find a minimum size folding. The simulated-annealing PLA folding algorithm uses a cost function considering the square of input and output densities, providing a refined measure of solution quality. The annealing schedule involves a temperature decrease with a factor r and the algorithm starts with any initial permutation and terminates based on specific conditions regarding accepted moves and temperature levels.

Further Reading:
Wong, D. F., Hon Wai Leong, and H. W. Liu. Simulated annealing for VLSI design. Vol. 42. Springer Science & Business Media, 2012. Sign in with GT credentials if required

Multiprocessor Scheduling Problem

The challenge of multiprocessor scheduling involves allocating a workload uniformly across a distributed computer system, which comprises multiple processors with individual distributed local memories. Program modules are represented by nodes in a program graph, with edge connections indicating communication paths. Module and edge weights reflect code segment length and communication cost. The objective function involves minimizing system load imbalance and communication cost. The classical simulated annealing algorithm as well as variations like enhanced simulated annealing algorithm, multi-thread simulated annealing algorithm, modified-uniform simulated annealing algorithm, and multi-thread modified-uniform simulated annealing algorithm have been applied as scheduling methods [5], each with specific steps and applications. The algorithms aim to optimize system execution time and achieve load balancing.

Further Reading:
Nasr, George E., A. Harb, and George V. Meghabghab. “Enhanced Simulated Annealing Techniques for Multiprocessor Scheduling.” FLAIRS Conference. 1999.

Vehicle Routing Problem

The Vehicle Routing Problem involves optimizing routes for a limited number of vehicles to serve customers with specific time windows. The problem is defined on a complete graph with nodes representing the depot, customers, and connections between them. Customers have fixed delivery and pickup demands, and the central depot serves as the distribution and collection center. A fleet of identical vehicles with capacity and dispatching cost is available at the depot. The depot and customers have time windows, treated as hard constraints, and vehicles must adhere to these time limits. Distances between customers are based on Euclidean distance, and routes start and end at the depot, visiting customers at most once. The objective is to minimize overall cost while satisfying constraints related to vehicle capacity, time windows, and service requirements. Simulated annealing approaches have been shown to be effective [2], with further potential to add carbon footprint to the cost function, thus enabling “green” and efficient vehicle routing.

Further Reading:
Wang, Chao, et al. “A parallel simulated annealing method for the vehicle routing problem with simultaneous pickup–delivery and time windows.” Computers & Industrial Engineering 83 (2015): 111-122. Sign in with GT credentials if required

Traveling Salesman Problem (TSP)

(Image taken from Github)

The Traveling Salesman Problem (TSP) involves finding the most efficient closed tour that covers every city exactly once. Recognized for its computational difficulty, the TSP falls within the category of NP-complete problems, and its complexity escalates exponentially with an increase in the number of cities. In applying Simulated Annealing to the TSP, one starts by obtaining an initial solution, which can be any valid route for the salesman. Then, two cities within the route are randomly chosen, and their positions are swapped. SA is used to calculate the probability of accepting or rejecting the new solution. This process is repeated for a specified number of iterations, gradually decreasing the temperature. Throughout iterations, a record is maintained of the best overall solution discovered.

Prediction of crystal structures

Crystal structure prediction has been modeled as a global optimization challenge, aiming to minimize a potential energy function. Researchers have proposed potential energy functions suitable for determining accurate structures in diverse ionic crystals, utilizing empirical rules from electrostatics and crystal chemistry [6]. A sample optimization approach involves using a Simulated Annealing method with parameters tailored to crystal structure prediction topology and subsequently employing deterministic optimization methods for final refinement of atomic positional parameters.

(Image generated by Midjourney)

Further Reading:
Bollweg, Wilfried, Helmut Maurer, and Herbert Kroll. Numerical prediction of crystal structures by simulated annealing. Springer US, 1997. Sign in with GT credentials if required

Conclusion

We have now seen a powerful optimization technique which finds its use across the boundaries of Machine Learning and Computer Science. In may real world use cases, the absolute optimal solution is not easily reachable due to computational or model complexity constraints. SA does a good job of trading off solution optimality for computational and time complexity.

Featured image generated by Midjourney.