|
...carrying stochastic jump based on the annealing update density, the update density is used to select a fixed number of candidate parameter vectors which are all fed to the next iteration of the algorithm. The selection criterion involves not only the update density height, but also information about the origin of the candidate vector. Thus, each iteration produces a cooperative collection of parameter vectors in hope of exploring the parameter space in search of the optimum more thoroughly than the regular annealing. The technique is shown to outperform regular annealing on the problem of restoration of lattice images consisting of simple-shaped objects.
Key Words: Deterministic updates; Posterior mode.
1. INTRODUCTION
Simulated annealing (Kirkpatrick, Gelatt, Jr., and Vecchi 1983) provides a way to find a global optimum of a function. The scope of its applicability includes target functions of a large number of arguments that may have many local extrema. Its motivation includes its ability to escape from these local extrema. Without loss of generality, this article considers the maximization problem.
Let vector [theta] = ([[theta].sub.1], ..., [[theta].sub.P]) index the P-dimensional parameter space on which a target function [pi]([theta]) is defined. We are looking for [??] = [argmax.sub.[theta]] [pi]([theta]). A sequence of vectors is defined in the course of annealing. To this end, one traverses all the parameters in a predefined (or random) order. Some mild restrictions are imposed on the order to make sure that all parameters are visited equally frequently. When a parameter is visited, its value can be changed. This value is updated by making a random jump according to the annealing update density computed based on the vector from the previous iteration. At regular time intervals one decreases the temperature according to an annealing schedule, which makes the update density more concentrated around its maximum.
In applications, simulated annealing does not generally find [??] because the conditions guaranteeing convergence (e.g., in Hajek 1988) cannot usually be met in practice. Instead, it produces an estimate where the target function [pi] has a reasonably high value. Such an estimate is normally useful on the same grounds as [??] and is sometimes a good approximant to the latter.
An important area where simulated annealing is used is maximum a posteriori estimation. Given a posterior distribution of a parameter vector [theta], simulated annealing presents a way of finding a posterior mode, a point estimate that maximizes the posterior distribution. This estimator is optimal under a 0-1 loss for a discrete parameter space and is also the most probable value of [theta]. For flat priors, it also coincides with the maximum likelihood estimator which has nice asymptotic properties. This estimator is frequently used in a variety of problems including image restoration (Besag and Green 1993: Gilks, Richardson, and Spiegelhalter 1996).
This article proposes a new technique for finding an estimate with a high value of [pi]. The technique is deterministic with a possible exception for the choice of the initial population. It is similar to an evolutionary strategy (Back, Fogel, and Michalewitz 1997). An iterative part of a general evolutionary strategy consists of recombination, mutation, and selection parts. An iteration begins with a collection of parameter vectors called parent vectors. Parts of several parent parameter vectors are combined and the result is perturbed or mutated to produce an offspring This is repeated some m number of times to produce m offspring. Then some of these offspring are selected as parent vectors for the next iteration.
In our algorithm, recombination is omitted. In the mutation or, more appropriately, exploration part, each parent's evolution is similar to an annealing update, in that an offspring possibly differs from its parents only in the value of the visited parameter. M sibling offspring are obtained for each parent vector including the parent itself. Unlike standard annealing or the mutation philosophy that implies a random perturbation, our choices are deterministic. For a discrete parameter space with a small or moderate number of levels for each coordinate, all possible annealing updates enter the selection part.
The selection operation is deterministic as in evolution strategies, such as ([mu] + [lambda]), ([mu], [lambda]) and their variants (Back et al. 1997, sec. C.2.1). but unlike other exploration/selection algorithms, such as Francois (2002). The selection criterion is based not only on the relative or absolute values of the target function [pi] on the offspring set, but also on the offspring "pedigree." This is done in order to improve exploration. It is recognized that sibling offspring tend to be close. Therefore, choosing many siblings at the expense of offspring of other parents may deteriorate the quality of the exploration. In particular, we implicitly assume a notion of distance between two parameter vectors with the goal being to encourage greater distance between the selected offspring as well as high values of [pi] on the chosen set. The selection part carries most of the complexity of the algorithm.
We now describe the technique. We maintain a list of L parameter vectors for some L, for example, L = 20. Assume some deterministic parameter visiting order of regular annealing. During a visit to parameter [[theta].sub.p], L update densities are computed based on the L vectors from the previous iteration. So far it just looks like we have L different copies of simulated annealing running concurrently (e.g., from different starting places). At this point, each copy of annealing would choose the next vector by making a random jump according to its update probabilities. Instead, we use all the update probabilities together to...
NOTE: All illustrations and photos
have been removed from this article.

More articles from Journal of Computational & Graphical Statistics
Statistical simulations on parallel computers., December 01, 2004 Population Monte Carlo., December 01, 2004 Asymmetric linear dimension reduction for classification., December 01, 2004 Weber correspondence analysis: the one-dimensional case., December 01, 2004 Feature significance in geostatistics., December 01, 2004
Looking for additional articles?
Search our database of over 3 million articles.
Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication
name or publication date.
About Goliath
Whether you're looking for sales prospects, competitive information, company
analysis or best practices in managing your organization,
Goliath can help you meet your business needs.
Our extensive business information databases empower business
professionals with both the breadth and depth of credible,
authoritative information they need to support their business
goals. Whether it be strategic planning, sales prospecting,
company research or defining management best practices -
Goliath is your leading source for accurate information.
|