Improved Metropolis–Hastings algorithms via landscape modification with applications to simulated annealing and the Curie–Weiss model

Michael C. H. Choi

doi:10.1017/apr.2023.35

Improved Metropolis–Hastings algorithms via landscape modification with applications to simulated annealing and the Curie–Weiss model

Part of: Markov processes

Published online by Cambridge University Press: 30 August 2023

Michael C. H. Choi

Show author details

Michael C. H. Choi*: Affiliation:
National University of Singapore
*: *Postal address: Department of Statistics and Data Science, National University of Singapore, Singapore. Email address: mchchoi@nus.edu.sg

Article contents

Abstract
Introduction
Metropolis–Hastings with landscape modification
The Curie–Weiss model with landscape modification
Discrete simulated annealing with landscape modification
Funding information
Competing interests
References

Rights & Permissions

Abstract

In this paper, we propose new Metropolis–Hastings and simulated annealing algorithms on a finite state space via modifying the energy landscape. The core idea of landscape modification rests on introducing a parameter c, such that the landscape is modified once the algorithm is above this threshold parameter to encourage exploration, while the original landscape is utilized when the algorithm is below the threshold for exploitation purposes. We illustrate the power and benefits of landscape modification by investigating its effect on the classical Curie–Weiss model with Glauber dynamics and external magnetic field in the subcritical regime. This leads to a landscape-modified mean-field equation, and with appropriate choice of c the free energy landscape can be transformed from a double-well into a single-well landscape, while the location of the global minimum is preserved on the modified landscape. Consequently, running algorithms on the modified landscape can improve the convergence to the ground state in the Curie–Weiss model. In the setting of simulated annealing, we demonstrate that landscape modification can yield improved or even subexponential mean tunnelling time between global minima in the low-temperature regime by appropriate choice of c, and we give a convergence guarantee using an improved logarithmic cooling schedule with reduced critical height. We also discuss connections between landscape modification and other acceleration techniques, such as Catoni’s energy transformation algorithm, preconditioning, importance sampling, and quantum annealing. The technique developed in this paper is not limited to simulated annealing, but is broadly applicable to any difference-based discrete optimization algorithm by a change of landscape.

Keywords

Metropolis–Hastings simulated annealing spectral gap Curie–Weiss metastability landscape modification energy transformation

MSC classification

Primary: 60J27: Continuous-time Markov processes on discrete state spaces

Secondary: 60J28: Applications of continuous-time Markov processes on discrete state spaces

Information

Type: Original Article
Information: Advances in Applied Probability , Volume 56 , Issue 2 , June 2024 , pp. 587 - 620

DOI: https://doi.org/10.1017/apr.2023.35 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright: © The Author(s), 2023. Published by Cambridge University Press on behalf of Applied Probability Trust

1. Introduction

Given a target Gibbs distribution with Hamiltonian function $\mathcal{H}$ and temperature $\epsilon$ , the Metropolis–Hastings (MH) algorithm is a popular and important Markov chain Monte Carlo algorithm that has been applied to various sampling and optimization problems in a wide range of disciplines, including but not limited to Bayesian computation, statistical physics, and theoretical computer science. While there are many improved variants of the MH algorithm that have been investigated in the literature, this paper centres around a method that is known as landscape modification. This technique is particularly suitable in the context of stochastic minimization with respect to $\mathcal{H}$ , either by running the MH algorithm at a low enough temperature $\epsilon$ or by driving the temperature $\epsilon$ to zero as in simulated annealing. The core idea of landscape modification relies on targeting a modified Hamiltonian function instead of $\mathcal{H}$ , where the modification or the transformation is based upon two parameters, namely f and the threshold parameter c. On the part of landscape of $\mathcal{H}$ that is below c, the original landscape is utilized, which allows for exploitation and concentration of the MH algorithm at low temperatures. On the other hand, on the region of the landscape of $\mathcal{H}$ which is above c, the function f is applied to transform this part of the landscape to encourage and facilitate exploration of the chain. More precisely, the acceptance–rejection probability on this part is increased by the transformation, which consequently leads to a higher transition rate for the modified MH chain, which promotes exploration.

The advantage of landscape modification stems from the notion of critical height, which measures the difficulty of the landscape in a broad sense. With appropriate tuning of both f and c, the critical height is reduced, which leads to an improved simulated annealing algorithm or, for instance, reduced mean crossover time or relaxation time in the Curie–Weiss (CW) model, which we shall discuss in detail.

We now summarize and highlight the key achievements and contributions of this paper:

1. We propose and analyse MH algorithms with landscape modification. In Section 2, we first define a new MH algorithm using a modified Hamiltonian function. In general, the corresponding acceptance–rejection probability in the modified MH is of integral form, but we show that this integral can be readily calculated upon specializing to various choices of f, such as linear, quadratic, or square root functions. This is then followed by an example of landscape modification in the Ehrenfest urn with a linear Hamiltonian in Section 2.4, where we prove an upper bound on the spectral gap with polynomial dependence on the dimension, whereas the same technique yields an exponential dependence on the dimension for the classical MH. We provide a discussion on possible strategies for tuning f and c in Section 2.5. In the final subsection, that is, Section 2.6, we elaborate on the similarities and differences between landscape modification and other acceleration techniques in the Markov chain Monte Carlo literature, such as Catoni’s energy transformation algorithm [Reference Catoni1, Reference Catoni2], preconditioning of the Hamiltonian, importance sampling, and quantum annealing [Reference Wang, Wu and Zou3].
2. We obtain improved mean crossover time and relaxation time in the CW model with landscape modification. In Section 3, we investigate the effect of landscape modification on the CW model. We first consider the CW model under a fixed external magnetic field in the subcritical regime, where the free energy landscape, as a function of the magnetization, has two local minima. Using a Glauber dynamics with landscape modification, we introduce a new mean-field equation, and with appropriate choice of c, we transform the free energy landscape from a double-well into a single-well landscape, while preserving the location of the global minimum on the modified landscape. As a result, running algorithms on the modified landscape can accelerate the convergence towards the ground state. We prove a subexponential mean tunnelling time in such a setting and discuss related metastability results. Similar results are then extended to the random-field CW model.
3. We obtain an improved simulated annealing algorithm with an improved logarithmic cooling schedule. In Section 4, we consider the simulated annealing setting by driving the temperature down to zero. We define a clipped critical height $c^*$ (that depends on the threshold parameter c) associated with the improved simulated annealing algorithm, and prove tight spectral gap asymptotics based on this parameter. This leads to similar asymptotic results concerning the total-variation mixing time and the mean tunnelling time in the low-temperature regime. Utilizing existing results concerning simulated annealing with time-dependent target function, we prove a convergence guarantee for the landscape-modified simulated annealing on a finite state space with an improved logarithmic cooling schedule. These theoretical results on improved convergence are corroborated by numerical experiments on a travelling salesman problem, in Section 4.1.

This paper can be seen as a sequel to an earlier work by the author [Reference Choi4]. The original motivation for landscape modification is from Fang et al. [Reference Fang, Qian and Gong5], who propose a variant of overdamped Langevin diffusion with state-dependent diffusion coefficient. Choi [Reference Choi4] applies this idea of state-dependent noise via landscape modification in the setting of kinetic simulated annealing and develops an improved kinetic simulated annealing algorithm with a convergence guarantee. In the present paper, we recognize that this idea of landscape modification can also be applied to the finite-state-space setting in MH and simulated annealing, and investigate the benefits and speed-ups that this technique can bring, in particular, to the analysis of the CW model and stochastic optimization. While the technique developed in Choi [Reference Choi4] can readily be applied to gradient-based continuous optimization algorithms, we emphasize that the landscape modification technique proposed in this paper can be analogously implemented in essentially all discrete optimization algorithms by a change of landscape and is not limited to simulated annealing or the CW model.

1.1. Notation

Throughout this paper, we adopt the following notation. For $x,y \in \mathbb{R}$ , we let $x_+ = \max\{x,0\}$ denote the non-negative part of x, and we write $x \wedge y = \min\{x,y\}$ . For two functions $g_1, g_2 \;:\; \mathbb{R} \to \mathbb{R}$ , we say that $g_1 = \mathcal{O}(g_2)$ if there exists a constant $C > 0$ such that for sufficiently large x, we have $|g_1(x)| \leqslant C g_2(x)$ . We write $g_1 = o(g_2)$ if $\lim_{x \to \infty} g_1(x)/g_2(x) = 0$ , and we write $g_1 \sim g_2$ if $\lim_{x \to \infty} g_1(x)/g_2(x) = 1$ . We say that $g_1(x)$ is a subexponential function if $\lim_{x \to \infty} \frac{1}{x} \log g_1(x) = 0$ .

2. Metropolis–Hastings with landscape modification

Let $\mathcal{X}$ be a finite state space, let $Q = (Q(x,y))_{x,y \in \mathcal{X}}$ be the transition matrix of a reversible proposal chain with respect to the probability measure $\mu = (\mu(x))_{x \in \mathcal{X}}$ , and let $\mathcal{H}\;:\;\mathcal{X} \to \mathbb{R}$ be the target Hamiltonian function. Denote by $M^{0} = (M^{0}(x,y))_{x,y \in \mathcal{X}}$ the infinitesimal generator of the continuized classical MH chain $X^0 = (X^0(t))_{t \geqslant 0}$ , with proposal chain Q, and with target distribution being the Gibbs distribution $\pi^{0}(x) \propto e^{-\frac{1}{\epsilon} \mathcal{H}(x)}\mu(x)$ at temperature $\epsilon > 0$ . Recall that its dynamics is given by

\begin{align*}M^{0}(x,y) &= M^0_{\epsilon}(Q,\pi^0)(x,y)\\[5pt] & \;:\!=\; \begin{cases} Q(x,y) \min \left\{ 1,e^{\frac{1}{\epsilon}(\mathcal{H}(x)-\mathcal{H}(y))} \right\} = Q(x,y) e^{-\frac{1}{\epsilon}(\mathcal{H}(y)-\mathcal{H}(x))_+} &\mbox{if } x \neq y, \\[8pt] - \sum_{z: z \neq x} M^0(x,z) & \mbox{if } x = y. \end{cases}\end{align*}

We shall explain the superscript of 0 in both $M^0$ and $\pi^0$ in Definition 1 below.

Let us denote the ground-state energy level or the global minimum value of $\mathcal{H}$ by $\mathcal{H}_{\textrm{min}} \;:\!=\; \min_{x \in \mathcal{X}} \mathcal{H}(x)$ . Instead of directly targeting the Hamiltonian $\mathcal{H}$ in the Gibbs distribution, we instead target the following modified or transformed function $\mathcal{H}_{\epsilon,c}^f$ at temperature $\epsilon$ :

(2.1)

\begin{align} \mathcal{H}_{\epsilon}(x) = \mathcal{H}^f_{\epsilon,c}(x) \;:\!=\; \int_{\mathcal{H}_{\textrm{min}}}^{\mathcal{H}(x)} \dfrac{1}{f((u-c)_+) + \epsilon}\,du,\end{align}

where the function f and the parameter c are chosen to satisfy the following assumptions.

Assumption 1.

1. The function $f\;:\; \mathbb{R}^+ \to \mathbb{R}^+$ is differentiable and non-decreasing, and it satisfies
\begin{align*}f(0) = 0.\end{align*}
2. The parameter c satisfies $c \geqslant \mathcal{H}_{\textrm{min}}$ .

While it is impossible to calculate $\mathcal{H}_{\epsilon,c}^f$ without knowing $\mathcal{H}_{\textrm{min}}$ a priori, in an MH chain what matters is the difference of the energy function. For $x,y \in \mathcal{X}$ , we see that

\begin{align*}\mathcal{H}_{\epsilon,c}^f(y) - \mathcal{H}_{\epsilon,c}^f(x) = \int_{\mathcal{H}(x)}^{\mathcal{H}(y)} \dfrac{1}{f((u-c)_+) + \epsilon}\,du,\end{align*}

which does not depend on $\mathcal{H}_{\textrm{min}}$ . In the special case where we choose $f = 0$ , the above equation reduces to $\mathcal{H}_{\epsilon,c}^0(y) - \mathcal{H}_{\epsilon,c}^0(x) = \frac{1}{\epsilon}(\mathcal{H}(y) - \mathcal{H}(x))$ . On the other hand, in the case where $c < \mathcal{H}(x) < \mathcal{H}(y)$ and f is chosen such that $f(z) > 0$ whenever $z > 0$ , we have

\begin{align*}\mathcal{H}_{\epsilon,c}^f(y) - \mathcal{H}_{\epsilon,c}^f(x) \leqslant \dfrac{1}{f(\mathcal{H}(x)-c) + \epsilon}(\mathcal{H}(y) - \mathcal{H}(x)) < \frac{1}{\epsilon} (\mathcal{H}(y) - \mathcal{H}(x)).\end{align*}

Since f is assumed to be non-decreasing in Assumption 1, the greater the difference between $\mathcal{H}(x)$ and c, the smaller the upper bound is in the first inequality in the above equation; thus, the smaller we expect $\mathcal{H}_{\epsilon,c}^f(y) - \mathcal{H}_{\epsilon,c}^f(x)$ to be, and the higher the transition rate $Q(x,y)\exp\Big\{-(\mathcal{H}_{\epsilon,c}^f(y) - \mathcal{H}_{\epsilon,c}^f(x))\Big\}$ is. This is the sense in which the landscape is modified. Intuitively speaking, when the algorithm is above the threshold parameter c, the landscape is modified so that the transition rate is higher, to encourage exploration, while the original landscape is utilized for exploitation when the algorithm is below c.

To illustrate the effect of the proposed transformation, we plot the function

\begin{align*}\mathcal{H}(x) =\cos\! (2 x)+\frac{1}{2} \sin\! (x)+\frac{1}{3} \sin\! (10 x)\end{align*}

as in Monmarché [Reference Monmarché6] and compare this landscape with that of $\mathcal{H}_{\epsilon,c}^f$ in Figure 1, where we take $\epsilon \in \{0.25,0.5,0.75,1\}$ , $c = -1.5$ , and $f(z) = z$ . With these choices of parameters, we compute that

\begin{align*}\mathcal{H}^f_{\epsilon,c}(x) = \frac{1}{\epsilon} (\!\min\{c,\mathcal{H}(x)\} - \min \mathcal{H}) + \ln\!\bigg(1 + \frac{1}{\epsilon}(\mathcal{H}(x) - c)_+\bigg).\end{align*}

In view of the above equation, we plot and compare $\frac{1}{\epsilon} \mathcal{H}(x)$ and $\mathcal{H}^f_{\epsilon,c}(x) + \frac{1}{\epsilon} \mathcal{H}_{\textrm{min}}$ in Figure 1 on the domain $\mathcal{X} = \{-5 + \frac{k}{1000} 10;\; k = 1,2,\ldots,1000\}$ . The shift of $+\frac{1}{\epsilon}\mathcal{H}_{\textrm{min}}$ is necessary to make these two landscapes match exactly in the region where $\{x;\;\mathcal{H}(x) \leqslant c\}$ , so that they are on the same scale. When $x \in (\!-\!4,-2)$ in Figure 1, we can see that the gradient of $\mathcal{H}_{\epsilon,c}^f$ is smaller than that of $\frac{1}{\epsilon} \mathcal{H}$ ; thus it is easier to climb up the hill in this region (i.e. there is a higher transition rate). From the plot we also note that both $\frac{1}{\epsilon} \mathcal{H}$ (the solid red curve) and $\mathcal{H}_{\epsilon,c}^f$ (the dashed blue curve) have exactly the same set of stationary points.

Figure 1. Landscape of $\frac{1}{\epsilon} \mathcal{H}$ and $\mathcal{H}^f_{\epsilon,c} + \frac{1}{\epsilon} \mathcal{H}_{\textrm{min}}$ , where $\mathcal{H}(x) =\cos\! (2 x)+\frac{1}{2} \sin\! (x)+\frac{1}{3} \sin\! (10 x)$ , $\epsilon \in \{0.25,0.5,0.75,1\}$ , $c = -1.5$ , and $f(z) = z$ .

Keeping in mind the ideas and notation above, we are now ready to introduce the MH chain with landscape modification.

Definition 1. (Metropolis–Hastings with landscape modification.) Let $\mathcal{H}$ be the target Hamiltonian function, and let $\mathcal{H}_{\epsilon,c}^f$ be the landscape-modified function at temperature $\epsilon$ introduced in (2.1), where f and c satisfy Assumption 1. The continuized Metropolis–Hastings chain with landscape modification $X^f_{\epsilon,c} = \big(X^f_{\epsilon,c}(t)\big)_{t \geqslant 0}$ has target distribution $\pi^f(x) = \pi^f_{\epsilon,c}(x) \propto e^{-\mathcal{H}_{\epsilon,c}^f(x)}\mu(x)$ and proposal chain Q, and its infinitesimal generator $M^f = \big(M^f(x,y)\big)_{x,y \in \mathcal{X}}$ is given by

\begin{align*}M^{f}(x,y) = M_{\epsilon,c}^f\big(Q,\pi^f\big)(x,y) \;:\!=\; \begin{cases} Q(x,y) e^{-(\mathcal{H}_{\epsilon,c}^f(y)-\mathcal{H}_{\epsilon,c}^f(x))_+} &\mbox{if } x \neq y, \\[5pt] - \sum_{z: z \neq x} M^f(x,z) & \mbox{if } x = y. \end{cases}\end{align*}

Note that when $f = 0$ , the above dynamics reduces to the classical MH $X^0$ .

We now fix some notation and recall some important concepts and results from the literature on Markov chains. We endow the Hilbert space $\ell^2\big(\pi^f_{\epsilon,c}\big)$ with the usual inner product weighted by the invariant measure $\pi^f_{\epsilon,c}$ : for $g_1,g_2 \in \ell^2 \big(\pi^f_{\epsilon,c}\big)$ ,

\begin{align*}\langle g_1,g_2 \rangle_{\pi^f_{\epsilon,c}} \;:\!=\; \sum_{x \in \mathcal{X}} g_1(x) g_2(x) \pi^f_{\epsilon,c}(x),\end{align*}

and for $p > 1$ we denote the $\ell^p$ norm by $\left\lVert\cdot\right\rVert_{\ell^p(\pi^f_{\epsilon,c})}$ . We write $\lambda_2\big(\!-\!M^f_{\epsilon,c}\big)$ for the spectral gap of $M^f_{\epsilon,c}$ ; that is,

(2.2)

\begin{align} \lambda_2\big(\!-\!M^f_{\epsilon,c}\big) \;:\!=\; \inf_{l \in \ell^2(\pi^f_{\epsilon,c})\;: \pi^f_{\epsilon,c}(l) = 0} \dfrac{\langle -M^f_{\epsilon,c}l,l \rangle_{\pi^f_{\epsilon,c}}}{\langle l,l \rangle_{\pi^f_{\epsilon,c}}}.\end{align}

Analogously, we denote by $\lambda_2\big(\!-\!M^0_{\epsilon}\big)$ the spectral gap of $M^0_{\epsilon}$ .

In the upcoming sections, we shall investigate and compare the total-variation mixing time between the MH chains with and without landscape modification. For any probability measures $\nu_1,\nu_2$ with support on $\mathcal{X}$ , the total-variation distance between $\nu_1$ and $\nu_2$ is

\begin{align*}||\nu_1 - \nu_2||_{TV} \;:\!=\; \sup_{A \subset \mathcal{X}} |\nu_1(A) - \nu_2(A)| = \dfrac{1}{2} \sum_{x \in \mathcal{X}} |\nu_1(x) - \nu_2(x)|.\end{align*}

The worst-case total-variation mixing time is defined to be

\begin{align*}t_{mix}\big(M^f_{\epsilon,c},1/4\big) \;:\!=\; \inf\left\{t;\;\sup_x ||P_t^{f}(x,\cdot) - \pi^f_{\epsilon,c}||_{TV} < 1/4\right\},\end{align*}

where $\big(P_t^{f} = e^{M^f_{\epsilon,c}t}\big)_{t \geqslant 0}$ is the transition semigroup of $X^f_{\epsilon,c}$ .

Besides the mixing time, we will also be interested in various hitting times of the MH chain. These variables naturally appear when we discuss metastability results in the CW model in Section 3 or in discrete simulated annealing in Section 4. For any $A \subset \mathcal{X}$ , we write $\tau^f_{A} \;:\!=\; \inf\big\{t \geqslant 0;\; X^f_{\epsilon,c}(t) \in A\big\}$ , and the usual convention that $\inf \emptyset = \infty$ applies. Similarly, we define $\tau^0_A$ to be the first hitting time of the set A for the MH chain $X^0$ . When $A = \{x\}$ , we shall simply write $\tau^f_x = \tau^f_{\{x\}}$ (resp. $\tau^0_x = \tau^0_{\{x\}}$ ). Also, we denote by $\mathbb{E}_x(\cdot)$ (resp. $\mathbb{P}_x(\cdot)$ ) the mathematical expectation (resp. probability) of the Markov chain with initial state at $x \in \mathcal{X}$ .

One crucial notion that quantifies the possible benefits of landscape modification is the concept of critical height, which in a broad sense measures the difficulty of the landscape. We now recall this classical notion, which originates from the literature on simulated annealing and metastability. A path from x to y is any sequence of points $x_0 = x, x_1, x_2,\ldots, x_n =y$ such that $Q(x_{i-1},x_i) > 0$ for $i = 1,2,\ldots,n$ . For any $x \neq y$ , such a path exists as the proposal chain Q is irreducible. We write $\chi^{x,y}$ for the set of paths from x to y, and elements of $\chi^{x,y}$ are denoted by $\gamma = (\gamma_i)_{i=0}^n$ . Given a target function $\mathcal{U}$ defined on $\mathcal{X}$ , the highest value of $\mathcal{U}$ along a path $\gamma \in \chi^{x,y}$ , known as the elevation, is defined to be

\begin{align*}\mathrm{Elev}(\mathcal{U},\gamma) \;:\!=\; \max\{\mathcal{U}(\gamma_i);\;\gamma_i \in \gamma\},\end{align*}

and the lowest possible highest elevation along paths from x to y is

(2.3)

\begin{align} G(x,y) = G(\mathcal{U},x,y) \;:\!=\; \min\{\mathrm{Elev}(\mathcal{U},\gamma);\;\gamma \in \chi^{x,y}\}.\end{align}

The associated critical height of $\mathcal{U}$ is then defined to be

\begin{align*} L(\mathcal{U}) &\;:\!=\; \max_{x,y \in \mathcal{X}}\{G(x,y) - \mathcal{U}(x) - \mathcal{U}(y) \} + \min \mathcal{U}.\end{align*}

Another related notion is the clipped critical height $c^*$ , which is defined to be

\begin{align*} c^*(\mathcal{U},c) \;:\!=\; \max_{x,y \in \mathcal{X}} \{(G(\mathcal{U},x,y) \wedge c) - (\mathcal{U}(x) \wedge c) - (\mathcal{U}(y) \wedge c) \} + \min \mathcal{U}.\end{align*}

One can understand $c^*$ as if we are optimizing with respect to the function $\mathcal{U} \wedge c$ . In Section 3, we shall consider $\mathcal{U}$ to be the free energy with and without landscape modification that arises in the CW model, while in Section 4, when we investigate an improved simulated annealing algorithm, we take $\mathcal{U}$ to be either $\mathcal{H}$ or $\mathcal{H}^f_{\epsilon,c}$ . We refer readers to Figure 4 in Section 4, where we offer a visual illustration of the notion of critical height.

To simulate $X^{f}_{\epsilon,c}$ practically, we would need to evaluate the acceptance–rejection probability, which amounts to the following integration:

(2.4)

\begin{align}& \exp\!\big(\!-\!\big(\mathcal{H}_{\epsilon,c}^f(y)-\mathcal{H}_{\epsilon,c}^f(x)\big)_+\big)\nonumber \\[5pt] &\qquad = \begin{cases} 1 &\mbox{if } \mathcal{H}(y) \leqslant \mathcal{H}(x), \\[5pt] \exp\!\left(\!-\!\dfrac{1}{\epsilon}(\mathcal{H}(y) - \mathcal{H}(x))\right) & \mbox{if } c\geqslant \mathcal{H}(y) > \mathcal{H}(x), \\[9pt] \exp\!\left(\!-\!\dfrac{1}{\epsilon}(c - \mathcal{H}(x)) - \int_{c}^{\mathcal{H}(y)} \dfrac{1}{f(u-c) + \epsilon}\,du\right) & \mbox{if } \mathcal{H}(y) > c \geqslant \mathcal{H}(x), \\[9pt] \exp\!\left(\!-\!\int_{\mathcal{H}(x)}^{\mathcal{H}(y)} \dfrac{1}{f(u-c) + \epsilon}\,du\right) & \mbox{if } \mathcal{H}(y) > \mathcal{H}(x) > c. \end{cases}\end{align}

In the following three subsections, we evaluate the above integrals (2.4) with f chosen to be a linear, quadratic, or cubic function, respectively.

2.1. Linear f: Metropolis–Hastings with logarithmic Hamiltonian and Catoni’s energy transformation algorithm

In this subsection, we specialize to $f(u) = u$ . It turns out we can understand the landscape modification as if the Hamiltonian is on a logarithmic scale whenever $\mathcal{H}(x) > c$ .

For $x,y \in \{\mathcal{H}(y) > \mathcal{H}(x) \geqslant c\}$ , since

\begin{align*}\int_{\mathcal{H}(x)}^{\mathcal{H}(y)} \frac{1}{u-c + \epsilon}\,du = \ln \bigg(\dfrac{\mathcal{H}(y)-c+\epsilon}{\mathcal{H}(x)-c+\epsilon}\bigg),\end{align*}

substituting the expression into (2.4) gives

\begin{align*} \exp\!\big(\!-\!\big(\mathcal{H}_{\epsilon,c}^f(y)-\mathcal{H}_{\epsilon,c}^f(x)\big)_+\big) &= \begin{cases} 1 &\mbox{if } \mathcal{H}(y) \leqslant \mathcal{H}(x), \\[5pt] \exp\!\bigg(\!-\!\dfrac{1}{\epsilon}(\mathcal{H}(y) - \mathcal{H}(x))\bigg) & \mbox{if } c\geqslant \mathcal{H}(y) > \mathcal{H}(x), \\[9pt] \exp\!\bigg(\!-\!\dfrac{1}{\epsilon}(c - \mathcal{H}(x))\bigg) \dfrac{\epsilon}{\mathcal{H}(y)-c+\epsilon} & \mbox{if } \mathcal{H}(y) > c \geqslant \mathcal{H}(x), \\[9pt] \dfrac{\mathcal{H}(x)-c+\epsilon}{\mathcal{H}(y)-c+\epsilon} & \mbox{if } \mathcal{H}(y) > \mathcal{H}(x) > c. \end{cases}\end{align*}

This resulting dynamics $M^f$ coincides with the energy transformation method introduced by Catoni [Reference Catoni1, Reference Catoni2] on $\{\mathcal{H}(y) > \mathcal{H}(x) > c\}$ , which is based on a logarithmic Hamiltonian. We refer readers to Section 2.6 for a more detailed account of the connection between landscape modification and Catoni’s energy transformation.

2.2. Quadratic f: Metropolis–Hastings with $\arctan$ Hamiltonian

In this subsection, we take $f(u) = u^2$ . In this case, the effect of the landscape modification gives an inverse-tangent-transformed Hamiltonian whenever $\mathcal{H}(x) > c$ .

For $x,y \in \{\mathcal{H}(y) > \mathcal{H}(x) \geqslant c\}$ , using the inverse-tangent difference formula we obtain

\begin{align*} \int_{\mathcal{H}(x)}^{\mathcal{H}(y)} \frac{1}{(u-c)^2 + \epsilon}\,du &= \sqrt{\frac{1}{\epsilon}}\left(\arctan\!\left(\sqrt{\frac{1}{\epsilon}}(\mathcal{H}(y)-c)\right) - \arctan\!\left(\sqrt{\frac{1}{\epsilon}}(\mathcal{H}(x)-c)\right)\right) \\[5pt] &= \sqrt{\frac{1}{\epsilon}} \arctan\!\left(\dfrac{\sqrt{\frac{1}{\epsilon}}(\mathcal{H}(y)-\mathcal{H}(x))}{1+\frac{1}{\epsilon}(\mathcal{H}(y)-c)(\mathcal{H}(x)-c)}\right),\end{align*}

and substituting the above expression into (2.4) gives

\begin{align*} & \exp\!\big(\!-\!\big(\mathcal{H}_{\epsilon,c}^f(y)-\mathcal{H}_{\epsilon,c}^f(x)\big)_+\big) \\[5pt] &= \begin{cases} 1 &\mbox{if } \mathcal{H}(y) \leqslant \mathcal{H}(x), \\[5pt] \exp\!\left(\!-\!\dfrac{1}{\epsilon}(\mathcal{H}(y) - \mathcal{H}(x))\right) &\mbox{if } c\geqslant \mathcal{H}(y) > \mathcal{H}(x), \\[9pt] \exp\!\left(\!-\!\dfrac{1}{\epsilon}(c - \mathcal{H}(x)) - \sqrt{\frac{1}{\epsilon}}\arctan\!\left(\sqrt{\frac{1}{\epsilon}}(\mathcal{H}(y)-c)\right)\right) &\mbox{if } \mathcal{H}(y) > c \geqslant \mathcal{H}(x), \\[9pt] \exp\!\left(\sqrt{\dfrac{1}{\epsilon}}\left(\arctan\!\left(\sqrt{\frac{1}{\epsilon}}(\mathcal{H}(x)-c)\right) - \arctan\!\left(\sqrt{\frac{1}{\epsilon}}(\mathcal{H}(y)-c)\right)\right)\right) &\mbox{if } \mathcal{H}(y) > \mathcal{H}(x) > c. \end{cases}\end{align*}

2.3. Square root f: Metropolis–Hastings with sum of square root and logarithmic Hamiltonian

In the final example, we let $f(z) = \sqrt{z}$ for $z \geqslant 0$ . For $x,y \in \{\mathcal{H}(y) > \mathcal{H}(x) \geqslant c\}$ , consider the integral

\begin{align*}\int_{\mathcal{H}(x)}^{\mathcal{H}(y)} \frac{1}{\sqrt{u-c} + \epsilon}\,du = 2\bigg(\Big(\sqrt{\mathcal{H}(y)-c} - \sqrt{\mathcal{H}(x)-c}\Big) - \epsilon \ln \!\bigg(\dfrac{\sqrt{\mathcal{H}(y)-c} + \epsilon}{\sqrt{\mathcal{H}(x)-c} + \epsilon}\bigg)\bigg).\end{align*}

Substituting the expression into (2.4) gives

\begin{align*} & \exp\!\big(\!-\!\big(\mathcal{H}_{\epsilon,c}^f(y)-\mathcal{H}_{\epsilon,c}^f(x)\big)_+\big)\\[5pt] &= \begin{cases} 1 &\mbox{if } \mathcal{H}(y) \leqslant \mathcal{H}(x), \\[5pt] \exp\!\bigg(\!-\!\dfrac{1}{\epsilon}(\mathcal{H}(y) - \mathcal{H}(x))\bigg) &\mbox{if } c\geqslant \mathcal{H}(y) > \mathcal{H}(x), \\[9pt] \exp\!\bigg(\!-\!\dfrac{1}{\epsilon}(c - \mathcal{H}(x)) - 2\sqrt{\mathcal{H}(y)-c}\bigg) \bigg(\dfrac{\sqrt{\mathcal{H}(y)-c} + \epsilon}{ \epsilon}\bigg)^{2\epsilon} &\mbox{if } \mathcal{H}(y) > c \geqslant \mathcal{H}(x), \\[9pt] \exp\!\left(2\sqrt{\mathcal{H}(x)-c} - 2\sqrt{\mathcal{H}(y)-c}\right) \bigg(\dfrac{\sqrt{\mathcal{H}(y)-c} + \epsilon}{\sqrt{\mathcal{H}(x)-c} + \epsilon}\bigg)^{2\epsilon} &\mbox{if } \mathcal{H}(y) > \mathcal{H}(x) > c. \end{cases}\end{align*}

Remark 1. (Use of landscape modification for sampling.) This paper focuses on investigating the acceleration effect of landscape modification in stochastic optimization and simulated annealing. However, the technique of landscape modification can also be applied to sampling from multimodal distributions. The paper Zhang and Choi [Reference Zhang and Choi7] analyses the use of landscape modification for sampling. We now briefly describe the setting therein. Suppose that we are interested in sampling from a multimodal distribution that we denote by $\nu(x) \propto e^{-\mathcal{H}(x)}$ . Applying the idea of landscape modification, we then construct an MH chain with the transformed Hamiltonian function

(2.5)

\begin{align} \mathcal{H}^f_{1,c,\alpha}(x) = \int_{\mathcal{H}_{\textrm{min}}}^{\mathcal{H}(x)} \dfrac{1}{\alpha f((u-c)_+) + 1}\,du, \end{align}

so that its stationary distribution is given by $\nu^f_{1,c,\alpha}(x) \propto e^{-\mathcal{H}^f_{1,c,\alpha}(x)}$ . We note that the parameter $\alpha \geqslant 0$ , introduced in (2.5), controls the bias between the distribution $\nu$ and its landscape-modified counterpart $\nu^f_{1,c,\alpha}$ . If we anneal this parameter by sending $\alpha_t \to 0$ as $t \to \infty$ , then $\nu^f_{1,c,\alpha_t}$ converges weakly to the target distribution $\nu$ . As a result, we are able to construct a non-homogeneous MH chain that converges to $\nu$ in the long run while enjoying the benefits of landscape modification.

2.4. A Metropolized Ehrenfest urn with landscape modification

In this section, we discuss a Metropolized Ehrenfest urn model; our exposition closely follows that of Deuschel and Mazza ([Reference Deuschel and Mazza8], Section 5.1.2). Let us first briefly fix the setting. We consider the state space $\mathcal{X} = \{0,1,\ldots,d\}$ with $d \in \mathbb{N}$ and take a linear Hamiltonian $\mathcal{H}(x) = x$ , where $\mathcal{H}_{\textrm{min}} = 0$ . The proposal birth–death chain has generator Q given by $Q(x,x+1) = 1 - x/d$ , $Q(x,x-1) = x/d$ , $Q(x,x) = -1$ and zero otherwise, and we note that the stationary measure of Q is $\mu(x) \propto 2^{-d} (\substack{d \\[5pt] x})$ . With these choices, the classical Metropolized dynamics is $M^0_{\epsilon}(x,x+1) = Q(x,x+1) e^{-\frac{1}{\epsilon}}$ , $M^0_{\epsilon}(x,x-1) = Q(x,x-1)$ , and $\pi^0(x) \propto e^{-\frac{1}{\epsilon} x} \mu(x)$ . It is shown in Deuschel and Mazza ([Reference Deuschel and Mazza8], Equation (5.1.1)) that

(2.6)

\begin{align} \lambda_2\big(\!-\!M^0_{\epsilon}\big) \leqslant \lambda_2(\!-\!Q) \dfrac{1}{\sum_{x=0}^d e^{-\frac{1}{\epsilon} x} \mu(x)} = \dfrac{d}{2} \dfrac{2^d}{\big(1+e^{-\frac{1}{\epsilon}}\big)^d}.\end{align}

At a fixed temperature $\epsilon$ , we note that the upper bound in (2.6) is exponential in d.

Now, we consider the landscape-modified MH with $f(z) = z$ and $c = 1$ . With these parameters, we compute

\begin{align*}\mathcal{H}(x) = \int_0^x \dfrac{1}{(u-1)_+ + \epsilon} \, du = \frac{1}{\epsilon} + \ln\!\bigg(\dfrac{(x-1)_+ + \epsilon}{\epsilon}\bigg).\end{align*}

Using Deuschel and Mazza ([Reference Deuschel and Mazza8], Equation (5.1.1)) leads to

(2.7)

\begin{align} \lambda_2\big(\!-\!M^f_{\epsilon,c=1}\big) \leqslant \lambda_2(\!-\!Q) \dfrac{1}{\sum_{x=0}^d e^{-\mathcal{H}(x)} \mu(x)} = \dfrac{d}{2} \dfrac{2^d e^{\frac{1}{\epsilon}}}{\frac{\epsilon}{\lfloor d/2 \rfloor - 1 + \epsilon} \left(\substack{d \\[5pt] {\lfloor d/2 \rfloor}}\right)} \sim d^3 \frac{1}{\epsilon} e^{\frac{1}{\epsilon}},\end{align}

where we use Stirling’s formula, which gives, for large enough d,

\begin{align*} \left(\substack{d \\[5pt] {\lfloor d/2 \rfloor}}\right) &\sim \dfrac{2^d}{\sqrt{\pi \lfloor d/2 \rfloor}}.\end{align*}

As a result, the upper bound in (2.7) gives a polynomial dependency on d, at the tradeoff of an exponential dependency on $\frac{1}{\epsilon}$ . In retrospect this result is not surprising, since we are working with a logarithmic Hamiltonian and hence the asymptotics is in the polynomial of d instead of $2^d$ . This section serves as a warm-up example to prepare us for discussing the more complicated CW model in Section 3.

2.5. Tuning strategies for f and c

In this section, we discuss tuning strategies for f and the threshold parameter c in the context of using the MH chain with landscape modification for stochastic optimization.

First, the function f controls how the landscape is transformed above the threshold parameter c. For example, taking a linear $f(z) = z$ gives a logarithmic transformation, while taking a quadratic $f(z) = z^2$ yields an $\arctan$ transformation, as shown in Sections 2.1 and 2.2. In general, we recommend using $f(z) = z^2$ , since the $\arctan$ transformation is uniformly bounded above by $\pi/2$ , which facilitates exploration on the part of the landscape above c by giving a higher transition rate compared with the choice of $f(z) = z$ . However, we suspect that in numerical investigations, and depending on the target Hamiltonian $\mathcal{H}$ , there are possibilities of using a linear f to yield improved convergence towards the global minimum.

Second, as we shall see in Section 4, the threshold parameter c controls the clipped critical height $c^*$ , and ideally it should be set as close to $\mathcal{H}_{\textrm{min}}$ as possible. This is possible if we have information about the value $\mathcal{H}_{\textrm{min}}$ , which is the case for some statistical physics and theoretical computer science models [Reference Nardi and Zocca9, Reference Zocca10]. In general, however, we may not have access to the value $\mathcal{H}_{\textrm{min}}$ , and one general method is to tune the threshold parameter adaptively by setting the value at time t to be the running minimum up to time t generated by the chain. However, the resulting process becomes non-Markovian because of the adaptive tuning. Another adaptive tuning strategy is to set the value of c at time t to be $c_t = \mathcal{H}(y_t) - d$ , where $y_t$ is the proposed state at time t generated by the proposal chain and $d \geqslant 0$ is a fixed number. In this way, the transition rate of the landscape-modified MH chain is always greater than or equal to that of the MH chain without landscape modification. We shall numerically investigate this tuning strategy and report positive results in Section 4.1.

As far as this paper is concerned, we assume a fixed c in all of the main theoretical results. We shall postpone to future work a systematic numerical study of investigating various choices of f and tuning strategies for c on benchmark functions, as well as a theoretical analysis of adaptively tuning c by the running minimum generated by the algorithm in an MH chain with landscape modification.

2.6. Connections between landscape modification and other acceleration techniques

In this section, we outline conceptual similarities and differences between MH with landscape modification and other common acceleration techniques in the literature for MH and simulated annealing.

2.6.1. Catoni’s energy transformation algorithm

Let $\alpha_1 \geqslant 0$ , $\alpha_2 \geqslant 0$ , $\alpha_3 > - \mathcal{H}_{\textrm{min}}$ be three parameters. In Catoni [Reference Catoni1, Reference Catoni2], the author introduces the energy transformation algorithm by transforming the Hamiltonian $\mathcal{H}$ to

\begin{align*}F_{\alpha_1,\alpha_2,\alpha_3}(x) \;:\!=\; \alpha_1 \mathcal{H}(x) + \alpha_2 \log \left(\mathcal{H}(x) + \alpha_3 \right).\end{align*}

Recall from Section 2.1 that MH with landscape modification can be considered as a state-dependent version of energy transformation if we take $f(z) = z$ , $\alpha_3 = -c + \epsilon$ , with $\alpha_1$ , $\alpha_2$ chosen in a state-dependent manner:

\begin{align*}\alpha_1(x) = \textbf{1}_{\{\mathcal{H}(x) \leqslant c\}}, \quad \alpha_2(x) = \textbf{1}_{\{\mathcal{H}(x) > c\}}.\end{align*}

Note that MH with landscape modification can give rise to other kinds of energy transformation through different choices of f; see for example the quadratic case or the square root case in Section 2.2 or 2.3, respectively. The idea of mapping or transforming the function from $\mathcal{H}$ to $F(\mathcal{H})$ with F being strictly increasing and concave can be dated back to R. Azencott.

2.6.2. Preconditioning of the Hamiltonian

Landscape modification can be understood as a state-dependent preconditioning of the Hamiltonian $\mathcal{H}$ . Recall that in (2.4) we compute the acceptance–rejection probability in MH by

\begin{align*} & \exp\!\big(\!-\!\big(\mathcal{H}_{\epsilon,c}^f(y)-\mathcal{H}_{\epsilon,c}^f(x)\big)_+\big)\\[5pt] &= \begin{cases} 1 &\mbox{if } \mathcal{H}(y) \leqslant \mathcal{H}(x), \\[5pt] \exp\!\left(\!-\!\dfrac{1}{\epsilon}(\mathcal{H}(y) - \mathcal{H}(x))\right) &\mbox{if } c\geqslant \mathcal{H}(y) > \mathcal{H}(x), \\[9pt] \exp\!\left(\!-\!\dfrac{1}{\epsilon}(c - \mathcal{H}(x)) - \int_{c}^{\mathcal{H}(y)} \frac{1}{f(u-c) + \epsilon}\,du\right) &\mbox{if } \mathcal{H}(y) > c \geqslant \mathcal{H}(x), \\[9pt] \exp\!\left(\!-\!\int_{\mathcal{H}(x)}^{\mathcal{H}(y)} \dfrac{1}{f(u-c) + \epsilon}\,du\right) &\mbox{if } \mathcal{H}(y) > \mathcal{H}(x) > c. \end{cases}\end{align*}

On the set $\{c\geqslant \mathcal{H}(y) > \mathcal{H}(x)\}$ , the acceptance–rejection probability is the same as the original MH to allow for exploitation, while on the set $\{\mathcal{H}(y) > c \geqslant \mathcal{H}(x)\}$ and $\{\mathcal{H}(y) > \mathcal{H}(x) > c\}$ , the acceptance–rejection probability is higher than that of the original MH to encourage exploration of the landscape. Therefore, there is a higher transition rate of moving to other states when the algorithm is above the threshold c.

2.6.3. Importance sampling

In importance sampling, the target distribution is altered for possible benefits and speed-ups such as variance reduction. In landscape modification, the target distribution in MH is altered from the original Gibbs distribution $\pi^0(x) \propto e^{-\frac{1}{\epsilon} \mathcal{H}(x)}$ to $\pi^f_{\epsilon,c}(x) \propto e^{-\mathcal{H}_{\epsilon,c}^f(x)}$ , while the set of stationary points is preserved in the sense that $\mathcal{H}$ and $\mathcal{H}^f_{\epsilon,c}$ share the same set of stationary points. In importance sampling, however, the set of stationary points need not be preserved between the altered Hamiltonian and the original Hamiltonian.

2.6.4 Quantum annealing

In quantum annealing [Reference Wang, Wu and Zou3], given a target Hamiltonian $\mathcal{H}$ and an initial Hamiltonian $\mathcal{H}_{init}$ that is usually easy to optimize, we optimize a time-dependent function $Q_t$ defined, for $t \in [0,T]$ , by

\begin{align*}Q_t(x) \;:\!=\; A(t) \mathcal{H}_{\textrm{init}}(x) + B(t) \mathcal{H}(x),\end{align*}

where A(t) and B(t) are smooth annealing schedules that satisfy $A(T) = B(0) = 0$ and T is the total annealing time. We also choose A(t) to be decreasing and B(t) to be increasing on the interval [0,T].

In simulated annealing with landscape modification, we also optimize a time-dependent function $\mathcal{H}^f_{\epsilon_t,c}$ which shares the same set of stationary points as the target $\mathcal{H}$ . In quantum annealing, $\mathcal{H}_{\textrm{init}}$ and $\mathcal{H}$ do not necessarily share the same set of stationary points. We mention the works Del Moral and Miclo [Reference Del Moral and Miclo11], Löwe [Reference Löwe12], and Frigerio and Grillo [Reference Frigerio and Grillo13] for simulated annealing with time-dependent energy function.

3. The Curie–Weiss model with landscape modification

In this section, we demonstrate the power of landscape modification by revisiting the Curie–Weiss (CW) model. With an appropriate choice of parameters, the landscape of the CW free energy is modified and the local minimum is eliminated while the global minimum is preserved on the transformed function. As a result, landscape modification convexifies the free energy from a double-well to a single-well as a function of the magnetization.

Let us first recall the setting of the CW model of a ferromagnet with external field $h \in \mathbb{R}$ and fix some notation. We shall follow the setting of Bovier and den Hollander ([Reference Bovier and den Hollander14], Chapters 13–14). Let $\mathcal{X} = \{-1,1\}^N$ be the set of possible configurations of the CW model with $N \in \mathbb{N}$ . The CW Hamiltonian is given, for $\sigma = (\sigma_i)_{i=1}^N \in \mathcal{X}$ , by

\begin{align*}H_{N}(\sigma) \;:\!=\; -\dfrac{1}{2N} \sum_{i,j=1}^{N} \sigma_i \sigma_j - h \sum_{i=1}^N \sigma_i = - \dfrac{N}{2} m_N(\sigma)^2 - h N m_N(\sigma) \;=\!:\; N E\left(m_N(\sigma)\right),\end{align*}

where $m_N(\sigma) = (1/N)\sum_{i=1}^N \sigma_i$ is the empirical magnetization. Consider the continuized Glauber dynamics by picking a node uniformly at random and flipping the sign of the selected spin, while targeting the Gibbs distribution with the CW Hamiltonian $H_N$ at temperature $\epsilon$ . The resulting Metropolis dynamics is given by

\begin{align*}P_{\epsilon,N}(\sigma,\sigma^{\prime}) = \begin{cases} (1/N) e^{-\frac{1}{\epsilon}(H_N(\sigma^{\prime})-H_N(\sigma))_+} &\mbox{if } \left\lVert\sigma-\sigma^{\prime}\right\rVert_1 = 2, \\[5pt] - \sum_{\eta: \eta \neq \sigma} P_{\epsilon,N}(\sigma,\eta) &\mbox{if } \sigma = \sigma^{\prime}, \\[5pt] 0 & \mbox{otherwise,} \end{cases}\end{align*}

where $\left\lVert\cdot\right\rVert_1$ is the $l^1$ norm on $\mathcal{X}$ .

The dynamics of the empirical magnetization $(m^0(t))_{t \geqslant 0}$ can be described by lumping the Glauber dynamics to give

\begin{align*} M^0_N(m,m^{\prime}) = M^0_{\epsilon,N}(m,m^{\prime}) = \begin{cases} \dfrac{1-m}{2} e^{-\frac{1}{\epsilon} N(E(m^{\prime})-E(m))_+} &\mbox{if } m^{\prime} = m + 2N^{-1}, \\[9pt] \dfrac{1+m}{2} e^{-\frac{1}{\epsilon} N(E(m^{\prime})-E(m))_+} &\mbox{if } m^{\prime} = m - 2N^{-1}, \\[5pt] - \sum_{m^{\prime}: m^{\prime} \neq m} M^0_{\epsilon,N}(m,m^{\prime}) &\mbox{if } m = m^{\prime}, \\[5pt] 0 &\mbox{otherwise,} \end{cases}\end{align*}

on the state space $\Gamma_N \;:\!=\; \{-1,-1+2N^{-1},\ldots,1-2N^{-1},1\}$ with the image Gibbs distribution

\begin{align*}\pi^0_{N}(m) = \pi^0_{\epsilon,N}(m) \propto e^{-\frac{1}{\epsilon} N E(m)} \bigg(\substack{N \\[5pt] \frac{1+m}{2}N}\bigg) 2^{-N}, \quad m \in \Gamma_N,\end{align*}

as the stationary distribution. Note that the dependency on $\epsilon$ is suppressed in the notation of $M^0_N$ and $\pi^0_N$ . Define

\begin{align*} I_N(m) &\;:\!=\; - \dfrac{1}{N} \ln \!\bigg(\bigg(\substack{N \\[5pt] \frac{1+m}{2}N}\bigg) 2^{-N}\bigg), \\[5pt] I(m) &\;:\!=\; \dfrac{1}{2} (1+m) \ln\!(1+m) + \dfrac{1}{2} (1-m) \ln\!(1-m), \\[5pt] g_{\epsilon,N}(m) &\;:\!=\; E(m) + \epsilon I_N(m),\end{align*}

where I(m) is the Cramér rate function for coin tossing. Then the image Gibbs distribution can be written as

\begin{align*} \pi^0_{\epsilon,N}(m) &\propto e^{-\frac{1}{\epsilon} N g_{\epsilon,N}(m)}, \\[5pt] \lim_{N \to \infty} I_N(m) &= I(m), \\[5pt] g_{\epsilon}(m) &\;:\!=\; \lim_{N \to \infty} g_{\epsilon,N}(m) = E(m) + \epsilon I(m).\end{align*}

The quantity $g_{\epsilon}$ is called the free energy of the CW model. The stationary points of $g_{\epsilon}$ satisfy the classical mean-field equation

(3.1)

\begin{align} m = \tanh\!\bigg(\frac{1}{\epsilon}(m+h)\bigg).\end{align}

To seek the ground state(s) of the free energy, we consider modifying the landscape of the CW Hamiltonian from $H_N$ to

(3.2)

\begin{align} E^f_{\epsilon,c}(m) &\;:\!=\; \int_{d}^{E(m)} \dfrac{1}{f((u-c)_+) + \epsilon} \, du, \end{align}

(3.3)

\begin{align} H^f_{\epsilon,c,N}(\sigma) &\;:\!=\; N \cdot E^f_{\epsilon,c}(m(\sigma)),\end{align}

where $d \in \mathbb{R}$ can be chosen arbitrarily, since we are only interested in the difference of $E^f_{\epsilon,c}$ . The infinitesimal generator of the magnetization $(m^f(t))_{t \geqslant 0}$ is

\begin{align*} M^f_{\epsilon,c,N}(m,m^{\prime}) = \begin{cases} \dfrac{1-m}{2} e^{- N(E^f_{\epsilon,c}(m^{\prime})-E^f_{\epsilon,c}(m))_+} &\mbox{if } m^{\prime} = m + 2N^{-1}, \\[9pt] \dfrac{1+m}{2} e^{- N(E^f_{\epsilon,c}(m^{\prime})-E^f_{\epsilon,c}(m))_+} &\mbox{if } m^{\prime} = m - 2N^{-1}, \\[5pt] - \sum_{m^{\prime}: m^{\prime} \neq m} M^f_{\epsilon,c,N}(m,m^{\prime}) &\mbox{if } m = m^{\prime}, \\[5pt] 0 &\mbox{otherwise,} \end{cases}\end{align*}

with stationary distribution

\begin{align*}\pi^f_{\epsilon,c,N}(m) \propto e^{- N E^f_{\epsilon,c,N}(m)} \left(\substack{N \\[5pt] \frac{1+m}{2}N}\right) 2^{-N} = e^{- N g^f_{\epsilon,c,N}(m)}, \quad m \in \Gamma_N,\end{align*}

where

\begin{align*} g^f_{\epsilon,c,N}(m) &\;:\!=\; E^f_{\epsilon,c}(m) + I_N(m).\end{align*}

By taking the limit $N \to \infty$ , we find that the free energy in the landscape-modified CW model is

\begin{align*}g^f_{\epsilon,c}(m) \;:\!=\; E^f_{\epsilon,c}(m) + I(m).\end{align*}

Setting the derivative of $g^f_{\epsilon,c}$ equal to zero gives the landscape-modified mean-field equation:

(3.4)

\begin{align} m &= \tanh\!\bigg(\dfrac{m+h}{f((E(m)-c)_+) + \epsilon}\bigg).\end{align}

Observe that if we take $f = 0$ , then (3.4) reduces to the classical mean-field equation in (3.1).

3.1. Main results

Without loss of generality, assume the external magnetic field is $h < 0$ . In the subcritical regime where $\frac{1}{\epsilon} > 1$ , it is known that there are two local minima of $g_{\epsilon}$ . We denote the global minimum of $g_{\epsilon}$ by $m_{-}^* < 0$ and the other local minimum by $m_+^* > 0$ , where $|m_-^*| > m_+^*$ , and let $z^*$ be the saddle point between $m_-^*$ and $m_+^*$ . We also write $m_-^*(N)$ (resp. $m_+^*(N)$ ) for the closest point in Euclidean distance to $m_-^*$ (resp. $m_+^*$ ) on $\Gamma_N$ .

Theorem 1. (Landscape modification in the subcritical regime.) Suppose $\frac{1}{\epsilon} > 1$ , $h < 0$ , and f,c are chosen as in Assumption 1.

1. (Convexification of the free energy $g_{\epsilon}$ and subexponential mean crossover time.) If we choose $c \in [E(m_-^*), E(m_+^*))$ , $c < h^2/2$ , $-h - \sqrt{h^2 - 2c} \leqslant z^*$ , and for $m \in [-h - \sqrt{h^2 - 2c},-h + \sqrt{h^2 - 2c}]$ ,
\begin{align*}m > \tanh\!\bigg(\dfrac{m+h}{f((E(m)-c)_+) + \epsilon}\bigg),\end{align*}
then $m_-^*$ is the only stationary point of the modified free energy $g^f_{\epsilon,c}$ , which is a global minimum. Consequently, we have subexponential mean crossover time on the modified landscape
(3.5) \begin{align} \lim_{N \to \infty} \dfrac{1}{N} \log \mathbb{E}_{m_+^*(N)} \Big(\tau^f_{m_-^*(N)}\Big) = 0, \end{align}
while the mean crossover time on the original landscape is exponential in N, $1/\epsilon$ , and the original critical height $g_{\epsilon}(z^*) - g_{\epsilon}(m_+^*)$ with
\begin{align*} \lim_{N \to \infty} \dfrac{1}{N} \log \mathbb{E}_{m_+^*(N)} \Big(\tau^0_{m_-^*(N)}\Big) = \frac{1}{\epsilon} (g_{\epsilon}(z^*) - g_{\epsilon}(m_+^*)). \end{align*}
2. If we choose $c \in [E(m_+^*), E(z^*)]$ and assume in addition that f is twice differentiable and satisfies $f^{\prime}(0) = f^{\prime \prime}(0) = 0$ , then there exists
\begin{align*}\textbf{z}^* = \arg \max_{m_-^* \leqslant m \leqslant m_+^*} g^f_{\epsilon,c}(m),\end{align*}
and as $N \to \infty$ ,
\begin{align*} \mathbb{E}_{m_+^*(N)} \Big(\tau^f_{m_-^*(N)}\Big) &= \exp\!\Big(N \Big(g_{\epsilon,c}^f(\textbf{z}^{*}) - g_{\epsilon,c}^f(m_+^*)\Big)\Big) \\[5pt] &\quad \times \frac{2}{1-|\textbf{z}^{*}|} \sqrt{\frac{1-\textbf{z}^{*2}}{1-m_{+}^{* 2}}} \frac{2 \pi N / 4}{ \sqrt{\left(\!-\!\Big(g_{\epsilon,c}^{f}\Big)^{\prime \prime}\left(\textbf{z}^{*}\right)\right) \Big(g_{\epsilon,c}^{f}\Big)^{\prime \prime}\left(m_{+}^{*}\right)}} (1+o(1)). \end{align*}
Consequently,
\begin{align*} \lim_{N \to \infty} \dfrac{1}{N} \log \mathbb{E}_{m_+^*(N)} \Big(\tau^f_{m_-^*(N)}\Big) &= g_{\epsilon,c}^f(\textbf{z}^{*}) - g_{\epsilon,c}^f(m_+^*) \\[5pt] &= \frac{1}{\epsilon} (c - E(m_+^*)) + \int_c^{E(\textbf{z}^{*})} \dfrac{1}{f((u-c)_+) + \epsilon} \,du \\[5pt] & \quad + \left(I(\textbf{z}^{*}) - I(m_+^*)\right) \\[5pt] &\leqslant \frac{1}{\epsilon} (g_{\epsilon}(z^*) - g_{\epsilon}(m_+^*)) = \lim_{N \to \infty} \dfrac{1}{N} \log \mathbb{E}_{m_+^*(N)} \Big(\tau^0_{m_-^*(N)}\Big). \end{align*}

Before we present the proof, we interpret the results in Theorem 1 intuitively: in Item 1, on the one hand we would like to choose c small enough so that the mapping

\begin{align*}m \mapsto \tanh\!\bigg(\dfrac{m+h}{f((E(m)-c)_+) + \epsilon}\bigg)\end{align*}

is flattened and only intersects the straight line $m \mapsto m$ at the global minimum $m_-^*$ . In this way the landscape of $g^f_{\epsilon,c}$ is transformed from a double-well to a single-well landscape, while the location of the global minimum at $m_-^*$ is preserved as that in the original landscape $g_{\epsilon}$ . This is illustrated in Figures 2 and 3. On the other hand, we cannot choose c to be too small if we are interested in seeking the ground state of $g_{\epsilon}$ , since otherwise, if $c < E(m_-^*)$ , then $m_-^*$ may no longer be the global minimum in the transformed free energy $g^f_{\epsilon,c}$ . This consequently yields a mean crossover time on the modified landscape that is subexponential in N, while the original mean crossover time is exponential in N, $1/\epsilon$ , and the original critical height $g_{\epsilon}(z^*) - g_{\epsilon}(m_+^*)$ . In Item 2 of Theorem 1, we choose a larger value of c compared with that in Item 1. Although the transformed free energy $g^f_{\epsilon,c}$ is not a convex function, it has a smaller critical height than the original free energy $g_{\epsilon}$ . This gives a reduced exponential dependence on the modified mean crossover time compared with the original mean crossover time.

Figure 2. Plots of the free energy $g_{\epsilon}$ and the modified free energy $g^{f}_{\epsilon,c} + E_{min}$ with $h = -0.05$ and $f(z) = z$ at temperature $\epsilon = 1/1.5$ , where $E_{min} = \min_{m \in [-1,1]} E(m)$ . We shift the modified free energy by $E_{min}$ so that it is on the same scale as the original free energy $g_{\epsilon}$ .

Figure 3. Plots of the mean-field equation (3.1) and the modified mean-field equation (3.4) with $h = -0.05$ and $f(z) = z$ at temperature $\epsilon = 1/1.5$ .

The power of landscape modification or energy transformation lies in tuning the parameter c appropriately. One way to tune c is to use the running minimum generated by the algorithm on the original free energy $g_{\epsilon}$ . Suppose we start in the well containing the local minimum $m_+^*$ ; setting c to be the running minimum eventually gives $c = E(m_+^*)$ , and hence Item 2 of Theorem 1 can be applied and the critical height on the modified landscape is reduced.

We illustrate Theorem 1 with a concrete numerical example in Figures 2 and 3, where we take $h = -0.05$ and $f(z) = z$ at temperature $\epsilon = 1/1.5$ . We numerically compute that $m_-^* = -0.8863$ , $m_+^* = 0.8188$ , and $z^* = 0.1524$ . As a result we have $E(m_-^*) = -0.4371$ , $E(m_+^*) = -0.2943$ , and $E(z^*) = -0.004$ . In the leftmost plots of Figures 2 and 3, we choose $c = -0.4 \in [E(m_-^*), E(m_+^*))$ . We numerically check that the conditions in Item 1 of Theorem 1 are satisfied, and we see that the location of the global minimum is the same for the blue curve as for the orange curve. In the rightmost plots of Figures 2 and 3, we choose $c = -0.2 \in [E(m_+^*), E(z^*)]$ . We see that for the blue curve and the red curve, the locations of the two local minima are the same, while the critical height is smaller than that in the original landscape $g_{\epsilon}$ .

3.2. Proof of Theorem 1

Before we give the proof, let us first recall the concept of critical height and the notation L as introduced in Section 2. In this section, we are interested in the CW model with and without landscape modification, with free energy $g^f_{\epsilon,c,N}$ and $g_{\epsilon,N}$ respectively. We therefore define the analogous concepts of critical height by inserting a subscript of N. This leads us to

\begin{align*} H^f_{\epsilon,c,N} &\;:\!=\; L\big(g^f_{\epsilon,c,N}\big) = \max_{x,y \in \mathcal{X}}\Big\{G\big(g^f_{\epsilon,c,N},x,y\big) - g^f_{\epsilon,c,N}(x) - g^f_{\epsilon,c,N}(y) \Big\} + \min_{x \in \Gamma_N} g^f_{\epsilon,c,N}(x), \\[5pt] H^0_{\epsilon,N} &\;:\!=\; L(g_{\epsilon,N}) = \max_{x,y \in \mathcal{X}}\Big\{G(g_{\epsilon,N},x,y) - g^f_{\epsilon,N}(x) - g_{\epsilon,N}(y) \Big\} + \min_{x \in \Gamma_N} g_{\epsilon,N}(x),\end{align*}

where we recall that G is introduced in (2.3).

We now proceed with the proof.

Proof of Theorem 1. First we prove Item 1. We observe that $\{E(m) \geqslant c\} = \{m \in [-h - \sqrt{h^2 - 2c},-h + \sqrt{h^2 - 2c}]\}$ . On this interval,

\begin{align*}\dfrac{d}{dm} g^f_{\epsilon,c}(m) = \textrm{arctanh}\,(m) - \dfrac{m+h}{f((E(m)-c)_+) + \epsilon} > 0,\end{align*}

and hence the modified free energy is strictly increasing on this interval. On the interval $\{m > -h + \sqrt{h^2 - 2c}\}$ ,

\begin{align*}\dfrac{d}{dm} g^f_{\epsilon,c}(m) = \dfrac{d}{dm} g_{\epsilon}(m) > 0,\end{align*}

as the original free energy is strictly increasing. On the interval $\{m < -h - \sqrt{h^2 - 2c}\}$ , we also have

\begin{align*}\dfrac{d}{dm} g^f_{\epsilon,c}(m) = \dfrac{d}{dm} g_{\epsilon}(m).\end{align*}

Thus, with these parameter choices, the only stationary point of $g^f_{\epsilon,c}$ is $m^*_-$ , which is the global minimum.

Next we proceed to proving (3.5). According to Löwe ([Reference Löwe12], Theorem 2.1), for $\xi_1(N)$ a polynomial function in N, we have

\begin{align*}\dfrac{1}{\lambda_2\Big(\!-\!M^f_{\epsilon,c,N}\Big)} \leqslant \xi_1(N) e^{N H^f_{\epsilon,c,N}}.\end{align*}

Using the random target lemma (Aldous and Fill, [Reference Aldous and Fill15], Section 4.2) and the above inequality leads to

(3.6)

\begin{align} \pi^f_{\epsilon,c,N}(m^*_-(N)) \mathbb{E}_{m_+^*(N)} \Big(\tau^f_{m_-^*(N)}\Big) \leqslant \sum_{y \in \Gamma_N} \pi^f_{\epsilon,c,N}(y) \mathbb{E}_{m_+^*(N)} \left(\tau^f_{y}\right) &\leqslant (|\Gamma_N|-1) \dfrac{1}{\lambda_2\Big(\!-\!M^f_{\epsilon,c,N}\Big)} \nonumber \\[5pt] &\leqslant N \xi_1(N) e^{N H^f_{\epsilon,c,N}}. \end{align}

Now, let $\overline{m}(N) \;:\!=\; \arg \min g^f_{\epsilon,c,N}(m)$ and compute

\begin{align*} g^f_{\epsilon,c,N}(m^*_-(N)) - g^f_{\epsilon,c,N}(\overline{m}(N)) &= E^f_{\epsilon,c}(m^*_-(N)) - E^f_{\epsilon,c}(\overline{m}(N)) + I_N(m^*_-(N)) - I_N(\overline{m}(N)) \\[5pt] &= E^f_{\epsilon,c}(m^*_-(N)) - E^f_{\epsilon,c}(\overline{m}(N)) \\[5pt] &\quad + I_N(m^*_-(N)) - I(m^*_-(N)) + I(m^*_-(N)) - I_N(\overline{m}(N)) \\[5pt] &= E^f_{\epsilon,c}(m^*_-(N)) - E^f_{\epsilon,c}(\overline{m}(N)) \\[5pt] &\quad + [1+o(1)] \frac{1}{2 N} \ln \left(\frac{\pi N\left(1-m^*_-(N)^{2}\right)}{2}\right) \\[5pt] &\quad + I(m_-^*(N)) - I(\overline{m}(N)) \\[5pt] &\quad - [1+o(1)] \frac{1}{2 N} \ln \left(\frac{\pi N\left(1-\overline{m}(N)^{2}\right)}{2}\right) \\[5pt] &\rightarrow 0 \quad \text{as} \, N \to \infty, \end{align*}

where we use Bovier and den Hollander ([Reference Bovier and den Hollander14], Equation (13.2.5)) in the third equality, and $m_-^*(N), \overline{m}(N) \to m_-^*$ as $N \to \infty$ . The above computation combined with (3.6) yields

(3.7)

\begin{align} \limsup_{N \to \infty} \mathbb{E}_{m_+^*(N)} \Big(\tau^f_{m_-^*(N)}\Big) \leqslant \lim_{N \to \infty} \dfrac{1}{N} \log H^f_{\epsilon,c,N} = 0. \end{align}

On the other hand, as the magnetization $(m^f(t))_{t \geqslant 0}$ is a birth–death process, using Equation (13.2.2) of Bovier and den Hollander [Reference Bovier and den Hollander14], the mean hitting time can be calculated explicitly as

\begin{align*} & \mathbb{E}_{m_{+}^{*}(N)}\Big(\tau^f_{m_{-}^{*}(N)}\Big) \\[5pt] &\quad= \sum_{\substack{m, m^{\prime} \in \Gamma_{N}, m \leqslant m^{\prime} \\ m_{-}^{*}(N)<m \leqslant m_{+}^{*}(N)}} \frac{\pi^f_{\epsilon,c, N}\left(m^{\prime}\right)}{\pi^f_{\epsilon,c, N}(m)} \frac{1}{M^f_{\epsilon,c, N}\left(m, m-2 N^{-1}\right)} \\ &\quad\geqslant \dfrac{1}{M^f_{\epsilon,c, N}\left(m_{+}^{*}(N), m_{+}^{*}(N)-2 N^{-1}\right)} = \dfrac{2}{1+m_{+}^{*}(N)} e^{ N(E^f_{\epsilon,c}(m_{+}^{*}(N)-2 N^{-1})-E^f_{\epsilon,c}(m))_+} \\ &\quad\geqslant \dfrac{2}{1+m_{+}^{*}(N)}. \end{align*}

As a result, as $m_+^*(N) \to m_+^*$ we have

(3.8)

\begin{align} \liminf_{N \to \infty} \dfrac{1}{N} \log \mathbb{E}_{m_+^*(N)} \Big(\tau^f_{m_-^*(N)}\Big) \geqslant 0. \end{align}

Using both (3.8) and (3.7) gives (3.5).

Next, we prove Item 2, the argument for which closely follows that of Bovier and den Hollander ([Reference Bovier and den Hollander14], Theorem 13.1). Since we choose $c \in [E(m_+^*), E(z^*)]$ , we have $E(m_-^*) - c < E(m_+^*) - c \leqslant 0$ , and hence the landscape-modified mean-field equation (3.4) has at least two solutions $m_+^*$ and $m_-^*$ , which are exactly the same as those of the original mean-field equation (3.1). As the landscape-modified mean-field equation is continuous in m, there exists $\textbf{z}^* = \arg \max_{m_-^* \leqslant m \leqslant m_+^*} g^f_{\epsilon,c}(m)$ which also satisfies (3.4). Now, for $m \in \Gamma_N$ we consider

\begin{align*} N \big(E^f_{\epsilon,c}(m - 2N^{-1}) - E^f_{\epsilon,c}(m)\big) &= N \int_{E(m)}^{E(m) + 2N^{-1}(m+h - N^{-1})} \dfrac{1}{f((u-c)_+) + \epsilon} \, du \\[5pt] &\rightarrow \dfrac{2(m+h)}{f((E(m)-c)_+) + \epsilon} \quad \text{as } N \to \infty. \end{align*}

If we take $N \to \infty$ and $m \rightarrow \textbf{z}^*$ , we obtain

\begin{align*} \frac{1}{M^f_{\epsilon,c, N}\big(m, m-2 N^{-1}\big)} \to \dfrac{2}{1+\textbf{z}^*} \exp \bigg(\dfrac{2(\textbf{z}^*+h)_+}{f((E(\textbf{z}^*)-c)_+) + \epsilon}\bigg) = \dfrac{2}{1 - |\textbf{z}^*|}, \end{align*}

since, if $\textbf{z}^* > 0$ , then $\textbf{z}^* + h > 0$ and satisfies the mean-field equation (3.4). Using the mean hitting time formula again leads to the following, for any $\delta > 0$ :

(3.9)

\begin{align} \mathbb{E}_{m_{+}^{*}(N)}\Big(\tau^f_{m_{-}^{*}(N)}\Big) &= \sum_{\substack{m, m^{\prime} \in \Gamma_{N}, m \leqslant m^{\prime} \\ m_{-}^{*}(N)<m \leqslant m_{+}^{*}(N)}} \frac{\pi^f_{\epsilon,c, N}\left(m^{\prime}\right)}{\pi^f_{\epsilon,c, N}(m)} \frac{1}{M^f_{\epsilon,c, N}\left(m, m-2 N^{-1}\right)} \nonumber \\ &= e^{N\left[g^f_{\epsilon,c,N}\left(\textbf{z}^{*}\right) - g^f_{\epsilon,c,N}\left(m_{+}^{*}\right)\right]} \frac{2}{1 - |\textbf{z}^*|}[1+o(1)] \nonumber \\ &\quad \times \sum_{\substack{m, m^{\prime} \in \Gamma_{N} \\ \left|m-\textbf{z}^{*}\right|<\delta,\left|m^{\prime}-m_{+}^{*}\right|<\delta}} \mathrm{e}^{N\left[g^f_{\epsilon,c, N}(m)-g^f_{\epsilon,c, N}\left(\textbf{z}^{*}\right)\right]- N\left[g^f_{\epsilon,c, N}\left(m^{\prime}\right)-g^f_{\epsilon,c, N}\left(m_{+}^{*}\right)\right]}. \end{align}

Using Stirling’s formula and the same argument as in Bovier and den Hollander ([Reference Bovier and den Hollander14], Equation (13.2.7)) yields

\begin{align*} e^{N\left[g^f_{\epsilon,c,N}\left(\textbf{z}^{*}\right) - g^f_{\epsilon,c,N}\left(m_{+}^{*}\right)\right]} &= (1+o(1)) e^{N\left[g^f_{\epsilon,c}\left(\textbf{z}^{*}\right) - g^f_{\epsilon,c}\left(m_{+}^{*}\right)\right]} \sqrt{\dfrac{1-\textbf{z}^{*2}}{1-m_+^{*2}}}. \end{align*}

Substitution into (3.9) gives

(3.10)

\begin{align} & \mathbb{E}_{m_{+}^{*}(N)}\left(\tau^f_{m_{-}^{*}(N)}\right) \nonumber \\[5pt] &= e^{N\left[g^f_{\epsilon,c}\left(\textbf{z}^{*}\right) - g^f_{\epsilon,c}\left(m_{+}^{*}\right)\right]} \frac{2}{1 - |\textbf{z}^*|} \sqrt{\dfrac{1-\textbf{z}^{*2}}{1-m_+^{*2}}}[1+o(1)] \nonumber \\[5pt] &\quad \times \sum_{\substack{m, m^{\prime} \in \Gamma_{N}\\ \left|m-\textbf{z}^{*}\right|<\delta,\left|m^{\prime}-m_{+}^{*}\right|<\delta}}\sqrt{\dfrac{1-m^{2}}{1-\textbf{z}^{*2}}} \sqrt{\dfrac{1-m^{*2}_+}{1-m^{\prime 2}}} \mathrm{e}^{N\left[g^f_{\epsilon,c}(m)-g^f_{\epsilon,c}\left(\textbf{z}^{*}\right)\right]- N\left[g^f_{\epsilon,c}\left(m^{\prime}\right)-g^f_{\epsilon,c}\left(m_{+}^{*}\right)\right]}. \end{align}

We use a Laplace method argument to handle the sum in (3.10). Note that as f is assumed to be twice differentiable with $f(0) = f^{\prime}(0) = f^{\prime \prime}(0) = 0$ , this implies that $g^f_{\epsilon,c}$ is three times differentiable, and applying the third-order Taylor expansion gives

(3.11)

\begin{align} g^f_{\epsilon,c}(m) - g^f_{\epsilon,c}\big(\textbf{z}^{*}\big) &= \dfrac{(m - \textbf{z}^{*})^2}{2} \big(g_{\epsilon,c}^{f}\big)^{\prime \prime}( \textbf{z}^*) + \mathcal{O}\left((m - \textbf{z}^{*})^3\right),\end{align}

(3.12)

\begin{align} g^f_{\epsilon,c}\left(m^{\prime}\right)-g^f_{\epsilon,c}\left(m_{+}^{*}\right) &= \dfrac{(m^{\prime} - m_{+}^{*})^2}{2} \big(g_{\epsilon,c}^{f}\big)^{\prime \prime}\big( m_{+}^{*}\big) + \mathcal{O}\left((m^{\prime} - m_{+}^{*})^3\right), \end{align}

where we use $(g_{\epsilon,c}^{f})^{\prime}(\textbf{z}^*) = (g_{\epsilon,c}^{f})^{\prime}(m_{+}^{*}) = 0$ . Now we observe that the sum in (3.10) is

\begin{align*} &\quad (1+o(1)) \dfrac{N}{4} \int_{\mathbb{R}} \int_{\mathbb{R}} \exp \left[\frac{1}{2} \big(g_{\epsilon,c}^{f}\big)^{\prime \prime}\left(\textbf{z}^{*}\right) u^{2}-\frac{1}{2} \big(g_{\epsilon,c}^{f}\big)^{\prime \prime}\left(m_{+}^{*}\right) u^{\prime 2}\right]\, du du^{\prime} \\[5pt] &= (1+o(1)) \dfrac{N}{4} \frac{2 \pi}{\sqrt{\left[-\big(g_{\epsilon,c}^{f}\big)^{\prime \prime}\left(\textbf{z}^{*}\right)\right] \big(g_{\epsilon,c}^{f}\big)^{\prime \prime}\left(m_{+}^{*}\right)}}, \end{align*}

since $\big(g_{\epsilon,c}^{f}\big)^{\prime \prime}\left(\textbf{z}^{*}\right) < 0$ and $\big(g_{\epsilon,c}^{f}\big)^{\prime \prime}\left(m_{+}^{*}\right) > 0$ .

3.3. Extension to the random-field Curie–Weiss model

In Section 3.1, we discussed the classical CW model with landscape modification under a fixed magnetic field. In this section, we consider the random-field CW model with landscape modification. We discuss related metastability results and the ground-state free energy in such a setting, with the aim of illustrating that landscape modification can also be applied in the setting of a random energy landscape. Let us begin by recalling the random-field CW model. We shall adapt the setting of Mathieu and Picco [Reference Mathieu and Picco16]. Let $(h_i)_{i \in \mathbb{N}}$ be a sequence of independent and identically distributed random variables with $\mathbb{P}(h_i = 1) = \mathbb{P}(h_i = -1) = 1/2$ . We consider the random Hamiltonian function given, for a fixed $\theta > 0$ and $\sigma \in \{-1,1\}^N$ , by

\begin{align*}\textbf{H}_N(\sigma) = \textbf{H}_N(\sigma, \omega) &\;:\!=\; - \dfrac{N}{2} \textbf{m}_N(\sigma)^2 - \theta \sum_{i=1}^N h_i(\omega) \sigma_i \\[5pt] &= - \dfrac{N}{2} (\textbf{m}_N^+(\sigma) + \textbf{m}_N^-(\sigma))^2 - \theta N (\textbf{m}_N^+(\sigma) - \textbf{m}_N^-(\sigma)),\end{align*}

where

\begin{align*}\textbf{m}_N(\sigma) \;:\!=\; (1/N) \sum_{i=1}^N \sigma_i, \qquad\textbf{m}_N^+(\sigma) \;:\!=\; (1/N) \sum_{i=1;\; h_i = 1}^N \sigma_i, \qquad\textbf{m}_N^-(\sigma) \;:\!=\; (1/N) \sum_{i=1;\; h_i = -1}^N \sigma_i.\end{align*}

In the sequel, we shall suppress the dependency on $\omega$ . Denote the Gibbs distribution at temperature $\epsilon$ on $\{-1,1\}^N$ by

\begin{align*}\boldsymbol{\nu}_N(\sigma) \propto \exp\bigg\{-\frac{1}{\epsilon} \textbf{H}_N(\sigma)\bigg\}.\end{align*}

Let $N^+ \;:\!=\; |\{i;\;h_i = +1\}|$ , $N^- \;:\!=\; |\{i;\;h_i = -1\}|$ , and define the random set

\begin{align*}\boldsymbol{\Gamma}_N \;:\!=\; \bigg(\!-\!\frac{N^{+}}{N},-\frac{N^{+}}{N} + \frac{2}{N}, \ldots, \frac{N^{+}}{N}\bigg) \times\bigg(\!-\!\frac{N^{-}}{N},-\frac{N^{-}}{N} + \frac{2}{N}, \ldots, \frac{N^{-}}{N}\bigg).\end{align*}

For $\textbf{m} = (\textbf{m}^+, \textbf{m}^-) \in \boldsymbol{\Gamma}_N$ , with slight abuse of notation we write

\begin{align*}\textbf{H}_N(\textbf{m}) = - \dfrac{N}{2} (\textbf{m}^+ + \textbf{m}^-)^2- \theta N (\textbf{m}^+ - \textbf{m}^-) \;=\!:\; N \cdot \textbf{E}(\textbf{m}),\end{align*}

where $\textbf{m}^+$ and $\textbf{m}^-$ are the magnetization among the sites i where respectively $h_i = 1$ and $h_i = -1$ . Let $\boldsymbol{\pi}_{\epsilon,N}^0$ denote the image Gibbs distribution of $\boldsymbol{\nu}_N$ by $\boldsymbol{\Gamma}_{N}$ , where

\begin{align*} \boldsymbol{\pi}_{\epsilon,N}^0(\textbf{m}) &\propto \exp\bigg\{- \frac{1}{\epsilon} N \textbf{g}_{\epsilon,N}(\textbf{m})\bigg\}, \\[5pt] \textbf{g}_{\epsilon,N}(\textbf{m}) &\;:\!=\; - \dfrac{1}{2} (\textbf{m}^+ + \textbf{m}^-)^2- \theta (\textbf{m}^+ - \textbf{m}^-) - \frac{1}{\frac{1}{\epsilon} N} \log \left(\begin{array}{c} N^{+} \\[5pt] \frac{N^{+}}{2}+\textbf{m}^{+} \frac{N}{2} \end{array}\right)\left(\begin{array}{c} N^{-} \\[5pt] \frac{N^{-}}{2}+\textbf{m}^{-} \frac{N}{2} \end{array}\right).\end{align*}

As $N \to \infty$ , by the strong law of large numbers, $\textbf{g}_{\epsilon,N}$ converges almost surely to the free energy given by

\begin{align*}\textbf{g}_{\epsilon}(\textbf{m}) \;:\!=\; - \dfrac{1}{2} (\textbf{m}^+ + \textbf{m}^-)^2- \theta (\textbf{m}^+ - \textbf{m}^-) + \frac{1}{2\frac{1}{\epsilon}}\left(I(2\textbf{m}^+) + I(2\textbf{m}^-)\right),\end{align*}

where I(m) is the Cramér rate function as introduced in Section 3. The critical points of $\textbf{g}_{\epsilon}$ satisfy

(3.13)

\begin{align} \textbf{m}^+ &= \dfrac{1}{2} \tanh\!\bigg(\frac{1}{\epsilon}(\textbf{m}^+ + \textbf{m}^- + \theta)\bigg),\end{align}

(3.14)

\begin{align} \textbf{m}^- &= \dfrac{1}{2} \tanh\!\bigg(\frac{1}{\epsilon}(\textbf{m}^+ + \textbf{m}^- - \theta)\bigg).\end{align}

In this section, we shall only consider the subcritical regime where $\frac{1}{\epsilon} > \cosh^2(\frac{1}{\epsilon} \theta)$ . It can be shown (see e.g. [Reference Mathieu and Picco16]) that there are exactly three critical points. Let $\textbf{m}_{*} > 0$ be the unique positive solution to the mean-field equation

\begin{align*}\textbf{m}_{*} = \dfrac{1}{2}\bigg(\tanh\!\bigg(\frac{1}{\epsilon}(\textbf{m}_{*} + \theta)\bigg) + \tanh\!\bigg(\frac{1}{\epsilon}(\textbf{m}_{*} - \theta)\bigg)\bigg).\end{align*}

The three critical points of $\textbf{g}_{\epsilon}$ are given by

\begin{align*} \textbf{m}_{0} &= \bigg(\frac{1}{2} \tanh \bigg(\frac{1}{\epsilon} \theta\bigg),-\frac{1}{2} \tanh \bigg(\frac{1}{\epsilon} \theta\bigg)\bigg), \\[5pt] \textbf{m}_{1} &= \bigg(\frac{1}{2} \tanh \bigg(\frac{1}{\epsilon} \textbf{m}_{*}+\frac{1}{\epsilon} \theta\bigg), \frac{1}{2} \tanh \bigg(\frac{1}{\epsilon} \textbf{m}_{*}-\frac{1}{\epsilon} \theta\bigg)\bigg), \\[5pt] \textbf{m}_{2} &= \bigg(\frac{1}{2} \tanh \bigg(\!-\!\frac{1}{\epsilon} \textbf{m}_{*}+\frac{1}{\epsilon} \theta\bigg),-\frac{1}{2} \tanh \bigg(\frac{1}{\epsilon} \textbf{m}_{*}+\frac{1}{\epsilon} \theta\bigg)\bigg),\end{align*}

where $\textbf{m}_0$ is the saddle point and $\textbf{m}_1, \textbf{m}_2$ are the two global minima. Consider the continuized Glauber dynamics $(\sigma_N(t))_{t \geqslant 0}$ by picking a node uniformly at random and changing the sign of the selected spin, while targeting the Gibbs distribution $\boldsymbol{\nu}_N$ at temperature $\epsilon$ . Denote by $\textbf{m}_N(t) \;:\!=\; \textbf{m}_N(\sigma_N(t))$ the induced dynamics on the magnetization, and denote its infinitesimal generator by $\textbf{M}^0_{\epsilon,N}$ . This is proven to be a Markov chain in Mathieu and Picco [Reference Mathieu and Picco16], with stationary measure $\boldsymbol{\pi}^0_{\epsilon,N}$ .

Now let us consider the landscape-modified Hamiltonian on $\boldsymbol{\Gamma}_N$ ,

(3.15)

\begin{align} \textbf{E}^f_{\epsilon,c}(\textbf{m}) &\;:\!=\; \int_{d}^{\textbf{E}(\textbf{m})} \dfrac{1}{f((u-c)_+) + \epsilon} \, du,\end{align}

(3.16)

\begin{align} \textbf{H}^f_{\epsilon,c,N}(\textbf{m}) &\;:\!=\; N \cdot \textbf{E}^f_{\epsilon,c}(\textbf{m}),\end{align}

where $d \in \mathbb{R}$ can be chosen arbitrarily since we are only interested in the difference of $\textbf{E}^f_{\epsilon,c}$ . The transformed image Gibbs distribution is therefore

\begin{align*} \boldsymbol{\pi}_{\epsilon,c,N}^f(\textbf{m}) &\propto \exp\{- N \textbf{g}^f_{\epsilon,c,N}(\textbf{m})\}, \\[5pt] \textbf{g}^f_{\epsilon,c,N}(\textbf{m}) &\;:\!=\; \textbf{E}^f_{\epsilon,c}(\textbf{m}) - \frac{1}{N} \log \left(\begin{array}{c} N^{+} \\[5pt] \frac{N^{+}}{2}+\textbf{m}^{+} \frac{N}{2} \end{array}\right)\left(\begin{array}{c} N^{-} \\[5pt] \frac{N^{-}}{2}+\textbf{m}^{-} \frac{N}{2} \end{array}\right).\end{align*}

The strong law of large numbers yields that as $N \to \infty$ , $\textbf{g}^f_{\epsilon,c,N}$ converges almost surely to the transformed free energy

\begin{align*}\textbf{g}^f_{\epsilon,c}(\textbf{m}) \;:\!=\; \textbf{E}^f_{\epsilon,c}(\textbf{m}) + \frac{1}{2}\left(I(2\textbf{m}^+) + I(2\textbf{m}^-)\right).\end{align*}

The critical points of $\textbf{g}^f_{\epsilon,c}$ satisfy the following landscape-modified mean-field equations:

(3.17)

\begin{align} \textbf{m}^+ &= \dfrac{1}{2} \tanh\!\bigg(\dfrac{\textbf{m}^+ + \textbf{m}^- + \theta}{f((\textbf{E}(\textbf{m})-c)_+) + \epsilon}\bigg),\end{align}

(3.18)

\begin{align} \textbf{m}^- &= \dfrac{1}{2} \tanh\!\bigg(\dfrac{\textbf{m}^+ + \textbf{m}^- - \theta}{f((\textbf{E}(\textbf{m})-c)_+) + \epsilon}\bigg).\end{align}

Note that (3.17) and (3.18) reduce to the classical case (3.13) and (3.14) if we take $f = 0$ . Consider the continuized Glauber dynamics $(\sigma_N^f(t))_{t \geqslant 0}$ by picking a node uniformly at random and changing the sign of the selected spin, while targeting the Gibbs distribution with Hamiltonian $\epsilon \textbf{H}^f_{\epsilon,c,N}$ at temperature $\epsilon$ . Denote by $\textbf{m}_N^f(t) \;:\!=\; \textbf{m}^f_N(\sigma_N^f(t))$ the induced dynamics on the magnetization, and denote by $\textbf{M}^f_{\epsilon,c,N}$ its infinitesimal generator, which is a Markov chain with stationary measure $\boldsymbol{\pi}^f_{\epsilon,c,N}$ .

In the following, we shall consider the case where $c \in [\textbf{E}(\textbf{m}_1), \textbf{E}(\textbf{m}_0)]$ . It can be seen that the two global minima of $\textbf{g}^f_{\epsilon,c}$ remain at $\textbf{m}_1, \textbf{m}_2$ with this choice of c. For any path $\gamma^{\textbf{m}_1,\textbf{m}_0}$ connecting $\textbf{m}_1$ and $\textbf{m}_0$ , we define

\begin{align*} \textbf{m}_3(\gamma^{\textbf{m}_1,\textbf{m}_0}) &\;:\!=\; \arg \max \{\textbf{g}^f_{\epsilon,c}(\gamma_i);\;\gamma_i \in \gamma^{\textbf{m}_1,\textbf{m}_0}\}, \\[5pt] \Delta \textbf{g}^f_{\epsilon,c} &\;:\!=\; \min_{\gamma^{\textbf{m}_1,\textbf{m}_0}} \textbf{m}_3(\gamma^{\textbf{m}_1,\textbf{m}_0}) - \textbf{g}^f_{\epsilon,c}(\textbf{m}_1) = \min_{\gamma^{\textbf{m}_1,\textbf{m}_0}} \textbf{m}_3(\gamma^{\textbf{m}_1,\textbf{m}_0}) - \textbf{g}^f_{\epsilon,c}(\textbf{m}_0),\end{align*}

where $\Delta \textbf{g}^f_{\epsilon,c}$ is the critical height on the modified landscape. We also write $\Delta \textbf{g}_{\epsilon}$ to denote the critical height on the original landscape. Suppose that $\Delta \textbf{g}_{\epsilon}$ is attained at $\textbf{m}_4$ , so that $\Delta \textbf{g}_{\epsilon} = \textbf{g}_{\epsilon}(\textbf{m}_4) - \textbf{g}_{\epsilon}(\textbf{m}_0)$ ; we deduce

\begin{align*} \Delta \textbf{g}^f_{\epsilon,c} \leqslant \textbf{g}^f_{\epsilon,c}(\textbf{m}_4) - \textbf{g}^f_{\epsilon,c}(\textbf{m}_0) &= \int_{\textbf{E}(\textbf{m}_0)}^{\textbf{E}(\textbf{m}_4)} \dfrac{1}{f((u-c)_+) + \epsilon} \, du \\[5pt] &\quad + \frac{1}{2}\big(I\big(2\textbf{m}_4^+\big) + I\big(2\textbf{m}_4^-\big)\big) - \frac{1}{2}\big(I\big(2\textbf{m}_0^+\big) + I\big(2\textbf{m}_0^-\big)\big) \\[5pt] &\leqslant \frac{1}{\epsilon} \left(\textbf{g}_{\epsilon}(\textbf{m}_4) - \textbf{g}_{\epsilon}(\textbf{m}_0)\right) = \frac{1}{\epsilon} \Delta \textbf{g}_{\epsilon}.\end{align*}

In other words, the critical height of the free energy in the modified landscape is bounded above by $\frac{1}{\epsilon}$ times the critical height of the free energy in the original landscape.

A direct application of Mathieu and Picco ([Reference Mathieu and Picco16], Theorem 2.7) yields the following result on the asymptotics of the spectral gap.

Theorem 2. (Asymptotics of the spectral gap.) Suppose $\theta > 0$ , $\frac{1}{\epsilon} > \cosh^2\left(\frac{1}{\epsilon} \theta\right)$ are fixed, and $\textbf{m}_0$ is the saddle point, while $\textbf{m}_1, \textbf{m}_2$ are the two global minima on the original free energy landscape $\textbf{g}_{\epsilon}$ . For $c \in [\textbf{E}(\textbf{m}_1), \textbf{E}(\textbf{m}_0)]$ , we have $\mathbb{P}$ -almost surely that

\begin{align*} \lim_{N \to \infty} \dfrac{1}{N} \log \lambda_2(\!-\!\textbf{M}^f_{\epsilon,c,N}) &= - \Delta \textbf{g}^f_{\epsilon,c} \geqslant - \frac{1}{\epsilon} \Delta \textbf{g}_{\epsilon} = \lim_{N \to \infty} \dfrac{1}{N} \log \lambda_2(\!-\!\textbf{M}^0_{\epsilon,N}). \end{align*}

In essence, the relaxation time in the mean-field limit of the transformed generator $\textbf{M}^f_{\epsilon,c,N}$ is asymptotically less than or equal to that of the original generator $\textbf{M}^0_{\epsilon,N}$ .

This yields a reduced exponential dependence of the relaxation time on the modified landscape, compared with the relaxation time on the original landscape.

4. Discrete simulated annealing with landscape modification

Unlike in previous sections of this paper, where the temperature parameter is fixed, in this section we consider the non-homogeneous MH with landscape modification where the temperature schedule $(\epsilon_t)_{t \geqslant 0}$ is time-dependent and non-increasing and goes to zero as $t \to \infty$ .

We first recall the concept of critical height as introduced in Section 2. To be precise, we define

(4.1)

\begin{align} H^f_{\epsilon,c} &\;:\!=\; L\big(\mathcal{H}^f_{\epsilon,c}\big) = \max_{x,y \in \mathcal{X}} \Big\{G\Big(\mathcal{H}^f_{\epsilon,c},x,y\Big) - \mathcal{H}^f_{\epsilon,c}(x) - \mathcal{H}^f_{\epsilon,c}(y) \Big\},\end{align}

(4.2)

\begin{align} H^0 &\;:\!=\; L(\mathcal{H}) = \max_{x,y \in \mathcal{X}}\{G(\mathcal{H},x,y) - \mathcal{H}(x) - \mathcal{H}(y)\} + \min \mathcal{H},\end{align}

(4.3)

\begin{align} c^* &\;:\!=\; c^*(\mathcal{H},c) = \max_{x,y \in \mathcal{X}} \{(G(\mathcal{H},x,y) \wedge c) - (\mathcal{H}(x) \wedge c) - (\mathcal{H}(y) \wedge c) \} + \min \mathcal{H}, \end{align}

where $H^f_{\epsilon,c}$ is the critical height associated with the modified landscape, $H^0$ is the critical height associated with the original landscape $\mathcal{H}$ , and $c^*$ is the clipped critical height. We shall see in the main results of this section that both $H^f_{\epsilon,c}$ and $c^*$ play fundamental roles in the relaxation time in the low-temperature regime and in determining the cooling schedule of an improved simulated annealing algorithm running on the modified landscape.

As an illustration to calculate and compare these critical heights, we consider a simple one-dimensional landscape with a saddle point at s, a local (but not global) minimum at m, and a single global minimum. At temperature $\epsilon = 1$ , the original critical height is attained at $H^0 = \mathcal{H}(s) - \mathcal{H}(m)$ while the modified critical height is $H^f_{1,c} = \mathcal{H}^f_{1,c}(s) - \mathcal{H}^f_{1,c}(m) \leqslant \mathcal{H}^0$ . In this setting, depending on whether c is above or below $\mathcal{H}(m)$ , the clipped critical height $c^*$ is

\begin{align*} c^* = \begin{cases} 0 \quad &\textrm{if} \, c \leqslant \mathcal{H}(m)\,, \\[5pt] c - \mathcal{H}(m) \quad &\textrm{if} \, c > \mathcal{H}(m)\,. \end{cases} \end{align*}

These critical heights are illustrated in Figure 4.

Figure 4. Comparing the critical heights $H^0, H^f_{1,c}, c^*$ on a one-dimensional landscape generated by $\mathcal{H}$ with a local minimum at m and a global minimum, where we take $\epsilon = 1$ , $f(z) = z$ , and the proposal chain is of nearest-neighbour type, i.e. going left or right with probability $1/2$ .

Our first result gives the asymptotic order of the spectral gap $\lambda_2\Big(\!-\!M^f_{\epsilon,c}\Big)$ in terms of $c^*$ in the low-temperature regime, which will be proven to be essential in obtaining a convergence result for simulated annealing.

Theorem 3. Assume that f and $\min \mathcal{H} \leqslant c \leqslant \max \mathcal{H}$ satisfy Assumption 1, and in addition for all small enough $z > 0$ we have $f(z) \geqslant z$ . Then there exist positive constants $C_2, C_3, C_4$ , depending on the state space $\mathcal{X}$ and the proposal generator Q but not on the temperature $\epsilon$ , and a subexponential function

\begin{align*}C_1(\epsilon) \;:\!=\; \begin{cases} \dfrac{1}{C_2} \left(1 + \dfrac{1}{\epsilon}(\!\max \mathcal{H} - c)\right)\exp\bigg\{\dfrac{1}{f(\delta)} \left(\max \mathcal{H} - \min \mathcal{H}\right)\bigg\} &\textit{if} \, c < \max \mathcal{H}, \\[9pt] \dfrac{1}{C_2} &\textit{if} \, c = \max \mathcal{H}, \end{cases}\end{align*}

where $\delta \;:\!=\; \min_{x;\; \mathcal{H}(x) > c} \{\mathcal{H}(x) - c\}$ , such that

\begin{align*} C_1^{-1}(\epsilon) e^{-\frac{1}{\epsilon} c^*} \leqslant C_2 e^{-H^f_{\epsilon,c}} \leqslant \lambda_2(\!-\!M^f_{\epsilon,c}) \leqslant C_3 e^{-H^f_{\epsilon,c}} \leqslant C_4 e^{-\frac{1}{\epsilon} c^*}, \end{align*}

where $H^f_{\epsilon,c}$ is introduced in (4.1) and $c^*$ is defined in (4.3). This leads to

\begin{align*} \lim_{\epsilon \to 0} \epsilon \log \lambda_2(\!-\!M^f_{\epsilon,c}) = - c^*. \end{align*}

As a corollary of Theorem 3, using the asymptotics of the spectral gap, we derive similar asymptotics for the mixing time and tunnelling time on the modified landscape.

Corollary 1. (Asymptotics of mixing and tunnelling times in the low-temperature regime.) Assume the same setting as in Theorem 3. Let $S_{\textrm{min}} \;:\!=\; \arg \min \mathcal{H}(x)$ be the set of global minima of $\mathcal{H}$ , let $\eta \in S_{\textrm{min}}$ , and assume that $\sigma,\eta$ attain $H^0$ such that $H^0 = G^0(\sigma,\eta) - \mathcal{H}(\sigma)$ . Then the following statements hold:

1.
\begin{align*}\lim_{\epsilon \to 0} \epsilon \log t_{mix}(M^f_{\epsilon,c},1/4) = c^*.\end{align*}
2.
\begin{align*}\lim_{\epsilon \to 0} \epsilon \log \mathbb{E}_{\sigma}(\tau^f_\eta) = c^* \leqslant H^0 = \lim_{\epsilon \to 0} \epsilon \log \mathbb{E}_{\sigma}(\tau^0_\eta).\end{align*}

In particular, when $c = \mathcal{H}_{\textrm{min}}$ , we have subexponential tunnelling time, as $\lim_{\epsilon \to 0} \epsilon \log \mathbb{E}_{\sigma}(\tau^f_\eta) = 0 = c^*$ .

Note that in the case where both $\sigma, \eta \in S_{\textrm{min}}$ with initial state $\sigma$ , it is a reasonable choice to pick the parameter $c = \mathcal{H}(X^f(0)) = \mathcal{H}(\sigma) = \mathcal{H}_{\textrm{min}}$ , and in this setting we have subexponential tunnelling time on the modified landscape. For instance, in applications we may know about a global minimizer $\sigma$ , and by setting $c = \mathcal{H}(\sigma)$ we can search for other possible global minimizers owing to the subexponential tunnelling in the low-temperature regime. For a concrete example, in the Widom–Rowlinson model with $m \in \mathbb{N}$ particle types, $S_{\textrm{min}}$ is precisely the set of configurations in which all sites are occupied by particles of the same type, and hence both $S_{\textrm{min}}$ and $\mathcal{H}_{\textrm{min}}$ are known in this model. In this direction, we refer interested readers to Nardi and Zocca [Reference Nardi and Zocca9] and Zocca [Reference Zocca10] for work on the energy landscape analysis of various statistical physics models.

To prove the convergence result for simulated annealing with landscape modification, as our target function $\mathcal{H}^f_{\epsilon,c}$ depends on time through the cooling schedule, we are in the setting of simulated annealing with time-dependent energy function as in Löwe [Reference Löwe12]. We first present the following auxiliary lemma, where we verify various assumptions from Löwe [Reference Löwe12] in our setting. We have included the statement of the lemma in this section, rather than with the proofs later on, since it will be helpful for understanding the convergence result in Theorem 4 below.

Lemma 1. Assume the same setting as in Theorem 3. Let $M \;:\!=\; \max \mathcal{H} - \min \mathcal{H}$ , $\beta_t \;:\!=\; 1/\epsilon_t$ , and assume the cooling schedule is

\begin{align*}\epsilon_t = \dfrac{c^* + \epsilon}{\ln(t+1)}\end{align*}

for $\epsilon$ small enough so that $M + \max \mathcal{H} - c > \epsilon > 0$ and $t \geqslant 0$ . Then we have the following:

1. For all $x \in \mathcal{X}$ and all $t \geqslant 0$ ,
\begin{align*}0 \leqslant \epsilon_t \mathcal{H}^f_{\epsilon_t,c}(x) \leqslant M.\end{align*}
2. For all $x \in \mathcal{X}$ ,
\begin{align*}\left|\dfrac{\partial}{\partial t} \epsilon_t \mathcal{H}^f_{\epsilon_t,c}(x)\right| \leqslant \dfrac{2M}{(\ln(1+t))(1+t)}.\end{align*}
3. Let $R_t \;:\!=\; \sup_x \frac{\partial}{\partial t} \epsilon_t \mathcal{H}^f_{\epsilon_t,c}(x)$ and $B \;:\!=\; 6M/(c^*+\epsilon)$ . For all $t \geqslant 0$ ,
\begin{align*}\beta_t^{\prime} M + \beta_t R_t \leqslant \dfrac{3M}{(c^*+\epsilon)(1+t)} = \dfrac{B}{2(1+t)}.\end{align*}
4. Let
\begin{align*}p \;:\!=\; \dfrac{2M}{M + \max \mathcal{H} - \epsilon - c} > 2\end{align*}
and
\begin{align*}A \;:\!=\; \begin{cases} \dfrac{1}{C_2 (\!\min_x \mu(x))^{(p-2)/p}}\exp\bigg\{\dfrac{1}{f(\delta)} \left(\max \mathcal{H} - \min \mathcal{H}\right)\bigg\} &\textit{if} \, c < \max \mathcal{H}, \\[11pt] \dfrac{1}{C_2 (\!\min_x \mu(x))^{(p-2)/p}} &\textit{if}\, c = \max \mathcal{H}, \end{cases}\end{align*}
where $C_2, \delta$ are as in Theorem 3, and we recall that $\mu$ is the stationary measure of the proposal generator Q. For $g \in \ell^p(\pi_{\epsilon_t,c}^f)$ , we have
\begin{align*}\left\lVert g - \pi_{\epsilon_t,c}^f(g)\right\rVert_{\ell^p(\pi^f_{\epsilon_t,c})}^2 \leqslant A (1+t) \langle -M^f_{\epsilon_t,c}g,g \rangle_{\pi^f_{\epsilon_t,c}}.\end{align*}

Remark 2. Items 1 and 2 correspond respectively to Equations (2.11) and (2.12) in Löwe [Reference Löwe12], while the lower bound of the spectral gap in Theorem 3 verifies Equation (2.13) in Löwe [Reference Löwe12]. Assumptions (A1) and (A2) in Löwe [Reference Löwe12] are checked in Items 3 and 4, respectively.

With the above lemma and the notation introduced there, we are ready to give one of the main results of this paper, which concerns the large-time convergence of discrete simulated annealing with landscape modification. The gain from landscape modification in simulated annealing can be seen by operating a possibly faster logarithmic cooling schedule with clipped critical height $c^*$ , while in classical simulated annealing the critical height is $H^0$ . The possible benefit thus depends on the tuning of c, since $c^* \leqslant c - \mathcal{H}_{\textrm{min}}$ . This result is analogous to the result obtained in Choi [Reference Choi4] for improved kinetic simulated annealing. A similar improvement of a logarithmic cooling schedule by means of reduction in critical height can be found in the infinite swapping algorithm [Reference Menz, Schlichting, Tang and Wu17].

Theorem 4. Assume the same setting as in Theorem 3. Let A,B,p be the quantities introduced in Lemma 1. Define

\begin{align*} \overline{\epsilon} &\;:\!=\; \dfrac{p-2}{p}, \quad K \;:\!=\; \dfrac{4(1+2AB)}{1 - \exp\{-\frac{1}{2A} - B\}}, \quad S_{\textrm{min}} \;:\!=\; \arg \min \mathcal{H}, \quad \underline{d} \;:\!=\; \min_{x;\;\mathcal{H}(x) \neq \mathcal{H}_{\textrm{min}}} \mathcal{H}(x). \end{align*}

Consider a cooling schedule of the following form: for any $\epsilon > 0$ as in Lemma 1,

\begin{align*}\epsilon_t = \dfrac{c^* + \epsilon}{\ln(t+1)}.\end{align*}

Under this cooling schedule, for any $x \in \mathcal{X} \backslash S_{\textrm{min}}$ and $t \geqslant e^{1/\overline{\epsilon}}-1$ , we have

\begin{align*}\mathbb{P}_x\left(\tau^f_{S_{\textrm{min}}} > t\right) \leqslant (1+K^{\frac{1}{2\overline{\epsilon}}}) \sqrt{\pi^f_{\epsilon_t,c}(\mathcal{X} \backslash S_{\textrm{min}})} + \pi^f_{\epsilon_t,c}(\mathcal{X} \backslash S_{\textrm{min}}) \to 0 \quad \textit{as } t \to \infty.\end{align*}

Note that

\begin{align*} \pi^f_{\epsilon_t,c}(\mathcal{X} \backslash S_{\textrm{min}}) &\leqslant \begin{cases} \dfrac{1}{\mu(S_{\textrm{min}})} e^{-\frac{1}{\epsilon_t}(\underline{d} - \mathcal{H}_{\textrm{min}})} &\textit{if} \, c \geqslant \underline{d}, \\[9pt] \dfrac{1}{\mu(S_{\textrm{min}})}\exp\bigg\{-\int_{c}^{\underline{d}} \frac{1}{f(u-c) + \epsilon_t}du\bigg\} &\textit{if} \, \mathcal{H}_{\textrm{min}} \leqslant c < \underline{d}. \end{cases} \end{align*}

Remark 3. (On tuning the threshold parameter c.) There are various ways to tune the parameter c for improved convergence. In Choi [Reference Choi4], we proposed to use the running minimum generated by the algorithm to tune c. Note that in the setting of the CW model, the parameter c can be tuned as explained earlier, in the second paragraph below Theorem 1.

4.1. Numerical illustrations

Before discussing the proofs of the main results above, we illustrate the convergence performance of simulated annealing with landscape modification, which we call improved simulated annealing (ISA), and compare it to the performance of the classical simulated annealing algorithm (SA) for the travelling salesman problem (TSP). We first state the parameters that we used to generate the numerical results:

• TSP and its objective function. We select 50 nodes uniformly at random on the grid $[0,100] \times [0,100]$ . The objective is to find a configuration that minimizes the total Euclidean distance with the same starting and ending point. Each node can only be visited once.
• Initial configuration. Both ISA and SA have the same initialization. They are initialized using the output of the nearest-neighbour algorithm: a node is randomly chosen as the starting point, which is then connected to the closest unvisited node. This is repeated until every node has been visited, and subsequently the last node is connected back to the starting node.
• Proposal chain. Both ISA and SA share the same proposal chain: at each step, a proposal move is generated using the 2-OPT algorithm [Reference Croes18].
• Acceptance–rejection mechanism. In SA, the proposed move is accepted with probability
\begin{align*}\min \left\{ 1,e^{\beta(\mathcal{H}(x)-\mathcal{H}(y))} \right\},\end{align*}
while in ISA, the acceptance probability is computed as in Section 2.1. In other words, we use a linear f in ISA. Both SA and ISA have the same source of randomness.
• Cooling schedule. Both ISA and SA use the same logarithmic cooling schedule, of the form
\begin{align*}\epsilon_t = \dfrac{\sqrt{50}}{\ln(t+1)}.\end{align*}
• Choice of c in ISA. In this experiment, if we denote the proposal configuration at time t by $y_t$ , we set $c = c_t$ to be
\begin{align*}c_t = \mathcal{H}(y_t) - 5.\end{align*}
This tuning strategy has been discussed in Section 2.5.
• Number of iterations. We run both ISA and SA for 100,000 iterations.

We generate 1000 random TSP instances. For each instance we compute what we call the improvement percentage ( $\textrm{IP}$ ) of ISA over SA, defined by

\begin{align*}\textrm{IP} \;:\!=\; 100 \dfrac{\min_{t \in [0,100,000]} \mathcal{H}(X^0(t)) - \min_{t \in [0,100,000]} \mathcal{H}(X^f(t))}{\min_{t \in [0,100,000]} \mathcal{H}(X^0(t))}.\end{align*}

The summary statistics of $\textrm{IP}$ are provided in Table 1, while its histogram over these 1000 instances can be found in Figure 5. The code for reproducing these results can be found at https://github.com/mchchoi/Improved-discrete-simulated-annealing.

The summary statistics in Table 1 and the histogram in Figure 5 offer empirical evidence for using ISA over SA: out of the 1000 TSP instances, there are 798 instances in which the $\textrm{IP}$ is non-negative. The sample mean of the $\textrm{IP}$ is approximately 1.87%, while its sample median is 1.47%.

Next, in Figure 6, we investigate the difference between SA and ISA in a particular instance. We see that SA (the blue curve) is stuck at a local minimum, while ISA (the orange curve) is able to escape the local minimum, owing to the increased acceptance probability compared with SA, and it reaches regions where the objective value is smaller than that of SA.

Table 1. Summary statistics of $\textrm{IP}$ on 1000 random TSP instances.

Figure 5. Histogram showing IP of ISA over SA on 1000 randomly generated TSP instances.

Figure 6. TSP objective value against iteration of ISA and SA.

The rest of this section is devoted to the proofs of Theorem 3, Corollary 1, Lemma 1, and Theorem 4.

4.2 Proof of Theorem 3

First, using the classical result of Holley and Stroock ([Reference Holley and Stroock19], Theorem 2.1), it is immediate that

\begin{align*} C_2 e^{-H^f_{\epsilon,c}} \leqslant \lambda_2(\!-\!M^f_{\epsilon,c}) \leqslant C_3 e^{-H^f_{\epsilon,c}}.\end{align*}

For any arbitrary $x_1, x_2 \in \{\mathcal{H}(x_1) \geqslant \mathcal{H}(x_2)\}$ , we deduce the following upper bound:

\begin{align*}& \mathcal{H}^f_{\epsilon,c}(x_1) - \mathcal{H}^f_{\epsilon,c}(x_2)\\[5pt] &= \int_{\mathcal{H}(x_2)}^{\mathcal{H}(x_1)} \dfrac{1}{f((u-c)_+) + \epsilon}\,du \\[5pt] &= \begin{cases} \dfrac{1}{\epsilon}(\mathcal{H}(x_1) - \mathcal{H}(x_2)) &\mbox{if } c\geqslant \mathcal{H}(x_1) > \mathcal{H}(x_2), \\[9pt] \dfrac{1}{\epsilon}(c - \mathcal{H}(x_2)) + \int_{c}^{\mathcal{H}(x_1)} \dfrac{1}{f(u-c) + \epsilon}\,du &\mbox{if } \mathcal{H}(x_1) > c \geqslant \mathcal{H}(x_2), \\[9pt] \int_{\mathcal{H}(x_2)}^{\mathcal{H}(x_1)} \dfrac{1}{f(u-c) + \epsilon}\,du &\mbox{if } \mathcal{H}(x_1) > \mathcal{H}(x_2) > c, \end{cases} \\[9pt] &\leqslant \begin{cases} \dfrac{1}{\epsilon}(\mathcal{H}(x_1) \wedge c - \mathcal{H}(x_2) \wedge c) &\mbox{if } c\geqslant \mathcal{H}(x_1) > \mathcal{H}(x_2), \\[9pt] \dfrac{1}{\epsilon}(\mathcal{H}(x_1) \wedge c - \mathcal{H}(x_2)\wedge c) + \ln\!\bigg(1+\dfrac{1}{\epsilon}(\max \mathcal{H}-c)\bigg) &\mbox{if } \mathcal{H}(x_1) > c \geqslant \mathcal{H}(x_2), \\[9pt] \dfrac{1}{f(\delta)} \left(\max \mathcal{H} - \min \mathcal{H}\right) &\mbox{if } \mathcal{H}(x_1) > \mathcal{H}(x_2) > c. \end{cases}\end{align*}

As a result, $C_2^{-1}e^{H^f_{\epsilon,c}} \leqslant e^{\frac{1}{\epsilon} c^*} C_1(\epsilon)$ . On the other hand, we have the following lower bound:

\begin{align*} \mathcal{H}^f_{\epsilon,c}(x_1) - \mathcal{H}^f_{\epsilon,c}(x_2) &\geqslant \begin{cases} \dfrac{1}{\epsilon}(\mathcal{H}(x_1) \wedge c - \mathcal{H}(x_2) \wedge c) &\mbox{if } c\geqslant \mathcal{H}(x_1) > \mathcal{H}(x_2), \\[9pt] \dfrac{1}{\epsilon}(\mathcal{H}(x_1) \wedge c - \mathcal{H}(x_2)\wedge c) &\mbox{if } \mathcal{H}(x_1) > c \geqslant \mathcal{H}(x_2), \\[9pt] \dfrac{1}{f(\mathcal{H}(x_1) - c)} \left(\mathcal{H}(x_1) - \mathcal{H}(x_2)\right) &\mbox{if } \mathcal{H}(x_1) > \mathcal{H}(x_2) > c, \end{cases}\end{align*}

and hence $e^{H^f_{\epsilon,c}} \geqslant e^{\frac{1}{\epsilon} c^*} C_4^{-1}$ .

4.3. Proof of Corollary 1

We first prove Item 1. For a continuous-time reversible Markov chain, by Levin and Peres ([Reference Levin and Peres20], Theorems 12.5 and 20.6), we can bound the total-variation mixing time by relaxation time via

\begin{align*}\dfrac{1}{\lambda_2(\!-\!M^f_{\epsilon,c}\Big)} \log 2 \leqslant t_{mix}\Big(M^f_{\epsilon,c},1/4\Big) \leqslant \dfrac{1}{\lambda_2(\!-\!M^f_{\epsilon,c})} \log\!\left(\dfrac{4}{\pi^f_{min}}\right),\end{align*}

where

\begin{align*}\pi_{min}^f \;:\!=\; \min_x \pi^f_{\epsilon,c}(x) = \dfrac{\mu(x^*)}{Z^f_{\epsilon,c}}\end{align*}

for some $x^* \in S_{\textrm{min}}$ , and

\begin{align*}Z^f_{\epsilon,c} \;:\!=\; \sum_{x \in \mathcal{X}} e^{-\mathcal{H}_{\epsilon,c}^f(x)}\mu(x)\end{align*}

is the normalization constant. Note that since $ \ln Z^f_{\epsilon,c} \to \ln \mu(S_{\textrm{min}})$ , we have

\begin{align*}\lim_{\epsilon \to 0} \epsilon \ln Z^f_{\epsilon,c} = 0.\end{align*}

Item 1 follows from collecting the above results together with Theorem 3.

Next, we prove Item 2. First, using the random target lemma ([Reference Aldous and Fill15], Section 4.2), we have

\begin{align*}\pi^f_{\epsilon,c}(\eta) \mathbb{E}_{\sigma}\big(\tau^f_\eta\big) \leqslant \sum_{x \in \mathcal{X}} \pi^f_{\epsilon,c}(x) \mathbb{E}_{\sigma}\big(\tau^f_x\big) \leqslant (|\mathcal{X}|-1) \dfrac{1}{\lambda_2(\!-\!M^f_{\epsilon,c})}.\end{align*}

Since $\mathcal{H}(\eta) = 0$ and $Z^f_{\epsilon,c} \leqslant 1$ , upon rearranging and using Theorem 2 we obtain

\begin{align*}\limsup_{\epsilon \to 0} \epsilon \log \mathbb{E}_{\sigma}\big(\tau^f_\eta\big) \leqslant \lim_{\epsilon \to 0} \epsilon \log \dfrac{1}{\lambda_2\Big(\!-\!M^f_{\epsilon,c}\Big)} = c^*.\end{align*}

Define the equilibrium potential and capacity of the pair $(\sigma,\eta)$ as in Bovier and den Hollander ([Reference Bovier and den Hollander14], Chapter 7.2) to be respectively

\begin{align*} h_{\sigma,\eta}^{f}(x) &\;:\!=\; \mathbb{P}_x\big(\tau^f_{\sigma} < \tau^f_{\eta}\big), \\[5pt] \mathrm{cap}^{M^f_{\epsilon,c}}(\sigma,\eta) &\;:\!=\; \inf_{f: f|_{A} = 1, f|_{B} = 0} \langle -M^f_{\epsilon,c}f,f \rangle_{\pi^f_{\epsilon,c}} = \langle -M^f_{\epsilon,c} h_{\sigma,\eta}^{f}, h_{\sigma,\eta}^{f} \rangle_{\pi^f_{\epsilon,c}}.\end{align*}

If we prove that

(4.4)

\begin{align} \mathrm{cap}^{M^f_{\epsilon,c}}(\sigma,\eta) \leqslant \dfrac{1}{Z^f_{\epsilon,c}} \left(\sum_{x,y} \mu(x) Q(x,y)\right) e^{-\frac{1}{\epsilon} G^0(\sigma,\eta)},\end{align}

then the mean hitting time formula combined with the equilibrium potential and capacity leads to

\begin{align*} \mathbb{E}_{\sigma}(\tau^f_{\eta}) &= \dfrac{1}{\mathrm{cap}^{M^f_{\epsilon,c}}(\sigma,\eta)} \sum_{y \in \mathcal{X}} \pi_{\epsilon,c}^f(y) h_{\sigma,\eta}^{f}(y) \\[5pt] &\geqslant \dfrac{1}{\mathrm{cap}^{M^f_{\epsilon,c}}(\sigma,\eta)} \pi_{\epsilon,c}^f(\sigma) h_{\sigma, \eta}^{f}(\sigma) \geqslant \dfrac{\mu(\sigma)}{\sum_{x,y} \mu(x) Q(x,y)} e^{\frac{1}{\epsilon} c^*},\end{align*}

and the desired result follows, since

\begin{align*}\liminf_{\epsilon \to 0} \epsilon \log \mathbb{E}_{\sigma}\big(\tau^f_\eta\big) \geqslant c^*.\end{align*}

It therefore remains to prove (4.4). Define

\begin{align*}\Phi(\sigma,\eta) \;:\!=\; \{x\in\mathcal{X};\; G^0(x, \sigma) \leqslant G^0(x,\eta)\}.\end{align*}

Let $\textbf{1}_{A}$ denote the indicator function of the set A; the Dirichlet principle of capacity gives

\begin{align*} \mathrm{cap}^{M^f_{\epsilon,c}}(\sigma,\eta) \leqslant \langle -M^f_{\epsilon,c}\textbf{1}_{\Phi(\sigma,\eta)},\textbf{1}_{\Phi(\sigma,\eta)}\rangle_{\pi^f_{\epsilon,c}} &= \dfrac{1}{Z^f_{\epsilon,c}}\sum_{x \in \Phi(\sigma,\eta), y \notin \Phi(\sigma,\eta)} e^{-\frac{1}{\epsilon} \left( H(x) \vee H(y) \right)} \mu(x) Q(x,y) \\[5pt] &\leqslant \dfrac{1}{Z^f_{\epsilon,c}} \left(\sum_{x,y} \mu(x) Q(x,y)\right) e^{-\frac{1}{\epsilon} G^0(\sigma,\eta)},\end{align*}

where in the last inequality we use the fact that $G^0(\sigma,\eta)$ is the lowest possible highest elevation between $\sigma$ and $\eta$ .

4.4. Proof of Lemma 1

We first prove Item 1. The lower bound is immediate, while the upper bound can be deduced via

\begin{align*}\epsilon_t \mathcal{H}^f_{\epsilon_t,c} \leqslant \epsilon_t \int_{\mathcal{H}_{\textrm{min}}}^{\mathcal{H}(x)} \dfrac{1}{\epsilon_t} \,du \leqslant M.\end{align*}

Next, we prove Item 2. We consider

\begin{align*} \dfrac{\partial}{\partial t} \epsilon_t \mathcal{H}^f_{\epsilon_t,c}(x) &= \mathcal{H}^f_{\epsilon_t,c}(x) \left(\dfrac{\partial}{\partial t} \epsilon_t\right) + \epsilon_t \dfrac{\partial}{\partial t} \mathcal{H}^f_{\epsilon_t,c}(x) \\[5pt] &= \mathcal{H}^f_{\epsilon_t,c}(x) \dfrac{-c^* - \epsilon}{(\ln(t+1))^2} \dfrac{1}{t+1} + \epsilon_t \epsilon_t^{\prime} \int_{\mathcal{H}_{\textrm{min}}}^{\mathcal{H}(x)} - \dfrac{1}{(f((u-c)_+) + \epsilon_t)^2} \, du.\end{align*}

This leads to

\begin{align*} \left|\dfrac{\partial}{\partial t} \epsilon_t \mathcal{H}^f_{\epsilon_t,c}(x)\right| \leqslant \dfrac{2M}{(\ln(t+1))(t+1)}.\end{align*}

Thirdly, we prove Item 3. Using Item 2, we calculate that

\begin{align*} \beta_t^{\prime} M + \beta_t R_t &\leqslant \dfrac{M}{(c^*+\epsilon)(t+1)} + \dfrac{2M}{(c^*+\epsilon)(1+t)} = \dfrac{3M}{(c^*+\epsilon)(1+t)}.\end{align*}

Finally, we prove Item 4. Following exactly the same calculation as in the proof of Löwe ([Reference Löwe12], Lemma 3.5), we see that

\begin{align*} \left\lVert g - \pi_{\epsilon_t,c}^f(g)\right\rVert_{\ell^p(\pi^f_{\epsilon_t,c})}^2 &\leqslant A \left(1 + \beta_t(\max \mathcal{H} - c)\right) e^{\beta_t(c^* + M \frac{p-2}{p})}\langle -M^f_{\epsilon_t,c}g,g \rangle_{\pi^f_{\epsilon_t,c}} \\[5pt] &\leqslant A e^{\beta_t(c^* + M \frac{p-2}{p} + \max \mathcal{H} - c)}\langle -M^f_{\epsilon_t,c}g,g \rangle_{\pi^f_{\epsilon_t,c}}.\end{align*}

The desired result follows if we let

\begin{align*}\epsilon = \dfrac{M(p-2)}{p} + \max \mathcal{H} - c,\end{align*}

so that

\begin{align*}p = \dfrac{2M}{M + \max \mathcal{H} - \epsilon - c}.\end{align*}

4.5 Proof of Theorem 4

We would like to invoke the results in Löwe [Reference Löwe12] for a time-dependent target function in simulated annealing. In Lemma 1, Items 1, 2, 3, and 4, we verify that Equations (11) and (12) and Assumptions (A1) and (A2), respectively, from Löwe [Reference Löwe12] hold. Consequently, if we let $h_t(y) \;:\!=\; \mathbb{P}_x(X^f_{\epsilon_t,c}(t) = y)/\pi^f_{\epsilon_t,c}(y)$ , then according to Holley and Stroock ([Reference Holley and Stroock19], Lemma 1.7), its $\ell^2$ norm is bounded by

\begin{align*}\left\lVert h_t\right\rVert_{\ell^2(\pi^f_{\epsilon_t,c})} \leqslant 1+K^{\frac{1}{2\overline{\epsilon}}},\end{align*}

for $t \geqslant e^{1/\overline{\epsilon}}-1$ . The desired result follows from exactly the same argument as in Löwe ([Reference Löwe12], Theorem 3.8).

Now we calculate $\pi^f_{\epsilon_t,c}(\mathcal{X} \backslash S_{\textrm{min}})$ . For $c \geqslant \underline{d}$ , we compute that

\begin{align*} \pi^f_{\epsilon_t,c}(\mathcal{X} \backslash S_{\textrm{min}}) &= \dfrac{\sum_{x \in \mathcal{X} \backslash S_{\textrm{min}}} e^{-\mathcal{H}_{\epsilon_t,c}^f(x) }\mu(x)}{\sum_{x \in \mathcal{X}} e^{-\mathcal{H}_{\epsilon_t,c}^f(x)}\mu(x)} \\[5pt] &\leqslant \dfrac{\sum_{x;\; \mathcal{H}(x) \geqslant \underline{d}} \mu(x)}{\sum_{x; \mathcal{H}(x) \leqslant \underline{d}} \exp\{\int_{\mathcal{H}(x)}^{\underline{d}} \frac{1}{\epsilon_t}\,du\}\mu(x)} \\[5pt] &\leqslant e^{-\frac{1}{\epsilon_t}(\underline{d} - \mathcal{H}_{\textrm{min}})} \dfrac{1}{\mu(S_{\textrm{min}})}.\end{align*}

On the other hand, for $\mathcal{H}_{\textrm{min}} \leqslant c < \underline{d}$ ,

\begin{align*} \pi^f_{\epsilon_t,c}(\mathcal{X} \backslash S_{\textrm{min}}) &= \dfrac{\sum_{x \in \mathcal{X} \backslash S_{\textrm{min}}} e^{-\mathcal{H}_{\epsilon_t,c}^f(x) }\mu(x)}{\sum_{x \in \mathcal{X}} e^{-\mathcal{H}_{\epsilon_t,c}^f(x)}\mu(x)} \\[5pt] &\leqslant \dfrac{1}{\mu(S_{\textrm{min}})}\exp\bigg\{-\int_{\mathcal{H}_{\textrm{min}}}^{\underline{d}} \dfrac{1}{f((u-c)_+) + \epsilon_t}du\bigg\} \\[5pt] &\leqslant \dfrac{1}{\mu(S_{\textrm{min}})}\exp\bigg\{-\int_{c}^{\underline{d}} \dfrac{1}{f(u-c) + \epsilon_t}du\bigg\}.\end{align*}

Acknowledgements

We thank Laurent Miclo for pointers to the work of Olivier Catoni, and we thank the two reviewers for careful reading and constructive feedback.

Funding information

The author acknowledges financial support from the startup grant of the National University of Singapore, Yale-NUS College, and a Singapore MoE Tier 1 grant entitled ‘MAPLE’.

Competing interests

There were no competing interests to declare which arose during the preparation or publication process of this article.

References

Catoni, O. (1996). Metropolis, simulated annealing, and iterated energy transformation algorithms: theory and experiments. J. Complexity 12, 595–623.CrossRef Google Scholar

Catoni, O. (1998). The energy transformation method for the Metropolis algorithm compared with simulated annealing. Prob. Theory Relat. Fields 110, 69–89.CrossRef Google Scholar

Wang, Y., Wu, S. and Zou, J. (2016). Quantum annealing with Markov chain Monte Carlo simulations and D-wave quantum computers. Statist. Sci. 31, 362–398.CrossRef Google Scholar

Choi, M. C. (2020). On the convergence of an improved and adaptive kinetic simulated annealing. Preprint. Available at https://arxiv.org/abs/2009.00195.Google Scholar

Fang, H., Qian, M. and Gong, G. (1997). An improved annealing method and its large-time behavior. Stoch. Process. Appl. 71, 55–74.CrossRef Google Scholar

Monmarché, P. (2018). Hypocoercivity in metastable settings and kinetic simulated annealing. Prob. Theory Relat. Fields 172, 1215–1248.CrossRef Google Scholar

Zhang, J. and Choi, M. C. (2021). Improved simulated annealing for sampling from multimodal distributions. Working paper.Google Scholar

Deuschel, J.-D. and Mazza, C. (1994).

$L^2$ convergence of time nonhomogeneous Markov processes: I. Spectral estimates. Ann. Appl. Prob. 4, 1012–1056.CrossRef Google Scholar

Nardi, F. R. and Zocca, A. (2019). Tunneling behavior of Ising and Potts models in the low-temperature regime. Stoch. Process. Appl. 129, 4556–4575.CrossRef Google Scholar

Zocca, A. (2018). Low-temperature behavior of the multicomponent Widom–Rowlison model on finite square lattices. J. Statist. Phys. 171, 1–37.CrossRef Google Scholar

Del Moral, P. and Miclo, L. (1999). On the convergence and applications of generalized simulated annealing. SIAM J. Control Optimization 37, 1222–1250.CrossRef Google Scholar

Löwe, M. (1996). Simulated annealing with time-dependent energy function via Sobolev inequalities. Stoch. Process. Appl. 63, 221–233.CrossRef Google Scholar

Frigerio, A. and Grillo, G. (1993). Simulated annealing with time-dependent energy function. Math. Z. 213, 97–116.CrossRef Google Scholar

Bovier, A. and den Hollander, F. (2015). Metastability: A Potential-Theoretic Approach. Springer, Cham.CrossRef Google Scholar

Aldous, D. and Fill, J. A. (2002). Reversible Markov Chains and Random Walks on Graphs. Unfinished monograph, recompiled 2014. Available at http://www.stat.berkeley.edu/aldous/RWG/book.html.Google Scholar

Mathieu, P. and Picco, P. (1998). Metastability and convergence to equilibrium for the random field Curie–Weiss model. J. Statist. Phys. 91, 679–732.CrossRef Google Scholar

Menz, G., Schlichting, A., Tang, W. and Wu, T. (2022). Ergodicity of the infinite swapping algorithm at low temperature. Stoch. Process. Appl. 151, 519–552.CrossRef Google Scholar

Croes, G. A. (1958). A method for solving traveling-salesman problems. Operat. Res. 6, 791–812.CrossRef Google Scholar

Holley, R. and Stroock, D. (1988). Simulated annealing via Sobolev inequalities. Commun. Math. Phys. 115, 553–569.CrossRef Google Scholar

Levin, D. A. and Peres, Y. (2017). Markov Chains and Mixing Times. American Mathematical Society, Providence, RI.CrossRef Google Scholar

Figure 1. Landscape of $\frac{1}{\epsilon} \mathcal{H}$ and $\mathcal{H}^f_{\epsilon,c} + \frac{1}{\epsilon} \mathcal{H}_{\textrm{min}}$, where $\mathcal{H}(x) =\cos\! (2 x)+\frac{1}{2} \sin\! (x)+\frac{1}{3} \sin\! (10 x)$, $\epsilon \in \{0.25,0.5,0.75,1\}$, $c = -1.5$, and $f(z) = z$.

Figure 2. Plots of the free energy $g_{\epsilon}$ and the modified free energy $g^{f}_{\epsilon,c} + E_{min}$ with $h = -0.05$ and $f(z) = z$ at temperature $\epsilon = 1/1.5$, where $E_{min} = \min_{m \in [-1,1]} E(m)$. We shift the modified free energy by $E_{min}$ so that it is on the same scale as the original free energy $g_{\epsilon}$.

Figure 3. Plots of the mean-field equation (3.1) and the modified mean-field equation (3.4) with $h = -0.05$ and $f(z) = z$ at temperature $\epsilon = 1/1.5$.

Figure 4. Comparing the critical heights $H^0, H^f_{1,c}, c^*$ on a one-dimensional landscape generated by $\mathcal{H}$ with a local minimum at m and a global minimum, where we take $\epsilon = 1$, $f(z) = z$, and the proposal chain is of nearest-neighbour type, i.e. going left or right with probability $1/2$.