Hostname: page-component-77f85d65b8-g98kq Total loading time: 0 Render date: 2026-04-23T04:21:25.110Z Has data issue: false hasContentIssue false

Anytime Monte Carlo

Published online by Cambridge University Press:  29 June 2021

Lawrence M. Murray
Affiliation:
Uber AI, San Francisco, CA, USA
Sumeetpal S. Singh*
Affiliation:
The Alan Turing Institute, University of Cambridge, Cambridge, United Kingdom
Anthony Lee
Affiliation:
The Alan Turing Institute, University of Bristol, Bristol, United Kingdom
*
*Corresponding author. E-mail: sss40@cam.ac.uk

Abstract

Monte Carlo algorithms simulates some prescribed number of samples, taking some random real time to complete the computations necessary. This work considers the converse: to impose a real-time budget on the computation, which results in the number of samples simulated being random. To complicate matters, the real time taken for each simulation may depend on the sample produced, so that the samples themselves are not independent of their number, and a length bias with respect to compute time is apparent. This is especially problematic when a Markov chain Monte Carlo (MCMC) algorithm is used and the final state of the Markov chain—rather than an average over all states—is required, which is the case in parallel tempering implementations of MCMC. The length bias does not diminish with the compute budget in this case. It also occurs in sequential Monte Carlo (SMC) algorithms, which is the focus of this paper. We propose an anytime framework to address the concern, using a continuous-time Markov jump process to study the progress of the computation in real time. We first show that for any MCMC algorithm, the length bias of the final state’s distribution due to the imposed real-time computing budget can be eliminated by using a multiple chain construction. The utility of this construction is then demonstrated on a large-scale SMC$ {}^2 $ implementation, using four billion particles distributed across a cluster of 128 graphics processing units on the Amazon EC2 service. The anytime framework imposes a real-time budget on the MCMC move steps within the SMC$ {}^2 $ algorithm, ensuring that all processors are simultaneously ready for the resampling step, demonstrably reducing idleness to due waiting times and providing substantial control over the total compute budget.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press
Figure 0

Figure 1. A realization of a Markov chain $ {\left({X}_n\right)}_{n=0}^{\infty } $ in real time, with hold times $ {\left({H}_n\right)}_{n=0}^{\infty } $ and arrival times $ {\left({A}_n\right)}_{n=0}^{\infty } $.

Figure 1

Figure 2. Illustration of the multiple chain concept with two Markov chains. At any time, one chain is being simulated (indicated with a dotted line) while one chain is waiting (indicated with a solid line). When querying the process at some time $ t $, it is the state of the waiting chain that is reported, so that the hold times of each chain are the compute times of the other chain. For $ K+1\ge 2 $ chains, there is always one chain simulating while $ K $ chains wait, and when querying the process at some time $ t $, the states of all $ K $ waiting chains are reported.

Figure 2

Figure 3. Convergence of Markov chains to the anytime distribution for the simulation study, with constant ($ p=0 $), linear ($ p=1 $), quadratic ($ p=2 $), and cubic ($ p=3 $) expected hold time. Each plot shows the evolution of the one-Wasserstein distance between the anytime distribution and the empirical distribution of $ {2}^{18} $ independent Markov chains initialized from the target distribution.

Figure 3

Figure 4. Correction of length bias for the simulation study, using $ K+1\in \left\{\mathrm{2,4,8,16,32}\right\} $ chains (light to dark), with constant ($ p=0 $), linear ($ p=1 $), quadratic ($ p=2 $), and cubic ($ p=3 $) expected hold time. Each plot shows the evolution of the one-Wasserstein distance between the empirical and target distributions. On the top row, the states of all chains contribute to the empirical distribution, which does not converge to the target. On the bottom row, the state of the extra chain is eliminated, contributing only the remaining states to the empirical distribution, which does converge to the target.

Figure 4

Figure 5. Elucidating the Lorenz ’96 model. The left column shows the range $ F\hskip.2em \in \hskip.2em \left[0,7\right] $ as in the uniform prior distribution, while the right column shows a narrower range of $ F $ in the vicinity of the posterior distribution. The solid vertical lines indicate the value $ F=4.8801 $, with which data are simulated. The first row is a bifurcation diagram depicting the stationary distribution of any element of $ \mathbf{X}(t) $ for various values of $ F $. Each column is a density plot for a particular value of $ F $; darker for higher density values, scaled so that the mode is black. Note the intricate behaviors of decay, periodicity, and chaos induced by $ F $. The second row depicts estimates of the marginal log-likelihood of the simulated dataset for the same values of $ F $, using SMC with $ {2}^{20} $ particles. Multiple modes and heteroskedasticity are apparent. The third row depicts the compute time taken to obtain these estimates, showing increasing compute time in $ F $ after an initial plateau.

Figure 5

Figure 6. Posterior distributions over $ F $ for the Lorenz ’96 case study. On the left, from conventional SMC$ {}^2 $, on the right, from SMC$ {}^2 $ with anytime moves, both running on the 128 GPU configuration.

Figure 6

Figure 7. Compute profiles for the Lorenz ’96 case study. On the left is a conventional distributed SMC$ {}^2 $ method with a fixed number of moves per particle after resampling. On the right is distributed SMC$ {}^2 $ with anytime move steps. Each row represents the activity of a single processor over time: light gray while active and dark gray while waiting. The top profiles are for an eight GPU shared system where contesting processes are expected. The conventional method on the left exhibits significant idle time on processors 2–8 due to a contesting job on Processor 1. The two bottom profiles are for the 128 GPU configuration with no contesting processes. Wait time in the conventional methods on the left is significantly reduced in the anytime methods on the right.

Supplementary material: File

Murray et al. supplementary material

Murray et al. supplementary material

Download Murray et al. supplementary material(File)
File 76.4 KB
Submit a response

Comments

No Comments have been published for this article.