Hostname: page-component-76fb5796d-qxdb6 Total loading time: 0 Render date: 2024-04-27T20:54:47.165Z Has data issue: false hasContentIssue false

Exact convergence analysis for metropolis–hastings independence samplers in Wasserstein distances

Published online by Cambridge University Press:  05 June 2023

Austin Brown*
Affiliation:
University of Warwick
Galin L. Jones*
Affiliation:
University of Minnesota
*
*Postal address: Department of Statistics, University of Warwick, Coventry, UK. Email: austin.d.brown@warwick.ac.uk
**Postal address: School of Statistics, University of Minnesota, Minneapolis, MN, USA. Email: galin@umn.edu

Abstract

Under mild assumptions, we show that the exact convergence rate in total variation is also exact in weaker Wasserstein distances for the Metropolis–Hastings independence sampler. We develop a new upper and lower bound on the worst-case Wasserstein distance when initialized from points. For an arbitrary point initialization, we show that the convergence rate is the same and matches the convergence rate in total variation. We derive exact convergence expressions for more general Wasserstein distances when initialization is at a specific point. Using optimization, we construct a novel centered independent proposal to develop exact convergence rates in Bayesian quantile regression and many generalized linear model settings. We show that the exact convergence rate can be upper bounded in Bayesian binary response regression (e.g. logistic and probit) when the sample size and dimension grow together.

Type
Original Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88, 669679.CrossRefGoogle Scholar
Belloni, A. and Chernozhukov, V. (2009). On the computational complexity of MCMC-based estimators in large samples. Ann. Statist. 37, 20112055.CrossRefGoogle Scholar
Bogachev, V. I. (1998). Gaussian Measures. American Mathematical Society, Providence, RI.CrossRefGoogle Scholar
Brooks, S., Gelman, A., Jones, G. L. and Meng, X.-L. (2011). Handbook of Markov chain Monte Carlo. Chapman and Hall/CRC, New York.CrossRefGoogle Scholar
Brown, A. and Jones, G. L. (2023). Lower bounds on the rate of convergence for accept–reject-based Markov chains. Preprint, arXiv:2212.05955.Google Scholar
Demidenko, E. (2001). Computational aspects of probit model. Math. Commun. 6, 233247.Google Scholar
Durmus, A. and Moulines, É. (2015). Quantitative bounds of convergence for geometrically ergodic Markov chain in the Wasserstein distance with application to the Metropolis adjusted Langevin algorithm. Statist. Comput. 25, 519.CrossRefGoogle Scholar
Durmus, A. and Moulines, É. (2019). High-dimensional Bayesian inference via the unadjusted Langevin algorithm. Bernoulli 25, 28542882.CrossRefGoogle Scholar
Dwivedi, R., Chen, Y., Wainwright, M. J. and Yu, B. (2018). Log-concave sampling: Metropolis–Hastings algorithms are fast! Proc. Mach. Learn. Res. 75, 793–797.Google Scholar
Eberle, A. (2014). Error bounds for Metropolis–Hastings algorithms applied to perturbations of Gaussian measures in high dimensions. Ann. Appl. Prob. 24, 337377.CrossRefGoogle Scholar
Ekvall, K. O. and Jones, G. L. (2021). Convergence analysis of a collapsed Gibbs sampler for Bayesian vector autoregressions. Electron. J. Statist. 15, 691721.CrossRefGoogle Scholar
Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Prob. 8, 252261.CrossRefGoogle Scholar
Gibbs, A. L. (2004). Convergence in the Wasserstein metric for Markov chain Monte Carlo algorithms with applications to image restoration. Stoch. Models 20, 473492.CrossRefGoogle Scholar
Giraudo, D. (2014). Product measure with a Dirac delta marginal. Mathematics Stack Exchange. Available at: https://math.stackexchange.com/questions/794299/product-measure-with-a-dirac-delta-marginal.Google Scholar
Hairer, M., Stuart, A. M. and Vollmer, S. J. (2014). Spectral gaps for a Metropolis–Hastings algorithm in infinite dimensions. Ann. Appl. Prob. 24, 24552490.CrossRefGoogle Scholar
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97109.CrossRefGoogle Scholar
Hiriart-Urruty, J.-B. and Lemaéchal, C. (2001). Fundamentals of Convex Analysis. Springer, Berlin.CrossRefGoogle Scholar
Jarner, S. F. and Hansen, E. (2000). Geometric ergodicity of Metropolis algorithms. Stoch. Process. Appl. 85, 341361.CrossRefGoogle Scholar
Jin, R. and Tan, A. (2020). Central limit theorems for Markov chains based on their convergence rates in Wasserstein distance. Preprint, arXiv:2002.09427.Google Scholar
Johndrow, J. E., Smith, A., Pillai, N. and Dunson, D. B. (2019). MCMC for imbalanced categorical data. J. Amer. Statist. Assoc. 114, 13941403.CrossRefGoogle Scholar
Johnson, L. T. and Geyer, C. J. (2012). Variable transformation to obtain geometric ergodicity in the random-walk Metropolis algorithm. Ann. Statist. 40, 30503076.CrossRefGoogle Scholar
Jones, G. L. (2004). On the Markov chain central limit theorem. Prob. Surv. 1, 299320.CrossRefGoogle Scholar
Joulin, A. and Ollivier, Y. (2010). Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Prob. 38, 24182442.CrossRefGoogle Scholar
Kantorovich, L. V. and Rubinstein, G. S. (1957). On a function space in certain extremal problems. Dokl. Akad. Nauk USSR 115, 10581061.Google Scholar
Khare, K. and Hobert, J. P. (2012). Geometric ergodicity of the Gibbs sampler for Bayesian quantile regression. J. Multivar. Anal, 112, 108116.CrossRefGoogle Scholar
Komorowski, T. and Walczuk, A. (2011). Central limit theorem for Markov processes with spectral gap in the Wasserstein metric. Stoch. Process. Appl. 122, 21552184.CrossRefGoogle Scholar
Liu, J. S. (1996). Metropolized independent sampling with comparisons to rejection sampling and importance sampling. Statist. Comput. 6, 113119.CrossRefGoogle Scholar
Madras, N. and Sezer, D. (2010). Quantitative bounds for Markov chain convergence: Wasserstein and total variation distances. Bernoulli 16, 882908.CrossRefGoogle Scholar
Mengersen, K. L. and Tweedie, R. L. (1996). Rates of convergence of the Hastings and Metropolis algorithms. Ann. Statist. 24, 101121.CrossRefGoogle Scholar
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 10871092.CrossRefGoogle Scholar
Meyn, S. P. and Tweedie, R. L. (2009). Markov Chains and Stochastic Stability, 2nd edn. Cambridge University Press.CrossRefGoogle Scholar
Nesterov, Y. (2018). Lectures on Convex Optimization, 2nd edn. Springer, Cham.CrossRefGoogle Scholar
Papaspiliopoulos, O., Roberts, G. O. and Zanella, G. (2019). Scalable inference for crossed random effects models. Biometrika 107, 2540.Google Scholar
Papaspiliopoulos, O., Stumpf-Fétizon, T. and Zanella, G. (2021). Scalable computation for Bayesian hierarchical models. Preprint, arXiv:2103.10875.Google Scholar
Pierre, J., Robert, C. P. and Smith, M. H. (2011). Using parallel computation to improve independent Metropolis–Hastings based estimation. J. Comput. Graph. Statist. 20, 616635.Google Scholar
Polson, N. G., Scott, J. G. and Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. J. Amer. Statist. Assoc. 108, 13391349.CrossRefGoogle Scholar
Qin, Q. and Hobert, J. P. (2019). Convergence complexity analysis of Albert and Chib’s algorithm for Bayesian probit regression. Ann. Statist. 47, 23202347.CrossRefGoogle Scholar
Qin, Q. and Hobert, J. P. (2021). On the limitations of single-step drift and minorization in Markov chain convergence analysis. Ann. Appl. Prob. 31, 16331659 CrossRefGoogle Scholar
Qin, Q. and Hobert, J. P. (2022). Geometric convergence bounds for Markov chains in Wasserstein distance based on generalized drift and contraction conditions. Ann. Inst. H. Poincaré 58, 872889.CrossRefGoogle Scholar
Qin, Q. and Hobert, J. P. (2022). Wasserstein-based methods for convergence complexity analysis of MCMC with applications. Ann. Appl. Prob. 32, 124166.CrossRefGoogle Scholar
Rajaratnam, B. and Sparks, D. (2015). MCMC-based inference in the era of big data: A fundamental analysis of the convergence complexity of high-dimensional chains. Preprint, arXiv:1508.00947.Google Scholar
Roberts, G. O. and Tweedie, R. L. (1996). Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika 83, 95110.CrossRefGoogle Scholar
Robertson, N., Flegal, J. M., Vats, D. and Jones, G. L. (2021). Assessing and visualizing simultaneous simulation error. J. Comput. Graph. Statist. 30, 324334.CrossRefGoogle Scholar
Rosenthal, J. S. (1995). Minorization conditions and convergence rates for Markov chain Monte Carlo. J. Amer. Statist. Assoc. 90, 558566.CrossRefGoogle Scholar
Shephard, N. and Pitt, M. K. (1997). Likelihood analysis of non-Gaussian measurement time series. Biometrika 84, 653667.CrossRefGoogle Scholar
Smith, R. L. and Tierney, L. (1996). Exact transition probabilities for the independence Metropolis sampler. Technical report, Department of Statistics, University of Cambridge.Google Scholar
Sur, P. and Candès, E. J. (2019). A modern maximum-likelihood theory for high-dimensional logistic regression. Proc. Nat. Acad. Sci. 116, 1451614525.CrossRefGoogle Scholar
Tierney, L. (1994). Markov chains for exploring posterior distributions. Ann. Statist. 22, 17011728.Google Scholar
Vats, D., Flegal, J. M. and Jones, G. L. (2019). Multivariate output analysis for Markov chain Monte Carlo. Biometrika 106, 321337.CrossRefGoogle Scholar
Villani, C. (2003). Topics in Optimal Transportation. American Mathematical Society, Providence, RI.CrossRefGoogle Scholar
Villani, C. (2009). Optimal Transport: Old and New. Springer, Berlin.CrossRefGoogle Scholar
Wang, G. (2022). Exact convergence rate analysis of the independent Metropolis–Hastings algorithms. Bernoulli 28, 20122033.CrossRefGoogle Scholar
Yang, Y., Wainwright, M. J. and Jordan, M. I. (2016). On the computational complexity of high-dimensional Bayesian variable selection. Ann. Statist. 44, 24972532.CrossRefGoogle Scholar