Nonparametric hidden Markov models

Jurgen Van Gael; Zoubin Ghahramani

doi:10.1017/CBO9780511984679.016

15 - Nonparametric hidden Markov models

from V - Nonparametric models

Published online by Cambridge University Press: 07 September 2011

Jurgen Van Gael and

Edited by

A. Taylan Cemgil and

Jurgen Van Gael: Affiliation:
University of Cambridge
Zoubin Ghahramani: Affiliation:
University of Cambridge
David Barber: Affiliation:
University College London
A. Taylan Cemgil: Affiliation:
Boğaziçi Üniversitesi, Istanbul
Silvia Chiappa: Affiliation:
University of Cambridge

Book contents

Get access

Summary

Introduction

Hidden Markov models (HMMs) are a rich family of probabilistic time series models with a long and successful history of applications in natural language processing, speech recognition, computer vision, bioinformatics, and many other areas of engineering, statistics and computer science. A defining property of HMMs is that the time series is modelled in terms of a number of discrete hidden states. Usually, the number of such states is specified in advance by the modeller, but this limits the flexibility of HMMs. Recently, attention has turned to Bayesian methods which can automatically infer the number of states in an HMM from data. A particularly elegant and flexible approach is to assume a countable but unbounded number of hidden states; this is the nonparametric Bayesian approach to hidden Markov models first introduced by Beal et al. [4] and called the infinite HMM (iHMM). In this chapter, we review the literature on Bayesian inference in HMMs, focusing on nonparametric Bayesian models. We show the equivalence between the Polya urn interpretation of the infinite HMM and the hierarchical Dirichlet process interpretation of the iHMM in Teh et al. [35]. We describe efficient inference algorithms, including the beam sampler which uses dynamic programming. Finally, we illustrate how to use the iHMM on a simple sequence labelling task and discuss several extensions.

Type: Chapter
Information: Bayesian Time Series Models , pp. 317 - 340

DOI: https://doi.org/10.1017/CBO9780511984679.016 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] H., Akaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6):716–723, 1974.Google Scholar

[2] L. E., Baum, T., Petrie, G., Soules and N., Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41(1):164–171, 1970.Google Scholar

[3] M. J., Beal. Variational algorithms for approximate Bayesian inference. PhD thesis, University of London, 2003.

[4] M. J., Beal, Z., Ghahramani and C. E., Rasmussen. The infinite hidden Markov model. In Advances in Neural Information Processing Systems, pages 577–584, 2002.Google Scholar

[5] Y., Bengio and P., Frasconi. An input output HMM architecture. In Advances in Neural Information Processing Systems, pages 427–434, 1995.Google Scholar

[6] S., Chiappa. A Bayesian approach to switching linear Gaussian state-space models for unsupervised time-series segmentation. In Proceedings of the International Conference on Machine Learning and Applications, pages 3–9, 2008.Google Scholar

[7] A. P., Dempster, N. M., Laird and D. B, Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1–38, 1977.Google Scholar

[8] L. E., Baum and T., Petrie. Statistical inference for probabilistic functions of finite state Markov chains. Annals of Mathematical Statistics, 37(6):1554–1563, 1966.Google Scholar

[9] E. B., Fox, E. B., Sudderth, M. I., Jordan and A. S., Willsky. Nonparametric Bayesian learning of switching linear dynamical systems. In Advances in Neural Information Processing Systems, pages 457–464, 2009.Google Scholar

[10] E. B., Fox, E. B., Sudderth, M. I., Jordan and A. S., Willsky. An HDP-HMM for systems with state persistence. In Proceedings of the International Conference on Machine learning, volume 25, Helsinki, 2008.Google Scholar

[11] S., Fruhwirth-Schnatter. Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques. Econometrics Journal, 7(1):143–167, 2004.Google Scholar

[12] J., Gao and Johnson, M.A comparison of Bayesian estimators for unsupervised hidden Markov model POS taggers. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 344–352, 2008.Google Scholar

[13] Z., Ghahramani and G. E., Hinton. Variational learning for switching state-space models. Neural Computation, 12(4):831–864, 2000.Google Scholar

[14] S., Goldwater and T., Griffiths. A fully Bayesian approach to unsupervised part-of-speech tagging. In Proceedings of the Association for Computational Linguistics, volume 45, page 744, 2007.Google Scholar

[15] S., Goldwater, T., Griffiths and M., Johnson. Interpolating between types and tokens by estimating power-law generators. In Advances in Neural Information Processing Systems, pages 459–466, 2006.Google Scholar

[16] P. J., Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4):711–732, 1995.Google Scholar

[17] N., Hjort, C., Holmes, P., Muller and S., Walker, editors. Bayesian Nonparametrics. Cambridge University Press, 2010.

[18] M., Johnson. Why doesnt EM find good HMM POS-taggers. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 296–305, 2007.Google Scholar

[19] D., Jurafsky and J. H., Martin. Speech and Language Processing. Pearson Prentice Hall, 2008.Google Scholar

[20] D. J. C., MacKay. Ensemble learning for hidden Markov models. Technical report, Cavendish Laboratory, University of Cambridge, 1997.

[21] C. D., Manning and H., Schütze. Foundations of Statistical Natural Language Processing. MIT Press.

[22] M. P., Marcus, M. A., Marcinkiewicz and B., Santorini. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2):313–330, June 1993.Google Scholar

[23] R. M., Neal. Annealed importance sampling. Statistics and Computing, 11:125–139, 1998.Google Scholar

[24] R. M., Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249–265, 2000.Google Scholar

[25] J., Pitman. Combinatorial stochastic processes, volume 1875 of Lecture Notes in Mathematics. Springer-Verlag, 2006.Google Scholar

[26] L. R., Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, 1989.Google Scholar

[27] C. P., Robert, T., Ryden and D. M., Titterington. Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method. Journal of the Royal Statistical Society. Series B, Statistical Methodology, pages 57–75, 2000.Google Scholar

[28] G., Schwarz. Estimating the dimension of a model. Annals of Statistics, 6(2):461–464, 1978.Google Scholar

[29] S. L., Scott. Bayesian methods for hidden Markov models: Recursive computing in the 21st century. Journal of the American Statistical Association, 97(457):337–351, 2002.Google Scholar

[30] J., Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4:639–650, 1994.Google Scholar

[31] T., Stepleton, Z., Ghahramani, G., Gordon and T-S., Lee. The block diagonal infinite hidden Markov model. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pages 552–559, 2009.Google Scholar

[32] A., Stolcke and S., Omohundro. Hidden Markov model induction by Bayesian model merging. Advances in Neural Information Processing Systems, 5: pages 11–18, 1993.Google Scholar

[33] Y. W., Teh. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 985–992, 2006.Google Scholar

[34] Y. W., Teh. Dirichlet processes. Encyclopedia of Machine Learning, to appear.

[35] Y. W., Teh, M. I., Jordan, M. J., Beal and D. M., Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006.Google Scholar

[36] J., Van Gael, Y., Saatci, Y. W., Teh and Z., Ghahramani. Beam sampling for the infinite hidden Markov model. In Proceedings of the International Conference on Machine Learning, pages 1088–1095, 2008.Google Scholar

[37] J., Van Gael, A., Vlachos and Z., Ghahramani. The infinite HMM for unsupervised POS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 678–687, 2009.Google Scholar

[38] S. G., Walker. Sampling the Dirichlet mixture model with slices. Communications in Statistics – Simulation and Computation, 36(1):45, 2007.Google Scholar

Book contents

15 - Nonparametric hidden Markov models

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive