Skip to main content Accessibility help
×
Hostname: page-component-76fb5796d-skm99 Total loading time: 0 Render date: 2024-04-30T00:42:02.422Z Has data issue: false hasContentIssue false

Bibliography

Published online by Cambridge University Press:  24 June 2019

Shinichi Nakajima
Affiliation:
Technische Universität Berlin
Kazuho Watanabe
Affiliation:
Toyohashi University of Technology
Masashi Sugiyama
Affiliation:
University of Tokyo
Get access

Summary

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akaho, S., and Kappen, H. J. 2000. Nonmonotonic Generalization Bias of Gaussian Mixture Models. Neural Computation, 12, 14111427.CrossRefGoogle ScholarPubMed
Akaike, H. 1974. A New Look at Statistical Model. IEEE Transactions on Automatic Control, 19(6), 716723.CrossRefGoogle Scholar
Akaike, H. 1980. Likelihood and Bayes Procedure. Pages 143–166 of: Bernald, J. M. (ed.), Bayesian Statistics. Valencia, Italy: University Press.Google Scholar
Alzer, H. 1997. On Some Inequalities for the Gamma and Psi Functions. Mathematics of Computation, 66(217), 373389.Google Scholar
Amari, S., Park, H., and Ozeki, T. 2002. Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons. Pages 343–350 of: Advances in NIPS, vol. 14. Cambridge, MA: MIT Press.Google Scholar
Aoyagi, M., and Nagata, K. 2012. Learning Coefficient of Generalization Error in Bayesian Estimation and Vandermonde Matrix-Type Singularity. Neural Computation, 24(6), 15691610.Google Scholar
Aoyagi, M., and Watanabe, S. 2005. Stochastic Complexities of Reduced Rank Regression in Bayesian Estimation. Neural Networks, 18(7), 924933.CrossRefGoogle ScholarPubMed
Asuncion, A., and Newman, D.J. 2007. UCI Machine Learning Repository. www.ics.uci.edu/∼mlearn/MLRepository.htmlGoogle Scholar
Asuncion, A., Welling, M., Smyth, P., and Teh, Y. W. 2009. On Smoothing and Inference for Topic Models. Pages 27–34 of: Proceedings of UAI. Stockholm, Sweden: Morgan Kaufmann Publishers Inc.Google Scholar
Attias, H. 1999. Inferring Parameters and Structure of Latent Variable Models by Variational Bayes. Pages 21–30 of: Proceedings of UAI. Stockholm, Sweden: Morgan Kaufmann Publishers Inc.Google Scholar
Babacan, S. D., Nakajima, S., and Do, M. N. 2012a. Probabilistic Low-Rank Subspace Clustering. Pages 2753–2761 of: Advances in Neural Information Processing Systems 25. Lake Tahoe, NV: NIPS Foundation.Google Scholar
Babacan, S. D., Luessi, M., Molina, R., and Katsaggelos, A. K. 2012b. Sparse Bayesian Methods for Low-Rank Matrix Estimation. IEEE Transactions on Signal Processing, 60(8), 39643977.Google Scholar
Baik, J., and Silverstein, J. W. 2006. Eigenvalues of Large Sample Covariance Matrices of Spiked Population Models. Journal of Multivariate Analysis, 97(6), 13821408.Google Scholar
Baldi, P. F., and Hornik, K. 1995. Learning in Linear Neural Networks: A Survey. IEEE Transactions on Neural Networks, 6(4), 837858.Google Scholar
Banerjee, A., Merugu, S., Dhillon, I. S., and Ghosh, J. 2005. Clustering with Bregman Divergences. Journal of Machine Learning Research, 6, 17051749.Google Scholar
Beal, M. J. 2003. Variational Algorithms for Approximate Bayesian Inference. PhD thesis, University College London.Google Scholar
Bicego, M., Lovato, P., Ferrarini, A., and Delledonne, M. 2010. Biclustering of Expression Microarray Data with Topic Models. Pages 2728–2731 of: Proceedings of ICPR. Istanbul, Turkey: ICPR.Google Scholar
Bickel, P., and Chernoff, H. 1993. Asymptotic Distribution of the Likelihood Ratio Statistic in a Prototypical Non Regular Problem. New Delhi, India: Wiley Eastern Limited.Google Scholar
Bishop, C. M. 1999a. Bayesian Principal Components. Pages 382–388 of: Advances in NIPS, vol. 11. Denver, CO: NIPS Foundation.Google Scholar
Bishop, C. M. 1999b. Variational Principal Components. Pages 514–509 of: Proceedings of International Conference on Artificial Neural Networks, vol. 1. Edinburgh, UK: Computing and Control Engineering Journal.Google Scholar
Bishop, C. M. 2006. Pattern Recognition and Machine Learning. New York: Springer.Google Scholar
Bishop, C. M., and Tipping, M. E. 2000. Variational Relevance Vector Machines. Pages 46–53 of: Proceedings of the Sixteenth Conference Annual Conference on Uncertainty in Artificial Intelligence. Stanford, CA: Morgan Kaufmann Publishers Inc.Google Scholar
Blei, D. M., and Jordan, M. I. 2005. Variational Inference for Dirichlet Process Mixtures. Bayesian Analysis, 1, 121144.Google Scholar
Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 9931022.Google Scholar
Bouchaud, J. P., and Potters, M. 2003. Theory of Financial Risk and Derivative Pricing—From Statistical Physics to Risk Management, 2nd edn. Cambridge, UK: University Press.Google Scholar
Brown, L. D. 1986. Fundamentals of Statistical Exponential Families. IMS Lecture Notes–Monograph Series 9. Beachwood, OH: Institute of Mathematical Statistics.Google Scholar
Candès, E. J., Li, X., Ma, Y., and Wright, J. 2011. Robust Principal Component Analysis? Journal of the ACM, 58(3), 137.Google Scholar
Carroll, J. D., and Chang, J. J. 1970. Analysis of Individual Differences in Multidimensional Scaling via an N-way Generalization of “Eckart–Young” Decomposition. Psychometrika, 35, 283319.Google Scholar
Chen, X., Hu, X., Shen, X., and Rosen, G. 2010. Probabilistic Topic Modeling for Genomic Data Interpretation. Pages 149–152 of: 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).Google Scholar
Chib, S. 1995. Marginal Likelihood from the Gibbs Output. Journal of the American Statistical Association, 90(432), 13131321.Google Scholar
Chu, W., and Ghahramani, Z. 2009. Probabilistic Models for Incomplete Multidimensional Arrays. Pages 89–96. In: Proceedings of International Conference on Artificial Intelligence and Statistics. Clearwater Beach, FL: Proceedings of Machine Learning Research.Google Scholar
Courant, R., and Hilbert, D. 1953. Methods of Mathematical Physics, Volume 1. New York: Wiley.Google Scholar
Cramer, H. 1949. Mathematical Methods of Statistics. Princeton, NJ: University Press.Google Scholar
Dacunha-Castelle, D., and Gassiat, E. 1997. Testing in Locally Conic Models, and Application to Mixture Models. Probability and Statistics, 1, 285317.Google Scholar
Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum Likelihood for Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39-B, 138.Google Scholar
Dharmadhikari, S., and Joag-Dev, K. 1988. Unimodality, Convexity, and Applications. Cambridge, MA: Academic Press.Google Scholar
Ding, X., He, L., and Carin, L. 2011. Bayesian Robust Principal Component Analysis. IEEE Transactions on Image Processing, 20(12), 34193430.Google Scholar
Drexler, F. J. 1978. A Homotopy Method for the Calculation of All Zeros of Zero-Dimensional Polynomial Ideals. Pages 69–93 of: Wacker, H. J. (ed.), Continuation Methods. New York: Academic Press.Google Scholar
D’Souza, A., Vijayakumar, S., and Schaal, S. 2004. The Bayesian Backfitting Relevance Vector Machine. In: Proceedings of the 21st International Conference on Machine Learning. Banff, AB: Association for Computing Machinery.Google Scholar
Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge: Cambridge University Press.Google Scholar
Efron, B., and Morris, C. 1973. Stein’s Estimation Rule and its Competitors—An Empirical Bayes Approach. Journal of the American Statistical Association, 68, 117130.Google Scholar
Elhamifar, E., and Vidal, R. 2013. Sparse Subspace Clustering: Algorithm, Theory, and Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 27652781.Google Scholar
Felzenszwalb, P. F., and Huttenlocher, D. P. 2004. Efficient Graph-Based Image Segmentation. International Journal of Computer Vision, 59(2), 167181.Google Scholar
Fukumizu, K. 1999. Generalization Error of Linear Neural Networks in Unidentifiable Cases. Pages 51–62 of: Proceedings of International Conference on Algorithmic Learning Theory. Tokyo, Japan: Springer.Google Scholar
Fukumizu, K. 2003. Likelihood Ratio of Unidentifiable Models and Multilayer Neural Networks. Annals of Statistics, 31(3), 833851.Google Scholar
Garcia, C. B., and Zangwill, W. I. 1979. Determining All Solutions to Certain Systems of Nonlinear Equations. Mathematics of Operations Research, 4, 114.Google Scholar
Gershman, S. J., and Blei, D. M. 2012. A Tutorial on Bayesian Nonparametric Models. Journal of Mathematical Psychology, 56(1), 112.Google Scholar
Ghahramani, Z., and Beal, M. J. 2001. Graphical Models and Variational Methods. Pages 161–177 of: Advanced Mean Field Methods. Cambridge, MA: MIT Press.Google Scholar
Girolami, M. 2001. A Variational Method for Learning Sparse and Overcomplete Representations. Neural Computation, 13(11), 25172532.Google Scholar
Girolami, M., and Kaban, A. 2003. On an Equivalence between PLSI and LDA. Pages 433–434 of: Proceedings of SIGIR, New York and Toronto, ON: Association for Computing Machinery.Google Scholar
Gopalan, P., Hofman, J. M., and Blei, D. M. 2013. Scalable Recommendation with Poisson Factorization. arXiv:1311.1704 [cs.IR].Google Scholar
Griffiths, T. L., and Steyvers, M. 2004. Finding Scientific Topics. PNAS, 101, 5228– 5235.Google Scholar
Gunji, T., Kim, S., Kojima, M., Takeda, A., Fujisawa, K., and Mizutani, T. 2004. PHoM—A Polyhedral Homotopy Continuation Method. Computing, 73, 5777.Google Scholar
Gupta, A. K., and Nagar, D. K. 1999. Matrix Variate Distributions. London, UK: Chapman and Hall/CRC.Google Scholar
Hagiwara, K. 2002. On the Problem in Model Selection of Neural Network Regression in Overrealizable Scenario. Neural Computation, 14, 19792002.Google Scholar
Hagiwara, K., and Fukumizu, K. 2008. Relation between Weight Size and Degree of Over-Fitting in Neural Network Regression. Neural Networks, 21(1), 4858.Google Scholar
Han, T. S., and Kobayashi, K. 2007. Mathematics of Information and Coding. Providence, RI: American Mathematical Society.Google Scholar
Harshman, R. A. 1970. Foundations of the PARAFAC Procedure: Models and Conditions for an “Explanatory” Multimodal Factor Analysis. UCLA Working Papers in Phonetics, 16, 184.Google Scholar
Hartigan, J. A. 1985. A Failure of Likelihood Ratio Asymptotics for Normal Mixtures. Pages 807–810 of: Proceedings of the Berkeley Conference in Honor of J. Neyman and J. Kiefer. Berkeley, CA: Springer.Google Scholar
Hastie, T., and Tibshirani, R. 1986. Generalized Additive Models. Statistical Science, 1(3), 297318.Google Scholar
Hinton, G. E., and van Camp, D. 1993. Keeping Neural Networks Simple by Minimizing the Description Length of the Weights. Pages 5–13 of: Proceedings of COLT. Santa Cruz, CA.Google Scholar
Hoffman, M. D., Blei, D. M., Wang, C., and Paisley, J. 2013. Stochastic Variational Inference. Journal of Machine Learning Research, 14, 13031347.Google Scholar
Hofmann, T. 2001. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 42, 177196.Google Scholar
Hosino, T., Watanabe, K., and Watanabe, S. 2005. Stochastic Complexity of Variational Bayesian Hidden Markov Models. In: Proceedings of IJCNN. Montreal, QC.Google Scholar
Hosino, T., Watanabe, K., and Watanabe, S. 2006a. Free Energy of Stochastic Context Free Grammar on Variational Bayes. Pages 407–416 of: Proceedings of ICONIP. Hong Kong, China: Springer.Google Scholar
Hosino, T., Watanabe, K., and Watanabe, S. 2006b. Stochastic Complexity of Hidden Markov Models on the Variational Bayesian Learning (in Japanese). IEICE Transactions on Information and Systems, J89-D(6), 12791287.Google Scholar
Hotelling, H. 1933. Analysis of a Complex of Statistical Variables into Principal Components. Journal of Educational Psychology, 24, 417441.CrossRefGoogle Scholar
Hoyle, D. C. 2008. Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes. Journal of Machine Learning Research, 9, 27332759.Google Scholar
Hoyle, D. C., and Rattray, M. 2004. Principal-Component-Analysis Eigenvalue Spectra from Data with Symmetry-Breaking Structure. Physical Review E, 69(026124).Google Scholar
Huynh, T., Mario, F., and Schiele, B. 2008. Discovery of Activity Patterns Using Topic Models. Pages 9–10. In: International Conference on Ubiquitous Computing (Ubi-Comp). New York and Seoul, South Korea: Association for Computer Machinery.Google Scholar
Hyvärinen, A., Karhunen, J., and Oja, E. 2001. Independent Component Analysis. New York: Wiley.Google Scholar
Ibragimov, I. A. 1956. On the Composition of Unimodal Distributions. Theory of Probability and Its Applications, 1(2), 255260.Google Scholar
Ilin, A., and Raiko, T. 2010. Practical Approaches to Principal Component Analysis in the Presence of Missing Values. Journal of Machine Learning Research, 11, 19572000.Google Scholar
Ito, H., Amari, S., and Kobayashi, K. 1992. Identifiability of Hidden Markov Information Sources and Their Minimum Degrees of Freedom. IEEE Transactions on Information Theory, 38(2), 324333.Google Scholar
Jaakkola, T. S., and Jordan, M. I. 2000. Bayesian Parameter Estimation via Variational Methods. Statistics and Computing, 10, 2537.Google Scholar
James, W., and Stein, C. 1961. Estimation with Quadratic Loss. Pages 361–379 of: Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. Berkeley: University of California Press.Google Scholar
Jeffreys, H. 1946. An Invariant Form for the Prior Probability in Estimation Problems. Pages 453–461 of: Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, vol. 186. London, UK: Royal Society.Google Scholar
Jensen, F. V. 2001. Bayesian Networks and Decision Graphs. Springer.Google Scholar
Johnstone, I. M. 2001. On the Distribution of the Largest Eigenvalue in Principal Components Analysis. Annals of Statistics, 29, 295327.Google Scholar
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. 1999. Introduction to Variational Methods for Graphical Models. Machine Learning, 37, 183233.Google Scholar
Kaji, D., Watanabe, K., and Watanabe, S. 2010. Phase Transition of Variational Bayes Learning in Bernoulli Mixture. Australian Journal of Intelligent Information Processing Systems, 11(4), 3540.Google Scholar
Khan, M. E., Babanezhad, R., Lin, W., Schmidt, M., and Sugiyama, M. 2016. Faster Stochastic Variational Inference Using Proximal-Gradient Methods with General Divergence Functions. Pages 309–318. In: Proceedings of UAI. New York: AUAI Press.Google Scholar
Kim, Y. D., and Choi, S. 2014. Scalable Variational Bayesian Matrix Factorization with Side Information. Pages 493–502 of: Proceedings of AISTATS. Reykjavik, Iceland: Proceedings of Machine Learning Research.Google Scholar
Kingma, D. P., and Welling, M. 2014. Auto-Encoding Variational Bayes. In: International Conference on Learning Representations (ICLR). arXiv:1412.6980Google Scholar
Kolda, T. G., and Bader, B. W. 2009. Tensor Decompositions and Applications. SIAM Review, 51(3), 455500.Google Scholar
Krestel, R., Fankhauser, P., and Nejdl, W. 2009. Latent Dirichlet Allocation for Tag Recommendation. Pages 61–68 of: Proceedings of the Third ACM Conference on Recommender Systems. New York: Association for Computing Machinery.Google Scholar
Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Pages 1097–1105 of: Advances in NIPS. Lake Tahoe, NV: NIPS Foundation.Google Scholar
Kurihara, K., and Sato, T. 2004. An application of the variational Bayesian Approach to Probabilistic Context-Free Grammars. In: Proceedings of IJCNLP. Banff, AB.Google Scholar
Kurihara, K., Welling, M., and Teh, M. Y. W. 2007. Collapsed Variational Dirichlet Process Mixture Models. In: Proceedings of IJCAI. Hyderabad, India.Google Scholar
Kuriki, S., and Takemura, A. 2001. Tail Probabilities of the Maxima of Multilinear Forms and Their Applications. Annals of Statistics, 29(2), 328371.Google Scholar
Lee, T. L., Li, T. Y., and Tsai, C. H. 2008. HOM4PS-2.0: A Software Package for Solving Polynomial Systems by the Polyhedral Homotopy Continuation Method. Computing, 83, 109133.CrossRefGoogle Scholar
Levin, E., Tishby, N., and Solla, S. A. 1990 . A Statistical Approaches to Learning and Generalization in Layered Neural Networks. Pages 15681674 of: Proceedings of IEEE, vol. 78.Google Scholar
Li, F.-F., and Perona, P. 2005. A Bayesian Hierarchical Model for Learning Natural Scene Categories. Pages 524–531 of: Proceedings of CVPR. San Diego, CA.Google Scholar
Lim, Y. J., and Teh, Y. W. 2007. Variational Bayesian Approach to Movie Rating Prediction. In: Proceedings of KDD Cup and Workshop. New York and San Jose, CA: Association for Computing Machinery.Google Scholar
Lin, Z., Chen, M., and Ma, Y. 2009. The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. UIUC Technical Report UILU-ENG-09-2215.Google Scholar
Liu, G., and Yan, S. 2011. Latent Low-Rank Representation for Subspace Segmentation and Feature Extraction. In: Proceedings of ICCV. Barcelona, Spain.Google Scholar
Liu, G., Lin, Z., and Yu, Y. 2010. Robust Subspace Segmentation by Low-Rank Representation. Pages 663–670 of: Proceedings of ICML. Haifa, Israel: Omnipress.Google Scholar
Liu, G., Xu, H., and Yan, S. 2012. Exact Subspace Segmentation and Outlier Detection by Low-Rank Representation. In: Proceedings of AISTATS. La Palma, Canary Islands: Proceedings of Machine Learning Research.Google Scholar
Liu, X., Pasarica, C., and Shao, Y. 2003. Testing Homogeneity in Gamma Mixture Models. Scandinavian Journal of Statistics, 30, 227239.Google Scholar
Lloyd, S. P. 1982. Least Square Quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129137.CrossRefGoogle Scholar
MacKay, D. J. C. 1992. Bayesian Interpolation. Neural Computation, 4(2), 415447.Google Scholar
MacKay, D. J. C. 1995. Developments in Probabilistic Modeling with Neural Networks—Ensemble Learning. Pages 191–198 of: Proceedings of the 3rd Annual Symposium on Neural Networks.Google Scholar
Mackay, D. J. C. 2001. Local Minima, Symmetry-Breaking, and Model Pruning in Variational Free Energy Minimization. Available from www.inference.phy.cam.ac.uk/mackay/minima.pdf.Google Scholar
MacKay, D. J. C. 2003. Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge University Press. Available from www.inference.phy.cam.ac.uk/mackay/itila/.Google Scholar
MacQueen, J. B. 1967. Some Methods for Classification and Analysis of Multivariate Observations. Pages 281–297 of: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. Berkeley: University of California Press.Google Scholar
Marčenko, V. A., and Pastur, L. A. 1967. Distribution of Eigenvalues for Some Sets of Random Matrices. Mathematics of the USSR-Sbornik, 1(4), 457483.Google Scholar
Marshall, A. W., Olkin, I., and Arnold, B. C. 2009. Inequalities: Theory of Majorization and Its Applications, 2d ed. Springer.Google Scholar
Minka, T. P. 2001a. Automatic Choice of Dimensionality for PCA. Pages 598–604 of: Advances in NIPS, vol. 13. Cambridge, MA: MIT Press.Google Scholar
Minka, T. P. 2001b. Expectation Propagation for Approximate Bayesian Inference. Pages 362–369 of: Proceedings of UAI. Seattle, WA: Morgan Kaufmann Publishers Inc.Google Scholar
Mørup, M., and Hansen, L. R. 2009. Automatic Relevance Determination for Multi-Way Models. Journal of Chemometrics, 23, 352363.Google Scholar
Nakajima, S., and Sugiyama, M. 2011. Theoretical Analysis of Bayesian Matrix Factorization. Journal of Machine Learning Research, 12, 25792644.Google Scholar
Nakajima, S., and Sugiyama, M. 2014. Analysis of Empirical MAP and Empirical Partially Bayes: Can They Be Alternatives to Variational Bayes? Pages 20–28 of: Proceedings of International Conference on Artificial Intelligence and Statistics, vol. 33. Reykjavik, Iceland: Proceedings of Machine Learning Research.Google Scholar
Nakajima, S., and Watanabe, S. 2007. Variational Bayes Solution of Linear Neural Networks and Its Generalization Performance. Neural Computation, 19(4), 1112– 1153.Google Scholar
Nakajima, S., Sugiyama, M., and Babacan, S. D. 2011 (June 28–July 2). On Bayesian PCA: Automatic Dimensionality Selection and Analytic Solution. Pages 497– 504 of: Proceedings of 28th International Conference on Machine Learning (ICML2011). Bellevue, WA: Omnipress.Google Scholar
Nakajima, S., Sugiyama, M., Babacan, S. D., and Tomioka, R. 2013a. Global Analytic Solution of Fully-Observed Variational Bayesian Matrix Factorization. Journal of Machine Learning Research, 14, 137.Google Scholar
Nakajima, S., Sugiyama, M., and Babacan, S. D. 2013b. Variational Bayesian Sparse Additive Matrix Factorization. Machine Learning, 92, 3191347.Google Scholar
Nakajima, S., Takeda, A., Babacan, S. D., Sugiyama, M., and Takeuchi, I. 2013c. Global Solver and Its Efficient Approximation for Variational Bayesian Low-Rank Subspace Clustering. In: Advances in Neural Information Processing Systems 26. Lake Tahoe, NV: NIPS Foundation.Google Scholar
Nakajima, S., Sato, I., Sugiyama, M., Watanabe, K., and Kobayashi, H. 2014. Analysis of Variational Bayesian Latent Dirichlet Allocation: Weaker Sparsity Than MAP. Pages 1224–1232 of: Advances in NIPS, vol. 27. Montreal, Quebec:: NIPS Foundation.Google Scholar
Nakajima, S., Tomioka, R., Sugiyama, M., and Babacan, S. D. 2015. Condition for Perfect Dimensionality Recovery by Variational Bayesian PCA. Journal of Machine Learning Research, 16, 37573811.Google Scholar
Nakamura, F., and Watanabe, S. 2014. Asymptotic Behavior of Variational Free Energy for Normal Mixtures Using General Dirichlet Distribution (in Japanese). IEICE Transactions on Information and Systems, J97-D(5), 10011013.Google Scholar
Neal, R. M. 1996. Bayesian Learning for Neural Networks. New York: Springer.Google Scholar
Opper, M., and Winther, O. 1996. A Mean Field Algorithm for Bayes Learning in Large Feed-Forward Neural Networks. Pages 225–231 of: Advances in NIPS. Denver, CO: NIPS Foundation.Google Scholar
Pearson, K. 1914. Tables for Statisticians and Biometricians. Cambridge: Cambridge University Press.Google Scholar
Purushotham, S., Liu, Y., and Kuo, C. C. J. 2012. Collaborative Topic Regression with Social Matrix Factorization for Recommendation Systems. In: Proceedings of ICML. Edinburgh, UK: Omnipress.Google Scholar
Rabiner, L. R. 1989. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Pages 257–286 of: Proceedings of the IEEE. Piscataway, NJ: IEEE.Google Scholar
Ranganath, R., Gerrish, S., and Blei, D. M. 2013. Black Box Variational Inference. In: Proceedings of AISTATS. Scottsdale, AZ: Proceedings of Machine Learning Research.Google Scholar
Reinsel, G. R., and Velu, R. P. 1998. Multivariate Reduced-Rank Regression: Theory and Applications. New York: Springer.Google Scholar
Rissanen, J. 1986. Stochastic Complexity and Modeling. Annals of Statistics, 14(3), 10801100.Google Scholar
Robbins, H., and Monro, S. 1951. A Stochastic Approximation Method. Annals of Mathematical Statistics, 22(3), 400407.Google Scholar
Ruhe, A. 1970. Perturbation Bounds for Means of Eigenvalues and Invariant Subspaces. BIT Numerical Mathematics, 10, 343354.Google Scholar
Rusakov, D., and Geiger, D. 2005. Asymptotic Model Selection for Naive Bayesian Networks. Journal of Machine Learning Research, 6, 135.Google Scholar
Sakamoto, T., Ishiguro, M., and Kitagawa, G. 1986. Akaike Information Criterion Statistics. Dordrecht: D. Reidel Publishing Company.Google Scholar
Salakhutdinov, R., and Mnih, A. 2008. Probabilistic Matrix Factorization. Pages 1257– 1264 of: Platt, J. C., Koller, D., Singer, Y., and Roweis, S. (eds), Advances in Neural Information Processing Systems 20. Cambridge, MA: MIT Press.Google Scholar
Sato, I., Kurihara, K., and Nakagawa, H. 2012. Practical Collapsed Variational Bayes Inference for Hierarchical Dirichlet Process. Pages 105–113. In: Proceedings of KDD. New York and Beijing, China: Association for Computing Machinery.Google Scholar
Sato, M., Yoshioka, T., Kajihara, S., et al. 2004. Hierarchical Bayesian Estimation for MEG Inverse Problem. NeuroImage, 23, 806826.Google Scholar
Schwarz, G. 1978. Estimating the Dimension of a Model. Annals of Statistics, 6(2), 461464.Google Scholar
Seeger, M. 2008. Bayesian Inference and Optimal Design for the Sparse Linear Model. Journal of Machine Learning Research, 9, 759813.Google Scholar
Seeger, M. 2009. Sparse Linear Models: Variational Approximate Inference and Bayesian Experimental Design. In: Journal of Physics: Conference Series, vol. 197. Bristol, UK: IOP Publishing.Google Scholar
Seeger, M., and Bouchard, G. 2012. Fast Variational Bayesian Inference for Non-Conjugate Matrix Factorization Models. Pages 1012–1018. In: Proceedings of International Conference on Artificial Intelligence and Statistics. La Palma, Canary Islands: Proceedings of Machine Learning Research.Google Scholar
Shi, J., and Malik, J. 2000. Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888905.Google Scholar
Soltanolkotabi, M., and Candès, E. J. 2011. A Geometric Analysis of Subspace Clustering with Outliers. CoRR. arXiv:1112.4258 [cs.IT].Google Scholar
Spall, J. 2003. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. New York: John Wiley and Sons.Google Scholar
Srebro, N., and Jaakkola, T. 2003. Weighted Low Rank Approximation. In: Fawcett, T., and Mishra, N. (eds), Proceedings of the Twentieth International Conference on Machine Learning. Washington, DC: AAAI Press.Google Scholar
Srebro, N., Rennie, J., and Jaakkola, T. 2005. Maximum Margin Matrix Factorization. In: Advances in Neural Information Processing Systems 17. Vancouver, BC: NIPS Foundation.Google Scholar
Stein, C. 1956. Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution. Pages 197–206 of: Proceedings of the 3rd Berkeley Symposiium on Mathematics Statistics and Probability. Berkeley: University of California Press.Google Scholar
Takemura, A., and Kuriki, S. 1997. Weights of Chi-Squared Distribution for Smooth or Piecewise Smooth Cone Alternatives. Annals of Statistics, 25(6), 23682387.Google Scholar
Teh, Y. W., Newman, D., and Welling, M. 2007. A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation. In: Advances in NIPS. Vancouver, BC: NIPS Foundation.Google Scholar
Tipping, M. E. 2001. Sparse Bayesian Learning and the Relevance Vector Machine. Journal of Machine Learning Research, 1, 211244.Google Scholar
Tipping, M. E., and Bishop, C. M. 1999. Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society, 61, 611622.Google Scholar
Tomioka, R., Suzuki, T., Sugiyama, M., and Kashima, H. 2010. An Efficient and General Augmented Lagrangian Algorithm for Learning Low-Rank Matrices. In: Proceedings of International Conference on Machine Learning. Haifa, Israel: Omnipress.Google Scholar
Tron, R., and Vidal, R. 2007. A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms. Pages 1–8. In: Proceedings of CVPR. Minneapolis, MN.Google Scholar
Tucker, L. R. 1996. Some Mathematical Notes on Three-Mode Factor Analysis. Psychometrika, 31, 279311.Google Scholar
Ueda, N., Nakano, R., Ghahramani, Z., and Hinton, G. E. 2000. SMEM Algorithm for Mixture Models. Neural Computation, 12(9), 21092128.Google Scholar
van der Vaart, A. W. 1998. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge and New York: Cambridge University Press.Google Scholar
Vidal, R., and Favaro, P. 2014. Low Rank Subspace Clustering. Pattern Recognition Letters, 43(1), 4761.Google Scholar
Wachter, K. W. 1978. The Strong Limits of Random Matrix Spectra for Sample Matrices of Independent Elements. Annals of Probability, 6, 118.Google Scholar
Wainwright, M. J., and Jordan, M. I. 2008. Graphical Models, Exponential Families, and Variational Inference. Foundations and Trends in Machine Learning, 1, 1305.Google Scholar
Watanabe, K. 2012. An Alternative View of Variational Bayes and Asymptotic Approximations of Free Energy. Machine Learning, 86(2), 273293.Google Scholar
Watanabe, K., and Watanabe, S. 2004. Lower Bounds of Stochastic Complexities in Variational Bayes Learning of Gaussian Mixture Models. Pages 99–104 of: Proceedings of IEEE on CIS.Google Scholar
Watanabe, K., and Watanabe, S. 2005. Variational Bayesian Stochastic Complexity of Mixture Models. Pages 99–104. In: Advances in NIPS, vol. 18. Vancouver, BC: NIPS Foundation.Google Scholar
Watanabe, K., and Watanabe, S. 2006. Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation. Journal of Machine Learning Research, 7, 625644.Google Scholar
Watanabe, K., and Watanabe, S. 2007. Stochastic Complexities of General Mixture Models in Variational Bayesian Learning. Neural Networks, 20(2), 210219.Google Scholar
Watanabe, K., Shiga, M., and Watanabe, S. 2006. Upper Bounds for Variational Stochastic Complexities of Bayesian Networks. Pages 139–146 of: Proceedings of IDEAL. Burgos, Spain: Springer.Google Scholar
Watanabe, K., Shiga, M., and Watanabe, S. 2009. Upper Bound for Variational Free Energy of Bayesian Networks. Machine Learning, 75(2), 199215.Google Scholar
Watanabe, K., Okada, M., and Ikeda, K. 2011. Divergence Measures and a General Framework for Local Variational Approximation. Neural Networks, 24(10), 1102– 1109.Google Scholar
Watanabe, S. 2001a. Algebraic Analysis for Nonidentifiable Learning Machines. Neural Computation, 13(4), 899933.Google Scholar
Watanabe, S. 2001b. Algebraic Information Geometry for Learning Machines with Singularities. Pages 329–336 of: Advances in NIPS, vol. 13. Vancouver, BC: NIPS Foundation.Google Scholar
Watanabe, S. 2009. Algebraic Geometry and Statistical Learning Theory. Cambridge: Cambridge University Press.Google Scholar
Watanabe, S. 2010. Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 35713594.Google Scholar
Watanabe, S. 2013. A Widely Applicable Bayesian Information Criterion. Journal of Machine Learning Research, 14, 867897.Google Scholar
Watanabe, S., and Amari, S. 2003. Learning Coefficients of Layered Models When the True Distribution Mismatches the Singularities. Neural Computation, 15, 1013– 1033.Google Scholar
Wei, X., and Croft, W. B. 2006. LDA-Based Document Models for Ad-Hoc Retrieval. Pages 178–185 of: Proceedings of SIGIR. Seattle, WA: Association for Computing Machinery New York.Google Scholar
Wingate, D., and Weber, T. 2013. Automated Variational Inference in Probabilistic Programming. arXiv:1301.1299.Google Scholar
Yamazaki, K. 2016. Asymptotic Accuracy of Bayes Estimation for Latent Variables with Redundancy. Machine Learning, 102(1), 128.Google Scholar
Yamazaki, K., and Kaji, D. 2013. Comparing Two Bayes Methods Based on the Free Energy Functions in Bernoulli Mixtures. Neural Networks, 44, 3643.Google Scholar
Yamazaki, K., and Watanabe, S. 2003a. Singularities in Mixture Models and Upper Bounds Pages 1–8. of Stochastic Complexity. Neural Networks, 16(7), 10291038.Google Scholar
Yamazaki, K., and Watanabe, S. 2003b. Stochastic Complexity of Bayesian Networks. Pages 592–599 of: Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence. Acapulco, Mexico: Morgan Kaufmann.Google Scholar
Yamazaki, K., and Watanabe, S. 2004. Newton Diagram and Stochastic Complexity in Mixture of Binomial Distributions. Pages 350–364. In: Proceedings of ALT. Padova, Italy: Springer.Google Scholar
Yamazaki, K., and Watanabe, S. 2005. Algebraic Geometry and Stochastic Complexity of Hidden Markov Models. Neurocomputing, 69, 6284.Google Scholar
Yamazaki, K., Aoyagi, M., and Watanabe, S. 2010. Asymptotic Analysis of Bayesian Generalization Error with Newton Diagram. Neural Networks, 23(1), 3543.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×