Parallel Graph-Based Semi-Supervised Learning

doi:10.1017/CBO9781139042918.016

15 - Parallel Graph-Based Semi-Supervised Learning

from Part Three - Alternative Learning Settings

Published online by Cambridge University Press: 05 February 2012

Jeff Bilmes and

Edited by

Mikhail Bilenko and

Jeff Bilmes: Affiliation:
University of Washington
Amarnag Subramanya: Affiliation:
Google Research, Mountain View, CA, USA
Ron Bekkerman: Affiliation:
LinkedIn Corporation, Mountain View, California
Mikhail Bilenko: Affiliation:
Microsoft Research, Redmond, Washington
John Langford: Affiliation:
Yahoo! Research, New York

Book contents

Get access

Summary

Semi-supervised learning (SSL) is the process of training decision functions using small amounts of labeled and relatively large amounts of unlabeled data. In many applications, annotating training data is time consuming and error prone. Speech recognition is the typical example, which requires large amounts of meticulously annotated speech data (Evermann et al., 2005) to produce an accurate system. In the case of document classification for internet search, it is not even feasible to accurately annotate a relatively large number of web pages for all categories of potential interest. SSL lends itself as a useful technique in many machine learning applications because one need annotate only relatively small amounts of the available data. SSL is related to the problem of transductive learning (Vapnik, 1998). In general, a learner is transductive if it is designed for prediction on only a closed dataset, where the test set is revealed at training time. In practice, however, transductive learners can be modified to handle unseen data (Sindhwani, Niyogi, and Belkin, 2005; Zhu, 2005a). Chapter 25 in Chapelle, Scholkopf, and Zien (2007) gives a full discussion on the relationship between SSL and transductive learning. In this chapter, SSL refers to the semi-supervised transductive classification problem.

Let x ∈ X denote the input to the decision function (classifier), f, and y ∈ Y denote its output label, that is, f : X → Y. In most cases f(x) = argmaxy∈Yp(y|x).

Information

Type: Chapter
Information: Scaling up Machine Learning
Parallel and Distributed Approaches
, pp. 307 - 330

DOI: https://doi.org/10.1017/CBO9781139042918.016 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Alexandrescu, A., and Kirchhoff, K. 2007. Graph-Based Learning for Statistical Machine Translation. In: Proceeding of the Human Language Technologies Conference (HLT-NAACL).Google Scholar

Arya, S., and Mount, D. M. 1993. Approximate Nearest Neighbor Queries in Fixed Dimensions. In: ACM-SIAM Symposium on Discrete Algorithms (SODA).Google Scholar

Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., and Wu, A. 1998. An Optimal Algorithm for Approximate Nearest Neighbor Searching. Journal of the ACM.CrossRef Google Scholar

Balcan, M.-F., and Blum, A. 2005. A PAC-Style Model for Learning from Labeled and Unlabeled Data. Pages 111–126 of: COLT.Google Scholar

Baluja, S., Seth, R., Sivakumar, D., Jing, Y., Yagnik, J., Kumar, S., Ravichandran, D., and Aly, M. 2008. Video Suggestion and Discovery for YouTube: Taking Random Walks through the View Graph. Pages 895–904 of: Proceeding of the 17th International conference on World Wide Web. ACM.CrossRef Google Scholar

Belkin, M., Niyogi, P., and Sindhwani, V. 2005. On Manifold Regularization. In: Proceedings of the Conference on Artificial Intelligence and Statistics (AISTATS).Google Scholar

Bengio, Y., Delalleau, O., and Roux, N. L. 2007. Label Propagation and Quadratic Criterion. In: Semi-Supervised Learning. Cambridge, MA: MIT Press.Google Scholar

Bertsekas, D. 1999. Nonlinear Programming. Athena Scientific.Google Scholar

Bie, T. D., and Cristianini, N. 2003. Convex Methods for Transduction. Pages 73–80 of: Advances in Neural Information Processing Systems 16. Cambridge, MA: MIT Press.Google Scholar

Bilmes, J. A. 1998. A Gentle Tutorial on the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report ICSI-TR-97-021. University of Berkeley.Google Scholar

Bishop, C. (ed). 1995. Neural Networks for Pattern Recognition. New York: Oxford University Press.Google Scholar

Blitzer, J., and Zhu, J. 2008. ACL 2008 Tutorial on Semi-supervised Learning. http://ssl-acl08.wikidot.com/.

Blum, A., and Chawla, S. 2001. Learning from Labeled and Unlabeled Data Using Graph Mincuts. Pages 19–26 of: Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann.Google Scholar

Chapelle, O., Scholkopf, B., and Zien, A. 2007. Semi-Supervised Learning. Cambridge, MA: MIT Press.Google Scholar

Collobert, R., Sinz, F., Weston, J., Bottou, L., and Joachims, T. 2006. Large Scale Transductive SVMs. Journal of Machine Learning Research.Google Scholar

Corduneanu, A., and Jaakkola, T. 2003. On Information Regularization. In: Uncertainty in Artificial Intelligence.Google Scholar

Delalleau, O., Bengio, Y., and Roux, N. L. 2005. Efficient Non-parametric Function Induction in Semi-Supervised Learning. In: Proceedings of the Conference on Artificial Intelligence and Statistics (AISTATS).Google Scholar

Dempster, A. P., Laird, N. M., Rubin, D. B., et al. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39(1), 1–38.Google Scholar

Deshmukh, N., Ganapathiraju, A., Gleeson, A., Hamaker, J., and Picone, J. 1998 (November). Resegmentation of Switchboard. Pages 1543–1546 of: Proceedings of the International Conference on Spoken Language Processing.

Evermann, G., Chan, H. Y., Gales, M. J. F., Jia, B., Mrva, D., Woodland, P. C., and Yu, K. 2005. Training LVCSR Systems on Thousands of Hours of Data. In: Proceedings of ICASSP.Google Scholar

Frey, B. J., and Dueck, D. 2007. Clustering by Passing Messages between Data Points. Science, 315(5814), 972.CrossRef Google Scholar PubMed

Friedman, J. H., Bentley, J. L., and Finkel, R. A. 1977. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Transaction on Mathematical Software, 3.Google Scholar

Garcke, J., and Griebel, M. 2005. Semi-supervised Learning with Sparse Grids. In: Proceedings of the 22nd ICML Workshop on Learning with Partially Classified Training Data.Google Scholar

Godfrey, J., Holliman, E., and McDaniel, J. 1992 (March). SWITCHBOARD: Telephone Speech Corpus for Research and Development. Pages 517–520 of: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1.Google Scholar

Goldman, S., and Zhou, Y. 2000. Enhancing Supervised Learning with Unlabeled Data. Pages 327–334 of: Proceedings of the 17th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann.Google Scholar

Greenberg, S. 1995. The Switchboard Transcription Project. Technical Report, The Johns Hopkins University (CLSP) Summer Research Workshop.Google Scholar

Greenberg, S., Hollenback, J., and Ellis, D. 1996. Insights into Spoken Language Gleaned from Phonetic Transcription of the Switchboard Corpus. Pages 24–27 of: ICSLP.Google Scholar

Haffari, G.R., and Sarkar, A. 2007. Analysis of Semi-supervised Learning with the Yarowsky Algorithm. In: UAI.Google Scholar

Hosmer, D. W. 1973. A Comparison of Iterative Maximum Likelihood Estimates of the Parameters of a Mixture of Two Normal Distributions under Three Different Types of Sample. Biometrics.CrossRef Google Scholar

Huang, X., Acero, A., and Hon, H. 2001. Spoken Language Processing. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar

Jebara, T.,Wang, J., and Chang, S.F. 2009. Graph Construction and b-Matching for Semi-supervised Learning. In: International Conference on Machine Learning.Google Scholar

Joachims, T. 2003. Transductive Learning via Spectral Graph Partitioning. In: Proceedings of the International Conference on Machine Learning (ICML).Google Scholar

Karlen, M., Weston, J., Erkan, A., and Collobert, R. 2008. Large Scale Manifold Transduction. In: International Conference on Machine Learning, ICML.CrossRef Google Scholar

Lawrence, N. D., and Jordan, M. I. 2005. Semi-supervised Learning via Gaussian Processes. In: Neural Information Processing Systems.Google Scholar

Malkin, J., Subramanya, A., and Bilmes, J.A. 2009 (September). On the Semi-Supervised Learning of Multi-Layered Perceptrons. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH).Google Scholar

McLachlan, G. J., and Ganesalingam, S. 1982. Updating a Discriminant Function on the Basis of Unclassified Data. Communication in Statistics: Simulation and Computation.CrossRef Google Scholar

Nadler, B., Srebro, N., and Zhou, X. 2010. Statistical Analysis of Semi-supervised Learning: The Limit of Infinite Unlabelled Data. In: Advances in Neural Information Processing Systems (NIPS).Google Scholar

Ng, A., and Jordan, M. 2002. On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes. In: Advances in Neural Information Processing Systems (NIPS).Google Scholar

Nigam, G. 2001. Using Unlabeled Data to Improve Text Classification. Ph.D. thesis, CMU.

Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA: Morgan Kaufmann.Google Scholar

Scudder, H. J. 1965. Probability of Error of Some Adaptive Pattern-Recognition Machines. IEEE Transactions on Information Theory, 11.CrossRef Google Scholar

Seeger, M. 2000. Learning with Labeled and Unlabeled Data. Technical Report, University of Edinburgh, UK.Google Scholar

Shi, J., and Malik, J. 2000. Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence.Google Scholar

Sindhwani, V., and Selvaraj, S. K. 2006. Large Scale Semi-Supervised Linear SVMs. In: SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR.CrossRef Google Scholar

Sindhwani, V., Niyogi, P., and Belkin, M. 2005. Beyond the Point Cloud: From Transductive to Semi-supervised learning. In: Proceedings of the International Conference on Machine Learning (ICML).CrossRef Google Scholar

Subramanya, A., and Bilmes, J. 2008. Soft-Supervised Text Classification. In: EMNLP.CrossRef Google Scholar

Subramanya, A., and Bilmes, J. 2009a. Entropic Regularization in Non-parametric Graph-Based Learning. In: NIPS.Google Scholar

Subramanya, A., and Bilmes, J. 2009b. The Semi-supervised Switchboard Transcription Project. In: Interspeech.Google Scholar

Subramanya, A., and Bilmes, J. 2011. Semi-Supervised Learning with Measure Propagation. Journal of Machine Learning Research.Google Scholar

Subramanya, A., Bartels, C., Bilmes, J., and Nguyen, P. 2007. Uncertainty in Training Large Vocabulary Speech Recognizers. In: Proceedings of the IEEE Workshop on Speech Recognition and Understanding.Google Scholar

Szummer, M., and Jaakkola, T. 2001. Partially Labeled Classification with Markov Random Walks. In: Advances in Neural Information Processing Systems, vol. 14.Google Scholar

Talukdar, P. P., and Crammer, K. 2009. New Regularized Algorithms for Transductive Learning. In: European Conference on Machine Learning (ECML-PKDD).Google Scholar

Tomkins, A. 2008. Keynote Speech. CIKM Workshop on Search and Social Media.

Tsang, I.W., and Kwok, J. T. 2006. Large-Scale Sparsified Manifold Regularization. In: Advances in Neural Information Processing Systems (NIPS) 19.Google Scholar

Tsuda, K. 2005. Propagating Distributions on a Hypergraph by Dual Information Regularization. In: Proceedings of the 22nd International Conference on Machine Learning.Google Scholar

Vapnik, V. 1998. Statistical Learning Theory. New York: Wiley.Google Scholar

Vazirani, V. V. 2001. Approximation Algorithms. New York: Springer.Google Scholar

Wang, F., and Zhang, C. 2006. Label Propagation through Linear Neighborhoods. Pages 985–992 of: Proceedings of the 23rd International Conference on Machine Learning. New York: ACM.Google Scholar

White, N. 1986. Theory of Matroids. Cambridge University Press.CrossRef Google Scholar

Woess, W. 2000. Random Walks on Infinite Graphs and Groups. Cambridge Tracts in Mathematics 138. New York: Cambridge University Press.Google Scholar

Zhu, X. 2005a. Semi-Supervised Learning Literature Survey. Technical Report 1530. Computer Sciences, University of Wisconsin–Madison.Google Scholar

Zhu, X. 2005b. Semi-Supervised Learning with Graphs. Ph.D. thesis, Carnegie Mellon University.

Zhu, X., and Ghahramani, Z. 2002a. Learning from Labeled and Unlabeled Data with Label Propagation. Technical Report, Carnegie Mellon University.Google Scholar

Zhu, X., and Ghahramani, Z. 2002b. Towards Semi-supervised Classification with Markov Random Fields. Technical Report CMU-CALD-02-106. Carnegie Mellon University.Google Scholar

Zhu, X., and Goldberg, A.B. 2009. Introduction to Semi-supervised Learning. Morgan & Claypool.Google Scholar

Zhu, X., Ghahramani, Z., and Lafferty, J. 2003. Semi-supervised Learning using Gaussian Fields and Harmonic Functions. In: Proceedings of the International Conference on Machine Learning (ICML).Google Scholar

Accessibility standard: Unknown

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.