Distributed Transfer Learning via Cooperative Matrix Factorization

doi:10.1017/CBO9781139042918.017

16 - Distributed Transfer Learning via Cooperative Matrix Factorization

from Part Three - Alternative Learning Settings

Published online by Cambridge University Press: 05 February 2012

Evan Xiang ,

Nathan Liu and

Edited by

Mikhail Bilenko and

Evan Xiang: Affiliation:
Hong Kong University of Science and Technology
Nathan Liu: Affiliation:
Hong Kong University of Science and Technology
Qiang Yang: Affiliation:
Hong Kong University of Science and Technology
Ron Bekkerman: Affiliation:
LinkedIn Corporation, Mountain View, California
Mikhail Bilenko: Affiliation:
Microsoft Research, Redmond, Washington
John Langford: Affiliation:
Yahoo! Research, New York

Book contents

Get access

Summary

Machine learning and data-mining technologies have already achieved significant success in many knowledge engineering areas including web search, computational advertising, recommender systems, etc. A major challenge in machine learning is the data sparsity problem. For example, in the domain of online recommender systems, we attempt to recommend information items (e.g., movies, TV, books, news, images, web pages, etc.) that are likely to be of interest to the user. However, the item space is usually very large and the amount of user preference values is small. When the user data are too sparse, it is difficult to obtain a reliable and useful model for recommendation. Whereas large online sites like Amazon and Google can easily access huge volumes of user data, the enormous number of smaller online business sites, which collectively constitute the long tail of the web, are much more likely to have very sparse user data and have difficulty in generating accurate recommendations. One potential solution to the data sparsity problem is to transfer knowledge from other information sources (e.g., Mehta and Hofmann, 2007; Li, Yang, and Xue, 2009). Such techniques for knowledge transfer are called transfer learning (see, e.g., Pan and Yang, 2010). An additional issue is that, in reality, many small websites often attract similar users and/or provide similar items, if not the identical ones, which implies that data about such users/items could potentially be distributed across different systems. For example, Delicious and Digg are both popular online social bookmarking tools.

Information

Type: Chapter
Information: Scaling up Machine Learning
Parallel and Distributed Approaches
, pp. 331 - 351

DOI: https://doi.org/10.1017/CBO9781139042918.017 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022.Google Scholar

Chen, W.-Y., Chu, J.-C., Luan, J., Bai, H., Wang, Y., and Chang, E. Y. 2009. Collaborative Filtering for Orkut Communities: Discovery of User Latent Behavior. Pages 681–690 of: WWW '09: Proceedings of the 18th International Conference on World Wide Web.CrossRef Google Scholar

Dai, W., Xue, G.-R., Yang, Q., and Yu, Y. 2007. Co-clustering Based Classification for Out-of-Domain Documents. Pages 210–219 of: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2007, San Jose, California, USA.Google Scholar

Das, A. S., Datar, M., Garg, A., and Rajaram, S. 2007. Google News Personalization: Scalable Online Collaborative Filtering. Pages 271–280 of: WWW '07: Proceedings of the 16th International Conference on World Wide Web.CrossRef Google Scholar

Dean, J., and Ghemawat, S. 2008. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51(1), 107–113.CrossRef Google Scholar

Koren, Y., Bell, R., and Volinsky, C. 2009. Matrix Factorization Techniques for Recommender Systems. IEEE Computer, 42(8), 30–37.CrossRef Google Scholar

Li, B., Yang, Q., and Xue, X. 2009. Can Movies and Books Collaborate? Cross-domain Collaborative Filtering for Sparsity Reduction. Pages 2052–2057 of: International Joint Conference on Artificial Intelligence (IJCAI).Google Scholar

Mehta, B., and Hofmann, T. 2007. Cross System Personalization and Collaborative Filtering by Learning Manifold Alignments. Pages 244–259 of: KI 2006: Advances in Artificial Intelligence.Google Scholar

Pan, S. J., and Yang, Q. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.CrossRef Google Scholar

Snir, M., Otto, S., Huss-Lederman, S., Walker, D., and Dongarra, J. 1998. MPI-The Complete Reference, Vol. 1: The MPI Core.

Xue, G.-R., Dai, W., Yang, Q., and Yu, Y. 2008. Topic-bridged PLSA for Cross-domain Text Classification. Pages 627–634 of: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, Singapore, July 20–24.CrossRef Google Scholar

Zhou, Y., Wilkinson, D., Schreiber, R., and Pan, R. 2008. Large-scale Parallel Collaborative Filtering for the Netflix Prize. Pages 337–348 of: AAIM '08: Proceedings of the 4th International Conference on Algorithmic Aspects in Information and Management.Google Scholar

Zhu, S., Yu, K., Chi, Y., and Gong, Y. 2007. Combining Content and Link for Classification Using Matrix Factorization. Pages 487–494 of: SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.CrossRef Google Scholar

Accessibility standard: Unknown

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

Accessibility compliance for the PDF of this chapter is currently unknown and may be updated in the future.