Book contents
- Frontmatter
- Contents
- Contributors
- Preface
- 1 Scaling Up Machine Learning: Introduction
- Part One Frameworks for Scaling Up Machine Learning
- Part Two Supervised and Unsupervised Learning Algorithms
- Part Three Alternative Learning Settings
- 14 Parallel Online Learning
- 15 Parallel Graph-Based Semi-Supervised Learning
- 16 Distributed Transfer Learning via Cooperative Matrix Factorization
- 17 Parallel Large-Scale Feature Selection
- Part Four Applications
- Subject Index
- References
16 - Distributed Transfer Learning via Cooperative Matrix Factorization
from Part Three - Alternative Learning Settings
Published online by Cambridge University Press: 05 February 2012
- Frontmatter
- Contents
- Contributors
- Preface
- 1 Scaling Up Machine Learning: Introduction
- Part One Frameworks for Scaling Up Machine Learning
- Part Two Supervised and Unsupervised Learning Algorithms
- Part Three Alternative Learning Settings
- 14 Parallel Online Learning
- 15 Parallel Graph-Based Semi-Supervised Learning
- 16 Distributed Transfer Learning via Cooperative Matrix Factorization
- 17 Parallel Large-Scale Feature Selection
- Part Four Applications
- Subject Index
- References
Summary
Machine learning and data-mining technologies have already achieved significant success in many knowledge engineering areas including web search, computational advertising, recommender systems, etc. A major challenge in machine learning is the data sparsity problem. For example, in the domain of online recommender systems, we attempt to recommend information items (e.g., movies, TV, books, news, images, web pages, etc.) that are likely to be of interest to the user. However, the item space is usually very large and the amount of user preference values is small. When the user data are too sparse, it is difficult to obtain a reliable and useful model for recommendation. Whereas large online sites like Amazon and Google can easily access huge volumes of user data, the enormous number of smaller online business sites, which collectively constitute the long tail of the web, are much more likely to have very sparse user data and have difficulty in generating accurate recommendations. One potential solution to the data sparsity problem is to transfer knowledge from other information sources (e.g., Mehta and Hofmann, 2007; Li, Yang, and Xue, 2009). Such techniques for knowledge transfer are called transfer learning (see, e.g., Pan and Yang, 2010). An additional issue is that, in reality, many small websites often attract similar users and/or provide similar items, if not the identical ones, which implies that data about such users/items could potentially be distributed across different systems. For example, Delicious and Digg are both popular online social bookmarking tools.
- Type
- Chapter
- Information
- Scaling up Machine LearningParallel and Distributed Approaches, pp. 331 - 351Publisher: Cambridge University PressPrint publication year: 2011