Search results for Computer Science

18 - Large-Scale Learning for Vision with GPUs
from Part Four - Applications
- By Adam Coates, Stanford University, Rajat Raina, Facebook Inc., Palo Alto, CA, USA, Andrew Y. Ng, Stanford University
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 373-398
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Computer vision is a challenging application area for learning algorithms. For instance, the task of object detection is a critical problem for many systems, like mobile robots, that remains largely unsolved. In order to interact with the world, robots must be able to locate and recognize large numbers of objects accurately and at reasonable speeds. Unfortunately, off-the-shelf computer vision algorithms do not yet achieve sufficiently high detection performance for these applications. A key difficulty with many existing algorithms is that they are unable to take advantage of large numbers of examples. As a result, they must rely heavily on prior knowledge and hand-engineered features that account for the many kinds of errors that can occur. In this chapter, we present two methods for improving performance by scaling up learning algorithms to large datasets: (1) using graphics processing units (GPUs) and distributed systems to scale up the standard components of computer vision algorithms and (2) using GPUs to automatically learn high-quality feature representations using deep belief networks (DBNs). These methods are capable of not only achieving high performance but also removing much of the need for hand-engineering common in computer vision algorithms.
The fragility of many vision algorithms comes from their lack of knowledge about the multitude of visual phenomena that occur in the real world. Whereas humans can intuit information about depth, occlusion, lighting, and even motion from still images, computer vision algorithms generally lack the ability to deal with these phenomena without being engineered to account for them in advance.

13 - Parallelizing Information-Theoretic Clustering Methods
from Part Two - Supervised and Unsupervised Learning Algorithms
- By Ron Bekkerman, LinkedIn Corporation, Mountain View, CA, USA, Martin Scholz, HP Labs, Palo Alto, CA, USA
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 262-280
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Facing a problem of clustering amultimillion-data-point collection, amachine learning practitioner may choose to apply the simplest clustering method possible, because it is hard to believe that fancier methods can be applicable to datasets of such scale. Whoever is about to adopt this approach should first weigh the following considerations:
Simple clustering methods are rarely effective. Indeed, four decades of research would not have been spent on data clustering if a simple method could solve the problem. Moreover, even the simplest methods may run for long hours on a modern PC, given a large-scale dataset. For example, consider a simple online clustering algorithm (which, we believe, is machine learning folklore): first initialize k clusters with one data point per cluster, then iteratively assign the rest of data points into their closest clusters (in the Euclidean space). If k is small enough, we can run this algorithm on one machine, because it is unnecessary to keep the entire data in RAM. However, besides being slow, it will produce low-quality results, especially when the data is highly multi-dimensional.
State-of-the-art clustering methods can scale well, which we aim to justify in this chapter.
With the deployment of large computational facilities (such as Amazon.com's EC2, IBM's BlueGene, and HP's XC), the Parallel Computing paradigm is probably the only currently available option for tackling gigantic data processing tasks. Parallel methods are becoming an integral part of any data processing system, and thus getting special attention (e.g., universities introduce parallel methods to their core curricula; see Johnson et al., 2008).

Frontmatter
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

17 - Parallel Large-Scale Feature Selection
from Part Three - Alternative Learning Settings
- By Jeremy Kubica, Google Inc., Pittsburgh, PA, USA, Sameer Singh, University of Massachusetts, Daria Sorokina, Yandex Labs, Palo Alto, CA, USA
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 352-370
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The set of features used by a learning algorithm can have a dramatic impact on the performance of the algorithm. Including extraneous features can make the learning problem more difficult by adding useless, noisy dimensions that lead to over-fitting and increased computational complexity. Conversely, excluding useful features can deprive the model of important signals. The problem of feature selection is to find a subset of features that allows the learning algorithm to learn the “best” model in terms of measures such as accuracy or model simplicity.
The problem of feature selection continues to grow in both importance and difficulty as extremely high-dimensional datasets become the standard in real-world machine learning tasks. Scalability can become a problem for even simple approaches. For example, common feature selection approaches that evaluate each new feature by training a new model containing that feature require learning a linear number of models each time they add a new feature. This computational cost can add up quickly when we iteratively add many new features. Even those techniques that use relatively computationally inexpensive tests of a feature's value, such as mutual information, require at least linear time in the number of features being evaluated.
As a simple illustrative example, consider the task of classifying websites. In this case, the dataset could easily contain many millions of examples. Including very basic features such as text unigrams on the page or HTML tags could easily provide many thousands of potential features for the model.

Part Three - Alternative Learning Settings
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 281-282
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

19 - Large-Scale FPGA-Based Convolutional Networks
from Part Four - Applications
- By Clément Farabet, New York University, Yann Lecun, New York University, Koray Kavukcuoglu, NEC Labs America, Princeton, NJ, USA, Berin Martini, Yale University, Polina Akselrod, Yale University, Selcuk Talay, Yale University, Eugenio Culurciello, Yale University
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 399-419
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Micro-robots, unmanned aerial vehicles, imaging sensor networks, wireless phones, and other embedded vision systems all require low cost and high-speed implementations of synthetic vision systems capable of recognizing and categorizing objects in a scene.
Many successful object recognition systems use dense features extracted on regularly spaced patches over the input image. The majority of the feature extraction systems have a common structure composed of a filter bank (generally based on oriented edge detectors or 2D Gabor functions), a nonlinear operation (quantization, winner-take-all, sparsification, normalization, and/or pointwise saturation), and finally a pooling operation (max, average, or histogramming). For example, the scale-invariant feature transform (SIFT) (Lowe, 2004) operator applies oriented edge filters to a small patch and determines the dominant orientation through a winner-take-all operation. Finally, the resulting sparse vectors are added (pooled) over a larger patch to form a local orientation histogram. Some recognition systems use a single stage of feature extractors (Lazebnik, Schmid, and Ponce, 2006; Dalal and Triggs, 2005; Berg, Berg, and Malik, 2005; Pinto, Cox, and DiCarlo, 2008).
Other models such as HMAX-type models (Serre, Wolf, and Poggio, 2005; Mutch, and Lowe, 2006) and convolutional networks use two more layers of successive feature extractors. Different training algorithms have been used for learning the parameters of convolutional networks. In LeCun et al. (1998b) and Huang and LeCun (2006), pure supervised learning is used to update the parameters. However, recent works have focused on training with an auxiliary task (Ahmed et al., 2008) or using unsupervised objectives (Ranzato et al., 2007b; Kavukcuoglu et al., 2009; Jarrett et al., 2009; Lee et al., 2009).

2 - MapReduce and Its Application to Massively Parallel Learning of Decision Tree Ensembles
from Part One - Frameworks for Scaling Up Machine Learning
- By Biswanath Panda, Google Inc., Mountain View, CA, USA, Joshua S. Herbach, Google Inc., Mountain View, CA, USA, Sugato Basu, Google Research, Mountain View, CA, USA, Roberto J. Bayardo, Google Research, Mountain View, CA, USA
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 23-48
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter we look at leveraging the MapReduce distributed computing framework (Dean and Ghemawat, 2004) for parallelizing machine learning methods of wide interest, with a specific focus on learning ensembles of classification or regression trees. Building a production-ready implementation of a distributed learning algorithm can be a complex task. With the wide and growing availability of MapReduce-capable computing infrastructures, it is natural to ask whether such infrastructures may be of use in parallelizing common data mining tasks such as tree learning. For many data mining applications, MapReduce may offer scalability as well as ease of deployment in a production setting (for reasons explained later).
We initially give an overview of MapReduce and outline its application in a classic clustering algorithm, k-means. Subsequently, we focus on PLANET: a scalable distributed framework for learning tree models over large datasets. PLANET defines tree learning as a series of distributed computations and implements each one using the MapReduce model. We show how this framework supports scalable construction of classification and regression trees, as well as ensembles of such models. We discuss the benefits and challenges of using a MapReduce compute cluster for tree learning and demonstrate the scalability of this approach by applying it to a real-world learning task from the domain of computational advertising.
MapReduce is a simple model for distributed computing that abstracts away many of the difficulties in parallelizing data management operations across a cluster of commodity machines.

5 - Uniformly Fine-Grained Data-Parallel Computing for Machine Learning Algorithms
from Part One - Frameworks for Scaling Up Machine Learning
- By Meichun Hsu, HP Labs, Palo Alto, CA, USA, Ren Wu, HP Labs, Palo Alto, CA, USA, Bin Zhang, HP Labs, Palo Alto, CA, USA
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 89-106
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The graphics processing unit (GPU) of modern computers has evolved into a powerful, general-purpose, massively parallel numerical (co-)processor. The numerical computation in a number of machine learning algorithms fits well on the GPU. To help identify such algorithms, we present uniformly fine-grained data-parallel computing and illustrate it on two machine learning algorithms, clustering and regression clustering, on a GPU and central processing unit (CPU) mixed computing architecture. We discuss the key issues involved in a successful design of the algorithms, data structures, and computation partitioning between a CPU and a GPU. Performance gains on a CPU and GPU mixed architecture are compared with the performance of the regression clustering algorithm implemented completely on a CPU. Significant speedups are reported. A GPU and CPU mixed architecture also achieves better cost-performance and energy-performance ratios.
The computing power of the CPU has increased dramatically in the past few decades, supported by both miniaturization and increasing clock frequencies. More and more electronic gates were packed onto the same area of a silicon die as miniaturization continued. Hardware-supported parallel computing, pipelining for example, further increased the computing power of CPUs. Frequency increases speeded up CPUs even more directly. However, the long-predicted physical limit of the miniaturization process was finally hit a few years ago such that increasing the frequency was no longer feasible due to the accompanied nonlinear increase in power consumption, even though miniaturization still continues.

4 - IBM Parallel Machine Learning Toolbox
from Part One - Frameworks for Scaling Up Machine Learning
- By Edwin Pednault, IBM Research, Yorktown Heights, NY, USA, Elad Yom-Tov, Yahoo! Research, New York, NY, USA, Amol Ghoting, IBM Research, Yorktown Heights, NY, USA
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 69-88
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In many ways, the objective of the IBM Parallel Machine Learning Toolbox (PML) is similar to that of Google's MapReduce programming model (Dean and Ghemawat, 2004) and the open source Hadoop system, which is to provide Application Programming Interfaces (APIs) that enable programmers who have no prior experience in parallel and distributed systems to nevertheless implement parallel algorithms with relative ease. Like MapReduce and Hadoop, PML supports associative-commutative computations as its primary parallelization mechanism. Unlike MapReduce and Hadoop, PML fundamentally assumes that learning algorithms can be iterative in nature, requiring multiple passes over data. It also extends the associative-commutative computational model in various aspects, the most important of which are:
The ability to maintain the state of each worker node between iterations, making it possible, for example, to partition and distribute data structures across workers
Efficient distribution of data, including the ability for each worker to read a subset of the data, to sample the data, or to scan the entire dataset
Access to both sparse and dense datasets
Parallel merge operations using tree structures for efficient collection of worker results on very large clusters
In order to make these extensions to the computational model and still address ease of use, PML provides an object-oriented API in which algorithms are objects that implement a predefined set of interface methods. The PML infrastructure then uses these interface methods to distribute algorithm objects and their computations across multiple compute nodes.

1 - Scaling Up Machine Learning: Introduction
- By Ron Bekkerman, LinkedIn Corporation, Mountain View, CA, USA, Mikhail Bilenko, Microsoft Research, Redmond, WA, USA, John Langford, Yahoo! Research, New York, NY, USA
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 1-20
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Distributed and parallel processing of very large datasets has been employed for decades in specialized, high-budget settings, such as financial and petroleum industry applications. Recent years have brought dramatic progress in usability, cost effectiveness, and diversity of parallel computing platforms, with their popularity growing for a broad set of data analysis and machine learning tasks.
The current rise in interest in scaling up machine learning applications can be partially attributed to the evolution of hardware architectures and programming frameworks that make it easy to exploit the types of parallelism realizable in many learning algorithms. A number of platforms make it convenient to implement concurrent processing of data instances or their features. This allows fairly straightforward parallelization of many learning algorithms that view input as an unordered batch of examples and aggregate isolated computations over each of them.
Increased attention to large-scale machine learning is also due to the spread of very large datasets across many modern applications. Such datasets are often accumulated on distributed storage platforms, motivating the development of learning algorithms that can be distributed appropriately. Finally, the proliferation of sensing devices that perform real-time inference based on high-dimensional, complex feature representations drives additional demand for utilizing parallelism in learning-centric applications. Examples of this trend include speech recognition and visual object detection becoming commonplace in autonomous robots and mobile devices.

14 - Parallel Online Learning
from Part Three - Alternative Learning Settings
- By Daniel Hsu, Rutgers University, Nikos Karampatziakis, Cornell University, John Langford, Yahoo! Research, New York, NY, USA, Alex J. Smola, Yahoo! Research, Santa Clara, NY, USA
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 283-306
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

15 - Parallel Graph-Based Semi-Supervised Learning
from Part Three - Alternative Learning Settings
- By Jeff Bilmes, University of Washington, Amarnag Subramanya, Google Research, Mountain View, CA, USA
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 307-330
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Semi-supervised learning (SSL) is the process of training decision functions using small amounts of labeled and relatively large amounts of unlabeled data. In many applications, annotating training data is time consuming and error prone. Speech recognition is the typical example, which requires large amounts of meticulously annotated speech data (Evermann et al., 2005) to produce an accurate system. In the case of document classification for internet search, it is not even feasible to accurately annotate a relatively large number of web pages for all categories of potential interest. SSL lends itself as a useful technique in many machine learning applications because one need annotate only relatively small amounts of the available data. SSL is related to the problem of transductive learning (Vapnik, 1998). In general, a learner is transductive if it is designed for prediction on only a closed dataset, where the test set is revealed at training time. In practice, however, transductive learners can be modified to handle unseen data (Sindhwani, Niyogi, and Belkin, 2005; Zhu, 2005a). Chapter 25 in Chapelle, Scholkopf, and Zien (2007) gives a full discussion on the relationship between SSL and transductive learning. In this chapter, SSL refers to the semi-supervised transductive classification problem.
Let x ∈ X denote the input to the decision function (classifier), f, and y ∈ Y denote its output label, that is, f : X → Y. In most cases f(x) = argmaxy∈Yp(y|x).

8 - Large-Scale Learning to Rank Using Boosted Decision Trees
from Part Two - Supervised and Unsupervised Learning Algorithms
- By Krysta M. Svore, Microsoft Research, Redmond, WA, USA, Christopher J. C. Burges, Microsoft Research, Redmond, WA, USA
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 148-169
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The web search ranking task has become increasingly important because of the rapid growth of the internet. With the growth of the web and the number of web search users, the amount of available training data for learning web ranking models has also increased. We investigate the problem of learning to rank on a cluster using web search data composed of 140,000 queries and approximately 14 million URLs. For datasets much larger than this, distributed computing will become essential, because of both speed and memory constraints. We compare a baseline algorithm that has been carefully engineered to allow training on the full dataset using a single machine, in order to evaluate the loss or gain incurred by the distributed algorithms we consider. The underlying algorithm we use is a boosted tree ranking algorithm called LambdaMART, where a split at a given vertex in each decision tree is determined by the split criterion for a particular feature. Our contributions are twofold. First, we implement a method for improving the speed of training when the training data fits in main memory on a single machine by distributing the vertex split computations of the decision trees. The model produced is equivalent to the model produced from centralized training, but achieves faster training times. Second, we develop a training method for the case where the training data size exceeds the main memory of a single machine. Our second approach easily scales to far larger datasets, that is, billions of examples, and is based on data distribution.

An overview of Ciao and its design philosophy
M. V. HERMENEGILDO, F. BUENO, M. CARRO, P. LÓPEZ-GARCÍA, E. MERA, J. F. MORALES, G. PUEBLA
Journal:

Theory and Practice of Logic Programming / Volume 12 / Issue 1-2 / January 2012

Published online by Cambridge University Press:

30 December 2011, pp. 219-252
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We provide an overall description of the Ciao multiparadigm programming system emphasizing some of the novel aspects and motivations behind its design and implementation. An important aspect of Ciao is that, in addition to supporting logic programming (and, in particular, Prolog), it provides the programmer with a large number of useful features from different programming paradigms and styles and that the use of each of these features (including those of Prolog) can be turned on and off at will for each program module. Thus, a given module may be using, e.g., higher order functions and constraints, while another module may be using assignment, predicates, Prolog meta-programming, and concurrency. Furthermore, the language is designed to be extensible in a simple and modular way. Another important aspect of Ciao is its programming environment, which provides a powerful preprocessor (with an associated assertion language) capable of statically finding non-trivial bugs, verifying that programs comply with specifications, and performing many types of optimizations (including automatic parallelization). Such optimizations produce code that is highly competitive with other dynamic languages or, with the (experimental) optimizing compiler, even that of static languages, all while retaining the flexibility and interactive development of a dynamic language. This compilation architecture supports modularity and separate compilation throughout. The environment also includes a powerful autodocumenter and a unit testing framework, both closely integrated with the assertion system. The paper provides an informal overview of the language and program development environment. It aims at illustrating the design philosophy rather than at being exhaustive, which would be impossible in a single journal paper, pointing instead to previous Ciao literature.

Contributors
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp xi-xiv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Contents
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp v-x
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

12 - Large-Scale Spectral Clustering with Map Reduce and MPI
from Part Two - Supervised and Unsupervised Learning Algorithms
- By Wen-Yen Chen, University of California, Yangqiu Song, Tsinghua University, Hongjie Bai, Google Research, Beijing, China, Chih-Jen Lin, National Taiwan University, Edward Y. Chang, Google Research, Beijing, China
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 240-261
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Spectral clustering is a technique for finding group structure in data. It makes use of the spectrum of the data similarity matrix to perform dimensionality reduction for clustering in fewer dimensions. Spectral clustering algorithms have been shown to be more effective in finding clusters than traditional algorithms such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computation time when the size of a dataset is large. To perform clustering on large datasets, in this work, we parallelize both memory use and computation using MapReduce and MPI. Through an empirical study on a document set of 534,135 instances and a photo set of 2,121,863 images, we show that our parallel algorithm can effectively handle large problems.
Clustering is one of the most important subfields of machine learning and data mining tasks. In the last decade, spectral clustering (e.g., Shi and Malik, 2000; Meila and Shi, 2000; Fowlkes et al., 2004), motivated by normalized graph cut, has attracted much attention. Unlike traditional partition-based clustering, spectral clustering exploits a pairwise data similarity matrix. It has been shown to be more effective than traditional methods such as k-means, which considers only the similarity between instances and k centroids (Ng, Jordan, and Weiss, 2001).Because of its effectiveness, spectral clustering has been widely used in several areas such as information retrieval and computer vision (e.g., Dhillon, 2001; Xu, Liu, and Gong, 2003; Shi and Malik, 2000; Yu and Shi, 2003).

6 - PSVM: Parallel Support Vector Machines with Incomplete Cholesky Factorization
from Part Two - Supervised and Unsupervised Learning Algorithms
- By Edward Y. Chang, Google Research, Beijing, China, Hongjie Bai, Google Research, Beijing, China, Kaihua Zhu, Google Research, Beijing, China, Hao Wang, Google Research, Beijing, China, Jian Li, Google Research, Beijing, China, Zhihuan Qiu, Google Research, Beijing, China
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 109-126
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

XSB: Extending Prolog with Tabled Logic Programming
TERRANCE SWIFT, DAVID S. WARREN
Journal:

Theory and Practice of Logic Programming / Volume 12 / Issue 1-2 / January 2012

Published online by Cambridge University Press:

30 December 2011, pp. 157-187
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The paradigm of Tabled Logic Programming (TLP) is now supported by a number of Prolog systems, including XSB, YAP Prolog, B-Prolog, Mercury, ALS, and Ciao. The reasons for this are partly theoretical: tabling ensures termination and optimal known complexity for queries to a large class of programs. However, the overriding reasons are practical. TLP allows sophisticated programs to be written concisely and efficiently, especially when mechanisms such as tabled negation and call and answer subsumption are supported. As a result, TLP has now been used in a variety of applications from program analysis to querying over the semantic web. This paper provides a survey of TLP and its applications as implemented in the XSB Prolog, along with discussion of how XSB supports tabling with dynamically changing code, and in a multi-threaded environment.

9 - The Transform Regression Algorithm
from Part Two - Supervised and Unsupervised Learning Algorithms
- By Ramesh Natarajan, IBM Research, Yorktown Heights, NY, USA, Edwin Pednault, IBM Research, Yorktown Heights, NY, USA
Edited by Ron Bekkerman, Mikhail Bilenko, John Langford
Book:

Scaling up Machine Learning

Published online:

05 February 2012

Print publication:

30 December 2011, pp 170-189
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Massive training datasets, ranging in size from tens of gigabytes to several terabytes, arise in diverse machine learning applications in areas such as text mining of web corpora, multimedia analysis of image and video data, retail modeling of customer transaction data, bioinformatic analysis of genomic and microarray data, medical analysis of clinical diagnostic data such as functional magnetic resonance imaging (fMRI) images, and environmental modeling using sensor and streaming data. Provost and Kolluri (1999) in their overview of machine learning with massive datasets, emphasize the need for developing parallel algorithms and implementations for these applications.
In this chapter, we describe the Transform Regression (TReg) algorithm (Pednault, 2006), which is a general-purpose, non-parametric methodology suitable for a wide variety of regression applications. TReg was originally created for the data mining component of the IBM InfoSphere Warehouse product, guided by a challenging set of requirements:
The modeling time should be comparable to linear regression.
The resulting models should be compact and efficient to apply.
The model quality should be reliable without any further tuning.
The model training and scoring should be parallelized for large datasets stored as partitioned tables in IBM's DB2 database systems.
Requirements 1 and 2 were deemed necessary for a successful commercial algorithm, although this ruled out certain ensemble-based methods that produce highquality models but have high computation and storage requirements. Requirement 3 ensured that the chosen algorithm did not unduly compromise the concomitant model quality in view of requirements 1 and 2.

Computer Science

Refine search

Refine search

Actions for selected content:

48289 results in Computer Science

18 - Large-Scale Learning for Vision with GPUs

Summary

13 - Parallelizing Information-Theoretic Clustering Methods

Summary

Frontmatter

17 - Parallel Large-Scale Feature Selection

Summary

Part Three - Alternative Learning Settings

19 - Large-Scale FPGA-Based Convolutional Networks

Summary

2 - MapReduce and Its Application to Massively Parallel Learning of Decision Tree Ensembles

Summary

5 - Uniformly Fine-Grained Data-Parallel Computing for Machine Learning Algorithms

Summary

4 - IBM Parallel Machine Learning Toolbox

Summary

1 - Scaling Up Machine Learning: Introduction

Summary

14 - Parallel Online Learning

15 - Parallel Graph-Based Semi-Supervised Learning

Summary

8 - Large-Scale Learning to Rank Using Boosted Decision Trees

Summary

An overview of Ciao and its design philosophy

Contributors

Contents

12 - Large-Scale Spectral Clustering with Map Reduce and MPI

Summary

6 - PSVM: Parallel Support Vector Machines with Incomplete Cholesky Factorization

XSB: Extending Prolog with Tabled Logic Programming

9 - The Transform Regression Algorithm

Summary

Computer Science

Refine search

Refine search

Actions for selected content:

Save Search

48289 results in Computer Science

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary