To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Since the first WLAN-positioning system was introduced in 2000 [4], rapid advances in signal processing methods have been made in this area. A decade later, fundamental positioning techniques have matured significantly, allowing these systems to offer highly accurate positioning estimates with accuracies on the order of several meters in both indoor and outdoor environments.
The previous chapters have reviewed the fundamental techniques used in WLAN positioning. In this chapter, we look ahead to opportunities and challenges remaining to be addressed in this area.
Highlights
The focus of this book has been WLAN-based positioning. Chapters 1 to 4 discussed the history, applications, and various positioning systems to motivate these systems. In particular, the development of these systems is motivated by the need for accurate, reliable, and cost-efficient positioning solutions to enable the delivery of location-based services (LBS).
The second part of the book was dedicated to fundamental signal processing concepts in these systems. In Chapter 5, we saw that the unpredictability of radio signal features poses a significant challenge to the development of accurate and reliable WLAN systems. In Chapter 6, we discussed a number of non-parametric techniques that can be used to model these radio signals using training samples collected at a set of anchor points with known locations. In addition to their effectiveness as modeling tools, these nonparametric techniques also allowed the estimation of a measure of uncertainty associated with position estimates.
In the previous chapters, we discussed the history and application of modern positioning systems that enable the delivery of location-based services (LBS). In this chapter, we shift our attention to the fundamental positioning principles used in these systems. We begin this chapter by presenting the location stack, a model of location-aware systems, and identify the focus of this book (Section 3.1). We then proceed to discuss the most commonly used techniques for computing the position of mobile receivers. Similar to the techniques used in celestial navigation, modern positioning systems often employ a set of references with known locations for position computation. In this chapter, we discuss different positioning methods, differentiated by the type of references and signal measurements used (Sections 3.2 to 3.4). In addition to these techniques, which generally employ wireless signals, we will also briefly review dead reckoning (Section 3.6) and computer-based positioning (Section 3.7) methods. These two techniques employ modalities complementary to wireless measurements and as such provide a promising direction of development for hybrid positioning systems that employ multiple measurements to improve the accuracy and reliability of positioning. Finally, we conclude the chapter by discussing the advantages and disadvantages of each positioning method (Section 3.8).
The location stack
To position this book within the wealth of information available on positioning systems used in LBS, we review a model of location-aware systems proposed by Hightower et al. [35].
For thousands of years, the ability to explore the world has significantly impacted human civilization. Human explorations have enabled the interaction of cultures for the purposes of geographic expansion (for example, through war and colonization) and economic development through trade. These interactions have also played a pivotal role in an exchange of knowledge that has supported the advancement of science, the development of religion, and the flourishing of the arts throughout the world.
World exploration is largely enabled by the ability to control the movement of a vessel from one position to another. This process, known as navigation, requires the knowledge of the locations of the source and destination points. The process of determining the location of points in space is known as positioning. In this book, we use the terms location and position interchangeably to refer to the point in physical space occupied by a person or object.
Throughout history, various positioning methods have been developed including methods using the relation of a point to various reference points such as celestial bodies and the Earth's magnetic pole. More recently, the advent of wireless communications has led to the development of a number of additional positioning systems that enable not only navigation, but also the delivery of additional value-added services. The focus of this book is one such positioning method that employs wireless local area signals to determine the location of wireless devices.
Traditionally, the application scope of positioning systems was limited to target tracking and navigation in civilian and military applications. This has changed in past decades with the advent of mobile computing. In particular, the maturation of wireless communication and advances in microelectronics have given birth to mobile computing devices, such as laptops and smart phones, which are equipped with sensing and computing capabilities. The mobility of these computing devices in wireless networks means that users' communication, resource, and information needs now change with their physical location. More specifically, location information is now part of the context in which users access and consume wireless services. This, together with the availability of positioning information (for example, through the Global Positioning System), both necessitated and enabled the development of services that cater to the changing needs of mobile users [34]. This need has sparked a new generation of applications for positioning known as location-based services (LBS) or location-aware systems. Formally, LBS have been defined in many ways [24, 50, 80]. In this book, the term LBS is used to indicate services that use the position of a user to add value to a service [50].
In this chapter, we will discuss the economical and ethical implications of LBS. We begin with an assessment of the market potential for these services (Section 2.1). This is followed by a discussion of application areas where LBS services can be employed (Section 2.2). Finally, we discuss the ethical implications of LBS (Section 2.3).
from
Part One
-
Frameworks for Scaling Up Machine Learning
By
Mihai Budiu, Microsoft Research, Mountain View, CA, USA,
Dennis Fetterly, Microsoft Research, Mountain View, CA, USA,
Michael Isard, Microsoft Research, Mountain View, CA, USA,
Frank McSherry, Microsoft Research, Mountain View, CA, USA,
Yuan Yu, Microsoft Research, Mountain View, CA, USA
This chapter describes DryadLINQ, a general-purpose system for large-scale data-parallel computing, and illustrates its use on a number of machine learning problems.
The main motivation behind the development of DryadLINQ was to make it easier for nonspecialists to write general-purpose, scalable programs that can operate on very large input datasets. In order to appeal to nonspecialists, we designed the programming interface to use a high level of abstraction that insulates the programmer from most of the detail and complexity of parallel and distributed execution. In order to support general-purpose computing, we embedded these high-level abstractions in .NET, giving developers access to full-featured programming languages with rich type systems and proven mechanisms (such as classes and libraries) for managing complex, long-lived, and geographically distributed software projects. In order to support scalability over very large data and compute clusters, the DryadLINQ compiler generates code for the Dryad runtime, a well-tested and highly efficient distributed execution engine.
As machine learning moves into the industrial mainstream and operates over diverse data types including documents, images, and graphs, it is increasingly appealing to move away from domain-specific languages like MATLAB and toward general-purpose languages that support rich types and standardized libraries. The examples in this chapter demonstrate that a general-purpose language such as C# supports effective, concise implementations of standard machine learning algorithms and that DryadLINQ efficiently scales these implementations to operate over hundreds of computers and very large datasets primarily limited by disk capacity.
from
Part Two
-
Supervised and Unsupervised Learning Algorithms
By
Joseph Gonzalez, Carnegie Mellon University, Pittsburgh, PA, USA,
Yucheng Low, Carnegie Mellon University, Pittsburgh, PA, USA,
Carlos Guestrin, Carnegie Mellon University, Pittsburgh, PA, USA
Probabilistic graphical models are used in a wide range of machine learning applications. From reasoning about protein interactions (Jaimovich et al., 2006) to stereo vision (Sun, Shum, and Zheng, 2002), graphical models have facilitated the application of probabilistic methods to challenging machine learning problems. A core operation in probabilistic graphical models is inference – the process of computing the probability of an event given particular observations. Although inference is NP-complete in general, there are several popular approximate inference algorithms that typically perform well in practice. Unfortunately, the approximate inference algorithms are still computationally intensive and therefore can benefit from parallelization. In this chapter, we parallelize loopy belief propagation (or loopy BP in short), which is used in a wide range of ML applications (Jaimovich et al., 2006; Sun et al., 2002; Lan et al., 2006; Baron, Sarvotham, and Baraniuk, 2010; Singla and Domingos, 2008).
We begin by briefly reviewing the sequential BP algorithm as well as the necessary background in probabilistic graphical models. We then present a collection of parallel shared memory BP algorithms that demonstrate the importance of scheduling in parallel BP. Next, we develop the Splash BP algorithm, which combines new scheduling ideas to address the limitations of existing sequential BP algorithms and achieve theoretically optimal parallel performance. Finally, we present how to efficiently implement loopy BP algorithms in the distributed parallel setting by addressing the challenges of distributed state and load balancing.
In this chapter, we address distributed learning algorithms for statistical latent variable models, with a focus on topic models. Many high-dimensional datasets, such as text corpora and image databases, are too large to allow one to learn topic models on a single computer. Moreover, a growing number of applications require that inference be fast or in real time, motivating the exploration of parallel and distributed learning algorithms.
We begin by reviewing topic models such as Latent Dirichlet Allocation and Hierarchical Dirichlet Processes. We discuss parallel and distributed algorithms for learning these models and show that these algorithms can achieve substantial speedups without sacrificing model quality. Next we discuss practical guidelines for running our algorithms within various parallel computing frameworks and highlight complementary speedup techniques. Finally, we generalize our distributed approach to handle Bayesian networks.
Several of the results in this chapter have appeared in previous papers in the specific context of topic modeling. The goal of this chapter is to present a comprehensive overview of distributed inference algorithms and to extend the general ideas to a broader class of Bayesian networks.
Latent Variable Models
Latent variable models are a class of statistical models that explain observed data with latent (or hidden) variables. Topic models and hidden Markov models are two examples of such models, where the latent variables are the topic assignment variables and the hidden states, respectively. Given observed data, the goal is to perform Bayesian inference over the latent variables and use the learned model to make inferences or predictions.
Automatic speech recognition (ASR) allows multimedia contents to be transcribed from acoustic waveforms into word sequences. It is an exemplar of a class of machine learning applications where increasing compute capability is enabling new industries such as automatic speech analytics. Speech analytics help customer service call centers search through recorded content, track service quality, and provide early detection of service issues. Fast and efficient ASR enables economic employment of a plethora of text-based data analytics on multimedia contents, opening the door to many possibilities.
In this chapter, we describe our approach for scalable parallelization of the most challenging component of ASR: the speech inference engine. This component takes a sequence of audio features extracted from a speech waveform as input, compares them iteratively to a speech model, and produces the most likely interpretation of the speech waveform as a word sequence. The speech model is a database of acoustic characteristics, word pronunciations, and phrases from a particular language. Speech models for natural languages are represented with large irregular graphs consisting of millions of states and arcs. Referencing these models involves accessing an unpredictable data working set guided by “what was said” in the speech input. The inference process is highly challenging to parallelize efficiently.
We demonstrate that parallelizing an application is much more than recoding the program in another language. It requires careful consideration of data, task, and runtime concerns to successfully exploit the full parallelization potential of an application.
This book attempts to aggregate state-of-the-art research in parallel and distributed machine learning. We believe that parallelization provides a key pathway for scaling up machine learning to large datasets and complex methods. Although large-scale machine learning has been increasingly popular in both industrial and academic research communities, there has been no singular resource covering the variety of approaches recently proposed. We did our best to assemble the most representative contemporary studies in one volume. While each contributed chapter concentrates on a distinct approach and problem, together with their references they provide a comprehensive view of the field.
We believe that the book will be useful to the broad audience of researchers, practitioners, and anyone who wants to grasp the future of machine learning. To smooth the ramp-up for beginners, the first five chapters provide introductory material on machine learning algorithms and parallel computing platforms. Although the book gets deeply technical in some parts, the reader is assumed to have only basic prior knowledge of machine learning and parallel/distributed computing, along with college-level mathematical maturity. We hope that an engineering undergraduate who is familiar with the notion of a classifier and had some exposure to threads, MPI, or MapReduce will be able to understand the majority of the book's content. We also hope that a seasoned expert will find this book full of new, interesting ideas to inspire future research in the area.
Mining frequent subtrees in a database of rooted and labeled trees is an important problem in many domains, ranging from phylogenetic analysis to biochemistry and from linguistic parsing to XML data analysis. In this work, we revisit this problem and develop an architecture-conscious solution targeting emerging multicore systems. Specifically, we identify a sequence of memory-related optimizations that significantly improve the spatial and temporal locality of a state-of-the-art sequential algorithm – alleviating the effects of memory latency. Additionally, these optimizations are shown to reduce the pressure on the front-side bus, an important consideration in the context of large-scale multicore architectures. We then demonstrate that these optimizations, although necessary, are not sufficient for efficient parallelization on multicores, primarily because of parametric and data-driven factors that make load balancing a significant challenge. To address this challenge, we present a methodology that adaptively and automatically modulates the type and granularity of the work being shared among different cores. The resulting algorithm achieves near perfect parallel efficiency on up to 16 processors on challenging real-world applications. The optimizations we present have general-purpose utility, and a key outcome is the development of a generalpurpose scheduling service for moldable task scheduling on emerging multicore systems.
The field of knowledge discovery is concerned with extracting actionable knowledge from data efficiently. Although most of the early work in this field focused on mining simple transactional datasets, recently there has been a significant shift toward analyzing data with complex structure such as trees and graphs.
Computer vision is a challenging application area for learning algorithms. For instance, the task of object detection is a critical problem for many systems, like mobile robots, that remains largely unsolved. In order to interact with the world, robots must be able to locate and recognize large numbers of objects accurately and at reasonable speeds. Unfortunately, off-the-shelf computer vision algorithms do not yet achieve sufficiently high detection performance for these applications. A key difficulty with many existing algorithms is that they are unable to take advantage of large numbers of examples. As a result, they must rely heavily on prior knowledge and hand-engineered features that account for the many kinds of errors that can occur. In this chapter, we present two methods for improving performance by scaling up learning algorithms to large datasets: (1) using graphics processing units (GPUs) and distributed systems to scale up the standard components of computer vision algorithms and (2) using GPUs to automatically learn high-quality feature representations using deep belief networks (DBNs). These methods are capable of not only achieving high performance but also removing much of the need for hand-engineering common in computer vision algorithms.
The fragility of many vision algorithms comes from their lack of knowledge about the multitude of visual phenomena that occur in the real world. Whereas humans can intuit information about depth, occlusion, lighting, and even motion from still images, computer vision algorithms generally lack the ability to deal with these phenomena without being engineered to account for them in advance.
Facing a problem of clustering amultimillion-data-point collection, amachine learning practitioner may choose to apply the simplest clustering method possible, because it is hard to believe that fancier methods can be applicable to datasets of such scale. Whoever is about to adopt this approach should first weigh the following considerations:
Simple clustering methods are rarely effective. Indeed, four decades of research would not have been spent on data clustering if a simple method could solve the problem. Moreover, even the simplest methods may run for long hours on a modern PC, given a large-scale dataset. For example, consider a simple online clustering algorithm (which, we believe, is machine learning folklore): first initialize k clusters with one data point per cluster, then iteratively assign the rest of data points into their closest clusters (in the Euclidean space). If k is small enough, we can run this algorithm on one machine, because it is unnecessary to keep the entire data in RAM. However, besides being slow, it will produce low-quality results, especially when the data is highly multi-dimensional.
State-of-the-art clustering methods can scale well, which we aim to justify in this chapter.
With the deployment of large computational facilities (such as Amazon.com's EC2, IBM's BlueGene, and HP's XC), the Parallel Computing paradigm is probably the only currently available option for tackling gigantic data processing tasks. Parallel methods are becoming an integral part of any data processing system, and thus getting special attention (e.g., universities introduce parallel methods to their core curricula; see Johnson et al., 2008).
The set of features used by a learning algorithm can have a dramatic impact on the performance of the algorithm. Including extraneous features can make the learning problem more difficult by adding useless, noisy dimensions that lead to over-fitting and increased computational complexity. Conversely, excluding useful features can deprive the model of important signals. The problem of feature selection is to find a subset of features that allows the learning algorithm to learn the “best” model in terms of measures such as accuracy or model simplicity.
The problem of feature selection continues to grow in both importance and difficulty as extremely high-dimensional datasets become the standard in real-world machine learning tasks. Scalability can become a problem for even simple approaches. For example, common feature selection approaches that evaluate each new feature by training a new model containing that feature require learning a linear number of models each time they add a new feature. This computational cost can add up quickly when we iteratively add many new features. Even those techniques that use relatively computationally inexpensive tests of a feature's value, such as mutual information, require at least linear time in the number of features being evaluated.
As a simple illustrative example, consider the task of classifying websites. In this case, the dataset could easily contain many millions of examples. Including very basic features such as text unigrams on the page or HTML tags could easily provide many thousands of potential features for the model.