To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This study concerns the development of autonomy in adult learners working on an online learning platform as part of a professional master's degree programme in “French as a Foreign Language”. Our goal was to identify the influence of reflective and collaborative dimensions on the construction of autonomy for online learners in this programme. The material used was 27 self-analysis papers in response to an assignment which asked students to review their distance learning experience (reflective dimension) and to highlight the role of others, if any, in their learning (collaborative dimension). In addition to these two major points, the analysis by category of the body of results shows principally that in qualitative terms, the factors of autonomisation for online learning are interconnected and include: the difficulties related to distance learning and the strategies that learners develop to face those difficulties, the importance of interpersonal relationships in social and emotional terms in overcoming those difficulties, the specific modes of sociability developed for distance learning and the related development of a new type of autonomy that is both individual and collective. The discussion examines the creation, over the course of time, of a new “distance learning culture” that is nonetheless never easy to create and share.
We construct a sequence of finite graphs that weakly converge to a Cayley graph, but there is no labelling of the edges that would converge to the corresponding Cayley diagram. A similar construction is used to give graph sequences that converge to the same limit, and such that a Hamiltonian cycle in one of them has a limit that is not approximable by any subgraph of the other. We give an example where this holds, but convergence is meant in a stronger sense. This is related to whether having a Hamiltonian cycle is a testable graph property.
This paper reports on the task-based interaction of English as a Foreign Language (EFL) learners in the 3D multiuser virtual environment (MUVE) Second Life. The discussion first explores research on the precursors of MUVEs, text-based 2D virtual worlds known as MOOs. This is followed by an examination of studies on the use of MUVEs in Computer Assisted Language Learning (CALL). The discussion then focuses on an investigation of the Second Life-based text chat of learners located at a university in Japan. Data analysis reveals that the environment, and tasks, elicited types of collaborative interaction hypothesized as beneficial in the sociocultural account of language development. Collaborative interaction identified in the data involved peer-scaffolding focusing on lexis, and correction. The data further showed that the participants actively maintained a supportive atmosphere through the provision of utterances designed to signal interest, and the extensive use of positive politeness. These factors facilitated social cohesion, intersubjectivity, and the consistent production of coherent target language output focused on the tasks. Participant feedback was broadly positive, and indicates that specific features of Second Life such as individual avatars, coupled to the computer-based nature of the interaction, appeared to enhance discourse management, engagement, and participation. The findings suggest that Second Life provides an arena for learner centered social interaction that offers valuable opportunities for target language practice, and the development of autonomy. Areas of potential for future research are identified.
Our aim in this paper is to identify the limit behavior of the solutions of random degenerate equations of the form −div Aε(x′,∇Uε)+ρεω(x′)Uε=F with mixed boundary conditions on Ωε whenever ε→0, where Ωε is an N-dimensional thin domain with a small thickness h(ε), ρεω(x′)=ρω(x′/ε), where ρω is the realization of a random function ρ(ω) , and Aε(x′,ξ)=a(Tx′ /εω,ξ) , the map a(ω,ξ)being measurable in ω and satisfying degenerated structure conditions with weight ρ in ξ. As usual in dimension reduction problems, we focus on the rescaled equations and we prove that under the condition h(ε)/ε→0 , the sequence of solutions of them converges to a limit u0, where u0 is the solution of an (N−1) -dimensional limit problem with homogenized and auxiliary equations.
Any amicable pair ϕ, ψ of Sturmian morphisms enables aconstruction of a ternary morphism η which preserves the set of infinitewords coding 3-interval exchange. We determine the number of amicable pairs with the sameincidence matrix in SL±(2,ℕ) and we study incidence matricesassociated with the corresponding ternary morphisms η.
from
Part One
-
Frameworks for Scaling Up Machine Learning
By
Mihai Budiu, Microsoft Research, Mountain View, CA, USA,
Dennis Fetterly, Microsoft Research, Mountain View, CA, USA,
Michael Isard, Microsoft Research, Mountain View, CA, USA,
Frank McSherry, Microsoft Research, Mountain View, CA, USA,
Yuan Yu, Microsoft Research, Mountain View, CA, USA
This chapter describes DryadLINQ, a general-purpose system for large-scale data-parallel computing, and illustrates its use on a number of machine learning problems.
The main motivation behind the development of DryadLINQ was to make it easier for nonspecialists to write general-purpose, scalable programs that can operate on very large input datasets. In order to appeal to nonspecialists, we designed the programming interface to use a high level of abstraction that insulates the programmer from most of the detail and complexity of parallel and distributed execution. In order to support general-purpose computing, we embedded these high-level abstractions in .NET, giving developers access to full-featured programming languages with rich type systems and proven mechanisms (such as classes and libraries) for managing complex, long-lived, and geographically distributed software projects. In order to support scalability over very large data and compute clusters, the DryadLINQ compiler generates code for the Dryad runtime, a well-tested and highly efficient distributed execution engine.
As machine learning moves into the industrial mainstream and operates over diverse data types including documents, images, and graphs, it is increasingly appealing to move away from domain-specific languages like MATLAB and toward general-purpose languages that support rich types and standardized libraries. The examples in this chapter demonstrate that a general-purpose language such as C# supports effective, concise implementations of standard machine learning algorithms and that DryadLINQ efficiently scales these implementations to operate over hundreds of computers and very large datasets primarily limited by disk capacity.
Open Answer Set Programming (OASP) is an undecidable framework for integrating ontologies and rules. Although several decidable fragments of OASP have been identified, few reasoning procedures exist. In this paper, we provide a sound, complete, and terminating algorithm for satisfiability checking w.r.t. Forest Logic Programs (FoLPs), a fragment of OASP where rules have a tree shape and allow for inequality atoms and constants. The algorithm establishes a decidability result for FoLPs. Although believed to be decidable, so far only the decidability for two small subsets of FoLPs, local FoLPs and acyclic FoLPs, has been shown. We further introduce f-hybrid knowledge bases, a hybrid framework where knowledge bases and FoLPs coexist, and we show that reasoning with such knowledge bases can be reduced to reasoning with FoLPs only. We note that f-hybrid knowledge bases do not require the usual (weakly) DL-safety of the rule component, thus providing a genuine alternative approach to current integration approaches of ontologies and rules.
from
Part Two
-
Supervised and Unsupervised Learning Algorithms
By
Joseph Gonzalez, Carnegie Mellon University, Pittsburgh, PA, USA,
Yucheng Low, Carnegie Mellon University, Pittsburgh, PA, USA,
Carlos Guestrin, Carnegie Mellon University, Pittsburgh, PA, USA
Probabilistic graphical models are used in a wide range of machine learning applications. From reasoning about protein interactions (Jaimovich et al., 2006) to stereo vision (Sun, Shum, and Zheng, 2002), graphical models have facilitated the application of probabilistic methods to challenging machine learning problems. A core operation in probabilistic graphical models is inference – the process of computing the probability of an event given particular observations. Although inference is NP-complete in general, there are several popular approximate inference algorithms that typically perform well in practice. Unfortunately, the approximate inference algorithms are still computationally intensive and therefore can benefit from parallelization. In this chapter, we parallelize loopy belief propagation (or loopy BP in short), which is used in a wide range of ML applications (Jaimovich et al., 2006; Sun et al., 2002; Lan et al., 2006; Baron, Sarvotham, and Baraniuk, 2010; Singla and Domingos, 2008).
We begin by briefly reviewing the sequential BP algorithm as well as the necessary background in probabilistic graphical models. We then present a collection of parallel shared memory BP algorithms that demonstrate the importance of scheduling in parallel BP. Next, we develop the Splash BP algorithm, which combines new scheduling ideas to address the limitations of existing sequential BP algorithms and achieve theoretically optimal parallel performance. Finally, we present how to efficiently implement loopy BP algorithms in the distributed parallel setting by addressing the challenges of distributed state and load balancing.
In this chapter, we address distributed learning algorithms for statistical latent variable models, with a focus on topic models. Many high-dimensional datasets, such as text corpora and image databases, are too large to allow one to learn topic models on a single computer. Moreover, a growing number of applications require that inference be fast or in real time, motivating the exploration of parallel and distributed learning algorithms.
We begin by reviewing topic models such as Latent Dirichlet Allocation and Hierarchical Dirichlet Processes. We discuss parallel and distributed algorithms for learning these models and show that these algorithms can achieve substantial speedups without sacrificing model quality. Next we discuss practical guidelines for running our algorithms within various parallel computing frameworks and highlight complementary speedup techniques. Finally, we generalize our distributed approach to handle Bayesian networks.
Several of the results in this chapter have appeared in previous papers in the specific context of topic modeling. The goal of this chapter is to present a comprehensive overview of distributed inference algorithms and to extend the general ideas to a broader class of Bayesian networks.
Latent Variable Models
Latent variable models are a class of statistical models that explain observed data with latent (or hidden) variables. Topic models and hidden Markov models are two examples of such models, where the latent variables are the topic assignment variables and the hidden states, respectively. Given observed data, the goal is to perform Bayesian inference over the latent variables and use the learned model to make inferences or predictions.
Automatic speech recognition (ASR) allows multimedia contents to be transcribed from acoustic waveforms into word sequences. It is an exemplar of a class of machine learning applications where increasing compute capability is enabling new industries such as automatic speech analytics. Speech analytics help customer service call centers search through recorded content, track service quality, and provide early detection of service issues. Fast and efficient ASR enables economic employment of a plethora of text-based data analytics on multimedia contents, opening the door to many possibilities.
In this chapter, we describe our approach for scalable parallelization of the most challenging component of ASR: the speech inference engine. This component takes a sequence of audio features extracted from a speech waveform as input, compares them iteratively to a speech model, and produces the most likely interpretation of the speech waveform as a word sequence. The speech model is a database of acoustic characteristics, word pronunciations, and phrases from a particular language. Speech models for natural languages are represented with large irregular graphs consisting of millions of states and arcs. Referencing these models involves accessing an unpredictable data working set guided by “what was said” in the speech input. The inference process is highly challenging to parallelize efficiently.
We demonstrate that parallelizing an application is much more than recoding the program in another language. It requires careful consideration of data, task, and runtime concerns to successfully exploit the full parallelization potential of an application.
This book attempts to aggregate state-of-the-art research in parallel and distributed machine learning. We believe that parallelization provides a key pathway for scaling up machine learning to large datasets and complex methods. Although large-scale machine learning has been increasingly popular in both industrial and academic research communities, there has been no singular resource covering the variety of approaches recently proposed. We did our best to assemble the most representative contemporary studies in one volume. While each contributed chapter concentrates on a distinct approach and problem, together with their references they provide a comprehensive view of the field.
We believe that the book will be useful to the broad audience of researchers, practitioners, and anyone who wants to grasp the future of machine learning. To smooth the ramp-up for beginners, the first five chapters provide introductory material on machine learning algorithms and parallel computing platforms. Although the book gets deeply technical in some parts, the reader is assumed to have only basic prior knowledge of machine learning and parallel/distributed computing, along with college-level mathematical maturity. We hope that an engineering undergraduate who is familiar with the notion of a classifier and had some exposure to threads, MPI, or MapReduce will be able to understand the majority of the book's content. We also hope that a seasoned expert will find this book full of new, interesting ideas to inspire future research in the area.
Mining frequent subtrees in a database of rooted and labeled trees is an important problem in many domains, ranging from phylogenetic analysis to biochemistry and from linguistic parsing to XML data analysis. In this work, we revisit this problem and develop an architecture-conscious solution targeting emerging multicore systems. Specifically, we identify a sequence of memory-related optimizations that significantly improve the spatial and temporal locality of a state-of-the-art sequential algorithm – alleviating the effects of memory latency. Additionally, these optimizations are shown to reduce the pressure on the front-side bus, an important consideration in the context of large-scale multicore architectures. We then demonstrate that these optimizations, although necessary, are not sufficient for efficient parallelization on multicores, primarily because of parametric and data-driven factors that make load balancing a significant challenge. To address this challenge, we present a methodology that adaptively and automatically modulates the type and granularity of the work being shared among different cores. The resulting algorithm achieves near perfect parallel efficiency on up to 16 processors on challenging real-world applications. The optimizations we present have general-purpose utility, and a key outcome is the development of a generalpurpose scheduling service for moldable task scheduling on emerging multicore systems.
The field of knowledge discovery is concerned with extracting actionable knowledge from data efficiently. Although most of the early work in this field focused on mining simple transactional datasets, recently there has been a significant shift toward analyzing data with complex structure such as trees and graphs.
Computer vision is a challenging application area for learning algorithms. For instance, the task of object detection is a critical problem for many systems, like mobile robots, that remains largely unsolved. In order to interact with the world, robots must be able to locate and recognize large numbers of objects accurately and at reasonable speeds. Unfortunately, off-the-shelf computer vision algorithms do not yet achieve sufficiently high detection performance for these applications. A key difficulty with many existing algorithms is that they are unable to take advantage of large numbers of examples. As a result, they must rely heavily on prior knowledge and hand-engineered features that account for the many kinds of errors that can occur. In this chapter, we present two methods for improving performance by scaling up learning algorithms to large datasets: (1) using graphics processing units (GPUs) and distributed systems to scale up the standard components of computer vision algorithms and (2) using GPUs to automatically learn high-quality feature representations using deep belief networks (DBNs). These methods are capable of not only achieving high performance but also removing much of the need for hand-engineering common in computer vision algorithms.
The fragility of many vision algorithms comes from their lack of knowledge about the multitude of visual phenomena that occur in the real world. Whereas humans can intuit information about depth, occlusion, lighting, and even motion from still images, computer vision algorithms generally lack the ability to deal with these phenomena without being engineered to account for them in advance.
Facing a problem of clustering amultimillion-data-point collection, amachine learning practitioner may choose to apply the simplest clustering method possible, because it is hard to believe that fancier methods can be applicable to datasets of such scale. Whoever is about to adopt this approach should first weigh the following considerations:
Simple clustering methods are rarely effective. Indeed, four decades of research would not have been spent on data clustering if a simple method could solve the problem. Moreover, even the simplest methods may run for long hours on a modern PC, given a large-scale dataset. For example, consider a simple online clustering algorithm (which, we believe, is machine learning folklore): first initialize k clusters with one data point per cluster, then iteratively assign the rest of data points into their closest clusters (in the Euclidean space). If k is small enough, we can run this algorithm on one machine, because it is unnecessary to keep the entire data in RAM. However, besides being slow, it will produce low-quality results, especially when the data is highly multi-dimensional.
State-of-the-art clustering methods can scale well, which we aim to justify in this chapter.
With the deployment of large computational facilities (such as Amazon.com's EC2, IBM's BlueGene, and HP's XC), the Parallel Computing paradigm is probably the only currently available option for tackling gigantic data processing tasks. Parallel methods are becoming an integral part of any data processing system, and thus getting special attention (e.g., universities introduce parallel methods to their core curricula; see Johnson et al., 2008).