To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter, we describe several common applications (including the ones we touched on before) and multiple possible neural approaches for each. We focus on simple neural approaches that work well and should be familiar to anybody beginning research in natural language processing or interested in deploying robust strategies in industry. In particular, we describe the implementation of the following applications: text classification, part-of-speech tagging, named entity recognition, syntactic dependency parsing, relation extraction, question answering, and machine translation.
The previous chapter introduced feed-forward neural networks and demonstrated that, theoretically, implementing the training procedure for an arbitrary feed-forward neural network is relatively simple. Unfortunately, neural networks trained this way will suffer from several problems such as stability of the training process – that is, slow convergence due to parameters jumping around a good minimum – and overfitting. In this chapter, we will describe several practical solutions that mitigate these problems. In particular, we discuss minibatching, multiple optimization algorithms, other activation and cost functions, regularization, dropout, temporal averaging, and parameter initialization and normalization.
In this chapter we investigate graph distances in preferential attachment models. We focus on typical distances as well as the diameter of preferential attachment models. We again rely on path-counting techniques, as well as local limit results. Since the local limit is a rather involved quantity, some parts of our analysis are considerably harder than those in Chapters 6 and 7.
As mentioned in the previous chapter, the perceptron does not perform smooth updates during training, which may slow down learning, or cause it to miss good solutions entirely in real-world situations. In this chapter, we will discuss logistic regression, a machine learning algorithm that elegantly addresses this problem. We also extend the vanilla logistic regression, which was designed for binary classification, to handle multiclass classification. Through logistic regression, we introduce the concept of cost function (i.e., the function we aim to minimize during training), and gradient descent, the algorithm that implements this minimization procedure.
In this chapter we investigate the distance structure of the configuration model by investigating its typical distances and its diameter. We adapt the path-counting techniques in Chapter 6 to the configuration model, and obtain typical distances from the “giant is almost local” proof. To understand the ultra-small distances for infinite-variance degree configuration models, we investigate the generation growth of infinite-mean branching processes. The relation to branching processes informally leads to the power-iteration technique that allows one to deduce typical distance results in random graphs in a relatively straightforward way.
Sensing is a key requirement for any but the simplest mobile behavior. In order for Robot to be able to warn the crew of Lost in Space that there is danger ahead, it must be able to sense and reason about its sensor responses. Sensing is a critical component of the fundamental tasks of pose estimation – determining where the robot is in its environment; pose maintenance – maintaining an ongoing estimate of the robot’s pose; and map construction – building a representation of the robot’s environment.
In the previous chapters, we have discussed the theory behind the perceptron and logistic regression, including mathematical explanations of how and why they are able to learn from examples. In this chapter, we will transition from math to code. Specifically, we will discuss how to implement these models in the Python programming language. All the code that we will introduce throughout this book is available in this GitHub repository as well: https://github.com/clulab/gentlenlp. To get a better understanding of how these algorithms work under the hood, we will start by implementing them from scratch. However, as the book progresses, we will introduce some of the popular tools and libraries that make Python the language of choice for machine learning – for example, PyTorch, and Hugging Face’s transformers. The code for all the examples in the book is provided in the form of Jupyter notebooks. Fragments of these notebooks will be presented in the implementation chapters so that the reader has the whole picture just by reading the book.
Robotic systems, and in particular mobile robotic systems, are the embodiment of a set of complex computational processes, mechanical systems, sensors, user interface, and communications infrastructure. The problems inherent in integrating these components into a working robot can be very challenging. Overall system control requires an approach that can properly handle the complexity of the system goals while dealing with poorly defined tasks and the existence of unplanned and unexpected events. This task is complicated by the non-standard nature of much robotic equipment. Often the hardware seems to have been built following a philosophy of “ease of design” rather that with an eye toward assisting with later system integration.
The previous chapter was our first exposure to recurrent neural networks, which included intuitions for why they are useful for natural language processing, various architectures, and training algorithms. In this chapter, we will put them to use in order to implement a common sequence modeling task. In particular, we implement a Spanish part-of-speech tagger using a bidirectional long short-term memory and a set of pretrained, static word embeddings. Through this process, we have also introduced several new PyTorch features such as the pad_sequence, pack_padded_sequence, and pad_packed_sequence functions, which allow us to work more e?iciently with variable length sequences for recurrent neural networks.
All the algorithms we covered so far rely on handcrafted features that must be designed and implemented by the machine learning developer. This is problematic for two reasons. First, designing such features can be a complicated endeavor. Second, most words in any language tend to be very infrequent. In our context, this means that most words are very sparse, and our text classification algorithm trained on word-occurrence features may generalize poorly. For example, if the training data for a review classification dataset contains the word great but not the word fantastic, a learning algorithm trained on these data will not be able to properly handle reviews containing the latter word, even though there is a clear semantic similarity between the two. In this chapter, we will begin to addresses this limitation. In particular, we will discuss methods that learn numerical representations of words that capture some semantic knowledge. Under these representations, similar words such as great and fantastic will have similar forms, which will improve the generalization capability of our machine learning algorithms.
In this chapter we investigate the connectivity structure of preferential attachment models. We start by discussing an important tool: exchangeable random variables and their distribution described in de Finetti’s Theorem. We apply these results to Pólya urn schemes, which, in turn, we use to describe the distribution of the degrees in preferential attachment models. It turns out that Pólya urn schemes can also be used to describe the local limit of preferential attachment models. A crucial ingredient is the fact that the edges in the Pólya urn representation are conditionally independent, given the appropriate randomness. The resulting local limit is the Pólya point tree, a specific multi-type branching process with continuous types.
One of the key advantages of transformer networks is the ability to take a model that was pretrained over vast quantities of text and fine-tune it for the task at hand. Intuitively, this strategy allows transformer networks to achieve higher performance on smaller datasets by relying on statistics acquired at scale in an unsupervised way (e.g., through the masked language model training objective). To this end, in this chapter, we will use the Hugging Face library, which has a rich repository of datasets and pretrained models, as well as helper methods and classes that make it easy to target downstream tasks. Using pretrained transformer encoders, we will implement the two tasks that served as use cases in the previous chapters: text classification and part-of-speech tagging.
The ability to navigate purposefully through its environment is fundamental to most animals and to every intelligent organism. In this book we examine the computational issues specific to the creation of machines that move intelligently in their environment. From the earliest modern speculation regarding the creation of autonomous robots, it was recognized that regardless of the mechanisms used to move the robot around or the methods used to sense the environment, the computational principles that govern the robot are of paramount importance. As Powell and Donovan discovered in Isaac Asimov’s story “Runaround,” subtle definitions within the programs that control a robot can lead to significant changes in the robot’s overall behavior or action. Moreover, interactions among multiple complex components can lead to large-scale emergent behaviors that may be hard to predict.