Deep Learning for Natural Language Processing

7 - Implementing Text Classification with Feed-Forward Networks
Mihai Surdeanu, University of Arizona, Marco Antonio Valenzuela-Escárcega, University of Arizona
Book:

Deep Learning for Natural Language Processing

Published online:

01 February 2024

Print publication:

08 February 2024, pp 107-116
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we provide an implementation of the multilayer neural network described in Chapter 5, along with several of the best practices discussed in Chapter 6. Still keeping things fairly simple, our network will consist of two fully connected layers: a hidden layer and an output layer. Between these layers, we will include dropout and a nonlinearity. Further, we make use of two PyTorch classes: a Dataset and a DataLoader. The advantage of using these classes is that they make several things easy, including data shuffling and batching. Last, since the classifier’s architecture has become more complex, for optimization we transition from stochastic gradient descent to the Adam optimizer to take advantage of its additional features such as momentum and L2 regularization.

1 - Introduction
Mihai Surdeanu, University of Arizona, Marco Antonio Valenzuela-Escárcega, University of Arizona
Book:

Deep Learning for Natural Language Processing

Published online:

01 February 2024

Print publication:

08 February 2024, pp 1-7
- Chapter
- - You have access
- PDF
- Export citation
Summary

This chapter motivates the need for a book that covers both theoretical and practical aspects of deep learning for natural language processing. We summarize the content of the book, as well as aspects that are not within scope, and current limitations of deep learning in general.

14 - Encoder-Decoder Methods
Mihai Surdeanu, University of Arizona, Marco Antonio Valenzuela-Escárcega, University of Arizona
Book:

Deep Learning for Natural Language Processing

Published online:

01 February 2024

Print publication:

08 February 2024, pp 216-228
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In Chapters 10 and 12, we focused on two common usages of recurrent neural networks and transformer networks: acceptors and transducers. In this chapter, we discuss a third architecture for both recurrent neural networks and transformer networks: encoder-decoder methods. We introduce three encoder-decoder architectures, which enable important NLP applications such as machine translation. In particular, we discuss the sequence-to-sequence method of Sutskever et al. (2014), which couples an encoder long short-term memory with a decoder long short-term memory. We follow this method with the approach of Bahdanau et al. (2015), which extends the previous decoder with an attention component, which produces a different encoding of the source text for each decoded word. Last, we introduce the complete encoder-decoder transformer network, which relies on three attention mechanisms: one within the encoder (which we discussed in Chapter 12), a similar one that operates over decoded words, and, importantly, an attention component that connects the input words with the decoded ones.

12 - Contextualized Embeddings and Transformer Networks
Mihai Surdeanu, University of Arizona, Marco Antonio Valenzuela-Escárcega, University of Arizona
Book:

Deep Learning for Natural Language Processing

Published online:

01 February 2024

Print publication:

08 February 2024, pp 178-193
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

As mentioned in Chapter 8, the distributional similarity algorithms discussed there conflate all senses of a word into a single numerical representation (or embedding). For example, the word bank receives a single representation, regardless of its financial (e.g., as in the bank gives out loans) or geological (e.g., bank of the river) sense. This chapter introduces a solution for this limitation in the form of a new neural architecture called transformer networks, which learns contextualized embeddings of words, which, as the name indicates, change depending on the context in which the words appear. That is, the word bank receives a different numerical representation for each of its instances in the two texts above because the contexts in which they occur are different. We also discuss several architectural choices that enabled the tremendous success of transformer networks: self attention, multiple heads, stacking of multiple layers, and subword tokenization, as well as how transformers can be pretrained on large amounts of data through through masked language modeling and next-sentence prediction.

10 - Recurrent Neural Networks
Mihai Surdeanu, University of Arizona, Marco Antonio Valenzuela-Escárcega, University of Arizona
Book:

Deep Learning for Natural Language Processing

Published online:

01 February 2024

Print publication:

08 February 2024, pp 147-164
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Up to this point, we have only discussed neural approaches for text classification (e.g., review and news classification) that handle the text as a bag of words. That is, we aggregate the words either by representing them as explicit features in a feature vector or by averaging their numerical representations (i.e., embeddings). Although this strategy completely ignores the order in which words occur in a sentence, it has been repeatedly shown to be a good solution for many practical natural language processing applications that are driven by text classification. Nevertheless, for many natural language processing tasks such as part-of-speech tagging, we need to capture the word-order information more explicitly. Sequence models capture exactly this scenario, where classification decisions must be made using not only the current information but also the context in which it appears. In particular, we discuss several types of recurrent neural networks, including stacked (or deep) recurrent neural networks, bidirectional recurrent neural networks, and long short-term memory networks. Last, we introduced conditional random fields, which extend recurrent neural networks with an extra layer that explicitly models transition probabilities between two cells.

Deep Learning for Natural Language Processing

A Gentle Introduction

Refine listing

Refine listing

Actions for selected content:

25 results in Deep Learning for Natural Language Processing

7 - Implementing Text Classification with Feed-Forward Networks

Summary

1 - Introduction

Summary

14 - Encoder-Decoder Methods

Summary

12 - Contextualized Embeddings and Transformer Networks

Summary

10 - Recurrent Neural Networks

Summary

Deep Learning for Natural Language Processing

A Gentle Introduction

Refine listing

Refine listing

Actions for selected content:

Save Search

25 results in Deep Learning for Natural Language Processing

7 - Implementing Text Classification with Feed-Forward Networks

Summary

1 - Introduction

Summary

14 - Encoder-Decoder Methods

Summary

12 - Contextualized Embeddings and Transformer Networks

Summary

10 - Recurrent Neural Networks

Summary