Search

21 - Large Language Models
from Part VII - Large Language Models
Ruye Wang, Harvey Mudd College, California
Book:

Introduction to Machine Learning

Published online:

05 February 2026

Print publication:

18 December 2025, pp 523-553
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter offers a comprehensive overview of large language models (LLMs), examining their theoretical foundations, core mechanisms, and broad-ranging implications. We begin by situating LLMs within the domain of natural language processing (NLP), tracing the evolution of language modeling from early statistical approaches to modern deep learning methods.The focus then shifts to the transformative impact of the Transformer architecture, introduced in the seminal paper Attention Is All You Need. By leveraging self-attention and parallel computation, Transformers have enabled unprecedented scalability and efficiency in training large models.We explore the pivotal role of transfer learning in NLP, emphasizing how pretraining on large text corpora followed by task-specific fine-tuning allows LLMs to generalize across a wide range of linguistic tasks. The chapter also discusses reinforcement learning with human feedback (RLHF)—a crucial technique for refining model outputs to better align with human preferences and values.Key theoretical developments are introduced, including scaling laws, which describe how model performance improves predictably with increased data, parameters, and compute resources, and emergence, the surprising appearance of complex behaviors in sufficiently large models.Beyond technical aspects, the chapter engages with deeper conceptual questions: Do LLMs genuinely "understand" language? Could advanced AI systems one day exhibit a form of consciousness, however rudimentary or speculative? These discussions draw from perspectives in cognitive science, philosophy of mind, and AI safety.Finally, we explore future directions in the field, including the application of Transformer architectures beyond NLP, and the development of generative methods that extend beyond Transformer-based models, signaling a dynamic and rapidly evolving landscape in artificial intelligence.

15 - Implementing Encoder-Decoder Methods
Mihai Surdeanu, University of Arizona, Marco Antonio Valenzuela-Escárcega, University of Arizona
Book:

Deep Learning for Natural Language Processing

Published online:

01 February 2024

Print publication:

08 February 2024, pp 229-245
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we implement a machine translation application as an example of an encoder-decoder task. In particular, we build on pretrained encoder-decoder transformer models, which exist in the Hugging Face library for a wide variety of language pairs. We first show how to use one of these models out-of-the-box to perform translation for one of the language pairs it has been exposed to during pretraining: English to Romanian. Afterward, we fine-tune the model to a new language combination that is has not seen before: Romanian to English. In both use cases, we use the T5 encoder-decoder model, which has been pretrained for several tasks, including machine translation.

14 - Encoder-Decoder Methods
Mihai Surdeanu, University of Arizona, Marco Antonio Valenzuela-Escárcega, University of Arizona
Book:

Deep Learning for Natural Language Processing

Published online:

01 February 2024

Print publication:

08 February 2024, pp 216-228
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In Chapters 10 and 12, we focused on two common usages of recurrent neural networks and transformer networks: acceptors and transducers. In this chapter, we discuss a third architecture for both recurrent neural networks and transformer networks: encoder-decoder methods. We introduce three encoder-decoder architectures, which enable important NLP applications such as machine translation. In particular, we discuss the sequence-to-sequence method of Sutskever et al. (2014), which couples an encoder long short-term memory with a decoder long short-term memory. We follow this method with the approach of Bahdanau et al. (2015), which extends the previous decoder with an attention component, which produces a different encoding of the source text for each decoded word. Last, we introduce the complete encoder-decoder transformer network, which relies on three attention mechanisms: one within the encoder (which we discussed in Chapter 12), a similar one that operates over decoded words, and, importantly, an attention component that connects the input words with the decoded ones.

15 - Deep Learning
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 494-517
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

NN models with more hidden layers than the traditional NN are referred to as deep neural network (DNN) or deep learning (DL) models, which are now widely used in environmental science. For image data, the convolutional neural network (CNN) has been developed, where in convolutional layers, a neuron is only connected to a small patch of neurons in the preceding layer, thereby greatly reducing the number of model weights. Popular architectures of DNN include the encoder-decoder and U-net models. For time series modelling, the long short-term memory (LSTM) network and temporal convolutional network have been developed. Generative adversarial network (GAN) produces highly realistic fake data.

Search Results

Refine search

Refine search

Actions for selected content:

4 results

21 - Large Language Models

Summary

15 - Implementing Encoder-Decoder Methods

Summary

14 - Encoder-Decoder Methods

Summary

15 - Deep Learning

Summary

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

4 results

21 - Large Language Models

Summary

15 - Implementing Encoder-Decoder Methods

Summary

14 - Encoder-Decoder Methods

Summary

15 - Deep Learning

Summary