Hostname: page-component-89b8bd64d-ksp62 Total loading time: 0 Render date: 2026-05-07T18:24:25.426Z Has data issue: false hasContentIssue false

Emerging trends: A gentle introduction to fine-tuning

Published online by Cambridge University Press:  26 October 2021

Kenneth Ward Church*
Affiliation:
Baidu, Sunnyvale, CA, USA
Zeyu Chen
Affiliation:
Baidu, Beijing, China
Yanjun Ma
Affiliation:
Baidu, Beijing, China
*
*Corresponding author. E-mail: kenneth.ward.church@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

The previous Emerging Trends article (Church et al., 2021. Natural Language Engineering 27(5), 631–645.) introduced deep nets to poets. Poets is an imperfect metaphor, intended as a gesture toward inclusion. The future for deep nets will benefit by reaching out to a broad audience of potential users, including people with little or no programming skills, and little interest in training models. That paper focused on inference, the use of pre-trained models, as is, without fine-tuning. The goal of this paper is to make fine-tuning more accessible to a broader audience. Since fine-tuning is more challenging than inference, the examples in this paper will require modest programming skills, as well as access to a GPU. Fine-tuning starts with a general purpose base (foundation) model and uses a small training set of labeled data to produce a model for a specific downstream application. There are many examples of fine-tuning in natural language processing (question answering (SQuAD) and GLUE benchmark), as well as vision and speech.

Information

Type
Emerging Trends
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press
Figure 0

Table 1. Base models are large in two respects: model size and training data

Figure 1

Table 2. Some popular datasets for training base models

Figure 2

Figure 1. Some pictures of flowers with labels.

Figure 3

Table 3. Some SQuAD 1.1 results

Figure 4

Table 4. English glosses of six types of questions from Chinese search logs

Figure 5

Table 5. GLUE Results for Human, HuggingFaceHub and Our replication