Hostname: page-component-89b8bd64d-j4x9h Total loading time: 0 Render date: 2026-05-07T03:49:18.691Z Has data issue: false hasContentIssue false

Emerging trends: General fine-tuning (gft)

Published online by Cambridge University Press:  23 May 2022

Kenneth Ward Church*
Affiliation:
Baidu, Sunnyvale, CA, USA
Xingyu Cai
Affiliation:
Baidu, Sunnyvale, CA, USA
Yibiao Ying
Affiliation:
Baidu, Beijing, China
Zeyu Chen
Affiliation:
Baidu, Beijing, China
Guangxu Xun
Affiliation:
Baidu, Sunnyvale, CA, USA
Yuchen Bian
Affiliation:
Baidu, Sunnyvale, CA, USA
*
*Corresponding author. E-mail: Kenneth.Ward.Church@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

This paper describes gft (general fine-tuning), a little language for deep nets, introduced at an ACL-2022 tutorial. gft makes deep nets accessible to a broad audience including non-programmers. It is standard practice in many fields to use statistics packages such as R. One should not need to know how to program in order to fit a regression or classification model and to use the model to make predictions for novel inputs. With gft, fine-tuning and inference are similar to fit and predict in regression and classification. gft demystifies deep nets; no one would suggest that regression-like methods are “intelligent.”

Information

Type
Emerging Trends
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Table 1. The standard recipe consists of three steps

Figure 1

Listing 1. Example of fit and predict in R.

Figure 2

Figure 1. Results produced by Listing 1. Predictions from the model g are shown in red.

Figure 3

Listing 2. Example of gft_fit.

Figure 4

Listing 3. Example of gft_predict. The default model performs sentiment analysis.

Figure 5

Listing 4. Example of gft_predict with a model for emotion classification.

Figure 6

Listing 5. Example of gft_summary.

Figure 7

Table 2. Sentiment classification of $x=$I love you

Figure 8

Table 3. More classifications of $x=$I love you

Figure 9

Listing 6. Example of gft_summary as a search engine.

Figure 10

Listing 7. Example of gft_summary with the null string as a query.

Figure 11

Listing 8. Examples of gft_summary.

Figure 12

Listing 9. An example of gft_fit using P for PaddleNLP/PaddleHub.

Figure 13

Table 4. Fine-tuning for downstream tasks: GLUE, SQuAD, etc.

Figure 14

Table 5. Some examples of tasks

Figure 15

Listing 10. Example of token classification.

Figure 16

Listing 11. Example of token classification with PaddleNLP.

Figure 17

Listing 12. Example of fill-mask (also known as cloze task).

Figure 18

Listing 13. Example of text generation.

Figure 19

Listing 14. Example of machine translation (MT).

Figure 20

Listing 15. Example of automatic speech recognition (ASR).

Figure 21

Listing 16. Example of image classification.

Figure 22

Listing 17. Example of input from data set (as opposed to stdin).

Figure 23

Listing 18. gft_eval outputs a single score for a data set, as opposed to gft_predict, which outputs a prediction for each row.

Figure 24

Listing 19. Code to create confusion matrix.

Figure 25

Table 6. Confusion matrix from Listing 19

Figure 26

Table 7. Some gold labels and predictions from model, $f_{post}$, from Listing 20

Figure 27

Listing 20. An equation with a vector on the left-hand side (lhs).

Figure 28

Table 8. Most $f_{pre}$ models are trained in industry because pretraining requires large capital investments in large teams and GPU clusters

Figure 29

Table 9. gft starts with large pre-trained base models, $f_{pre}$, typically trained on large corpora in Table 10, using expensive GPU clusters

Figure 30

Table 10. Some popular corpora for training pre-trained models, $f_{pre}$