Hostname: page-component-77f85d65b8-2tv5m Total loading time: 0 Render date: 2026-03-30T03:49:32.292Z Has data issue: false hasContentIssue false

How do control tokens affect natural language generation tasks like text simplification

Published online by Cambridge University Press:  23 January 2024

Zihao Li*
Affiliation:
Manchester Metropolitan University, Manchester, UK
Matthew Shardlow
Affiliation:
Manchester Metropolitan University, Manchester, UK
*
Corresponding author: Zihao Li; Email: 21443696@stu.mmu.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Recent work on text simplification has focused on the use of control tokens to further the state-of-the-art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenization strategy, which we also explore. In this paper, we (1) reimplemented AudienCe-CEntric Sentence Simplification, (2) explored the effects and interactions of varying control tokens, (3) tested the influences of different tokenization strategies, (4) demonstrated how separate control tokens affect performance and (5) proposed new methods to predict the value of control tokens. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence performance and give some suggestions for designing control tokens. We show the newly proposed method with higher performance in both SARI (a common scoring metric in text simplificaiton) and BERTScore (a score derived from the BERT language model) and potential in real applications.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. An instance of preprocessed input and raw output of the system.

Figure 1

Figure 2. The reimplementation methodology is represented as a flow chart. We fine-tune BART-base on the preprocessed WikiLarge training set so that the model learns to simplify under the control token or control tokens. After optimisation on the target dataset, the model can apply the optimal value of control tokens to the input and generate desired simplifications.

Figure 2

Table 1. Tokenization under differing strategies for the input starting with: ’$\lt$DEPENDENCYTREEDEPTHRATIO_0.6$\gt$

Figure 3

Table 2. The SARI score of supervised text simplification systems (p in the brackets refers to the p-value of the SARI score against the multilingual unsupervised sentence simplification (without mined data))

Figure 4

Table 3. Results on SARI and BERTScore under differing tokenization strategies, with comparison to the baseline (p in the brackets refers to the p-value of the SARI score against the baseline and shows no statistical significance improvement)

Figure 5

Table 4. SARI and BERTScore on the ASSET test set under different optimisation targets (p in the brackets refers to the p-value of the SARI score against the baseline and shows no statistical significance improvement). The first three rows optimise on the validation set, the middle three rows optimise on the test set and the bottom three rows show the performance under average value of control tokens

Figure 6

Table 5. Results on SARI and BERTScores of peak points in different control tokens (We choose the points with the highest SARI score in the three strategies and corresponding points in the other two strategies)

Figure 7

Figure 3. The effect of varying control tokens with different tokenization strategies on SARI Score.

Figure 8

Table 6. SARI score by operation at turning points in Fig. 3

Figure 9

Table 7. Performance of regressive control token predictors on the average value of ASSET test set (Average Variance means the average value of the variances calculated from different sentence pairs in ASSET test set)

Figure 10

Figure 4. The density distribution of predictions, average values and values of all reference sentences.

Figure 11

Figure 5. The box plot of distributions of predictions, average values and values of all reference sentences for the four control tokens.

Figure 12

Table 8. Performance of single-control token models with predictors on ASSET test (Regression and Classification methods refer to the situation that single-control models working with the control token predictors, average method means the control token value is the average value of reference sentences and optimization method is the same as shown in previous sections)

Figure 13

Table 9. Performance of control token model with predictors on ASSET test. The top three rows are predicted values, and the bottom three rows are calculated or optimised values

Figure 14

Table 10. The performance with only one static value of the control token (The value is referred from Table 4)

Figure 15

Table 11. The performance with only one dynamic value of the control token and the static value for the remaining control tokens are DTD:0.35, WR:0.7, LV:0.85 and LR:0.9, respectively (The value is referred from Table 4)

Figure 16

Table 12. Effect of varying Length ratio with the others remaining at 1.0

Figure 17

Table 13. Effect of varying ReplaceOnlyLevenshtein ratio with the others remaining at 1.0

Figure 18

Table 14. Effect of varying WordRank ratio and some other ratios with the others remaining at 1.0

Figure 19

Table 15. Effect of varying DependencyTreeDepth ratio with the others remaining at 1.0

Figure 20

Table 16. Examples of limitations of the optimisation method (changed the meaning in the third row)

Figure 21

Table 17. Examples with mispredicted values (missing ‘or’ clause in the third row)

Figure 22

Table 18. Examples with properly predicted values

Figure 23

Figure A1. SARI score of 128 times of optimisation.

Figure 24

Figure A2. The effect of varying control tokens with different tokenization strategies on BERTScore.

Figure 25

Table A1. More insights in the two systems