Hostname: page-component-77c78cf97d-9dm9z Total loading time: 0 Render date: 2026-04-23T09:26:10.032Z Has data issue: false hasContentIssue false

Data-to-text generation using conditional generative adversarial with enhanced transformer

Published online by Cambridge University Press:  28 November 2023

Elham Seifossadat
Affiliation:
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Hossein Sameti*
Affiliation:
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
*
Corresponding author: Hossein Sameti; Email: sameti@sharif.edu
Rights & Permissions [Opens in a new window]

Abstract

In this paper, we propose an enhanced version of the vanilla transformer for data-to-text generation and then use it as the generator of a conditional generative adversarial model to improve the semantic quality and diversity of output sentences. Specifically, by adding a diagonal mask matrix to the attention scores of the encoder and using the history of the attention weights in the decoder, this enhanced version of the vanilla transformer prevents semantic defects in the output text. Also, using this enhanced transformer along with a triplet network, respectively, as the generator and discriminator of conditional generative adversarial network, diversity and semantic quality of sentences are guaranteed. To prove the effectiveness of the proposed model, called conditional generative adversarial with enhanced transformer (CGA-ET), we performed experiments on three different datasets and observed that our proposed model is able to achieve better results than the baselines models in terms of BLEU, METEOR, NIST, ROUGE-L, CIDEr, BERTScore, and SER automatic evaluation metrics as well as human evaluation.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. Sample meaning representation (MR) and text chosen from (a) WebNLG, (b) MultiWoz, and (c) E2E datasets.

Figure 1

Figure 2. The block diagram of CGS-ET model. In the generator diagram, improvements made to the transformers-based generator network are shown as blocks with dashed borders. The discriminator diagram shows a triplet network that consists of three vanilla transformer encoders with sharing weights.

Figure 2

Figure 3. Generating Masked Scores for the self-attention sublayer of each encoder block.

Figure 3

Figure 4. Generating the encoder–decoder self-attention vector in (a) the vanilla transformer and (b) the proposed model.

Figure 4

Figure 5. Discriminator network. The hidden state vectors obtained from the encoders contain the semantic information of the input sentences. The discriminator network is trained in such a way that the distance between the semantic vectors of the real sentence and MR (positive distance) is less than the distance between the semantic vectors of the fake sentence and MR (negative distance). The discriminator network also tries to increase the distance between the real and predicted distributions of data $(D_{Real}, D_{Fake})$.

Figure 5

Algorithm 1: Training process of our proposed model.

Figure 6

Table 1. Datasets statistics. Attributes shows the total number of slot types and Avg-Len indicates average length of sentences in each dataset

Figure 7

Table 2. Results on WebNLG seen and unseen test data. The best and second-best models are highlighted in bold and underline face, respectively

Figure 8

Table 3. Results on WebNLG full test data. The best and second-best models are highlighted in bold and underline face, respectively

Figure 9

Table 4. Results of human evaluations on WebNLG dataset in terms of Faithfulness, Coverage, and Fluency (rating out of 3). The symbols $\ast$ and $\dagger$ indicate statistically significant improvement with $p \lt 0.05$ and $p \lt 0.01$, based on the paired t-test and the ANOVA test, respectively

Figure 10

Table 5. Comparison of the generated sentences from the WebNLG dataset for our proposed model and baselines. The missed meaning labels for each generated sentence are shown in red

Figure 11

Table 6. Results on MultiWoz for generating sentence form input MR. The best and second-best models are highlighted in bold and underline face, respectively

Figure 12

Table 7. Results of human evaluations on MultiWoz dataset in terms of Faithfulness, Coverage, and Fluency (rating out of 3)

Figure 13

Table 8. Comparison of the generated sentences from the MultiWoz dataset for our proposed model

Figure 14

Table 9. Results on E2E. The best and second-best models are highlighted in bold and underline face, respectively

Figure 15

Table 10. Results of human evaluations on E2E dataset in terms of Faithfulness, Coverage, and Fluency (rating out of 3)

Figure 16

Table 11. Comparison of the generated sentences from the E2E dataset for our proposed model

Figure 17

Figure 6. Encoder–decoder attention weights for an E2E MR and the equivalent sentence generated by (a) Vanilla transformer, (b) Vanilla transformer+Diagonal Mask. and (c) Vanilla transformer+Diagonal Mask+History.

Figure 18

Table 12. Comparison of the effect of modifications on the vanilla transformer in data-to-text generation

Figure 19

Figure 7. Different loss curves of CGA-ET model during adversarial training process.