Hostname: page-component-89b8bd64d-ktprf Total loading time: 0 Render date: 2026-05-05T22:03:00.496Z Has data issue: false hasContentIssue false

Joint learning of text alignment and abstractive summarization for long documents via unbalanced optimal transport

Published online by Cambridge University Press:  15 May 2023

Xin Shen*
Affiliation:
Department of System Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China
Wai Lam
Affiliation:
Department of System Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China
Shumin Ma
Affiliation:
Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College, Zhuhai, China
Huadong Wang
Affiliation:
Department of Computer Science and Technology, Tsinghua University, Beijing, China
*
Corresponding author: X. Shen; Email: xshen@se.cuhk.edu.hk
Rights & Permissions [Opens in a new window]

Abstract

Recently, neural abstractive text summarization (NATS) models based on sequence-to-sequence architecture have drawn a lot of attention. Real-world texts that need to be summarized range from short news with dozens of words to long reports with thousands of words. However, most existing NATS models are not good at summarizing long documents, due to the inherent limitations of their underlying neural architectures. In this paper, we focus on the task of long document summarization (LDS). Based on the inherent section structures of source documents, we divide an abstractive LDS problem into several smaller-sized problems. In this circumstance, how to provide a less-biased target summary as the supervision for each section is vital for the model’s performance. As a preliminary, we formally describe the section-to-summary-sentence (S2SS) alignment for LDS. Based on this, we propose a novel NATS framework for the LDS task. Our framework is built based on the theory of unbalanced optimal transport (UOT), and it is named as UOTSumm. It jointly learns three targets in a unified training objective, including the optimal S2SS alignment, a section-level NATS summarizer, and the number of aligned summary sentences for each section. In this way, UOTSumm directly learns the text alignment from summarization data, without resorting to any biased tool such as ROUGE. UOTSumm can be easily adapted to most existing NATS models. And we implement two versions of UOTSumm, with and without the pretrain-finetune technique. We evaluate UOTSumm on three publicly available LDS benchmarks: PubMed, arXiv, and GovReport. UOTSumm obviously outperforms its counterparts that use ROUGE for the text alignment. When combined with UOTSumm, the performance of two vanilla NATS models improves by a large margin. Besides, UOTSumm achieves better or comparable performance when compared with some recent strong baselines.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Table 1. The statistics of average document and summary lengths on several popular summarization datasets

Figure 1

Figure 1. An illustration of S2SS alignment. The arrow from $\textbf{s}_i$ to $\textbf{y}_{j}$, symbolizes ${\textbf{P}}_{i,j}\gt 0$, that is, there exists some degree of alignment between $\textbf{s}_i$ and $\textbf{y}_{j}$. Some sections may be aligned to more than one summary sentence, and some sections may not be aligned to any summary sentence. One summary sentence must be aligned to at least one section.

Figure 2

Table 2. Two cases to support formulating ${\textbf{P}}_{i,j}$ as a continuous variable

Figure 3

Figure 2. The architecture of UOTSumm.

Figure 4

Figure 3. The procedure of training UOTSumm. This figure visualizes one loop of Algorithm 1.

Figure 5

Table 3. Results on the test set of arXiv

Figure 6

Table 4. Results on the test set of PubMed

Figure 7

Table 5. Results on the test set of GovReport

Figure 8

Table 6. Cases to demonstrate the differences in text alignment between UOTSumm and ROUGE-L precision

Figure 9

Table 7. Section types and corresponding common keywords

Figure 10

Table 8. A case of text generation by UOTSumm