Hostname: page-component-77f85d65b8-zzw9c Total loading time: 0 Render date: 2026-03-29T10:19:59.593Z Has data issue: false hasContentIssue false

Controllable abstractive summarization with arbitrary textual context

Published online by Cambridge University Press:  26 August 2025

Tatiana Passali*
Affiliation:
School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas
Affiliation:
School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
*
Corresponding author: Tatiana Passali; Email: scpassali@csd.auth.gr
Rights & Permissions [Opens in a new window]

Abstract

Controllable summarization models are typically limited only to a short text, such as a topic mention, a keyword, or an entity, to control the output summary. At the same time, existing models for controllable summarization are prone to generate artificial content, resulting in unreliable summaries. In this work, we propose a method for controllable abstractive summarization that can exploit arbitrary textual context from a short text to a collection of documents to direct the focus of the generated summary. The proposed method incorporates a sentence BERT model to extract an embedding-based representation of the given context, which is then used to tag the most representative words of the input document towards this context. In addition, we propose an unsupervised metric to evaluate the faithfulness of the topic-oriented sentences of the generated summaries with respect to the input document. Experimental results under different zero-shot setups demonstrate that the proposed method surpasses both state-of-the-art large language models (LLMs) and controllable summarization methods. The generated summaries are both reliable and relevant with respect to the input document.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Table 1. Examples of summaries generated by CTRLsum and the proposed $BART_{tag}$ for different topics of the same document. Blue and violet demonstrate indicative tagged words for the topics “Science & and Health” and “Neuroscience,” respectively. Orange indicates common tagged words for both topics. Bold red indicates the artificially generated content.

Figure 1

Table 2. Examples of different tagging schemes according to different topics for a document from CNN/DailyMail (Hermann et al.2015).

Figure 2

Figure 1. We encode each word of the input document into the same space with the given context using SBERT. Words of the input document that are semantically close to the context are prepended with the special token [TAG].

Figure 3

Figure 2. Training Dataset Creation. We exploit the ground truth summaries of existing large-scale summarization datasets to tailor the summary generation towards this context.

Figure 4

Table 3. Dataset statistics. Size is measured in articles for train, validation, and test set while the average length for documents and summaries is measured in tokens.

Figure 5

Table 4. Experimental results on CNN/DailyMail dataset using different input context for a short text (topic), document (doc.), and collection of documents (col.). F-1 scores for ROUGE-1 (R1), ROUGE-2 (R2), and ROUGE-L (RL) are reported.

Figure 6

Table 5. Examples of summaries generated by CTRLsum and the proposed $PEGASUS_{tag}$ for different topics of the same document.

Figure 7

Table 6. Experimental results on the MultiNews dataset using different input context for a short text (topic), document (doc.), and collection of documents (col.). F-1 scores for ROUGE-1 (R1), ROUGE-2(R2), and ROUGE-L (RL) are reported.

Figure 8

Table 7. Experimental results on the XSum dataset using different input context for a short text (topic), document (doc.), and collection of documents (col.). F-1 scores for ROUGE-1 (R1), ROUGE-2(R2), and ROUGE-L (RL) are reported.

Figure 9

Table 8. Experimental results on the MacDoc dataset. F-1 scores for ROUGE-1 (R1), ROUGE-2(R2), and ROUGE-L (RL) are reported.

Figure 10

Table 9. Experimental results on the NEWTS dataset. F-1 scores for ROUGE-1 (R1), ROUGE-2(R2), and ROUGE-L (RL) are reported.

Figure 11

Table 10. Experimental results on the Debatepedia dataset with the given query used as the available context. F-1 scores for ROUGE-1 (R1), ROUGE-2(R2), and ROUGE-L (RL) are reported.

Figure 12

Table 11. Correlation Between REL Metric and Human Ratings.