Hostname: page-component-76fb5796d-25wd4 Total loading time: 0 Render date: 2024-04-29T14:42:56.430Z Has data issue: false hasContentIssue false

Reducing repetition in convolutional abstractive summarization

Published online by Cambridge University Press:  24 November 2021

Yizhu Liu
Affiliation:
Shanghai Jiao Tong, Shanghai, China
Xinyue Chen
Affiliation:
Carnegie Mellon University, Pennsylvania, USA
Xusheng Luo
Affiliation:
Alibaba Group, Hangzhou, China
Kenny Q. Zhu*
Affiliation:
Shanghai Jiao Tong, Shanghai, China
*
*Corresponding author: Email: kzhu@cs.sjtu.edu.cn

Abstract

Convolutional sequence to sequence (CNN seq2seq) models have met success in abstractive summarization. However, their outputs often contain repetitive word sequences and logical inconsistencies, limiting the practicality of their application. In this paper, we find the reasons behind the repetition problem in CNN-based abstractive summarization through observing the attention map between the summaries with repetition and their corresponding source documents and mitigate the repetition problem. We propose to reduce the repetition in summaries by attention filter mechanism (ATTF) and sentence-level backtracking decoder (SBD), which dynamically redistributes attention over the input sequence as the output sentences are generated. The ATTF can record previously attended locations in the source document directly and prevent the decoder from attending to these locations. The SBD prevents the decoder from generating similar sentences more than once via backtracking at test. The proposed model outperforms the baselines in terms of ROUGE score, repeatedness, and readability. The results show that this approach generates high-quality summaries with minimal repetition and makes the reading experience better.

Type
Article
Copyright
© The Author(s), 2021. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Allahyari, M., Pouriyeh, S.A., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., and Kochut, K. J. (2017). Text summarization techniques: A brief survey. arXiv, abs/1707.02268.CrossRefGoogle Scholar
Bae, S., Kim, T., Kim, J. and Lee, S. (2019). Summary level training of sentence rewriting for abstractive summarization. arXiv, abs/1909.08752.CrossRefGoogle Scholar
Bai, S., Kolter, J.Z. and Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv, abs/1803.01271.Google Scholar
Bing, L., Li, P., Liao, Y., Lam, W., Guo, W. and Passonneau, R.J. (2015). Abstractive multi-document summarization via phrase selection and merging. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers. Association for Computer Linguistics, pp. 15871597.CrossRefGoogle Scholar
Bokaei, M.H., Sameti, H. and Liu, Y. (2016). Extractive summarization of multi-party meetings through discourse segmentation. Natural Language Engineering 22(1), 4172.CrossRefGoogle Scholar
Briscoe, T. (1996). The syntax and semantics of punctuation and its use in interpretation. In Proceedings of the Association for Computational Linguistics Workshop on Punctuation, pp. 17.Google Scholar
Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., Säckinger, E. and Shah, R. (1993). Signature verification using A “siamese” time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence 7(4), 669688.CrossRefGoogle Scholar
Carenini, G. and Cheung, J.C.K. (2008). Extractive vs. NLG-based abstractive summarization of evaluative text: The effect of corpus controversiality. In INLG 2008 - Proceedings of the Fifth International Natural Language Generation Conference, June 12–14, 2008, Salt Fork, Ohio, USA. The Association for Computer Linguistics.Google Scholar
Çelikyilmaz, A., Bosselut, A., He, X. and Choi, Y. (2018). Deep communicating agents for abstractive summarization. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers). Association for Computational Linguistics, pp. 16621675.CrossRefGoogle Scholar
Chen, Q., Zhu, X., Ling, Z., Wei, S. and Jiang, H. (2016). Distraction-based neural networks for modeling document. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016. IJCAI/AAAI Press, pp. 27542760.Google Scholar
Chen, Y. and Bansal, M. (2018). Fast abstractive summarization with reinforce-selected sentence rewriting. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers. Association for Computational Linguistics, pp. 675686.CrossRefGoogle Scholar
Chopra, S., Auli, M. and Rush, A. M. (2016). Abstractive sentence summarization with attentive recurrent neural networks. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016. Association for Computational Linguistics, pp. 93–98.CrossRefGoogle Scholar
Dauphin, Y.N., Fan, A., Auli, M. and Grangier, D. (2017). Language modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. PMLR, pp. 933–941.Google Scholar
Devlin, J., Chang, M., Lee, K. and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp. 41714186.Google Scholar
Dong, L., Yang, N., Wang, W., Wei, F., Liu, X., Wang, Y., Gao, J., Zhou, M. and Hon, H. (2019). Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 1304213054.Google Scholar
Fan, A., Grangier, D. and Auli, M. (2018). Controllable abstractive summarization. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, NMT@ACL 2018, Melbourne, Australia, July 20, 2018. Association for Computational Linguistics, pp. 4554.CrossRefGoogle Scholar
Gehring, J., Auli, M., Grangier, D., Yarats, D. and Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. PMLR, pp. 12431252.Google Scholar
Gehrmann, S., Deng, Y., and Rush, A. M. (2018). Bottom-up abstractive summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. Association for Computational Linguistics, pp. 40984109.CrossRefGoogle Scholar
Grusky, M., Naaman, M. and Artzi, Y. (2018). Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers). Association for Computational Linguistics, pp. 708–719.Google Scholar
He, K., Zhang, X., Ren, S. and Sun, J. (2016). Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, pp. 770778.CrossRefGoogle Scholar
Hermann, K.M., Kociský, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M. and Blunsom, P. (2015). Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pp. 16931701.Google Scholar
Jiang, Y. and Bansal, M. (2018). Closed-book training to improve summarization encoder memory. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. Association for Computational Linguistics, pp. 40674077.CrossRefGoogle Scholar
Kim, S. (2019). Deep recurrent neural networks with layer-wise multi-head attentions for punctuation restoration. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019. IEEE, pp. 7280–7284.CrossRefGoogle Scholar
Kulesza, A. and Taskar, B. (2011). k-DPPs: Fixed-size determinantal point processes. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011. Omnipress, pp. 11931200.Google Scholar
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V. and Zettlemoyer, L. (2020). BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp. 7871–7880.CrossRefGoogle Scholar
Li, L., Liu, W., Litvak, M., Vanetik, N. and Huang, Z. (2019a). In conclusion not repetition: Comprehensive abstractive summarization with diversified attention based on determinantal point processes. In Proceedings of the 23rd Conference on Computational Natural Language Learning, CoNLL 2019, Hong Kong, China, November 3-4, 2019. Association for Computational Linguistics, pp. 822–832.CrossRefGoogle Scholar
Li, W., He, L. and Zhuge, H. (2016). Abstractive news summarization based on event semantic link network. In COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, December 11-16, 2016, Osaka, Japan. Association for Computer Linguistics, pp. 236–246.Google Scholar
Li, W., Xiao, X., Lyu, Y. and Wang, Y. (2018a). Improving neural abstractive document summarization with explicit information selection modeling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. Association for Computational Linguistics, pp. 1787–1796.CrossRefGoogle Scholar
Li, W., Xiao, X., Lyu, Y. and Wang, Y. (2018b). Improving neural abstractive document summarization with structural regularization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pp. 40784087. Association for Computational Linguistics.CrossRefGoogle Scholar
Li, X. L., Wang, D. and Eisner, J. (2019b). A generative model for punctuation in dependency trees. Transactions of the Association for Computational Linguistics 7, 357373.CrossRefGoogle Scholar
Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pp. 74–81, Barcelona, Spain.Google Scholar
Lin, J., Sun, X., Ma, S. and Su, Q. (2018). Global encoding for abstractive summarization. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers. Association for Computational Linguistics, pp. 163169.CrossRefGoogle Scholar
Liu, Y., Jia, Q. and Zhu, K. Q. (2021). Keyword-aware abstractive summarization by extracting set-level intermediate summaries. In WWW 2021: The Web Conference 2021, Virtual Event/Ljubljana, Slovenia, April 19-23, 2021. ACM/IW3C2, pp. 3042–3054.CrossRefGoogle Scholar
Liu, Y. and Lapata, M. (2019). Hierarchical transformers for multi-document summarization. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics, pp. 50705081.CrossRefGoogle Scholar
Liu, Y., Luo, Z. and Zhu, K. Q. (2018). Controlling length in abstractive summarization using a convolutional neural network. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. Association for Computational Linguistics, pp. 4110–4119.CrossRefGoogle Scholar
Loukina, A., Zechner, K. and Chen, L. (2014). Automatic evaluation of spoken summaries: the case of language assessment. In Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, BEA@ACL 2014, June 26, 2014, Baltimore, Maryland, USA. Association for Computer Linguistics, pp. 68–78.CrossRefGoogle Scholar
Nallapati, R., Zhou, B., dos Santos, C. N., Gülçehre, Ç. and Xiang, B. (2016). Abstractive text summarization using sequence-to-sequence rnns and beyond. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016, Berlin, Germany, August 11-12, 2016. Association for Computational Linguistics, pp. 280–290.CrossRefGoogle Scholar
Naserasadi, A., Khosravi, H. and Sadeghi, F. (2019). Extractive multi-document summarization based on textual entailment and sentence compression via knapsack problem.CrossRefGoogle Scholar
Nguyen, M., Cuong, T.V., Nguyen, X.H. and Nguyen, L. (2019). Web document summarization by exploiting social context with matrix co-factorization. Information Processing and Management 56(3), 495515.CrossRefGoogle Scholar
Pallotta, V., Delmonte, R. and Bristot, A. (2009). Abstractive summarization of voice communications. In Human Language Technology. Challenges for Computer Science and Linguistics - 4th Language and Technology Conference, LTC 2009, Poznan, Poland, November 6-8, 2009, Revised Selected Papers. Springer, pp. 291302.Google Scholar
Pascanu, R., Mikolov, T. and Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013. JMLR.org, pp. 1310–1318.Google Scholar
Paulus, R., Xiong, C. and Socher, R. (2018). A deep reinforced model for abstractive summarization. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.Google Scholar
Qi, W., Yan, Y., Gong, Y., Liu, D., Duan, N., Chen, J., Zhang, R. and Zhou, M. (2020). Prophetnet: Predicting future n-gram for sequence-to-sequence pre-training. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020 of Findings of ACL. Association for Computational Linguistics, pp. 2401–2410.CrossRefGoogle Scholar
Radev, D. R., Hovy, E. H. and McKeown, K. R. (2002). Introduction to the special issue on summarization. Computational Linguistics 28(4), 399408.CrossRefGoogle Scholar
Rush, A. M., Chopra, S. and Weston, J. (2015). A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015. Association for Computational Linguistics, pp. 379–389.CrossRefGoogle Scholar
Sankarasubramaniam, Y., Ramanathan, K. and Ghosh, S. (2014). Text summarization using wikipedia. Information Processing and Management 50(3), 443461.CrossRefGoogle Scholar
See, A., Liu, P.J. and Manning, C.D. (2017). Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers. Association for Computational Linguistics, pp. 1073–1083.CrossRefGoogle Scholar
Sharma, E., Huang, L., Hu, Z. and Wang, L. (2019). An entity-driven framework for abstractive summarization. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019. Association for Computer Linguistics, pp. 3278–3289.CrossRefGoogle Scholar
Shi, T., Keneshloo, Y., Ramakrishnan, N. and Reddy, C.K. (2021). Neural abstractive text summarization with sequence-to-sequence models. Transactions on data Science 2(1), 1:11:37.CrossRefGoogle Scholar
Sutskever, I., Martens, J., Dahl, G. E. and Hinton, G. E. (2013). On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013. JMLR.org, pp. 1139–1147.Google Scholar
Suzuki, J. and Nagata, M. (2017). Cutting-off redundant repeating generations for neural abstractive summarization. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3-7, 2017, Volume 2: Short Papers. Association for Computational Linguistics, pp. 291297.CrossRefGoogle Scholar
Tan, J., Wan, X. and Xiao, J. (2017). Abstractive document summarization with a graph-based attentional neural model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers. Association for Computational Linguistics, pp. 11711181.CrossRefGoogle Scholar
Vaswani, A., Bengio, S., Brevdo, E., Chollet, F., Gomez, A. N., Gouws, S., Jones, L., Kaiser, L., Kalchbrenner, N., Parmar, N., Sepassi, R., Shazeer, N. and Uszkoreit, J. (2018). Tensor2tensor for neural machine translation. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas, AMTA 2018, Boston, MA, USA, March 17-21, 2018 - Volume 1: Research Papers. Association for Machine Translation in the Americas, pp. 193199.Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 59986008.Google Scholar
Verma, R. M. and Lee, D. (2017). Extractive summarization: Limits, compression, generalized model and heuristics. Computación y Sistemas 21(4).Google Scholar
Wang, K., Quan, X. and Wang, R. (2019). Biset: Bi-directional selective encoding with template for abstractive summarization. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics, pp. 2153–2162.CrossRefGoogle Scholar
Yao, J., Wan, X. and Xiao, J. (2017). Recent advances in document summarization. Knowledge and Information Systems 53(2), 297336.CrossRefGoogle Scholar
Zhang, J., Zhao, Y., Saleh, M. and Liu, P. J. (2020). PEGASUS: pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research. PMLR, pp. 11328–11339.Google Scholar
Zhang, X., Wei, F. and Zhou, M. (2019a). HIBERT: document level pre-training of hierarchical bidirectional transformers for document summarization. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics, pp. 5059–5069.CrossRefGoogle Scholar
Zhang, Y., Li, D., Wang, Y., Fang, Y. and Xiao, W. (2019b). Abstract text summarization with a convolutional seq2seq model. Applied Sciences 9, 1665.CrossRefGoogle Scholar
Zhong, M., Liu, P., Chen, Y., Wang, D., Qiu, X. and Huang, X. (2020). Extractive summarization as text matching. In Jurafsky, D., Chai, J., Schluter, N. and Tetreault, J.R. (eds), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp. 61976208.CrossRefGoogle Scholar
Zhong, M., Liu, P., Wang, D., Qiu, X. and Huang, X. (2019). Searching for effective neural extractive summarization: What works and what’s next. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computer Linguistics, pp. 1049–1058.CrossRefGoogle Scholar