Skip to main content
×
×
Home

Extractive multi-document summarization based on textual entailment and sentence compression via knapsack problem

  • ALI NASERASADI (a1), HAMID KHOSRAVI (a2) and FARAMARZ SADEGHI (a2)
Abstract

By increasing the amount of data in computer networks, searching and finding suitable information will be harder for users. One of the most widespread forms of information on such networks are textual documents. So exploring these documents to get information about their content is difficult and sometimes impossible. Multi-document text summarization systems are an aid to producing a summary with a fixed and predefined length, while covering the maximum content of the input documents. This paper presents a novel method for multi-document extractive summarization based on textual entailment relations and sentence compression via formulating the problem as a knapsack problem. In this approach, sentences of documents are ranked according to the extended Tf-Idf method, then entailment scores of selected sentences are computed. Through these scores, the final score of each sentence is calculated. Finally, by decreasing the lengths of sentences via sentence compression, the problem has been solved by greedy and dynamic Programming approaches to the knapsack problem. Experiments on standard summarization datasets and evaluating the results based on the Rouge system show that the suggested method, according to the best of our knowledge, has increased F-measure of query-based summarization systems by two per cent and F-measure of general summarization systems by five per cent.

Copyright
References
Hide All
Almeida, M., and Martins, A. 2013. Fast and robust compressive summarization with dual decomposition and multi-task learning. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 196–206.
Amini, M., and Usunier, N. 2009. Incorporating prior knowledge into a transductive ranking algorithm for multi-document summarization. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 704–5.
Amini, M., Usunier, N., and Gallinari, P., 2005. Automatic text summarization based on word-clusters and ranking algorithms. In Proceedings of the European Conference on Information Retrieval, Springer, Berlin, Heidelberg, pp. 142–56.
Baumel, T., Cohen, R., and Elhadad, M. 2014. Query-chain focused summarization. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 913–22.
Baxendale, P., 1958. Machine-made index for technical literature – an experiment. IBM Journal of Research and Development 2 (4): 354–61.
Bentivogli, L., Clark, P., Dagan, I., and Giampiccolo, D. 2009. The fifth PASCAL recognizing textual entailment challenge. In Proceedings of the Text Analysis Conference.
Berg-Kirkpatrick, T., Gillick, D., and Klein, D. 2011. Jointly learning to extract and compress. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 481–90.
Brazilay, R., and Elhadad, M., 1997. Using lexical chains for text summarization. In Proceedings of the Association for Computational Linguistics (ACL) workshop on intelligent scalable text summarization, vol. 17, Madrid, Spain, pp. 1017.
Cai, X., and Li, W., 2013. Ranking through clustering: an integrated approach to multi-document summarization. IEEE Transactions on Audio, Speech, and Language Processing 21 (7): 1424–33.
Canhasi, E., and Kononenko, I., 2016. Weighted hierarchical archetypal analysis for multi-document summarization. Computer Speech & Language 37 (2016): 2446.
Cao, Z., Li, W., Li, S., and Wei, F. 2017. Improving multi-document summarization via text classification. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 3053–9.
Cao, Z., Li, W., Li, S., Wei, F., and Li, Y., 2016. Attsum: joint learning of focusing and summarization with neural attention. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING-2016), Osaka, Japan, pp. 547–56.
Cao, Z., Wei, F., Dong, L., Li, S., and Zhou, M. 2015. Ranking with recursive neural jnetworks and its application to multi-document summarization. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, pp. 2153–2159.
Cao, Z., Wei, F., Li, S., Li, W., Zhou, M., and Wang, H. 2015. Learning summary prior representation for extractive summarization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers), Beijing: China, pp. 829–33.
Christensen, J., Soderland, S., and Etzioni, O. 2013. Towards coherent multi-document summarization. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1163–73.
Chuang, W., and Yang, J. 2000. Extracting sentence segments for text summarization: a machine learning approach, In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 152–9.
Conroy, J., and O’leary, D. 2001. Text summarization via hidden markov models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 406–7.
Conroy, J., Schlesinger, J., and O’Leary, D. 2007. Classy 2007 at duc 2007. In Proceedings of the Document Understanding Conference.
Dagan, I., Glickman, O., and Magnini, B., 2006. The PASCAL recognising textual entailment challenge. In Machine Learning Challenges, Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, Springer, Berlin, Heidelberg, pp. 177–90.
Das, D., and Martins, A 2007. A Survey on Automatic Text Summarization. Literature Survey for the Language and Statistics II course at CMU 4, pp. 192–5.
Daume, H., and Marcu, D. 2005. Bayesian multi-document summarization at MSE. In ACL 2005, Workshop on Multilingual Summarization Evaluation (MSE).
Daume, H. III, and Marcu, D. 2006. Bayesian query-focused summarization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 305–12.
Donaway, R., Drummey, K., and Mather, L. 2000. A comparison of rankings produced by summarization evaluation measures. In Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization, vol. 4, pp. 69–78.
Dunlavy, D., O’Leary, D., Conroy, J., and Schlesinger, J., 2007. QCS: a system for querying, clustering and summarizing documents. Information Processing and Management 43 (6): 1588–605.
Edmundson, H., 1969. New methods in automatic extracting. Journal of the ACM 16 (2): 264–85.
Erkan, G., and Radev, D., 2004. LexRank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22: 457–79.
Filatova, E., and Hatzivassiloglou, V. 2004. A formal model for information selection in multi-sentence text extraction. In Proceedings of the 20th International Conference on Computational Linguistics, ACL, p. 397.
Fuentes, M., Alfonseca, E., and Rodriguez, H. 2007. Support vector machines for query-focused summarization trained and evaluated on pyramid data. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL, pp. 57–60.
Galley, M. 2006. A skip-chain conditional random field for ranking meeting utterances by importance. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, ACL, pp. 364–72.
Gong, Y., and Liu, X. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19–25.
Gupta, A., Kathuria, M., Singh, A., Sachdeva, A., and Bhati, S., 2012. Analog textual entailment and spectral clustering (atesc) based summarization. In Proceedings of the International Conference on Big Data Analytics, Springer, Berlin, Heidelberg, pp. 101–10.
Gupta, A., Kaur, M., Singh, A., Goel, A., and Mirkin, S. 2014. Text summarization through entailment-based minimum vertex cover. In Proceedings of the Third Joint Conference on Lexical and Computational Semantics (SEM-2014), pp. 75–80.
Haghighi, A., and Vanderwende, L. 2009. Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, ACL, pp. 362–70.
Hovy, E., and Lin, C. 1998. Automated text summarization and the SUMMARIST system. In Proceedings of a workshop on held at Baltimore Maryland, ACL, pp. 197–214.
He, Z., Chen, C., Bu, J., Wang, C., Zhang, L., Cai, D., and He, X. 2012. Document summarization based on data reconstruction. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI–2012).
Hirao, T., Yoshida, Y., Nishino, M., Yasuda, N., and Nagata, M. 2013. Single-document summarization as a tree knapsack problem. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP-2013), vol. 13, pp. 1515–20.
Hirao, T., Isozaki, H., Maeda, E., and Matsumoto, Y., 2002. Extracting important sentences with support vector machines. In Proceedings of the 19th International Conference on Computational Linguistics, ACL, vol. 1, pp. 17.
Hong, K., Marcus, M., and Nenkova, A. 2015. System combination for multi-document summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP-2015), pp. 107–17.
Hong, K., and Nenkova, A. 2014. Improving the estimation of word importance for news multi-document summarization. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2014), pp. 712–21.
Jin, R., Abu-Ata, M., Xiang, Y., and Ruan, N., 2008. Effective and efficient item set pattern summarization: regression-based approaches. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, pp. 399407.
Kaikhah, K., 2004. Automatic text summarization with neural networks. In Proceedings of the 2nd International IEEE Conference on Intelligent Systems, IEEE, vol 1, pp. 40–4.
Knight, K., and Marcu, D. 2000. Statistics-based summarization-step one: sentence compression. In Proceedings of the 17th National Conference on Artificial Intelligence and 12th Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI-2000), pp. 703–10.
Knight, K., and Marcu, D., 2002. Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artificial Intelligence 139 (1): 91107.
Kutlu, M., Cigir, C., and Cicekli, I., 2010. Generic text summarization for Turkish. The Computer Journal 53 (8): 13151323.
Li, P., Bing, L., Lam, W., Li, H., and Liao, Y. 2015. Reader-aware multi-document summarization via sparse coding. In IJCAI, pp. 1270–1276.
Li, C., Liu, Y., and Zhao, L. 2015. Improving update summarization via supervised ILP and sentence reranking. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL-2015), pp. 1317–22.
Li, S., Ouyang, Y., Wang, W., and Sun, B. 2007. Multi-document summarization using support vector regression. In Proceedings of Document Understanding Conference (DUC-2007).
Lin, C. 2004. Rouge: a package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, vol. 8.
Lin, C., Cao, G., Gao, J., and Nie, J. 2006. An information-theoretic approach to automatic evaluation of summaries. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, ACL, pp. 463–70.
Lin, S., and Chen, B. 2010. A risk minimization framework for extractive speech summarization. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 79–87.
Lin, C., and Hovy, E. 1997. Identifying topics by position. In Proceedings of the 5th Conference on Applied Natural Language Processing, ACL, pp. 283–90.
Lin, C., and Hovy, E., 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, ACL, vol. 1, pp. 71–8.
Litvak, M., Last, M., and Friedman, M. 2010. A new approach to improving multilingual summarization using a genetic algorithm. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 927–36.
Litvak, M., Vanetik, N., and Last, M. 2015. Krimping texts for better summarization. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1931–5.
Liu, F., and Liu, Y. 2009. From extractive to abstractive meeting summaries: can it be done by sentence compression?. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACL, pp. 261–4.
Lopez, C., Prince, V., and Roche, M. 2011. Automatic titling of articles using position and statistical information. In Proceedings of the Recent Advances in Natural Language Processing (RANLP-2011), pp. 727–32.
Lopez, M., Buenaga, M., and Gomez-Hidalgo, J., 2004. Multidocument summarization: an added value to clustering in interactive retrieval. ACM Transactions on Informations Systems 22 (2): 215–41.
Louis, A., and Nenkova, A. 2009. Automatically evaluating content selection in summarization without human models. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP-2009), pp. 306–14.
Madnani, N., Zajic, D., Dorr, B., Ayan, N., and Lin, J. 2007. Multiple alternative sentence compressions for automatic text summarization. In Proceedings of Document Understanding Conference (DUC-2007).
Magnini, B., Zanoli, R., Dagan, I., Eichler, K., Neumann, G., Noh, T., Pado, S., Stern, A., and Levy, O. 2014. The excitement open platform for textual inferences. In Proceedings of the Association for Computational Linguistics (System Demonstrations), pp. 43–8.
Mani, I., and Maybury, M. T. 1999. Advances in automatic text summarization, MIT Press, Cambridge, MA, USA.
Marcu, D. 1997. From discourse structures to text summaries. In Proceedings of the Association of Computer Linguistics (ACL) Workshop on Intelligent Scalable Text Summarization, pp. 82–8.
Martins, A. F., and Smith, N. A. 2009. Summarization with a joint model for sentence extraction and compression. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing, pp. 1–9.
Mason, R., and Charniak, E. 2011. Extractive multi-document summaries should explicitly not contain document-specific content. In Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages, ACL, pp. 49–54.
Metzler, D., and Kanungo, T. 2008. Machine learned sentence selection strategies for query-biased summarization. In SIGIR Learning to Rank Workshop, pp. 40–7.
Nenkova, A. 2006. Understanding the Process of Multi-Document Summarization: Content Selection, Rewriting and Evaluation. PhD dissertation, Columbia University.
Nenkova, A., and McKeown, K., 2011. Automatic summarization. Foundations and Trends in Information Retrieval 5 (2–3): 103233.
Nenkova, A., and McKeown, K., 2012. A survey of text summarization techniques. In Mining Text Data, Springer, USA, pp. 4376.
Nenkova, A., and Passonneau, R. 2004. Evaluating content selection in summarization: the pyramid method. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL-2004): Main Proceedings, ACL, pp. 145–52.
Nenkova, A., Vanderwende, L., and McKeown, K. 2006. A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 573–80.
Nishikawa, H., Hirao, T., Makino, T., and Matsuo, Y. 2012. Text summarization model based on redundancy-constrained knapsack problem. In Proceedings of the International Conference on Computational Linguistics (COLING-2012) (Posters), pp. 893–902.
Orasan, C., Pekar, V., and Hasler, L. 2004. A comparison of summarisation methods based on term specificity estimation. In International Conference on Language Resources and Evaluation (LREC-2004).
Osborne, M., 2002. Using maximum entropy for sentence extraction. In Proceedings of the ACL-02 Workshop on Automatic Summarization, ACL, vol. 4, pp. 18.
Ouyang, Y., Li, W., Li, S., and Lu, Q., 2011. Applying regression models to query-focused multi-document summarization. Information Processing & Management 47 (2): 227–37.
Pado, S., Noh, T., Stern, A., Wang, R., and Zanoli, R., 2015. Design and realization of a modular architecture for textual entailment. Natural Language Engineering 21 (02): 167200.
Pollock, J., and Zamora, A., 1999. Automatic abstracting research at chemical abstracts service. Advances in Automatic Text Summarization 15 (4): 4349.
Radev, D., Hovy, E., and McKeown, K., 2002. Introduction to the special issue on summarization. Computational Linguistics 28 (4): 399408.
Radev, D., Teufel, S., Saggion, H., Lam, W., Blitzer, J., Qi, H., Elebi, A., Liu, D., and Drabek, E. 2003. Evaluation challenges in large scale document summarization. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL-2003), pp. 375–82.
Rankel, P., Conroy, J., Dang, H., and Nenkova, A. 2013. A decade of automatic content evaluation of news summaries: reassessing the state of the art. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL-2013), vol. 2, pp. 131–6.
Riedhammer, K., Gillick, D., Favre, B., and Hakkani-Tur, D. 2008. Packing the meeting summarization knapsack. In Proceedings of the INTERSPEECH, pp. 2434–7.
Robertson, S., 2004. Understanding inverse document frequency: on theoretical arguments for IDF. Journal of Documentation 60 (5): 503–20.
Saggion, H., and Gaizauskas, R. 2004. Multi-document summarization by cluster/profile relevance and redundancy removal. In Proceedings of the Document Understanding Conference (DUC-2004).
Schluter, N., and Sogaard, A. 2015. Unsupervised extractive summarization via coverage maximization with syntactic and semantic concepts. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-2015), vol. 2, pp. 840–4.
Shen, D., Sun, J., Li, H., Yang, Q., and Chen, Z. 2007. Document summarization using conditional random fields. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI-2007), vol. 7, pp. 2862–7.
Silva, G., Ferreira, R., Dueire Lins, R., Cabral, L., Oliveira, H., Simske, S., and Riss, M. 2015. Automatic text document summarization based on machine learning. In Proceedings of the ACM Symposium on Document Engineering, ACM, pp. 191–4.
Suzuki, Y., and Fukumoto, F. 2014. Detection of topic and its extrinsic evaluation through multi-document summarization. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-2014), vol. 2, pp. 241–6.
Takamura, H., and Okumura, M. 2009. Text summarization model based on maximum coverage problem and its variant. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, ACL, pp. 781–9.
Tatar, D., Tamaianu-Morita, E., Mihis, A., and Lupsa, D., 2008. Summarization by logic segmentation and text entailment. Advances in Natural Language Processing and Applications 15: 26.
Toutanova, K., Brockett, C., Gamon, M., Jagarlamudi, J., Suzuki, H., and Vanderwende, L. 2007. The pythy summarization system: microsoft research at duc 2007. In Proceedings of the Document Understanding Conference (DUC-2007), vol. 2007.
Vanderwende, L., Suzuki, H., Brockett, C., and Nenkova, A., 2007. Beyond SumBasic: task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management 43 (6): 1606–18.
Wang, L., Raghavan, H., Castelli, V., Florian, R., and Cardie, C. 2016. A sentence compression based framework to query-focused multi-document summarization. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). ACL. pp. 1384–1394.
Woodsend, K., and Lapata, M. 2012. Multiple aspect summarization using integer linear programming. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 233–43.
Yasunaga, M., Zhang, R., Meelu, K., Pareek, A., Srinivasan, K., and Radev, D. 2017. Graph-based neural multi-document summarization. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL-2017), Vancouver, Canada. pp. 452–62.
Zhou, L., and Hovy, E. 2003. A web-trained extraction summarization system. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, ACL, vol. 1, pp. 205–11.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed