Skip to main content

A closed-domain question answering framework using reliable resources to assist students


This paper describes a question answering framework that can answer student questions given in natural language. We suggest a methodology that makes use of reliable resources only, provides the answer in the form of a multi-document summary for both factoid and open-ended questions, and produces an answer also from foreign resources by translating into the native language. The resources are compiled using a question database in the selected domains based on reliability and coverage metrics. A question is parsed using a dependency parser, important parts are extracted by rule-based and statistical methods, the question is converted into a representation, and a query is built. Documents relevant to the query are retrieved from the set of resources. The documents are summarized and the answers to the question together with other relevant information about the topic of the question are shown to the user. A summary answer from the foreign resources is also built by the translation of the input question and the retrieved documents. The proposed approach was applied to the Turkish language and it was tested with several experiments and a pilot study. The experiments have shown that the summaries returned include the answer for about 50–60 percent of the questions. The data bank built for factoid and open-ended questions in the two domains covered is made publicly available.

Hide All

*This work was supported by The Scientific and Technological Research Council of Turkey (TÜBİTAK) under the grant number 113E036. We would like to thank Çağıl Uluşahin Sönmez for her contribution in the Google Translate interface of the research.

Hide All
Abacha, A.B., and Zweigenbaum, P., 2015. MEANS: a medical question-answering system combining NLP techniques and semantic web technologies. Information Processing and Management 51 : 570–94.
Alguliev, R.M., Aliguliyev, R.M., and Isazade, N.R., 2013. Multiple documents summarization based on evolutionary optimization algorithm. Expert Systems with Applications 40 : 1675–89.
Barzilay, R., and Elhadad, M. 1997. Using lexical chains for text summarization. In Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization, pp. 10–7.
Bernhard, D., and Gurevych, I. 2009. Combining lexical semantic resources with question & answer archives for translation-based answer finding. In Proceedings of ACL-IJCNLP, pp. 728–36.
Bollegala, D., Okazaki, N., and Ishizuka, M., 2012. A preference learning approach to sentence ordering for multi-document summarization. Information Sciences 217 : 7895.
Bordes, A., Chopra, S., and Weston, J. 2014. Question answering with subgraph embeddings. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 615–20.
Bordes, A., Weston, J., and Usunier, N. 2014. Open question answering with weakly supervised embedding models. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), Springer-Verlag, pp. 165–80.
Bouziane, A., Bouchina, D., Doumi, N., and Malki, M., 2015. Question answering systems: survey and trends. Procedia Computer Science 73 : 366–75.
Brill, E., Dumais, S., and Banko, M. 2002. An analysis of the AskMSR Question-Answering system. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 257–64.
Chali, Y., Hasan, S.A., and Mojahid, M., 2015. A reinforcement learning formulation to the complex question answering problem. Information Processing and Management 51 : 252–72.
Chen, Y., Zhou, M., and Wang, S. 2006. Reranking answers for definitional QA using language modeling. In Proceedings of ACL/COLING, pp. 1081–8.
Chu-Carroll, J., Fan, J., Boguraev, B.K., Carmel, D., Sheinwald, D., and Welty, C., 2012a. Finding needles in the haystack: search and candidate generation. IBM Journal of Research and Development 56 (3): 300–11.
Chu-Carroll, J., Fan, J., Schlaefer, N., and Zadrozny, W. 2012b. Textual resource acquisition and engineering. IBM Journal of Research and Development 56 (3/4): 4.14.11.
Codina-Filba, J., Bouayad-Agha, N., Burga, A., Casamayor, G., Mille, S., Müller, A., Saggion, H., and Wanner, L., 2017. Using genre-specific features for patent summaries. Information Processing and Management 53 (1): 151–74.
Derici, C., Çelik, K., Kutbay, E., Aydın, Y., Güngör, T., Özgür, A., and Kartal, G. 2015. Question analysis for a closed domain question answering system. In Gelbukh, A. (ed.), Proceedings of Computational Linguistics and Intelligent Text Processing (CicLing), pp. 468–82. Springer, Cairo.
Derici, C., Çelik, K., Özgür, A., Güngör, T., Kutbay, E., Aydın, Y., and Kartal, G. 2014. Türkçe soru cevaplama sistemlerinde kural tabanlıodak çıkarımı(Rule-based focus extraction in Turkish question answering systems). In Proceedings of Signal Processing and Communications Applications Conference (SIU), pp. 1604–7.
Diefenbach, D., Lopez, V., Singh, K., and Maret, P. 2017. Core techniques of question answering systems over knowledge bases: a survey. Knowledge and Information Systems, pp. 141, Berlin, Germany: Springer.
Dong, L., Wei, F., Zhou, M., and Xu, K. 2015. Question answering over Freebase with multi-column convolutional neural networks. In Proceedings of International Joint Conference on Natural Language Processing (IJNLP), pp. 260–9.
Er, N.P., and Çiçekli, I. 2013. A factoid question answering system using answer pattern matching. In Proceedings of International Joint Conference on Natural Language Processing (IJNLP), pp. 854–8.
Eryiğit, G., Nivre, J., and Oflazer, K., 2008. Dependency parsing of Turkish. Computational Linguistics 34 (3): 357–89.
Fan, J., Kalyanpur, A., Gondek, D.C., and Ferrucci, D.A., 2012. Automatic knowledge extraction from documents. IBM Journal of Research and Development 56 (3/4): 5:15:10.
Feng, M., Xiang, B., Glass, M.R., Wang, L., and Zhou, B. 2015. Applying deep learning to answer selection: a study and an open task. In Proceedings of Automatic Speech Recognition and Understanding (ASRU), pp. 813–20.
Ferreira, R., Cabral, S., Freitas, F., Lins, R.D., Silva, F., Simske, S.J., and Favaro, L., 2014. A multi-document summarization system based on statistics and linguistic treatment. Expert Systems with Applications 41 : 5780–7.
Ferreira, R., Cabral, L. de S., Lins, R.F., Silva, G.P., Freitas, F., Cavalcanti, G.D.C., Lima, R., Simske, S.J., and Favaro, L., 2013. Assessing sentence scoring techniques for extractive text summarization. Expert Systems with Applications 40 : 5755–64.
Ferrucci, D.A., 2012. Introduction to “this is Watson”. IBM Journal of Research and Development 56 (3): 235–49.
Figueroa, F., and Neumann, G., 2016. Context-aware semantic classification of search queries for browsing community question–answering archives. Knowledge-Based Systems 96 : 113.
Ganesan, K., Zhai, C.X., and Han, J. 2010. Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In Proceedings of International Conference on Computational Linguistics (COLING), pp. 340–8.
Glavas, G., and Snajder, J., 2014. Event graphs for information retrieval and multi-document summarization. Expert Systems with Applications 41 : 6904–16.
Gondek, D.C., Lally, A., Kalyanpur, A., Murdock, J.W., Duboue, P.A., Zhang, L., Pan, Y., Qiu, Z.M., and Welty, C., 2012. A framework for merging and ranking of answers in DeepQA. IBM Journal of Research and Development 56 (3): 399410.
Habibi, M., Mahdabi, P., and Popescu-Belis, A., 2016. Question answering in conversations: query refinement using contextual and semantic information. Data & Knowledge Engineering 106 : 3851.
He, R., Tang, J., Gong, P., Hu, Q., and Wang, B., 2016. Multi-document summarization via group sparse learning. Information Sciences 349–50 : 1224.
Höffner, K., Walter, S., Marx, E., Usbeck, R., Lehmann, J., and Ngomo, A.-C.N., 2016. Survey on challenges of question answering in the semantic web. Semantic Web 8 (6): 126.
Iyyer, M., Boyd-Graber, J., Claudino, L., Socher, R., and Daume, H. 2014. A neural network for factoid question answering over paragraphs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 633–44.
İlhan, S., Duru, N., Karagöz, Ş., and Sağır, M. 2008. Metin madenciliği ile soru cevaplama sistemi (A question answering system based on text mining). In Proceedings of Elektrik-Elektronik ve Biyomedikal Mühendisliği Konferansı(ELECO) (Conference on Electrical-Electronics and Biomedical Engineering), pp. 356–9.
Katz, B. 1997. Annotating the world wide web using natural language. In Proceedings of the Conference on Computer Assisted Information Searching on the Internet (RIAO), pp. 136–55.
Khodadi, I., and Abadeh, M.S., 2016. Genetic programming-based feature learning for question answering. Information Processing and Management 52 : 340–57.
Kolomiyets, O. and Moens, M.F., 2011. A survey on question answering technology from an information retrieval perspective. Information Sciences 181 : 5412–34.
Lally, A., Prager, J.M., McCord, M.C., Boguraev, B.K., Patwardhan, S., Fan, J., Fodor, P., and Chu-Caroll, J. 2012. Question analysis: how Watson reads a clue. IBM Journal of Research and Development 56 (3/4), 2:12:14.
Landis, J.R., and Koch, G.G., 1977. The measurement of observer agreement for categorical data. Biometrics 33 (1): 159–74.
Li, J., Sun, L., Kit, C., and Webster, J. 2007. A query-focused multi-document summarizer based on lexical chains. In Proceedings of the Document Understanding Conference (DUC).
Lin, C.-Y. 2004. ROUGE: a package for automatic evaluation of summaries. In Proceedings of Workshop on Text Summarization Branches Out (WAS), pp. 74–81.
Lloret, E. and Palomar, M., 2012. Text summarisation in progress: a literature review. Artificial Intelligence Review 37 (1): 141.
Mani, I., 2001. Automatic Summarization. Amsterdam: John Benjamins Pub.
Marujo, L., Ling, W., Ribeiro, R., Gershman, A., Carbonell, J., de Matos, D.M., and Neto, J.P., 2016. Exploring events and distributed representations of text in multi-document summarization. Knowledge-Based Systems 94 : 3342.
McCord, M.C., Murdock, J.W., and Boguraev, B.K. 2012. Deep parsing in Watson. IBM Journal of Research and Development 56 (3/4), 3–1:3–15.
Medelyan, O. 2007. Computing lexical chains with graph clustering. In Proceedings of the Annual Meeting of the ACL: Student Research Workshop, pp. 85–90.
Metzler, D., and Croft, W.B., 2004. Combining the language model and inference network approaches to retrieval. Information Processing and Management 40 (5): 735–50.
Mishra, A., and Jain, S.K., 2016. A survey on question answering systems with classification. Journal of King Saud University 28 : 345–61.
Molino, P., Lops, P., Semeraro, G., Gemmis, M., and Basile, P., 2015. Playing with knowledge: a virtual player for “who wants to be a millionaire?” that leverages question answering techniques. Artificial Intelligence 222 : 157–81.
Momtazi, S. and Klakow, D., 2015. Bridging the vocabulary gap between questions and answer sentences. Information Processing and Management 51 : 595615.
Morita, H., Sakai, T., and Okumura, M. 2011. Query snowball: a co-occurrence-based approach to multi-document summarization for question answering. In Proceedings of the Annual Meeting of the ACL, pp. 223–9.
Murdock, J.W., Fan, J., Lally, A., Shima, H., and Boguraev, B.K., 2012a. Textual evidence gathering and analysis. IBM Journal of Research and Development 56 (3): 325–38.
Murdock, J.W., Kalyanpur, A., Welty, C., Fan, J., Ferrucci, D.A., Gondek, D.C., Zhang, L., and Kanayama, H., 2012b. Typing candidate answers using type coercion. IBM Journal of Research and Development 56 (3): 312–24.
Nagao, M., Tsujii, J., and Nakamura, J., 1988. The Japanese government project for machine translation. Computational Linguistics 11 (2–3): 91110.
Nenkova, A., and McKeown, K. 2012. A survey of text summarization techniques. In Aggarwal, C. C. and Zhai, C. X. (eds.), Mining Text Data. Boston, MA: Springer, pp. 4376.
Oliveira, H., Ferreira, R., Lima, R., Lins, R.F., Freitas, F., Riss, M., and Simske, S.J., 2016. Assessing shallow sentence scoring techniques and combinations for single and multi-document summarization. Expert Systems with Applications 65 : 6886.
Olvera-Lobo, M.D., and Gutierrez-Artacho, J. 2015. Question answering track evaluation in TREC, CLEF and NTCIR. In Rocha, A., Correia, A., Costanzo, S., and Reis, L. (eds.), New Contributions in Information Systems and Technologies - Advances in Intelligent Systems and Computing, p. 353, Berlin, Germany: Springer.
Pechsiri, C. and Piriyakul, R., 2016. Developing a why-how question answering system on community web boards with a causality graph including procedural knowledge. Information Processing in Agriculture 3 : 3653.
Qiang, J.-P., Chen, P., Ding, W., Xie, F., and Wu, X., 2016. Multi-document summarization using closed patterns. Knowledge-Based Systems 99 : 2838.
Sak, H., Güngör, T., and Saraçlar, M., 2011. Resources for Turkish morphological processing. Language Resources and Evaluation 45 : 249–61.
Shekarpour, S., Marx, E., Ngomo, A.-C.N., and Auer, S., 2015. SINA: semantic interpretation of user queries for question answering on interlinked data. Web Semantics: Science, Services and Agents on the World Wide Web 30 : 3951.
Silber, H.G., and McCoy, K.F., 2002. Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics 28 (4): 487–96.
Utomo, F.S., Suryana, N., and Azmi, M.S., 2017. Question answering system: a review on question analysis, document processing, and answer extraction techniques. Journal of Theoretical and Applied Information Technology 95 (14): 3158–74.
Wan, X. 2009. Topic analysis for topic-focused multi-document summarization. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM, pp. 1609–12.
Wang, D., and Nyberg, E. 2015. A long short-term memory model for answer sentence selection in question answering. In Proceedings of ACL-IJCNLP, pp. 707–12.
Wang, D., Zhu, S., Li, T., and Gong, Y. 2012. Comparative document summarization via discriminative sentence selection. ACM Transactions on Knowledge Discovery from Data 6 (3), 12:112:18.
Wu, Y., Hori, C., Kashioka, H., and Kawai, H., 2015. Leveraging social Q&A collections for improving complex question answering. Computer Speech and Language 29 : 119.
Xiong, S., and Ji, D., 2016. Query-focused multi-document summarization using hypergraph-based ranking. Information Processing and Management 52 : 670–81.
Xiong, C., Merity, S., and Socher, R. 2016. Dynamic memory networks for visual and textual question answering. In Proceedings of the International Conference on Machine Learning, pp. 2397–406.
Yang, L., Ai, Q., Spina, D., Chen, R-C., Pang, L., Croft, W.B., Guo, J., and Scholer, F. 2016. Beyond factoid QA: effective methods for non-factoid answer sentence retrieval. In Proceedings of the European Conference on Information Retrieval (ECIR, pp. 115–28.
Yang, M.-C., Lee, D.-G., Park, S.-Y., and Rim, H.-C., 2015a. Knowledge-based question answering using the semantic embedding space. Expert Systems with Applications 42 : 9086–104.
Yang, Y., Yih, W.-t., and Meek, C. 2015b. WIKIQA: A challenge dataset for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2013–18.
Yih, W-T., He, X., and Meek, C. 2014. Semantic parsing for single-relation question answering. In Proceedings of the Annual Meeting of ACL, pp. 643–8.
Yu, L., Hermann, K.M., Blunsom, P., and Pulman, S. 2014. Deep learning for answer sentence selection, In Proceedings of NIPS Deep Learning Workshop.
Zheng, Z. 2002. AnswerBus question answering system. In Proceedings of the International Conference on Human Language Technology Research (HLT, pp. 399–404.
Zhong, S.-h., Liu, Y., Li, B., and Long, J., 2015. Query-oriented unsupervised multi-document summarization via deep learning model. Expert Systems with Applications 42 : 8146–55.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed