Skip to main content

Lemaza : An Arabic why-question answering system*

  • AQIL M. AZMI (a1) and NOUF A. ALSHENAIFI (a1)

Question answering systems retrieve information from documents in response to queries. Most of the questions are who- and what-type questions that deal with named entities. A less common and more challenging question to deal with is the why -question. In this paper, we introduce Lemaza (Arabic for why), a system for automatically answering why -questions for Arabic texts. The system is composed of four main components that make use of the Rhetorical Structure Theory. To evaluate Lemaza, we prepared a set of why -question–answer pairs whose answer can be found in a corpus that we compiled out of Open Source Arabic Corpora. Lemaza performed best when the stop-words were not removed. The performance measure was 72.7%, 79.2% and 78.7% for recall, precision and c@1, respectively.

Hide All

We would like to thank W. Al-Sanie for sharing his RST implementation; and the language specialist for helping us with why-question–answer pairs. The first author would like to thank Miss Maryam for her assistance in proof-reading the manuscript. Special thanks to all three anonymous reviewers for their constructive comments, which helped in further improvement of the manuscript. This work was supported by a special fund in the Research Center of College of Computer & Information Sciences (CCIS) at King Saud University for which the authors are thankful.

Hide All
Abouenour L., Bouzouba K., and Rosso P., 2013. An evaluated semantic query expansion and structure-based approach for enhancing Arabic question/answering. International Journal on Information and Communication Technologies (IJICT) 3 (3): 3751.
Abouenour L., Bouzoubaa K., and Rosso P. 2008. Improving Q/A using Arabic wordnet. In Proceedings of the 2008 International Arab Conference on Information Technology (ACIT’08), Tunisia.
Akour M., Abufardeh S., Magel K., and Al-Radaideh Q., 2011. QArabPro: a rule based question answering system for reading comprehension tests in Arabic. American Journal of Applied Sciences 8 (6): 652–61.
Al-Kabi M. N., Kazakzeh S. A., Abu Ata B. M., Al-Rababah S. A., and Alsmadi I. M., 2015. A novel root based Arabic stemmer. Journal of King Saud University – Computer and Information Sciences 27 (2): 94103.
Al-Sanie W. 2005. Towards an Infrastructure for Arabic Text Summarization using Rhetorical Structure Theory. Master’s Thesis, King Saud University, Riyadh, Saudi Arabia.
Asher N., and Lascarides A., 2003. Logics of Conversation. Cambridge: Cambridge University Press.
Azmi A. M., and Al-Thanyyan S., 2012. A text summarizer for Arabic. Computer Speech and Language 26 (4): 260–73.
Azmi A. M., and Aljafari E. A. 2017. Universal web accessibility and the challenge to integrate informal Arabic users: a case study. In Universal Access in the Information Society (UAIS), Springer, doi:10.1007/s10209-017-0522-3.
Azmi A. M., and Almajed R. S., 2015. A survey of automatic Arabic diacritization techniques. Natural Language Engineering (NLE) 21 (3): 477–95.
Azmi A. M., and AlShenaifi N. 2014. Handling ‘why’ questions in Arabic. In Proceedings of the 5th International Conference on Arabic Language Processing (CITALA ’14), Oujda, Morocco. Available at
Bateman J., and Delin J. 2006. Rhetorical structure theory. In Brown K. (ed.), Encyclopedia of Language and Linguistics, 2nd ed., pp. 589–97. Amsterdam: Elsevier, BV.
Benajiba Y. 2007. Arabic Question Answering. Master’s Thesis, Universidad Politécnica de Valencia, Spain.
Benajiba Y., Rosso P., and Soriano J. 2007. Adapting the JIRS passage retrieval system to the Arabic language. In Computational Linguistics and Intelligent Text Processing, pp. 530–41. Lecture Notes in Computer Science, vol. 4394. Berlin Heidelberg: Springer.
Bosma W. 2005. Extending answers using discourse structure. In RANLP 2005 Workshop on Crossing Barriers in Text Summarization Research, Borovets, Bulgaria.
Brini W., Ellouze M., Trigui O., Mesfar S., Belguith L. H., and Rosso P., 2009. Factoid and definitional Arabic question answering system. In NOOJ ’09, Tozeur, Tunisia, pp. 243–55.
El-Khair I. A., 2006. Effects of stop words elmination for Arabic information retrieval: a comparative study. International Journal of Computing and Information Sciences 4 (3): 119–33.
Ezzeldin A. M., and Shaheen M. 2012. A survey of Arabic question answering: challenges, tasks, approaches, tools, and future trends. In Proceedings of the 13th International Arab Conference on Information Technology (ACIT’12), pp. 280–7.
Farghaly A., and Shaalan K., 2009. Arabic natural language processing: challenges and solutions. ACM Transaction on Asian Language Information Processing 8 (4): 122.
Ferguson C. A., 1959. Diglossia. Word 15 (2): 325–40.
Gaizauskas R., and Humphreys K., 2000. A combined IR/NLP approach to question answering against large text collections. In Proceedings of RIAO 2000: Content-Based Multimedia Information Access, Paris, France, pp. 1288–1304.
Habash N., and Rambow O. 2005. Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings 43rd Annual Meeting on Association for Computational Linguistics, pp. 573–80.
Habash N., Rambow O., and Roth R., 2009. MADA+TOKAN: a toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In Proceedings 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, pp. 102–9.
Hammo B., Abu-Salem H., Lytinen S., and Evens M. 2002. QARAB: a question answering system to support the Arabic language. In Workshop on Computational Approaches to Semitic Languages (ACL ’02). Association for Computational Linguistics, pp. 55–68.
Hammo B., Abuleil S., Lytinen S., and Evens M., 2004. Experimenting with a question answering system for the Arabic language. Computers and the Humanities 38 (4): 397415.
Higashinaka R., and Isozaki H., 2008. Corpusbased question answering for why questions. In Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008), Hyderabad, India, pp. 419–25.
Iruskieta M., da Cunha I., and Taboada M., 2014. A qualitative comparison method for rhetorical structures: identifying different discourse structures in multilingual corpora. Language Resources & Evaluation 49 (2): 263309.
Kanaan G., Hammouri A., Al-Shalabi R., and Swalha M., 2009. A new question answering system for the Arabic language. American Journal of Applied Sciences 6 (4): 797805.
Keskes I., Zitoune F. B., and Belguith L. H., 2014. Splitting Arabic texts into elementary discourse units. ACM Transaction Asian Language Information Processing 13 (2): 9:19:23.
Khoja S., and Roger G. 1999. Stemming Arabic text. Technical Report, Computing department, Lancaster University.
Larkey L. S., Ballesteros L., and Connell M. E. 2002. Improving stemming for Arabic information retrieval: light stemming and cooccurrence analysis. Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 275–82.
Mann W. C., and Thompson S. A. 1988. Rhetorical structure theory: toward a functional theory of text organization. Text 8 (3), 243–81.
Manning C. D., Raghavan P., and Schütze H., 2008. Introduction to Information Retrieval. Cambridge: Cambridge University Press.
Marcu D. 1997. The Rhetorical Parsing, Summarization, and Generation of Natural Languag Texts. PhD’s Thesis, University of Toronto, Toronto, Canada.
Marcu D., 1998. Improving summarization through rhetorical parsing tuning. In Proceedings of the 6th Workshop on Very Large Corpora, Montreal QC, Canada, pp. 206–15.
Marcu D., 2000. The Theory and Practice of Discourse Parsing and Summarization. Cambridge, MA: MIT Press.
Nakov P., Màrquez L., Magdy W., Moschitti A., Glass J., and Randeree B., 2015. Semeval-2015 task 3: answer selection in community question answering. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval ’15), Denver, Colorado, pp. 269–81.
Nakov P., Màrquez L., Moschitti A., Magdy W., Mubarak H., Freihat A., Glass J., and Randeree B. 2016. SemEval- 2016 task 3: community question answering. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval ’16), San Diego, California.
Nakov P., Hoogeveen D., Màrquez L., Moschitti A., Mubarak H., Baldwin T., and Verspoor K. 2017. SemEval- 2017 task 3: community question answering. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval ’17), Vancouver, Canada.
Oh J. H., Torisawa K., Hashimoto C., Kawada T., De Saeger S., Kazama J., and Wang Y., 2012. Why-question answering using sentiment analysis and word classes. In Proceedings 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, pp. 368–78.
Oh J. H., Torisawa K., Hashimoto C., Sano M., De Saeger S., and Ohtake K., 2013. Why-question answering using intra and intersentential causal relations. In Proceedings 51st Annual Meeting of the Association for Computational Linguistic (ACL 2013), Sofia, Bulgaria, pp. 1733–43.
Peñas A., and Rodrigo A., 2011. A simple measure to assess non-response. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL–HLT ’11), Portland, Oregon, pp. 1415–24.
Peñas A., Hovy E. H., Forner P., Rodrigo Á, Sutcliffe R. F. E., Sporleder C., Forascu C., Benajiba Y., and Osenova P. 2012. Overview of QA4MRE at CLEF 2012: question answering for machine reading evaluation. In CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, Rome, Italy.
Rosso P., Benajiba Y., and Lyhyaoui A. 2006. Towards an Arabic question answering system. In Proceedings of the 4th Conference on Scientific Research Outlook & Technology Development in the Arab World, Syria, pp. 11–14.
Ryding K. C., 2005. A Reference Grammar of Modern Standard Arabic. Cambridge: Cambridge University Press.
Saad M. K., and Ashour W. Nov., 2010. OSAC: open source Arabic corpora. In Proceedings of the 6th International Conference on Electrical and Computer Science (EECS’10), Lefke, North Cyprus, pp. 118–23.
Salem Z., Sadek J., Chakkour F., and Haskkour N. 2010. Automatically finding answers to ‘Why’ and ‘How to’ questions for arabic language. In Setchi R., Jordanov I., Howlett R., and Lakhmi J. (eds.), Knowledge-Based and Intelligent Information and Engineering Systems, vol. 6279, pp. 586–93. Lecture Notes in Computer Science. Berlin Heidelberg: Springer.
Salton G., Wong A., and Yang C. S., 1975. A vector space model for automatic indexing. Communications of ACM 18 (11): 613–20.
Scott D. R., and de Souza C. S. 1990. Getting the message across in RST-based text generation. In Dale R., Mellish C., and Zock M. (eds.), Current Research in Natural Language Generation, pp. 4773. San Diego CA: Academic Press Professional Inc.
Seif A., Mathkour H., and Touir A., 2005. An RST computational tool for the Arabic language. In Proceedings of the 7th International Conference on Information Integrationed Web-based Applications Services (iiWAS’05), Kuala Lumpur, Malaysia, pp. 527–34.
Semmar N., Laib M., and Fluhr C. 2006. Using stemming in morphological analysis to improve Arabic information retrieval, Traitement automatique des Langues naturelles (TALN 2006), Leuven, Belgium, pp. 317–26.
Severyn A., and Moschitti A., 2012. Structural relationships for largescale learning of answer reranking. In Proceedings of the 35th Annual ACM SIGIR Conference (SIGIR 2012), Portland, Oregon, pp. 741–50.
Severyn A., and Moschitti A., 2015. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of the 38th Annual ACM SIGIR Conference (SIGIR 2015), Santiago, Chile, pp. 373–82.
Shaheen M., and Ezzeldin A. M., 2014. Arabic question answering: systems, resources, tools, and future trends. Arabian Journal for Science and Engineering 39 (6): 4541–64.
Silberztein M. 2005. NooJ: a linguistic annotation system for corpus processing. In Proceedings of the Conference on Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), Vancouver BC, Canada.
Taboada M., and Stede M. 2009. Introduction to RST (Rhetorical Structure Theory). Slides available at
Trigui O., Belguith L. H., and Rosso P., 2010. DefArabicQA: Arabic definition question answering system. In Proceedings of the 7th LREC Workshop on Language Resources and Human Language Technologies for Semitic Languages, Valletta, Malta, pp. 40–5.
Tymoshenko K., and Moschitti A., 2015. Assessing the impact of syntactic and semantic structures for answer passages reranking. In Proceedings of The 24th ACM International Conference on Information and Knowledge Management (CIKM 2015), Melbourne, Australia, pp. 1451–60.
Verberne S. 2010. In Search of the Why. PhD Thesis, University of Nijmegen, The Netherlands.
Verberne S., Boves L., Coppen P.-A., and Oostdijk N. 2007. Discourse-based answering of why-questions. Traitement automatique des Langues (TAL), Published by Association pour le traitement automatique des langues (ATALA), Paris France 47 (2): 2141.
Verberne S., Boves L., Oostdijk N., and Coppen P.-A. 2010. What is not in the bag of words for Why-QA? Computational Linguistics 36 (2): 229–45.
Verberne S., van Halteren H., Theijssen D., Raaijmakers S., and Boves L., 2011. Learning to rank for why-question answering. Information Retrieval 14 (2): 107–32.
Webber B., 2004. D-LTAG: extending lexicalized TAG to discourse. Cognitive Science 28 (5): 751–79.
Zhao Y.-M., Xu Z.-M., Guan Y., and Wang X.-L., 2006. An open domain question answering system based on improved system similarity model. In Proceedings of the 5th International Conference on Machine Learning and Cybernetics, Dalian, China, pp. 4521–6.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 3
Total number of PDF views: 49 *
Loading metrics...

Abstract views

Total abstract views: 330 *
Loading metrics...

* Views captured on Cambridge Core between 24th August 2017 - 21st February 2018. This data will be updated every 24 hours.