Hostname: page-component-77f85d65b8-t6st2 Total loading time: 0 Render date: 2026-04-22T16:07:43.443Z Has data issue: false hasContentIssue false

Arabic community question answering

Published online by Cambridge University Press:  19 December 2018

PRESLAV NAKOV
Affiliation:
Arabic Language Technologies, Qatar Computing Research Institute, HBKU, HBKU Research Complex, PO box 5825, Doha, Qatar e-mails: pnakov@hbku.edu.qa, hmubarak@hbku.edu.qa
LLUÍS MÀRQUEZ
Affiliation:
Amazon, Carrer de Tànger 76, 08018, Barcelona, Spain e-mail: lluismv@amazon.com
ALESSANDRO MOSCHITTI
Affiliation:
Amazon, 1240 Rosecrans Ave #120, Manhattan beach, CA 90266, USA e-mail: amosch@amazon.com
HAMDY MUBARAK
Affiliation:
Arabic Language Technologies, Qatar Computing Research Institute, HBKU, HBKU Research Complex, PO box 5825, Doha, Qatar e-mails: pnakov@hbku.edu.qa, hmubarak@hbku.edu.qa

Abstract

We analyze resources and models for Arabic community Question Answering (cQA). In particular, we focus on CQA-MD, our cQA corpus for Arabic in the domain of medical forums. We describe the corpus and the main challenges it poses due to its mix of informal and formal language, and of different Arabic dialects, as well as due to its medical nature. We further present a shared task on cQA at SemEval, the International Workshop on Semantic Evaluation, based on this corpus. We discuss the features and the machine learning approaches used by the teams who participated in the task, with focus on the models that exploit syntactic information using convolutional tree kernels and neural word embeddings. We further analyze and extend the outcome of the SemEval challenge by training a meta-classifier combining the output of several systems. This allows us to compare different features and different learning algorithms in an indirect way. Finally, we analyze the most frequent errors common to all approaches, categorizing them into prototypical cases, and zooming into the way syntactic information in tree kernel approaches can help solve some of the most difficult cases. We believe that our analysis and the lessons learned from the process of corpus creation as well as from the shared task analysis will be helpful for future research on Arabic cQA.

Information

Type
Article
Copyright
Copyright © Cambridge University Press 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable