Hostname: page-component-77f85d65b8-45ctf Total loading time: 0 Render date: 2026-03-29T17:23:32.778Z Has data issue: false hasContentIssue false

Improved bidirectional attention flow (BIDAF) model for Arabic machine reading comprehension

Published online by Cambridge University Press:  31 October 2024

Mariam M. Biltawi*
Affiliation:
School of Computing and Informatics, Al Hussein Technical University, Amman, Jordan
Arafat Awajan
Affiliation:
Computer Science Department, Princess Sumaya University for Technology, Amman, Jordan
Sara Tedmori
Affiliation:
Computer Science Department, Princess Sumaya University for Technology, Amman, Jordan
*
Corresponding author: Mariam M. Biltawi; Email: mariam.biltawi@htu.edu.jo
Rights & Permissions [Opens in a new window]

Abstract

Machine reading comprehension (MRC) refers to the process of instructing machines to comprehend and respond to inquiries based on a provided text. There are two primary methodologies for achieving this: extracting answers directly from the text or predicting them. Extracting answers involves anticipating the specific segment of text containing the answer, pinpointed by its starting and ending indices within the paragraph. Despite the increasing interest in MRC, exploration within the framework of the Arabic language faces limitations due to various challenges. A significant impediment arises from the inadequacy of resources available for Arabic textual content, which impedes the development of effective models. Furthermore, the inherent intricacies of Arabic, manifesting in its diverse linguistic forms including classical, modern standard, and colloquial, present distinctive hurdles for tasks involving language comprehension. This paper proposes an enhanced version of the bidirectional attention flow (BIDAF) model for Arabic MRC, constructed upon the Arabic Span-Extraction-based Reading Comprehension Benchmark (ASER). ASER comprises 10,000 sets of questions, answers, and passages, partitioned into a training set constituting 90% of the data and a testing set making up the remaining 10%. By introducing a new input feature based on parts-of-speech (POS) word embeddings and replacing Bidirectional Long Short-Term Memory (bi-LSTM) with bidirectional gated recurrent unit, significant improvements were observed. Eight different POS word embeddings were generated using both Continuous Bag of Words (CBOW) and Skip-gram methods, with varying dimensionalities. Evaluation metrics, including exact match (EM) and F1-measure, were utilized to assess model performance, with emphasis on the latter for its accuracy. The proposed enhanced BIDAF model achieved a remarkable accuracy of 75.22% on the ASER dataset, demonstrating its efficacy in Arabic MRC tasks. Additionally, rigorous statistical evaluation using a two-tailed paired samples t-test further validated the findings, highlighting the significance of the proposed enhancements in advancing Arabic language processing capabilities.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Examples from ASER.

Figure 1

Figure 2. Improved-BIDAF Architecture.

Figure 2

Figure 3. AraBERT BIDAF Architecture.

Figure 3

Figure 4. Example of EM and F1 measures.

Figure 4

Table 1. EM after adding POS word embedding layer to the BIDAF model

Figure 5

Table 2. F1-measure after adding POS word embedding layer to the BIDAF model

Figure 6

Table 3. EM after adding POS word embedding layer and replacing bi-GRU by bi-LSTM

Figure 7

Table 4. F1-measure after adding POS word embedding layer and replacing bi-GRU by bi-LSTM

Figure 8

Table 5. F1-measure for the fine-tuned AraBERT BIDAF model

Figure 9

Table 6. The models selected for comparison with the highest results

Figure 10

Table 7. t-test statistics results

Figure 11

Table 8. EM according to evaluation label

Figure 12

Table 9. F1-measure according to evaluation label

Figure 13

Table 10. EM according to domain

Figure 14

Table 11. F1-measure according to domain

Figure 15

Table 12. EM and F1 measures for one record having different lengths

Figure 16

Table 13. Comparison of improved-BIDAF with other models

Figure 17

Figure 5. Example from ASER: Long passage, long question with long answer.

Figure 18

Figure 6. Example from ASER: Long passage, long question with short answers.

Figure 19

Figure 7. Example from ASER: Long passage, short question with long answer.

Figure 20

Figure 8. Example from ASER: Long passage, short question with short answer.

Figure 21

Figure 9. Example from ASER: Short passage, long question with long answer.

Figure 22

Figure 10. Example from ASER: Short passage, long question with short answer.

Figure 23

Figure 11. Example from ASER: Short passage, short question with short answer.

Figure 24

Figure 12. Example from ASER: Short passage, short question with long answer.