Hostname: page-component-89b8bd64d-9prln Total loading time: 0 Render date: 2026-05-06T23:34:45.580Z Has data issue: false hasContentIssue false

Maghrebi dialects – Arabic bidirectional translation: an improved transformer with transfer learning

Published online by Cambridge University Press:  10 March 2026

Jihad R’baiti*
Affiliation:
International Artificial Intelligence Center of Morocco, University Mohammed VI Polytechnic, Rabat, Morocco
Youssef Hmamouche
Affiliation:
International Artificial Intelligence Center of Morocco, University Mohammed VI Polytechnic, Rabat, Morocco
Amal El Fallah Seghrouchni
Affiliation:
International Artificial Intelligence Center of Morocco, University Mohammed VI Polytechnic, Rabat, Morocco LIP6 - UMR 7606 CNRS, Sorbonne University, Paris, France
*
Corresponding author: Jihad R’baiti; Email jihad.rbaiti@um6p.ma
Rights & Permissions [Opens in a new window]

Abstract

Neural Machine Translation (NMT), a subfield of Natural Language Processing, has seen significant advancements with the emergence of transformer architectures and generative artificial intelligence, demonstrating remarkable performance in various languages. However, translating Arabic dialects remains a notable challenge that becomes very pronounced primarily due to their morphological complexity and divergence from standardised grammatical rules. In this paper, we present a hybrid approach for translating the Maghrebi dialects into/from Modern Standard Arabic (MSA). The approach takes advantage of the strengths of the transformer architecture and the BERT language model for transfer learning of representations. To achieve this, we incorporated BERT embeddings into the encoder and decoder stacks of the transformer architecture. The BERT architecture, which we utilised, was trained in a self-supervised manner on Maghrebi dialects and Arabic corpora. The resulting BLEU/BERTScore/ChrF/METEOR scores for the approach were 14.148/79.414/28.885/28.428 and 8.961/20.994/19.465 (BLEU/ChrF/METEOR) for the translation in both directions using the raw data, demonstrating competitive performance compared to ChatGPT and Gemini Large Language Models (LLMs). Furthermore, we evaluated the approach using an ablation study with fine-tuned NLLB-200 and against three combinations of tokeniser techniques used in conjunction with the transformer architecture: Byte-Pair Encoding (BPE) tokeniser, WordPiece tokeniser, and BERT tokeniser. Both evaluations, including human evaluation, confirm the efficacy of our method.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press
Figure 0

Figure 1. The BERT-TransDial architecture: integration of a fine-tuned BERT-based embedding layer into a Transformer model, replacing the default token embedding and positional encoding components.

Figure 1

Table 1. Quantitative results on the test set with preprocessed data using BLEU,$\text{P}_{\text{BERT}}$, ChrF, and METEOR metrics for both translation directions

Figure 2

Table 2. Quantitative results on the test set with raw data using BLEU,$\text{P}_{\text{BERT}}$, ChrF, and METEOR metrics for both translation directions

Figure 3

Figure 2. Training and validation losses for MSA to Maghrebi dialects (a) and Maghrebi dialects to MSA (b) on raw data.

Figure 4

Table 3. Results of the ablation study (BERT’s Embedding Position) on the test set for preprocessed data

Figure 5

Table 4. Results of the ablation study (BERT’s Embedding Position) on the test set for raw data

Figure 6

Table 5. Results of the ablation study (with and without parameter freezing) on the test set for raw data

Figure 7

Table 6. Qualitative translation outputs from Maghrebi dialects to MSA generated by the proposed BERT-TransDial model, BERT in the encoder, and BERT in the decoder. The table includes source inputs, reference translations, and corresponding outputs for model comparison

Figure 8

Table 7. Qualitative translation outputs from MSA to Maghrebi dialects generated by the proposed BERT-TransDial model, BERT in the encoder, and BERT in the decoder. The table includes source inputs, reference translations, and corresponding outputs for model comparison

Figure 9

Table 8. Average results of pilot rating experiment for BERT-TransDial

Figure 10

Figure 3. Visualisation displaying the pilot rating experiment, where each bar plot displays a rating ranging from 1 to 7 for each participant, assessing the translation quality produced by BERT-TransDial for Maghrebi dialects and MSA translation generations.

Figure 11

Table 9. Human evaluation scores – Raking experiment

Figure 12

Figure 4. Quantitative metrics based on sentence length were analysed for (a) translation from Maghrebi dialects to MSA and (b) translation from MSA to Maghrebi dialects, using raw data.