Hostname: page-component-89b8bd64d-mmrw7 Total loading time: 0 Render date: 2026-05-08T23:04:05.379Z Has data issue: false hasContentIssue false

Improving neural machine translation by integrating transliteration for low-resource English–Assamese language

Published online by Cambridge University Press:  27 May 2024

Basab Nath
Affiliation:
Department of Computer Science and Engineering, Assam University, Silchar, India
Sunita Sarkar*
Affiliation:
Department of Computer Science and Engineering, Assam University, Silchar, India
Somnath Mukhopadhyay
Affiliation:
Department of Computer Science and Engineering, Assam University, Silchar, India
Arindam Roy
Affiliation:
Department of Computer Science, Assam University, Silchar, India
*
Corresponding author: Sunita Sarkar; Email: sarkarsunita2601@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

In machine translation (MT), one of the challenging tasks is to translate the proper nouns and technical terms from the source language to the target language while preserving the phonetic equivalent of original term. Machine transliteration, an essential part of MT systems, plays a vital role in handling proper nouns and technical terms. In this paper, a hybrid attention-based encoder–decoder machine transliteration system is proposed for the low-resource English to the Assamese language. In this work, the proposed machine transliteration system is integrated with the previously published hybrid attention-based encoder–decoder neural MT model to improve the translation quality of English to the Assamese language. The proposed integrated MT system demonstrated good results across various performance metrics such as BLEU, sacreBLEU, METEOR, chrF, RIBES, and TER for English to Assamese translation. Additionally, human evaluation was also conducted to assess translation quality. The proposed integrated MT system was compared with two existing systems: the Bing translation service model and the Samanantar Indic translation model.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Long short-term memory(LSTM) architecture (Hochreiter and Schmidhuber, 1997).

Figure 1

Figure 2. Gated recurrent unit (GRU) architecture (Cho et al. 2014).

Figure 2

Figure 3. Transformer architecture (Vaswani et al. 2017).

Figure 3

Table 1. Comparison of LSTM, GRU, and transformer

Figure 4

Figure 4. Working of BPE solving OOV issue.

Figure 5

Table 2. Subword tokens

Figure 6

Figure 5. Encoder–decoder architecture for transliteration model.

Figure 7

Figure 6. Proposed integrated translation–transliteration system architecture.

Figure 8

Table 3. Corpus statistics for translation model

Figure 9

Table 4. Corpus statistics for transliteration model

Figure 10

Table 5. Parameters for LSTM and GRU-based translation and transliteration models

Figure 11

Table 6. Parameters for transformer-based translation model

Figure 12

Table 7. Performance of transliteration model

Figure 13

Table 8. Performance of distinct models for translation English to Assamese language

Figure 14

Figure 7. Epoch vs loss graph – training progress and error minimization.

Figure 15

Figure 8. Bar chart for performance comparison I.

Figure 16

Table 9. Sample outputs from standalone transliteration model

Figure 17

Table 10. Sample translation by integrated system without transliteration

Figure 18

Table 11. Sample translation by integrated system with transliteration

Figure 19

Table 12. Performance comparison between integrated system and existing models

Figure 20

Table 13. Human evaluation scores for English to Assamese translations

Figure 21

Table 14. Sample output from MSFT system

Figure 22

Table 15. Sample output from Samanantar system

Figure 23

Figure 9. Bar chart for performance comparison II.

Figure 24

Table 16. Sample output from integrated system