Hostname: page-component-89b8bd64d-shngb Total loading time: 0 Render date: 2026-05-07T21:07:50.837Z Has data issue: false hasContentIssue false

Arabic spelling error detection and correction

Published online by Cambridge University Press:  18 March 2015

MOHAMMED ATTIA
Affiliation:
School of Computing, Dublin City University, Ireland, e-mail: mattia@computing.dcu.ie, josef@computing.dcu.ie Faculty of Engineering and IT, The British University in Dubai, UAE e-mail: khaled.shaalan@buid.ac.ae
PAVEL PECINA
Affiliation:
Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic e-mail: pecina@ufal.mff.cuni.cz
YOUNES SAMIH
Affiliation:
Department of Linguistics and Information Science, Heinrich-Heine-Universität Düsseldorf, Germany e-mail: samih@phil.uni-duesseldorf.de
KHALED SHAALAN
Affiliation:
Faculty of Engineering and IT, The British University in Dubai, UAE e-mail: khaled.shaalan@buid.ac.ae
JOSEF VAN GENABITH
Affiliation:
School of Computing, Dublin City University, Ireland, e-mail: mattia@computing.dcu.ie, josef@computing.dcu.ie

Abstract

A spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable