Hostname: page-component-5db58dd55d-htx7c Total loading time: 0 Render date: 2026-05-25T11:58:44.342Z Has data issue: false hasContentIssue false

Non-native text analysis: A survey

Published online by Cambridge University Press:  07 September 2015

SEAN MASSUNG
Affiliation:
Department of Computer Science, College of Engineering, University of Illinois at Urbana–Champaign, Urbana, Illinois, USA e-mail: massung1@illinois.edu, czhai@illinois.edu
CHENGXIANG ZHAI
Affiliation:
Department of Computer Science, College of Engineering, University of Illinois at Urbana–Champaign, Urbana, Illinois, USA e-mail: massung1@illinois.edu, czhai@illinois.edu

Abstract

Non-native speakers of English far outnumber native speakers; English is the main language of books, newspapers, airports, air-traffic control, international business, academic conferences, science, technology, diplomacy, sports, international competitions, pop music, and advertising (British Council 2014). Online education in the form of massive online open courses is also primarily in English—even teaching English. This creates enormous amounts of text written by non-native speakers, which in turn generates a need for grammar correction and analysis. Even aside from massive online open courses, the number of English learners in Asia alone is in the tens of millions. In this paper, we provide a survey of the two main areas of existing work on non-native text analysis, prefaced by an overview of common datasets used by researchers, comparing their attributes and potential uses. Then, an introduction to native language identification follows: determining the native language of an author based on text in the second language. This section is subdivided into various techniques and a shared task on this classification problem. Next, we discuss non-native grammatical error correction—finding and modifying text to fix errors or to make it sound more fluent. Again, we discuss different methods before investigating a relevant shared task. Lastly, we end with conclusions and potential future directions. While this survey primarily focuses on detecting and correcting non-native English text, many approaches are general and can be used across any language pairing.

Information

Type
Survey Paper
Copyright
Copyright © Cambridge University Press 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable