Hostname: page-component-89b8bd64d-r6c6k Total loading time: 0 Render date: 2026-05-07T15:58:51.580Z Has data issue: false hasContentIssue false

Utilizing lexical data from a Web-derived corpus to expand productive collocation knowledge

Published online by Cambridge University Press:  01 January 2010

Shaoqun Wu*
Affiliation:
Computer Science Department, University of Waikato, New Zealand (email: shaoqun@cs.waikato.ac.nz; ihw@cs.waikato.ac.nz)
Ian H. Witten*
Affiliation:
Computer Science Department, University of Waikato, New Zealand (email: shaoqun@cs.waikato.ac.nz; ihw@cs.waikato.ac.nz)
Margaret Franken*
Affiliation:
School of Education, University of Waikato, New Zealand (email: franken@waikato.ac.nz)

Abstract

Collocations are of great importance for second language learners, and a learner’s knowledge of them plays a key role in producing language fluently (Nation, 2001: 323). In this article we describe and evaluate an innovative system that uses a Web-derived corpus and digital library software to produce a vast concordance and present it in a way that helps students use collocations more effectively in their writing. Instead of live search we use an off-line corpus of short sequences of words, along with their frequencies. They are preprocessed, filtered, and organized into a searchable digital library collection containing 380 million five-word sequences drawn from a vocabulary of 145,000 words. Although the phrases are short, learners can browse more extended contexts because the system automatically locates sample sentences that contain them, either on the Web or in the British National Corpus. Two evaluations were conducted: an expert user tested the system to see if it could generate suitable alternatives for given text fragments, and students used it for a particular exercise. Both suggest that, even within the constraints of a limited study, the system could and did help students improve their writing.

Information

Type
Research Article
Copyright
Copyright © European Association for Computer Assisted Language Learning 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable