Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-ttngx Total loading time: 0 Render date: 2024-06-08T23:36:11.044Z Has data issue: false hasContentIssue false

2 - Planning the construction of a corpus

Published online by Cambridge University Press:  03 December 2009

Charles F. Meyer
Affiliation:
University of Massachusetts, Boston
Get access

Summary

Before the texts to be included in a corpus are collected, annotated, and analyzed, it is important to plan the construction of the corpus carefully: what size it will be, what types of texts will be included in it, and what population will be sampled to supply the texts that will comprise the corpus. Ultimately, decisions concerning the composition of a corpus will be determined by the planned uses of the corpus. If, for instance, the corpus is be used primarily for grammatical analysis (e.g. the analysis of relative clauses or the structure of noun phrases), it can consist simply of text excerpts rather than complete texts. On the other hand, if the corpus is intended to permit the study of discourse features, then it will have to contain complete texts.

Deciding how lengthy text samples within a corpus should be is but one of the many methodological considerations that must be addressed before one begins collecting data for inclusion in a corpus. To explore the process of planning a corpus, this chapter will consider the methodological assumptions that guided the compilation of the British National Corpus. Examining the British National Corpus reveals how current corpus planners have overcome the methodological deficiencies of earlier corpora, and raises more general methodological considerations that anyone planning to create a corpus must address.

The British National Corpus

At approximately 100 million words in length, the British National Corpus (BNC) (see table 2.1) is one of the largest corpora ever created.

Type
Chapter
Information
English Corpus Linguistics
An Introduction
, pp. 30 - 54
Publisher: Cambridge University Press
Print publication year: 2002

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×