To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
As noted in chapter 1, the four languages analyzed in the present study differ in nearly every conceivable way: in their geographic locations, language families, cultural settings, social histories, and characteristic speech events and literacy events. The present chapter provides further details about these differences. While it is not intended as a complete ethnography of the language situations, the chapter does provide sufficient background to interpret the sociolinguistic patterns described in following chapters.
Several of the differences among these four languages can be related to a continuum of institutional development, comprising considerations such as the synchronic range of written registers, the social history of literacy, the range of spoken registers, the number of speakers, and the extent to which the language is associated with a bounded community in a particular geographic location. English is near one extreme of this continuum, having a long history of literacy, a very wide range of both spoken and written registers, a very large number of speakers, and a wide distribution as both a first and second language across many countries. Nukulaelae Tuvaluan is near the opposite extreme on this continuum: it has a relatively short history of literacy, very few written registers, few spoken registers, few speakers, and a speech community that is restricted primarily to a single atoll in the Pacific.
Korean and Somali are intermediate along this continuum, and they also show that these characteristics are not perfectly correlated.
The preceding chapters have shown that registers can be described and compared with respect to their linguistic characteristics, from both synchronic and diachronic perspectives. However, registers are identified and defined in situational rather than in linguistic terms. That is, even though register distinctions have strong linguistic correlates, they are defined on the basis of situational characteristics such as the relations among participants, the production circumstances, and the major purposes and goals of communication (see Ferguson 1983; Biber 1994).
Since registers have a primary situational basis, they are not equally well defined in their linguistic characteristics: there are considerable linguistic differences among texts within some registers, while other registers have quite focused norms with relatively small differences among texts. As discussed in section 5.7.3, a wide range of internal variation does not invalidate a register category; rather, this internal range of variation is simply an important descriptive fact about the register. For this reason, a complete quantitative linguistic description of a register must include measures of the variability among texts within the register (e.g., standard deviation or range) as well as measures of the central tendency (the mean).
Internal variation within English registers
Table 9.1 presents descriptive statistics for thirteen English registers, including both the mean and the standard deviation. The analyses in chapter 6 show that the mean dimension scores are important and statistically significant predictors of the differences among registers.
Requirements of a comprehensive analytical framework for studies of register variation
A comprehensive analysis of register variation in a language must be based on an adequate sampling of registers, texts, and linguistic features:
registers: the full range of the registers in the language should be included, representing the range of situational variation;
texts: a representative sampling of texts from each register should be included;
linguistic features: a wide range of linguistic features should be analyzed in each text, representing multiple underlying parameters of variation.
Although there have been many important register studies, few previous analyses of register variation are comprehensive in this sense.
The restrictions of previous research, and the need for more comprehensive analyses, have been noted by several scholars. Characterizing the state of research in 1974, Hymes notes that ‘the fact that present taxonomic dimensions consist so largely of dichotomies – restricted vs. elaborated codes,… standard vs. non-standard speech, formal vs. informal scenes, literacy vs. illiteracy – shows how preliminary is the stage at which we work’ (1974: 41). This state of affairs was still largely true in the 1980s. Thus, Schafer (1981: 12) finds it ‘frustrating’ that although previous studies ‘are based on texts produced in particular circumstances by only a few subjects… speaking and writing in only one situation, this doesn't prevent researchers from offering their results as accurate generalizations of universal difference between speaking and writing’.
Overview of methodology in the Multi-Dimensional approach
The four languages compared in the present book have each been analyzed using the MD approach, following the same methodological steps:
Texts were collected, transcribed (in the case of spoken texts), and input into computer. The situational characteristics of each spoken and written register were noted during data collection.
Grammatical research was conducted to identify the range of linguistic features to be included in the analysis, together with functional associations of individual features.
Computer programs were developed for automated grammatical analysis, to ‘tag’ all relevant linguistic features in texts.
The entire corpus of texts was tagged automatically by computer, and all texts were post-edited interactively to insure that the linguistic features were accurately identified.
Additional computer programs were developed and run to compute frequency counts of each linguistic feature in each text of the corpus.
The co-occurrence patterns among linguistic features were analyzed, using a factor analysis of the frequency counts.
The ‘factors’ from the factor analysis were interpreted functionally as underlying dimensions of variation.
Dimension scores for each text with respect to each dimension were computed; the mean dimension scores for each register were then compared to analyze the salient linguistic similarities and differences among spoken and written registers.
In the present chapter, I discuss each of these methodological steps for each of the four languages. I first briefly describe the text corpora, linguistic features, and computational/statistical techniques used in the MD analysis of each language.
The preceding chapters have shown that the patterns of linguistic variation among texts are strikingly similar across languages. These cross-linguistic similarities hold at several levels:
for the co-occurrence patterns among linguistic features, and the ways in which features function together as underlying dimensions;
for the synchronic relations among registers within a language, including the multidimensional characteristics of particular registers, and the patterns of variation among registers;
for the diachronic patterns of change within and among registers, in response to changing communicative needs and circumstances following the introduction of written registers in a society;
for the text types that are well defined linguistically.
Given the strength of these patterns, it is reasonable to hold out the possibility of cross-linguistic universals governing the patterns of discourse variation across registers and text types. However, considerable additional research is required to determine which of the cross-linguistic generalizations identified in the present study are valid as universals, and to formulate more precise statements of those relationships.
One of the most important issues requiring further investigation is the role of literacy in defining the synchronic and diachronic patterns of register variation. From a synchronic perspective, differences in the repertoire of written registers correspond to different multidimensional profiles across languages: for example, the register and text type analyses of English, Korean, and Somali show that written registers – informational and expository registers as well as fiction and letters – vary systematically along several underlying dimensions.
One of the key issues in cross-linguistic register analyses is choosing the units of analysis and determining to what extent they are actually comparable. This is relevant both to the registers chosen for analysis and to the linguistic characteristics considered. In the present chapter, these issues are illustrated through a detailed comparison of English and Somali registers.
Throughout the analyses in this book, a quantitative, distributional approach is adopted to describe the linguistic characteristics of registers. These distributional patterns are interpreted functionally, based on previous research which shows that the preferred linguistic forms of a register are those that are best suited functionally to the situational demands of the variety (see, e.g., Chafe 1982; Finegan 1982; Ferguson 1983; Janda 1985; Biber 1988). For example, the situational demands of conversation are very different from those of an academic paper, and therefore the characteristic linguistic forms of these registers will be markedly different. Thus, consider the relative frequencies of first-person pronouns, second-person pronouns, and nouns in these two registers, presented in table 4.1.
Both first- and second-person pronouns are tied directly to the communicative situation in their meaning: first-person pronouns are used to refer to an actively ‘involved’ addressor, while second-person pronouns require a specific addressee in order to be felicitous. In contrast, nouns generally refer to third-person entities and can refer to either concrete objects or abstract concepts; nouns are the primary bearers of referential meaning in a text (apart from deictic, pronominal references to the immediate physical context).
Variability is inherent in human language: a single speaker will use different linguistic forms on different occasions, and different speakers of a language will say the same thing in different ways. Most of this variation is highly systematic: speakers of a language make choices in pronunciation, morphology, word choice, and grammar depending on a number of non-linguistic factors. These factors include the speaker's purpose in communication, the relationship between speaker and hearer, the production circumstances, and various demographic affiliations that a speaker can have. Analysis of the systematic patterns of variation associated with these factors has led to the recognition of two main kinds of language varieties: registers, referring to situationally defined varieties, and dialects, referring to varieties associated with different groups of speakers.
In the present book, register is used as a cover term for any variety associated with particular situational contexts or purposes. Although register distinctions are defined in non-linguistic terms, there are usually important linguistic differences among registers as well. In many cases, registers are named varieties within a culture, such as novels, letters, editorials, sermons, and debates. Registers can be defined at any level of generality: for example, academic prose is a very general register, while methodology sections in psychology articles are a much more highly specified register.
Chapters 3–6 present the extensive preliminary analyses required for cross-linguistic MD comparisons of register variation. Chapter 3 describes the four language situations considered in the present study, including the situational characteristics of each register and a brief social history of literacy in each language. The main purpose of chapter 4 is to demonstrate the difficulty of cross-linguistic register comparisons based on individual linguistic features, and to thereby establish the need for an alternative approach. The multidimensional approach is then offered as particularly well suited to the analytical requirements of cross-linguistic register comparisons. Chapter 5 provides a methodological description of the MD approach and presents the technical details of the MD analysis for each language. Finally, chapter 6 gives a fairly detailed description of each dimension in each of the languages, including analysis of: the co-occurring linguistic features on the dimension, the distribution of registers along the dimension, the functions of dimension features in particular text samples, and the overall functional basis of the dimension.
Given this background, it is now possible to proceed to the cross-linguistic analyses. In general terms, the analyses presented in chapter 6 show that the linguistic relations among spoken and written registers in all four languages are quite complex, and that a multidimensional analysis is required because no single dimension by itself adequately captures the similarities and differences among registers.
Before undertaking a cross-linguistic comparison of register variation, it is necessary to have a complete description of the distributional patterns within each language. The previous chapter gave the methodological and statistical details of the Multi-Dimensional analyses for English, Nukulaelae Tuvaluan, Korean, and Somali. The present chapter turns to the functional interpretation of the dimensions in each language and analysis of the relations among registers with respect to each of those dimensions.
English
Six basic dimensions of variation in English are identified and interpreted in Biber (1988). The present section will consider each of these dimensions in turn.
English dimension 1: ‘Involved versus Informational Production’
The interpretation of dimension 1 was briefly summarized in section 5.5. As table 6.1 shows, the negative features on this dimension are all associated with an informational focus and a careful integration of information in texts. Nouns are the primary device used to convey referential meaning, and a higher frequency of nouns is one reflection of a greater informational density. Prepositional phrases also integrate information into a text, functioning as postnominal modifiers to explicitly specify and elaborate referential identity (in addition to their function as adverbial modifiers). Word length and type–token ratio also mark a high density of information, by reflecting precise word choice and an exact presentation of informational content. Longer words tend to be rarer and more specific in meaning than shorter words (Zipf 1949).
A register perspective on historical change and the linguistic correlates of literacy
The studies discussed in chapters 6 and 7 have all been synchronic, identifying the dimensions of variation in four languages and describing the systematic similarities and differences among present-day registers in terms of those dimensions. This same analytical framework can be used to study historical change among registers, using the previously identified dimensions of variations to address two major research issues:
the linguistic development of an individual register – comparing the linguistic characteristics of a register along multiple dimensions across time periods;
historical change in the relative relations among registers – the diachronic patterns of register variation. For example, do spoken and written registers develop in parallel or divergent ways? Over time, do they become more similar or more distinct?
Investigations of this kind are especially important for literacy issues, considering the ways in which a language changes following the introduction of literacy and the early development of written registers. Specific research questions of this type include:
What are the linguistic characteristics of written registers when they are first introduced into a language? What linguistic models were used for written registers at that historical stage? Did written registers evolve from pre-existing spoken registers, from similar written registers in other languages, or from other sources?
How stable are written registers in the early stages of evolution? In what ways do they develop, and what are the motivating factors influencing change?
Do written varieties evolve independently or in shared ways? Do they evolve at a constant or variable rate? Do they evolve over time to become more different from spoken varieties and from one another?