How do you find a linguistic variable? This chapter will discuss the key construct in the variationist paradigm – the linguistic variable. It will detail its definition, describe what a linguistic variable is, how to identify it, and how to circumscribe it.
Defining the Linguistic Variable
The definition of a linguistic variable is the first and the last step in the analysis of variation. It begins with the simple act of noticing a variation – that there are two alternative ways of saying the same thing.
The most fundamental construct in variation analysis is the linguistic variable. The original definition of the linguistic variable is a little more complicated. In 1966, Labov (Reference Labov1966:49) says the linguistic variable must be ‘high in frequency, have a certain immunity from conscious suppression … [be] integral units of larger structures, and … be easily quantified on a linear scale’. Furthermore, the linguistic variable was required to be ‘highly stratified’ and to have ‘an asymmetric distribution over a wide range of age levels or other ordered strata of the society’ (Labov, Reference Labov1972d:8). In this chapter, I will unpack what all this means. At the outset, however, the most straightforward and simple definition of the linguistic variable is simply ‘two or more ways of saying the same thing’ (Labov, Reference Labov1972d:8; Sankoff, G., Reference Sankoff1980b:55).
At the level of lexis, the linguistic variable is relatively straightforward; there are two words, for example pail versus bucket. However, the alternates may differ in pronunciation of a single vowel sound, for example vayz versus vahz, [e] versus [ɔ], ‘vase’, or may simply differ by an extra phonological feature or two, such as the well-studied variable (t,d) and variable (-ing) of English. Variable (t,d) involves word-final consonant clusters. Sometimes the cluster is realised; sometimes it is not (1a). Variable (-ing) involves word-final -ing. Sometimes it is realised as [ŋ]; sometimes as [n] (1b). Variable (t) involves the pronunciation of word-internal intervocalic [t]. Sometimes it is realised as [t], sometimes as [r] (1c). In these cases, there is little question of semantic equivalence, that is, ‘meaning the same thing’, since the variant forms alternate within the same word.
a. I misse[t] the bus yesterday. vs I miss[Ø] the bus yesterday.
b. shoppi[ŋ] vs shoppi[n]
c. bu[t]r vs bu[r]r
In morphosyntax, however, alternation of forms may involve variable inflections, alternate lexical items or elementary syntactic differences that arise in the course of sentence derivation (2). Is the original definition of the linguistic variable as ‘two ways of saying the same thing’ viable?
a. do it quickØ vs do it quickly
b. the woman who … vs the woman that …
c. he isn’t vs he’s not
The question becomes whether two different ways of saying the same thing ever happens in syntax and semantics. While pure semantic equivalence is likely to be contested, syntacticians have discussed the fact that ‘doublets are in fact, reasonably common in the world’s languages’ and the explanation is sociolinguistic: ‘doublets arise through dialect and language contact and compete in usage until one or the other form wins out’ (Kroch, Reference Kroch, Beals, Denton, Knippen, Melnar, Suzuki and Zeinfeld1994:6–7). The question is how are such alternations recognised, interpreted, and explained effectively? Crucial to these questions is the often difficult task of defining the context of meaning, which requires having some principled way of dealing with the problematic relationship between linguistic form and linguistic function. Indeed, one of the key preoccupations of variation analysis has been that different forms can have the same meaning. But how can this be? Shouldn’t each form have a different meaning?
From the very beginning, linguistics and sociolinguistics have been opposed in their treatment of ‘meaning’:
two different lexical items or structures can almost always have some usages or contexts in which they have different meanings, or functions, and it is even claimed by some that this difference, though it may be subtle, is always pertinent whenever one of the forms is used.
The first recognition of the form–function problem is found in Weiner and Labov (Reference Weiner and Labov1983). They demonstrate that generalised active sentences, such as (3a), and agentless passives, such as (3b), are opposing choices of the same syntactic variable.
a. They broke into the liquor closet.
b. The liquor closet was broken into.
To include these two variants in one syntactic variable, the two forms must have the same referential meaning. Such a supposition calls into question the nature of equivalence.
This is where there has been heated debate in the field, which has, in turn, been responsible for an evolution in thinking about variables. Much of this development occurred when analysts started studying linguistic variables ‘above and beyond phonology’ (Sankoff, G., Reference Sankoff, Bailey and Shuy1973). This development led to analysts becoming much more rigorous and explicit in how they explained their method for studying the data.
In order to study the linguistic variable, a two-step methodological process is required; first, identification of two or more variant expressions of a common underlying form; second, an accountable method for deciding all the possible variants and the contexts in which they occur; third, the data must be accountable too, representing a diversity of contexts, for example representation by social characteristics and linguistic environments.
A key principle underlying this method (see also Chapter 1) is ‘the principle of accountability’ (Labov, Reference Labov, Lehmann and Malkiel1982:30). This principle is fundamental to variation analysis; it dictates that all occurrences of the target variable must be considered, not simply one variant or another: ‘analysts should not select from a text those variants of a variable that tend to confirm their argument, and ignore others that do not’ (Milroy & Gordon, Reference Milroy and Gordon2003:137). You must include all non-occurrences as well (Labov, Reference Labov, Lehmann and Malkiel1982:30). Then, the occurrence of variants can be calculated out of the total number of contexts in which it could have occurred but did not (distributional analysis; see Chapter 8). Similarly, statistical methods can be used to evaluate and compare different contextual effects as well as to detect and measure tendencies over time. Statistical techniques also permit correlations to be made among social and linguistic influences. Still, a critical assumption underlies these procedures – the idea that the variants differ relatively little in terms of their function.
When the linguistic variable lies beyond phonology, the variants may not be similar at all. They may have entirely different lexical sources as well as different histories in the language. For example, the alternations between the will future and the going to future, variable (fut), (4), have distinct verbs as their source, Old English willan and the motion verb to go. Alternation between was and were, variable (was) (5), derives from a verb with the same etymology, but whose variants were influenced differently by a sound change that operated before the final ‘e’ of were became silent.
(4) I think she’s gonna be cheeky … I think she’ll be cheeky. (YRK, kyoung, 31)
(5) There was always kids that were going missing. (YRK, kdilks, 26)
Such dissimilarities make it impossible to derive the variants from any meaning-preserving grammatical rule. Even the apparently mundane variation between come and came, variable (come), (6), can be traced back to upheaval in the strong verbs of English in which varying vowel sounds within the verb stem produced different pronunciations of ‘come’.
(6) And Laura come in at five pound odd … I came in on the Friday … (YRK, blowe, 62)
In the case of variables functioning at the level of discourse, pragmatics, or style, the notion of semantic equivalence becomes even more problematic. For example, the variable constructions in (7) may be considered semantically distinct or straddle more than one level of grammar: variable (subject drop), (7a), use of like, variable (like_disc) (7b), and post-posing in (7c), variable (post_pos).
a. Ø used to rent a house with er my mother’s sister and cousins. Yeah, so we used to rent this big house … (YRK, tlaxton, 48)
b. Just like little carriages, yes. Yes, just Ø little tiny things, yes. (YRK, vpriestly, 85)
c. I was terrible, really … Very selfish, I was! (YRK, tlaxton, 48)
Such cases are problematic for the original grammatical formalism of the variable rules as variants arising from a common underlying form, transformed by some rule of grammar.
In theory, no two forms can have identical semantic meaning, but in practice two different forms can be used interchangeably in some contexts even though they may have distinct referential meanings in other contexts. There are at least two different levels of meaning: (1) comprehensive meaning, which takes into consideration every possible inference; and (2) meaning as it is used in interaction in the speech community. While the first is subject to idiosyncratic interpretation and an infinite range of potential meanings, the second is a consensus that is shared and relatively constant. The claim is that meaning in the latter sense should adhere to a narrower interpretation and be restricted ‘to designate the coupling of a given sentence with a given state of affairs’ (Weiner & Labov, Reference Weiner and Labov1983:30). Indeed, the definition of the linguistic variable may be defined as the task of ‘separating out the functionally equivalent from the inferentially possible’ (Weiner & Labov, Reference Weiner and Labov1983:33). A foundational task in variation analysis is to ‘circumscribe the variable context’ , the painstaking task which requires the analyst to ‘ascertain which structures of forms may be considered variants of each other and in which contexts’ (Sankoff, D., Reference Sankoff, Cohen, Los, Pfeiffer and Podewski1982:681).
Re-examining the Definition of the Linguistic Variable
When analysts first started analysing morphosyntactic variables, they borrowed the notion of semantic equivalence from the model of transformed and untransformed sentences in theories of grammar from the late 1960s, transformational-generative grammar (see Weiner & Labov, Reference Weiner and Labov1983). The problem of working out the common underlying grammatical basis for variants embroils the analyst in decisions about underlying and derived forms, which may differ depending on the theory of grammar current at the time of analysis. Variable rules beyond phonology did not work in this model for two main reasons. First, transformational rules were supposed to be meaning preserving. However, with morphosyntactic variables this could not easily be defended in any theory of grammar, variationist or other. Second, forms which seemed to be equivalent to each other could often not be derived by the same transformational path.
However, these problems are not intrinsic to the nature of the linguistic variable itself but are the result of the formalism in which they are embedded. As Sankoff, D. and Thibault (Reference Sankoff, Thibault, Johns and Strong1981) argued, the method of variation analysis obviates these problems. According to standard methodological procedures, the first step is the observation that two (or more) forms are distributed differentially across a community or within the discourse. The variationist method begins when the analyst is convinced that they are dealing with a bona fide variable. Indeed, the nature of the underlying form, or even its existence, is irrelevant (Sankoff, D. & Thibault, Reference Sankoff, Thibault, Johns and Strong1981).
You might ask how this can be. It comes back to the distributional facts of language. The advantage of variation analysis is working with real data, often from representative samples of individuals living in communities, and from scrutiny of hundreds and perhaps thousands of instances of the linguistic variable. With this type of data on hand, the distributional facts about language use can be employed for understanding the nature of variation.
In the late 1970s and early 1980s, studies of variation above and beyond phonology were breaking new ground. It is not surprising, then, that the operational definition of the linguistic variable was challenged (Lavandera, Reference Lavandera1978; Reference Lavandera, Dittmar and Schlieben-Lange1982). The analytic method was ripe for renewal and advancement. Next came the important study of weak complementarity (Sankoff, D. & Thibault, Reference Sankoff, Thibault, Johns and Strong1981:208) that demonstrated that the linguistic variable need not be semantically equivalent. Instead, discourse equivalence, or functional equivalence, was found to be the relevant criterion. Indeed, Sankoff, D. and Thibault (Reference Sankoff, Thibault, Johns and Strong1981:208) argue that in many cases ‘the most we will be able to say is that the proposed variants can serve one, or more generally, similar discourse functions. We cannot even require that they be identical discourse functions.’
So how is one to recognise a linguistic variable? Even once you think you have found one, how can you be sure it is a good one? I now turn to exemplifying the pursuit of the linguistic variable in practical terms.
Recognising the Linguistic Variable
The linguistic variable can exist at virtually any level of the grammar, ranging from phonetics to discourse, from phonology to syntax, as elaborated in (8) (Wolfram, Reference Wolfram and Preston1993:195):
a. a structural category, e.g. the definite article, relativisers, complementisers
b. a semantic category, e.g. genitive -s vs of genitive, periphrastic comparative more vs synthetic -er
c. a particular morpheme category, e.g. third person singular present tense suffix, the -ly suffix on adverbs
d. a phoneme, a systematic or classical definition of a unit, e.g. [θ] in English a natural class of units in a particular linguistic environment, e.g. final stop consonant clusters in word-final position, Canadian Raising the process by which the onsets of the diphthongs /ay/ and /aw/ raise to mid-vowels when they precede voiceless obstruents (the sounds /p/, /t/, /k/, /s/, and /f/)
e. a syntactic relationship of some type, e.g. negative concord, passive vs active permutation or placement of items, e.g. adverb placement, particle placement
f. a lexical item, e.g. chesterfield vs couch vs settee
In this way, the linguistic variable is an abstraction. The varying forms must exist in some linguistically meaningful subsystem of the grammar. The linguistic variable must also have another important characteristic. It must co-vary, that is, correlate, with patterns of social and/or linguistic phenomena. While you may not be able to fully know what the correlates are at the outset, be on the look-out for trends. A linguistic variable is more than simply a synonym, and more complex than simply two ways of saying the same thing. It must also have qualities of system and distribution as well, (9), even if these can be revealed only by analysis:
a. synonymy or near synonymy (weak complementarity)
b. structurally embedded, i.e. implicated in structural relations with other elements of the linguistic system, e.g. the phonemic inventory, phonological space, functional heads, grammatical subsystems, etc.
c. correlation with social and/or linguistic phenomena
In sum, early controversy over the extent to which the linguistic variable could be applied to all levels of grammar was really a developmental phase in variation analysis when definitions were being refined and improvements to the methodology were in progress. Lavandera (Reference Lavandera1978) correctly pointed out that the linguistic variable, as it had originally been defined, could not be extended to variables above and beyond phonology. However, the research paradigm quickly caught up. Sankoff, G. (Reference Sankoff, Bailey and Shuy1973), Sankoff, G. and Laberge (Reference Sankoff, Laberge and Sankoff1980), Sankoff, G. and Thibault (Reference Sankoff, Thibault and Sankoff1980), (Reference Sankoff, Thibault, Johns and Strong1981), and Weiner and Labov (Reference Weiner and Labov1983) demonstrated through detailed methodological argumentation that the linguistic variable need not be confined to cases in which the variants necessarily mean precisely the same thing. Instead, the linguistic variable may have weak complementarity across the speech community, that is, functional equivalence in discourse. This malleability implicates the role of the linguistic variable in linguistic change (Sankoff, G. & Thibault, Reference Sankoff, Thibault, Johns and Strong1981; Sankoff, G., Reference Sankoff, Cohen, Los, Pfeiffer and Podewski1982:681–685; Reference 347Sankoff and Newmeyer1988b:153–155).
Linguistic Variables as Language Change
How can a linguistic variable involve variants that have no structural relationship or one-to-one equivalence? The answer has to do with how language changes. Linguistic change does not always occur gradually from one closely related form to another. Instead, language change may proceed by cataclysmic means: ‘by forcible juxtaposition of grammatically very different constructions whose only underlying property in common is their usage for similar discursive functions’ (Sankoff, D. & Thibault, Reference Sankoff, Thibault, Johns and Strong1981:207). Consider some examples. Going to and will are variants of future temporal reference in contemporary English, variable (fut), despite different sources in separate lexical verbs. In earlier times (and perhaps even today), the simple present tense varied systematically with the progressive: for example, the kettle boils versus the kettle is boiling, I love it versus I’m loving it. The relativiser that, also, a complementiser, often varies with who, also a pronoun and sometimes ‘as’ or ‘what’.
If one form appears to be replacing the other, either in time or along some socioeconomic or demographic dimension in the community (Sankoff, D. & Thibault, Reference Sankoff, Thibault, Johns and Strong1981:213), then this may be an indication of change in progress. For example, if a variant is correlated with age, this may be evidence of evolution of a subsystem of grammar taking place.
The application of variation analysis to formal models of grammatical change was foreshadowed in research in the early 1980s, long before variation analysis was explicitly applied to grammaticalisation theory per se (e.g. Poplack & Tagliamonte, Reference Tagliamonte1998; Reference Poplack and Tagliamonte2001). Sankoff, D. and Thibault (Reference Sankoff, Thibault, Johns and Strong1981) argued that when discourse alternatives coexist over time, we may expect this equivalence to eventually become grammaticalised, that is, functional analogues will become syntactic analogues. They speculated that the criterion of weak complementarity could be used as a diagnostic for stages in the development of forms. The progression of such change might be outlined as follows:
1. An innovation is introduced; it takes on the form of a discourse marker having some attentional or accentuation purpose.
2. The form gradually loses some of its original emphatic qualities.
3. Semantic distinctions gradually become neutralised.
4. Forms grammaticalise and take on the conventional characteristics of a linguistic variable.
Such an approach makes important and testable predictions for grammatical change, (10).
(10) Predictions for grammaticalisation
Early stage Later stage Semantic constraints Neutralisation of semantic constraints
Further research needs to be done in this area. The challenge is to find the right set of circumstances, a diagnostic variable, and then to test the hypotheses of change. Early in the 1990s scholars noted that variation analysis was ripe for research on grammaticalisation: ‘a fuller integration of sociolinguistic and developmental research with research on grammaticalization still remains to be worked out’ (Hopper & Traugott, Reference Hopper and Traugott1993:30). Since that time a flood of research on grammatical variation and change has emerged.
The next question is how do you choose which variable to study?
Selecting a Linguistic Variable for Analysis
Beyond the motivation to study something that interests you, what are the qualities that you should be looking for when choosing a linguistic variable? Wolfram (Reference Wolfram and Preston1993:209) notes that ‘selecting linguistic variables for study involves considerations on different levels, ranging from descriptive linguistic concerns to practical concerns of reliable coding’. These may seem overwhelming at first, but as you get the hang of it these decisions keep the process vibrant and intriguing.
Identify Potential Variables
The first task is to identify potential variables in language. Faced with data, students often ask me, ‘What do I look for?’ This is an entirely practical issue. The place to start is to take a long, hard look at your data with a linguist’s eye. As discussed earlier in Chapter 1, language materials, of any type (e.g. written, spoken, or signed), offer you a wide range of variables for investigation. All you have to do is find them. In the first instance, simply listen, read, or look. What is different? What is interesting? What strikes your curiosity? Take notes about the things you observe. In some cases, there may be structures that are not standard, or perhaps structures that are different from what you are familiar with in your own variety of language. In fact, when linguistic variables involve dialectal, informal, or non-standard variants, they are a lot easier to spot. You tend to notice things that are different from your own idiolect. In other cases, you will need to focus intently on the flow of forms and structures in the discourse because the variants will slip by without you even realising they are there. Many linguistic variables in contemporary varieties of English, for example, comprise variants that are acceptable in the language, with little associated stigma or affect. Variation is everywhere; you just have to notice it. Sometimes it is right under our noses, in a popular song, (11).
(11) You got to breathe and have some fun … We must engage and rearrange. (Lenny Kravitz, ‘Are You Gonna Go My Way’, 1993)
A corpus collected using standard sociolinguistic interviewing typically contains one to two hours of speech per individual, which translates to approximately fifty pages of double-spaced words when transcribed. Such materials will typically be replete with potential variables. In (12), examine the transcription of Mel, a forty-year-old man, who was working as a computer software trainer at the time of interview in 1997. The interview is very relaxed, and Mel presents himself as an easy-going person who rejects conventional values. This excerpt tells the story of how he quit one of his previous jobs. It involves a dramatic exchange between himself and the boss. In example (12) underline and italics represent variable (-t/d), bold and italics represent variable (-ing), and italics alone highlight potential variables. What I mean by ‘potential’ is that variants occur that the analyst may infer will vary with other forms in the larger context.
(12) York English Corpus, 1997, man, age 40
1 … So … sort of like jus’ sat in Fibbers, havin’ a pint and the phone rang, 2 and it was my boss. … Oh! Oh, it’s- tol’ everybody I’d gone t’pub[ʊ], 3 they knew where to find me if they wanted me, you know. And so the 4 phone rang and it was the boss, you know. And she said, ‘What are you 5 doing?’ So I said, ‘Well I’m havin’ a beer. What do you think?’ ‘What 6 about- … ’ Can’t think of the name of- the guy’s name, ‘What about this 7 guy’s manual?’, you see. So I said, ‘Well I’ll do what I normally do, you 8 know. Said, ‘I’ll do it at ’ome tonight. It’ll be sorted’, you know. I said, 9 ‘Have I ever let you down … before?’ So she said, ‘No.’ So I said, 10 ‘Well, why are you hasslin’ now?’ So she said, ‘Well, I want something 11 on my desk by five-o-clock, you see. Well, ‘You’ve got no chance.’ 12 ‘Well, when can I see it?’ So I said, ‘Don’t worry, there’ll be somethin’ 13 on your desk by nine o’clock tomorrow.’ Put the phone down. That 14 night was a few of us from work … goin’ out for a drink. So we’re all 15 sat over in the Red Lion and like all these horror stories start_comin’ 16 about, about you know, how Joanne’s treat[ʔ]ed differen[ʔ] ones of 17 them you know, and shit on them and what have you.’ Cos it was like, 18 there’s two bits. There’s a recruitmen[ʔ] bit and the training bit. And I 19 mean I was sort of like tucked[t] away upstairs by myself, so I didn’t 20 get to see much of what wen[ʔ] on downstairs. And they were like all- 21 we were all sat in the pub and everybody’s bitchin’ about this woman, 22 you know. And I thought, ‘Well I don’t want to work with someone like 23 this you know. And I jus’ said so. I said, ‘That’s it, I’m ’anding my 24 notice in tomorrow.’ And you know they’re all goin’ like, ‘Nah,’ you 25 know, ‘you won’t, you won’t.’ Followin’ mornin’, um, you know I 26 mean I’d told[d] ‘em about this phone call, you know. And then when 27 she’d said like everybody had said, oh I thought, ‘Well ‘ang on a 28 minute, I’ve said there’d be somethin’ on her desk by nine-o-clock 29 tomorrow mornin’, it will be my notice’ you know. Everybody’s goin’, 30 ‘Oh you won’t, you won’t.’ Followin’ mornin’ I got up, shirt and tie 31 on, suit as normal, tootled[d] around the corner, walked[t] into the 32 office, and I said ‘Joanne, you wanted somethin’ on your desk by nine- 33 o-clock, there’s my time sheet, I quit.’ … And walked[t] out. And you 34 could jus’ see everybody’s face like drop. It’s like … ‘he’s done it’!
Even in this small excerpt, approximately three minutes of a two-hour interview, there are many features that hold promise for investigation. Some of the linguistic variables can be authenticated; that is, there is evidence that they vary. What I mean by this is that the alternatives are both present in the excerpt.
Variable (-ing) and Variable (t,d)
Two variables readily apparent in (12) are variable (-ing) and variable (t,d). Note that this excerpt has been embellished from the transcription file, with an indication of the actual pronunciation of the forms for illustration purposes. In fact, these are two of the most studied variables in the history of variation analysis. Take a closer look at each of the instances of these variables. As noted above, the words in which they occur have been underlined/italicised, bolded/italicised, or italicised only for easy visibility. I have also indicated which of the phonological variants was produced in each case. The words containing variable (-ing) and variable (t,d) are listed in (13) and (14) respectively. Note that and was excluded due to its exceptional behaviour (Neu, Reference Neu and Labov1980).
(13) Variable (-ing)
havin’, doing, havin’, hasslin’, something, somethin’, goin’, comin’, training, bitchin’, ’anding, goin’, followin’, mornin’, somethin’, mornin’, goin’, followin’, mornin’, somethin’
(14) Variable (-t,d)
jus’, pint, tol’, different, recruitment, tucked, went, jus’, told, tootled, walked, walked, jus’
How many of each variant occur in each variable set? For variable (-ing) (italics, bold), notice that the standard variant [ŋ] occurs four times. For variable (-t,d) (italics, underline), there are four examples of the non-standard, zero form. The semi-weak verb told (in line 2), and monomorpheme just (lines 1, 23, and 34) exhibit simplification of the consonant cluster. This individual uses mostly non-standard [n], but standard [t,d] forms in his speech. In the full studies of both these variables, these idiolectal tendencies hold across the broader sample of York English (Tagliamonte, Reference Tagliamonte, Gunnarsson, Bergström, Eklund, Fidell, Hansen, Karstadt, Nordberg, Sundergren and Thelander2004; Tagliamonte & Temple, Reference Tagliamonte and Temple2005). Compared to other varieties, there is relatively frequent use of the standard variant of variable (-t,d), that is, realised clusters. In contrast, the standard variant of variable (-ing), that is the velar variant, is relatively moderate in frequency in other varieties, yet here it is rare.
A multitude of other interesting and potentially variable forms are evident – some phonological (15), others morphological and syntactic (16) – and there are numerous potential discourse pragmatic markers, notably so, well, you know, and you see. Some have been italicised in the excerpt and a few are listed in (15–17) for illustration.
(15) Phonological
a. definite article reduction gone t’pub line 2 b. variable (h), dropping ’ome line 8 ’anding line 23 ’ang line 27 c. variable (t) trea[ʔ]ed line 16 d. variable (U) pub [pub] line 2
(16) Morphological and syntactic
a. of vs ’s genitive the name of- the guy’s name line 6 b. agreement there’s two bits line 18 c. subject drop Ø Put the phone down line 13 d. zero definite article followin’ mornin’ lines 25, 30 e. possessive have got vs have you’ve got no chance line 11
Many discourse/pragmatic features are evident as well (17):
(17) Discourse-pragmatic
a. extension particles and what have you line 17 b. quotatives said lines 4, 5, 7, 8, 9, 23 thought line 22 going … lines 24, 29 it’s like line 34 c. discourse like it was like … line 17 like drop line 34 d. discourse markers you know lines 3, 4, 7–8, 16, 17, 22, 23, 24, 25, 26 I mean lines 18–19, 25–26 you see lines 7, 11 e. discourse so so the phone rang lines 3–4 so [x] said lines 7, 9, 10, 12 so we’re all sat lines 14–15
Of course, in such a small excerpt of material most of these potential variables cannot be authenticated. If only one variant is present, you cannot be sure that the linguistic feature in question is variable in the data as a whole. However, if you know these variants participate in alternation with other forms, then the presence of even one of the variants is a good indication that the other may be present as well. Further examination of a greater portion of the data for this individual would confirm which are variable and which are not. Nevertheless, the sheer number of possible features for study is quite remarkable.
Other features of note are morphosyntactic and lexical features that stand out nationally, regionally, and locally (18).
(18)
a. we’re sat … vs we’re sitting b. it’ll be sorted … vs it’ll be fixed/worked out, etc. c. tootled around … vs walked d. hasslin’ … vs bothering/bugging, etc.
Faced with such a data set, the analyst must decide which variable to tackle for a fully fledged analysis. Which one would you choose?
Notice in (12) that variable (-ing) is relatively frequent, occurring twenty times. Variable (-t,d) occurs fourteen times. It is not surprising that these two variables have been so often studied in the literature on English. They are ubiquitous, easy to spot, and easy to find. Both characteristics are ideal criteria for selecting a linguistic variable.
In fact, some linguistic variables are better candidates for variation analysis than others. Variable items which lack systemic, linguistic foundations, such as variable realisations of words like ‘yes’ (19a), ‘because’ (19b), or performance anomalies (19c–d), may not be ideal for variation analysis.
a. Yes it has, very tiny. … Yeah they’re not- they’re not that big. (YRK, mtoovey, 40)
b. ’Cos the atmosphere up there’s different as well because um everyone’s doing exams. (YRK, ldonald, 15)
c. We just go- really we’d um- we’d just go out … (YRK, mtoovey, 40)
d. The b- – the boys from Brigg were um- ten of their team were- (YRK, ldonald, 15)
Diagnostic criteria can guide the analyst in choosing a ‘good’ linguistic variable for analysis. Ideally, you want to select a variable that is interesting and relevant, both to you and within the field. But, in practice, this goal must necessarily be balanced on practical grounds.
Frequency
Linguistic features that are rare, either because of the relative frequency of the structure or because of conscious suppression in an interview, may not be good candidates for analysis. They may be interesting linguistically, dialectally fascinating, and critical for a comprehensive descriptive profile, but if they do not occur with sufficient numbers they can hardly be tabulated in a study of variation. Phonological variables are usually more frequent, while grammatical structures are rarer. Discourse features may be remarkably frequent or virtually absent depending on the variety under investigation, age of the individual, and so on.
Sometimes features occur extremely frequently but cannot be ideal variables because the context of variation is questionable. This arises most obviously in the case of discourse pragmatic features, where only one variant is overt in the discourse. But what is its alternative? Where can it occur, but did not? In contemporary English, features of this type are plentiful, including discourse pragmatic markers such as like, well, so, you know, and you see (as evident in the excerpt in 12). My students always want to study these features. What they do not realise is the study of these forms using variation analysis is a very complex and difficult enterprise. Defining the variable context requires painstaking treatment of the data and advanced knowledge of syntax because the feature must be defined structurally in order to assess its function in the phrase structure (D’Arcy, Reference D’Arcy2005; Reference D’Arcy2017).
It is possible to include questions to elicit specific types of constructions. For example, talking about past time will enhance the occurrence of past tense forms; talking about habitual activities will enhance the occurrence of habitual tense/aspect forms; and getting individuals to tell you stories will enhance your ability to get quotatives. However, you may not know in advance of data collection which feature(s) you want to study, or which features may become important to you later on. In sum, not all goals can be achieved in every interview situation. The frequency of different types of variables depends greatly on the type of discourse situation and innumerable other, often uncontrollable, factors.
One of my strategies for finding a good linguistic variable is to compile an index of an interview and look closely at the words that occur most frequently (see Chapter 4). Another strategy is to read prescriptive grammars and find cases where alternate forms are mentioned. Or just attend to complaints about language use on social media platforms. Another more general strategy is to get in the habit of making note of what linguistic variables researchers are talking about and check to see what is happening with those variables in your own data. If it is frequent enough, and the variation is robust enough, it is a good candidate for further investigation.
Robustness
Frequency is not necessarily the choice criterion for selecting a linguistic variable. A further requirement is that there is adequate variation between forms. Linguistic variables which are frequent but have minimal variation are less suitable for investigation. Although the structures themselves may be interesting, if they are near categorical (either 95–100% or 0–5%), so is little room for quantitative investigation. If variability hovers at very low or very high levels, differences between variants in independent contexts may be too small to achieve statistical significance. In this case, you may rely on the constraint ranking of factors for comparative purposes (Poplack & Tagliamonte, Reference Poplack and Tagliamonte2001:93); however, near categorical variables may not have sufficient numbers for even constraint ranking to be informative. In such cases, one of the possible variants may have such marginal status that the variable itself will be unrevealing. If it is a change in progress, it may also be possible that the variable has either ‘gone to completion’ or is perhaps still so incipient that it cannot be reliably modelled using statistical methods.
Sometimes very low-frequency items, by their very characteristic of limited status in a variety, can be extremely important. Indeed, Trudgill (Reference Trudgill1999) argues that ‘embryonic’ variants may sometimes blossom into rampant change. Something of this nature has occurred in the contemporary English quotative system where a new form, be like (20), represented 13% of all quotative verbs in Canadian English youth in 1995 (Tagliamonte & Hudson, Reference Tagliamonte and Hudson1999).
a. I’m like, ‘You’re kidding? Wow, that’s really cool.’
b. She says, ‘What do you think of him?’
c. I said, ‘Well, yeah, he’s cute.’ (OTT, speaker ‘c’)
Yet by 2007 it has risen to become the dominant quotative, 65%, (21) (Tagliamonte & D’Arcy, Reference Tagliamonte and D’Arcy2007) – a four-and-a-half-fold increase in less than eight years – and in 2013 at a rate of 82.4% (Denis et al., Reference Denis, Gardner, Brook and Tagliamonte2019:60).
a. She’s like, ‘Have you taken accounting?’
b. I’m like, ‘No.’
c. She’s like, ‘Have you taken business?’ (TOR, etimbali, 19)
A low-frequency variable which was well worth investigating was pre-verbal do, variable (did), in Somerset English (22a) (Jones & Tagliamonte, Reference Jones and Tagliamonte2004), and the zero definite article (22b) (Rupp & Tagliamonte, Reference Tagliamonte2019). In the first case, the rate of the obsolescing feature was 6% and in the second case there were a mere sixteen tokens in the corpus.
a. We did have an outside toilet, just a brick type of thing, you know.
b. We did have a flush toilet there. (TIV, ibargery, 74)
a. We used to follow Ø river down. (YRK, mmicheals, 67)
b. I many a time go and sit outside Ø Minster. (YRK, eburrit, 82)
Minimal presence of periphrastic do and zero definite articles amongst the oldest generation and virtual absence of do and waning presence of zero definite articles amongst the youngest generation meant that these features are dying out of the varieties. The studies we conducted likely represent the last opportunity to discover their grammar before they disappear for good. Therefore, despite the highly infrequent status of these feature, we decided to study them anyway.
Unfortunately, some obsolescent features in contemporary English are so far gone that they cannot be studied quantitatively at all. This was the case for the for to complementiser in British dialects (24). While we attempted to tabulate its frequency and distribution in our data, in the end it was too rare for meaningful patterns of use to be revealed (Tagliamonte et al., Reference Tagliamonte, Smith, Lawrence, Filppula, Klemola, Palander and Penttilä2005a; Tagliamonte, Reference Tagliamonte2013).
a. So the roads were crowded when it was time for to start. (MPT, gkenway, 74)
b. He’d light a furnace for to wash the clothes. (TIV, rharris, 64)
In sum, there may be extenuating circumstances for selecting a linguistic variable where one of the variants has very low frequency. Under most circumstances, however, variation analysis is best suited for a linguistic variable where the main variants occur robustly. This permits a richer, more complex, and informative analysis.
Implications for (Socio)linguistic Issues
Your choice of a linguistic variable should also be dictated by the extent to which it has the capacity to answer timely and relevant questions. For example, linguistic variables that are undergoing change are excellent targets for analysis since they give insights into the process of change itself. Those that implicate grammatical structures reveal details of the syntactic component of grammar. Those that differentiate dialects highlight parametric differences and so on.
Once you have decided which variable you will study, what next? It is time to extract all instances of the variable from your data according to the principle of accountability.
Circumscription of the Variable Context
Deciding on precisely how and where in the grammatical system a particular linguistic variable occurs is referred to as ‘circumscribing the variable context’ (e.g. Poplack & Tagliamonte, Reference Poplack and Tagliamonte1989:60). This refers to the multitude of little decisions that must be made to fine-tune precisely where alternates of a linguistic variable are possible.
Detail the procedure for inclusion and exclusion of items explicitly so that your analysis is replicable. If you do not provide this information, you violate the researcher’s obligation to provide enough information for your study to be repeated with reasonable accuracy and hence comparability.
First, identify the contexts in which the variants occur. Do each of the variants occur with all individuals? Do certain subgroups use more of one variant than others? These questions lead the analyst in identifying the envelope of variation (Labov, Reference Labov1972d). The tricky part is that you must count the number of actual occurrences of a particular structure as well as all those cases where the form might have occurred but did not. You have to know ‘what is varying with what’ (Weiner & Labov, Reference Weiner and Labov1983:33). In fact, you must know what the alternative variants are, even when one of the variants is nothing at all. But if one of the variants is zero, as is often the case, how do you spot them?
This is where the task of circumscribing the variable context can present special difficulties. Moreover, depending on the linguistic variable, there will be confounding factors that necessitate the exclusion of some instances, or tokens, of the variable.
Categorical, Near Categorical, and Variable Contexts
There may be a particular context in which one or the other variant never occurs. This is called a ‘categorical context’, which means that the variable is realised either 0% or 100% of the time. Such a case must necessarily be excluded from variable rule analysis for the simple reason that it is invariant. This is not to say that categorical contexts are not important. They are. In fact, the contrast between categorical variable contexts is diagnostic of structural differences in the grammar. However, categorical environments should not be included in a variable rule analysis for these reasons:
1. The frequency of application of the rule would appear much lower than it actually is.
2. Several important constraints on the variable contexts would be obscured, since they would appear to apply to a restricted set of cases.
3. The important distinction between variable and categorical behaviour would be lost (Labov, Reference Labov1969a; Reference Labov1972d:82).
Consider variation in the presence of periphrastic do in negative declarative sentences in a northern Scots variety (25) (Smith, Reference Smith2001).
a. I dinna mine fa taen it. (BCK, a)
b. I na mine fa come in. (BCK, a)
Smith demonstrated that there were two types of contexts: (1) those that never (or rarely) had do absence, third person; and (2) those that were variable, first and second person. While the (near) categorical contexts could be explained on syntactic grounds, the variable contexts were conditioned by lexical, frequency, and processing constraints. The divide between these two types of contexts showed the importance of the categorical/variable distinction in the grammar.
How do you circumscribe the variable contexts ? If the context is 95% or over, or 5% or under, these are also transparent candidates for exclusion from the variation analysis (Guy, Reference Guy, Ferrara, Brown, Walters and Baugh1988). However, in most analyses there will be a wide range of frequencies across factors. The analyst must be aware of where the variation exhibits extremes at one end of the scale or the other, as these contexts will be critical for explaining the variation.
The questions to ask yourself as you define the envelope of linguistic variation are these: Does this token behave exceptionally? Does it behave like other tokens of the variable? The major part of circumscribing the variable context is to ‘specify where the variable occurs and where it does not’ (Weiner & Labov, Reference Weiner and Labov1983:36). In so doing, you must provide an explicit account of which contexts are not part of the variable context.
The decisions that go into circumscribing the variable context affect the results in very important ways. Be sure to make principled decisions at each step in the process. Even the most sophisticated quantitative manipulations will not be able to save the analysis if you do not do this first (Labov, Reference Labov1969a:728). In the next section I turn to some practical examples.
Do not be afraid to falsify your own procedures. Circumscribing the linguistic variable is a process that unfolds as you go and is continually revised nearly right up to the end of the extraction process. I don’t know how many times I’ve had to go back and include a token type because I found later that it was variable. I’ve also had to go back and exclude tokens that were later found to be invariable. This is all part of the discovery process. Remember to document everything!
Exceptional Distributions
One of the first things to attend to when circumscribing the variable context is whether there are contexts that are exceptional in some way. Exceptional behaviour often becomes obvious only as research evolves. Certain exceptional behaviours are part of the knowledge base existing in the literature. It is the responsibility of the analyst to know what idiosyncratic behaviour has been noted in earlier research and to pay particularly good attention to how the variants of a variable are distributed in the data set under investigation. Are there co-varying nouns, verbs, adjectives, and if so, do they behave comparably to the rest? Are different structures, sentence types, and discourse contexts the same or different? Exceptional distributions may occur for any number of reasons, and these will differ depending on the variable and the data set. This is undoubtedly part of what Labov meant by ‘exploratory maneuvers’ (Reference Labov1969a:728).
Asymmetrical Contexts
It is critical that each linguistic variable be scrutinised for asymmetrical distribution patterns. For example, in a study of verbal -s in Early African American English (Poplack & Tagliamonte, Reference Poplack and Tagliamonte1989), we knew that one of its salient characteristics was use with non-finite constructions (Labov et al., Reference Labov, Cohen, Robins and Lewis1968:165). For this reason, we were looking for cases of verbal -s in these constructions in our data. When we did not find any, it was immediately apparent we were dealing with a different situation. Similarly, we knew from earlier research that verbal -s tended to appear on certain verbs. Once again, this was a red flag to us to pay attention to the distribution of variants by lexical verb.
Another good illustration of exceptional behaviour that must be considered comes from the study of relative markers in English, exemplified in (25). At the outset, it is extremely important to isolate the restrictive relative clauses. Why? Because in contemporary varieties of English, non-restrictive relative clauses differ on a number of counts from restrictive relatives, and thus cannot be treated in the same analysis. First, non-restrictive relative clauses occur primarily with which and who, but hardly ever with that and zero; second, their semantic function differs; third, non-restrictives are marked off prosodically (as indicated by a comma in 26). Given these characteristics, if non-restrictive relatives were included in a sample of data which included restrictive relative markers, the embedded clause in (26), the effect would be to raise the percentage of which/who forms and to lower the percentage of the others (that and zero). Further, the results would not be comparable with other data where only restrictive relative clauses were studied.
(26) Albert, who was one of the guys that I knew from the Bayhorse, got him to do his physics homework for him. (YRK, ocavell, man, 40)
Because non-restrictive relative clauses are nearly categorically marked with wh- forms, they are exceptional when it comes to the presence of relative markers and should not be included in the same analysis as restrictive relatives (see Ball, Reference Ball1996).
Somewhat the same modus operandi led to numerous exclusions in my study of dual form adverbs (Ito & Tagliamonte, Reference Ito and Tagliamonte2002:246–248). The variation was restricted to adverbs that could take either -ly or -Ø, without a difference in function. Numerous adverbs had to be excluded that did not permit alternation of zero and -ly, for example highly occurs as an adverb but not high, or whose adjectival form (i.e. the zero form) was not semantically related to the -ly counterparts, also short/shortly. The example in (27a) illustrates the case of directly, which was excluded because it means ‘immediately’ in this context. However, the token in (27b) was included because in this context direct can alternate with directly, meaning ‘in a direct way without deviation’.
a. He drove home directly after arriving (= ‘immediately’).
b. ’Cos in those days as well you used to get er milk direct from a – a- dairy on a morning. (YRK, jlowe, 62)
Sometimes you will not know a priori which contexts are variable and which are not. This is particularly true when you have targeted a variable which is undergoing change. Your own intuitions may not match what is happening in the speech community. For example, in my study of dual form adverbs, I adopted a strategy of examining the data itself for evidence of a particular item’s variability. This is because the literature and my own intuitions often failed to make the appropriate judgements about potential variability for the adverb (Ito & Tagliamonte, Reference Ito and Tagliamonte2002:247). Indeed, a reviewer of the study criticised us for including certain types (28) that they claimed were not variable. In the rewrite we had to demonstrate that they were, in fact, variable both inter- and intra- speaker and, further, that they were non-negligible in number and diffused across a reasonable proportion of our individuals. We used these distributional facts to justify their inclusion in the analysis.
a. I was an angel, absolute. (YRK, jlowe, woman, 62)
b. I had years of utter misery, absolutely. (YRK, jlowe, woman, 62)
A variable must be investigated in tremendous detail to determine which contexts permit variation and which do not. List those that do not and the reasons for their exclusion.
What is the difference between the rate of a variant and the proportion of a variant? Rate involves comparing different quantities, that is, the frequency of a factor relative to some independent variable such as time, social characteristic, or linguistic factor group. A proportion is a number considered in comparative relation to a whole, for example the fraction of negative contexts per decade in a factor group with negative and affirmative distinguished.
Formulaic Utterances
Typical constructions which exhibit exceptional behaviour for linguistic variables are those that have been learned by rote such as songs, religious recitations, or sayings, since these constructions may be imitative, (29).
(29) the bible says that, ‘the asking you shall receive, and seek and you shall find’. (ONT, jlaidlaw, man, 80)
Metalinguistic commentary is also a context for exclusion. Notice how examples such as (30) tell us a lot about the production and perception of language (Preston, Reference Preston1989; Reference Preston1999).
(30) Yeah, so when I first moved up here, um, there’s a really big debate on how to pronounce the word ‘poutine’. So like, you know, fries with gravy and cheese. Yeah. So down south, it’s poutine [putin]. So my dad and I were coming up here house shopping, looking for houses, and I ordered a poutine. And the waitress looked at me and she asks me, ‘Like, what are you talking about? What is a poutine [putin]?’ … And then she’s like, ‘Oh you mean a poutine [putɪn]?’ (ONT, eedmonds, woman, 16)
Exceptional distributions also occur in expressions where the individual lexical items have become part of a larger ‘chunk’. In the study of verbal -s (Godfrey & Tagliamonte, Reference Godfrey and Tagliamonte1999:99–100), I mean, you know, I see were excluded because they are functioning as discourse markers, not verbs (31a–b). Similarly, in a study of past tense be (variable was/were), contexts such as in (31c) were excluded (Tagliamonte & Smith, Reference Smith2000:160).
a. We’d seen the roses, you see. (YRK, nheath, woman 20)
b. Should have made it a bigger thing, I think (YRK, nheath, woman, 20)
c. So, I had some friends, as it were, from my own environment. (YRK, acork, woman, 76)
In sum, when the variable under investigation occurs in a context which is anomalous with respect to the variation of forms within it, these are typically removed from the analysis.
Neutralisation
Neutralisation contexts are tokens in which independent processes exist which make the reliable identification of the variant under investigation difficult (or near impossible). Unambiguous identification of the variant is compromised. The simplest case of neutralisation comes from variables which are phonologically conditioned. For example, the juxtaposition of a noun or verb ending in [s,z] and a following word beginning with [s,z], (32), precludes being able to identify the segment accurately as the final suffix on the noun/verb or the initial segment of the following word (Wolfram, Reference Wolfram and Preston1993; Poplack & Tagliamonte, Reference Poplack and Tagliamonte1994).
a. Pop wa[z] [s]at there rubbing her arm. (YRK, cspence, woman, 70)
b. You get[s] [s]ick of them if you had too many. (DVN, rharris, woman, 64)
Similarly, in studies of (-t,d) deletion, juxtaposition of a word ending in [t,d] and a following word beginning with [t,d], (33), makes it impossible to determine whether the final (t,d) or the initial (t,d) of the following word has been removed.
(33) We were suppose[d] [t]o land on the shore. (YRK, jtweddle, man, 78)
Ambiguity
When a linguistic variable involves a grammatical feature whose varying forms implicate different semantic interpretations, the issue of circumscribing the variable context becomes more difficult. Word-final suffixes such as verbal -s or past tense -ed involve independent processes of consonant cluster simplification which render the surface forms of regular (weak) present and past tense verbs indistinguishable (34).
(34) She liveØ right up yonder. (SAM, E)
Verbs in past temporal reference contexts with no marker are ambiguous. They could be instances of uninflected present tense forms or past tense forms with phonologically deleted [t,d]. Including them will obviously skew the rate of -s presence one way or another. Only forms for which past reference can be firmly established should be included. Past tense readings can often be inferred, for example, from adverbial or other temporal disambiguating constructions (35a), as well as other indicators (35b).
a. He liveØ with mama thirty, thirty-two years … (ESR)
b. There was a pal liveØ there. (YRK)
Other processes may also render the function of a variant indistinguishable from another. For example, in (36) it is impossible to determine whether the sibilant consonant represents the plural suffix followed by a deleted copula, or a zero plural followed by a contracted copula.
(36) Them thing[z] a bad thing. (NPR)
Some contexts may be inherently ambiguous. For example, in a study of past tense expression, verbs with identical present and past tense forms such as put, set, beat would not be included because there is no variation, (37).
a. past tense
That was before Tang Hall was built you see, they put in sewerage drain from Heworth, the top water and then they put in- then they got started building. (YRK, rfielding, man, 70 in 1986)
b. present tense
… things what you put your tea in. (YRK, rfielding, man, 70 in 1986)
Another source of ambiguity is when nothing in the context permits an unambiguous interpretation of the form’s function. For example, in (38) you cannot tell whether the noun is plural or singular. Therefore, neither of these tokens should be included in an analysis of plural nouns.
a. Just behind the tree. (SAM)
b. I ain’t gonna tell no lie. (ESR)
In sum, many contexts may seem to be part of the variable context but are not. Sometimes you may not know they present a problem until much later. This does not matter. It is more important to include things than not include them, because it is way easier to include more tokens while you are in the extraction phase than to have to go back and get the ones you missed later on. In fact, excluding certain types of tokens is simple if they have been treated uniquely in the coding system. I will tell you more about this in Chapters 8 and 10.
Ensuring Functional Equivalence
With morphosyntactic variables, following the criterion of ‘functional equivalence’ is often not straightforward. You must be particularly mindful that each variant is an instance of the same function.
The study of tense/aspect features in variation analysis has been particularly helpful in outlining procedures for excluding contexts which do not meet the criterion of functional equivalence. Tense/aspect features are often involved in longitudinal ‘layering’ (Hopper, Reference Hopper, Traugott and Heine1991) of forms in the grammar where only a subset are implicated in variation of the linguistic variable under investigation. For example, the study of future temporal reference involves variation in the forms will and going to. However, different forms of will (e.g. won’t, ‘d, and ’ll) may also denote other (non-future) temporal, modal, and/or aspectual meanings. Therefore, any study of future time must restrict the variable context to include cases of will that make predictions about states or events transpiring after speech time. This involves identifying and excluding all forms that involve other semantic readings: (1) forms having a modal rather than temporal interpretation, (39a); (2) counterfactual conditions that are hypothetical not temporal. (39b); or (3) forms denoting habitual action in the present or past, (39c).
a. And today, I wouldn’t do that for the queen … (GYE)
b. If it was up to me, I’d have fish on Sunday. (NPR)
c. And we would go hitting each other brothers and then we would fight. (NPR)
By strictly circumscribing the contexts to those that are temporal and that refer to future time, the variants included in the analysis are pertinent to the study of grammatical change in the future temporal reference system.
Repetitions
Tokens which occur directly after another in sequence as false starts or performance errors are typically not included in a variation analysis.
For example, in (40) the first of the tokens was included in the data file for these variables. Inclusion of the second, repeated, tokens would add a disproportionate number of instances of the same form.
a. And then funny enough, funny enough, I think in one year four of us got married. (YRK, csmith, woman, 71)
b. So they’d played one short- they’d played one short. (YRK, mjohnson, man, 43)
Natural Speech Anomalies
As with all naturally occurring speech, accurate interpretation of any part of the discourse may on occasion be impossible. Intrinsic characteristics of oral discourse, like false starts, hesitations, ellipsis, and reformulations (41), often lead to difficulty in interpretation. Any unclear or ambiguous contexts should be excluded from the analysis.
a. And there’s another new one in this week who- (CMK, bmcgregor, woman, 91)
b. And um, it was very– (YRK, cspence, woman, 70)
Imposing an Analysis
In circumscribing any variable context, be aware that your decision-making process may impose an analysis on the data. A good example of this comes from the study of variable (-t,d) in African American Vernacular English (e.g. Labov et al., Reference Labov, Cohen, Robins and Lewis1968; Wolfram, Reference Wolfram1969; Fasold, Reference Fasold1972) and then, later, in Guyanese Creole (Bickerton, Reference Bickerton1975). Part of the variable context involves suffixal (-t,d) alternating with bare verbs (i.e. no suffix) in contexts of past temporal reference (42a). Another part involves past marking of strong verbs, alternating with their base forms, also in contexts of past temporal reference (42b).
a. That’s got how many years since they killØ Papita? Yes, since they kilt him. (SAM)
b. I don’t know where they came from, but anyhow they came there, they begin to work. (SAM)
Bickerton criticised early studies by suggesting that if those studies had considered creole categories, such as distinctions of aspect, it would be revealed that the zero-marked verbs resulted not from deletion of English morphemes, but from a pattern of overt and zero marking peculiar to creoles. In these grammatical systems, the zero form actually encodes a different function, a particular aspectual reading.
One way to handle this type of pitfall is to configure your data file to allow for different possibilities of analysis. For example, in Tagliamonte and Poplack (Reference Tagliamonte and Poplack1993) we set up the coding system to test for both a creole and an English underlying grammar. No one analysis can claim to be the most accurate; however, a defensible and replicable analysis provides a sound foundation for future research.
The Type–Token Question
The type–token question is whether to include frequently occurring items every single time they occur, or include a certain number (Wolfram, Reference Wolfram1969:58). Such a strategy is particularly relevant for phonological variation where the inclusion of frequently occurring words with exceptional distribution patterns may distort the results. The best example I can think of is a recent study of dialect acquisition in young children (Tagliamonte & Molfenter, Reference Tagliamonte and Molfenter2005). The focus of investigation is variable (t) with variation amongst [t], [d], and [ʔ]. In the data, the children, aged 2–5, used the lexical item little extremely frequently (43).
(43) Mum, but we need- little holes. Why do we need little holes in it? Can I put little holes in it? Shaman can I put little little holes in? (KID, tclews)
A standard approach to such a situation is to restrict the number of tokens per individual, for example five tokens per hour of recording per child. However, in the study of acquisition, frequency of forms is critical. To model this effect on acquisition, it would be necessary to include all the forms. In this study we opted for an all-or-nothing strategy by devising a coding schema (see Chapter 6) that dictated including five tokens per hour per child or all of them. Time will tell which method supplies a better explanation.
The type–token question may have varying implications depending on the level of grammar under investigation and/or the particular variable targeted. While restricting the number of lexical items in a phonological analysis of variation may be defensible, the same decision might be less so in a study of syntax, discourse pragmatics, or especially grammatical change. Analysts must make a choice as to how their own study will proceed. Whatever the decision, it should be transparent enough for comparison with earlier research as well as future replications. Procedures for how the type–token question is resolved differ across studies and, unfortunately, in many, the decisions have not been made explicit in published works. To date, the relevance of type–token decisions has not, to my knowledge, been fully explored in the published literature.
Illustrating Linguistic Variables
A requisite component of a variation analysis is to illustrate the linguistic variable. At the beginning, it is important to substantiate the crucial characteristics of equivalence and distribution, intra-individual and inter-individual variation, and their diffusion in the community (e.g. how many people have the variants). Further, how the contexts of use are represented across factor groups. As a first step, find a ‘super token’: alternation of variants by the same individual in the same stretch of discourse.
Examples of variable (verbal -s) , from Samaná English (Poplack & Tagliamonte, Reference Poplack, Tagliamonte, Bailey, Maynor and Cukor-Avila1991b:49) show that both -s and zero occur in the same individual, (44a–b) uttered by individual ‘E’.
a. And sometimes she go in the evening and come up in the morning. (SAM, E)
b. She goes to town every morning and comes up in the evening. (SAM, E)
Examples of variable (-ly) from York English, (45) (Ito & Tagliamonte, Reference Ito and Tagliamonte2002), show that both -ly and zero are used by the same individual in the same stretch of discourse. In Ontario, Canada, there is also intra-individual variation, (45b–c) (Tagliamonte, Reference Tagliamonte2018).
a. I mean, you go to Leeds and Castleford, they take it so much more seriously … They really are, they take it so serious. (YRK, gdonald, man, 43)
b. You know they’re coming up slowly, but the roads are narrow. (ONT, gwindsor, man, 84)
c. It seems to go slow until you get to be about twenty. (ONT, gwindsor, man, 84)
Providing examples of intra-individual variation is important because it demonstrates that the linguistic variable under investigation is endemic to individual sample members, not simply the result of amalgamating data from individuals who are categorical one way or another.
Cross-variety comparisons illustrate that variation exists within individuals and across the communities under investigation. In (46a) you see intra-individual variation for African Nova Scotian English in rural Nova Scotia, Canada, and in (44b) for Buckie English in rural Scotland (Tagliamonte & Smith, Reference Smith2000).
a. And we was the only colour family. We were just surrounded. (GYE, l)
b. We were all thegither … I think we was all thegither. (BCK)
Similarly, (47) illustrates variable verbal -s in third person plural in Samaná English and Devon English (Godfrey & Tagliamonte, Reference Godfrey and Tagliamonte1999).
a. They speak the same English. But you see, the English people talks with grammar. (SAM)
b. Yeah they drives ’em … They help out. (DVN, dcollins, man, 76)
Whenever I use examples, I always look for the most interesting, colourful, funny, informative ones I can find in my data. The reasons are: (1) to convey a sense of what the variety and its culture is like; (2) if the audience is bored, they can at least enjoy the language!