Historical Pragmatics

Laurel J. Brinton

doi:10.1017/9781009322904.003

2 - Historical Pragmatics

Scope, Methods, Challenges

Published online by Cambridge University Press: 28 September 2023

Laurel J. Brinton

Show author details

Laurel J. Brinton: Affiliation:
University of British Columbia, Vancouver

Book contents

Summary

Historical pragmatics encompasses the subfields of historical pragmatics (proper), with static focus on pragmatic forms and functions in earlier language stages, and diachronic pragmatics, with dynamic focus on changes over time. Within each subfield, one can focus on the level of expressions (words, phrases, clauses), utterances (speech acts), and discourse (register, genre, style). But the “bad data” problem means that for the past we lack naturally occurring oral conversation, where pragmatic meaning, such as speaker attitude and speaker–hearer interaction, is most obvious. However, from the medieval period, we have records which, while they come down to us in written form, represent authentic (“speech-based”) dialogue (court transcripts, depositions, parliamentary proceedings), constructed or “speech-purposed” dialogue (dramatic and fictional dialogue) or intended for oral delivery (sermons, prayers). “Speech-like” texts are more or less colloquial in nature (personal letters, diaries). Many of these documents are now accessible in multi-genre and specialized, single-genre electronic corpora. Finally, this chapter contemplates the possibility of pragmatic corpus annotation.

Keywords

“bad data”diachronic corpus pragmatics corpus annotation form-to-function function-to-form diachronic pragmatics historical pragmatics (proper)speech-based speech-like speech-purposed

Information

Type: Chapter
Information: Pragmatics in the History of English , pp. 18 - 43

DOI: https://doi.org/10.1017/9781009322904.003 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2023

2 Historical Pragmatics Scope, Methods, Challenges

2.1 Introduction

The field of historical pragmatics is quite wide-ranging, with a number of subfields. After discussing the scope of historical pragmatics and its three traditionally recognized subfields, this chapter moves on to a description of the two common approaches to historical pragmatics, form-to-function and function-to-form. These approaches may be applied to a range of pragmatic units, including expressions, utterances, and genres/domains of discourse. A serious challenge for the discipline of historical pragmatics – what has been called the “bad data” problem – is described. We will see that in large part this challenge has been met through the “digital turn” in pragmatics, or, more specifically, diachronic corpus pragmatics and corpus annotation.

2.2 The Scope of Historical Pragmatics

The field of historical pragmatics was initially divided into three subfields by Jacobs and Jucker in 1995 in the introduction to their ground-breaking volume. While these subfields have been differently named and differently understood over the years, I retain their three-way distinction as it is still useful today. I recognize, however, that the dividing lines among the subfields may be difficult to delimit. As we will see, the first two involve adding a historical or diachronic dimension to pragmatics, while the third involves adding a pragmatic dimension to historical linguistics.

The first subfield is what Jacobs and Jucker call “pragmaphilology”; this is similar to the name “New Philology” originally applied to the entire field. I will call it “historical pragmatics (proper).” It describes the pragmatic aspects of a historical text or period. According to Reference Jacobs and JuckerJacobs and Jucker (1995: 11), these aspects include, among other things, the addressee/addresser and their social and personal relationships, the physical/sociohistorical setting of the text, the aims or communicative intentions of the addresser, and the goals of the text. Studying these aspects requires a thorough knowledge of the (socio)historical contexts in which the text is produced. For this reason, it has been seen as the “macro” aspect of historical pragmatics (Reference ArnovickArnovick 1999: 8; Reference CulpeperCulpeper 2010: 189). Understood somewhat more narrowly, historical pragmatics proper focuses on discourse-pragmatic features at a particular period of time. It is thus “historical” but not “diachronic,” and essentially synchronic in its approach. Discourse-pragmatic features which might be studied here include words, phrases, and clauses of high frequency but low semantic content (i.e., pragmatic markers, interjections, address forms, comment clauses, deictics, topic and focus markers), tense and aspect forms used in “non-grammatical” ways (the “historical present” used for narrative segmentation and internal evaluation, the perfective used for foregrounding, the imperfective used for backgrounding), or distinctive word-order patterns used to mark topic/comment, new information/old information, or background/foreground (see Reference Brinton, Tannen, Hamilton and SchiffrinBrinton 2015). Functional categories, such as politeness, speech acts, or speech representation, might also be subject to examination in a particular period, as might genre or register conventions. Specific examples falling under this rubric might be the following:

address terms (thou/you) used by Chaucer or Shakespeare;
speech representation in Old English;
conventions of medical writing in Middle English;
(im)politeness in the Early Modern courtroom; or
apologies in Late Modern English (LModE).

An early study by Reference Brown and GilmanBrown and Gilman (1989) of the speech act of “directives” (commands) in Shakespeare’s plays takes an approach which could be classified as historical pragmatics proper. They show that Shakespeare utilizes a range of directive strategies. In addition to direct imperatives (which are more polite when accompanied by a second-person pronoun, e.g., Take thou, Retire thee) and verbless forms (e.g., Peace), which are rude and brusque, Brown and Gilman find directives that express the speaker’s sincere wish that the hearer do something (e.g., I pray you, prithee, I entreat you, I beseech you, I would that, I require that) and directives querying whether the hearer is willing, sees fit, or is pleased to do something (e.g., so/if it please you, by your leave). Despite imposing on the hearer to do something, I beseech can also be deferential since it occurs almost always with the formal you rather than the informal thou and is accompanied by an honorific term of address (e.g., sir, madam, lord) 40 percent of the time. This makes it much more deferential than I pray, which occurs only 10 percent of the time with an honorific. Prithee (from pray thee), in contrast, occurs mainly with in-group markers such as good friend or my daughter. Thus, there is evidence of two types of politeness that will be discussed in detail in Chapter 5: politeness recognizing “negative face” (the desire not be imposed upon) and politeness recognizing “positive face” (the desire to be approved of).

The second subfield of “diachronic pragmatics” “focuses on the linguistic [pragmatic] inventory and its communicative use across different historical stages of the language” (Reference Jacobs and JuckerJacobs and Jucker 1995: 13). It involves tracing the development of discourse-pragmatic features, functions, and genre conventions over time. This approach is truly diachronic. It has been seen as “micro.” Examples of studies belonging to this tradition might be the following:

the history of directive speech acts;
the history of compliment speech acts;
changes in the inventory of interjections over time;
the development of the pragmatic marker well; or
the origin of the comment clause I think.

Because there may be change in a form within a particular period or even within one writer’s usage, Reference CulpeperCulpeper (2010: 190) suggests that we may need to propose a field called “diachronic pragmaphilology,” but such fine-grained distinctions are probably unnecessary.

The form and function of speech representation over time in the news register is the subject of Reference Jucker and BergerJucker and Berger’s (2014) study, representing an example of the diachronic pragmatic approach. They focus on changes in speech representation in one broadsheet newspaper, The Times, from 1833 to 1988. As a way to authenticate the news, earlier editions of The Times use indirect speech and sometimes narrative reports to convey the words of reliable sources and important newsmakers. In more recent editions, speech representation becomes an important means of giving a faithful account of official meetings, conferences, reports of rail accidents, or any events in which the words of speakers are important. Jucker and Berger find a number categories of speech representation in their corpus, with considerable fluctuation over time. However, they demonstrate a trend toward more direct means of representation, that is, direct speech, at the expense of less direct means such as indirect speech. Long passages of indirect speech and free indirect speech are replaced with more selective quotations in mixed forms; these serve to summarize or characterize events. In this regard, Jucker and Berger find that the broadsheet is moving in the direction of the tabloid, where direct speech is “especially pervasive” (83). Their results thus point to changes in the representation of speech over time as well as changing genre conventions in newspapers (both are discussed in more detail in Chapters 4 and 8).

The third subfield is “pragmahistorical linguistics.” It is the study of the discourse-pragmatic factors motivating language change, and thus also a “micro” approach. Increasingly, historical linguistics has turned to pragmatic factors as a means of explaining linguistic (phonological, morphological, syntactic, semantic) change. We see this especially in the concept of inferencing used to explain semantic change (Reference Traugott and DasherTraugott and Dasher 2002), and its importance to grammaticalization, which we explore in Chapter 3. Word-order change is also often pragmatically motivated, related to pragmatic concepts such as topic and focus marking as well as foregrounding and backgrounding.

An example of a pragmahistorical linguistics study is Reference Los and AnsLos and van Kemenade (2012). This article explains aspects of OE word order and changes in this word order by evoking pragmatic notions such as given and new information. Old English allows for two positions for the subject, before or after þa/þonne ‘then’; the position to the left is “earmarked for discourse-linking” (1481) and is correlated with given information (definite, specific noun phrases with a discourse antecedent). Likewise, the variable position of the object in respect to the verb (object-verb or verb-object order) depends on givenness: typically, given objects precede and new objects follow the verb. However, “[w]hen the syntactic SVO pattern became more and more canonical (as the result of the loss of OV orders in Early Middle English and the decline of verb-second in Late Middle English), the subject appears to have become increasingly reserved for Given information and the object for New information” (1486). At the same time, the needs of information structuring remain and lead to the development of, or increased use of, certain less common syntactic structures, including left dislocation, topicalization, there-insertion, passive, and clefts.

2.3 Form-to-Function and Function-to-Form Approaches

In their early delineation of the subfields of historical pragmatics, Jacobs and Jucker suggest that there are two possible approaches in diachronic pragmatics: form-to-function and function-to-form (Reference Jacobs and Jucker1995: 13–25). In fact, these two approaches are possible whether one’s study falls under the historical pragmatics proper, diachronic pragmatics, or pragmahistorical linguistics rubric.

In the form-to-function approach, one begins with the linguistic form (e.g., a pragmatic marker, address terms, performative verb, reporting verb, vocative, conversational formula, interjection, exclamation, topic changer) and studies how it functions (either at a point of time or over time). This is a “semasiological” approach, focusing on the discourse-pragmatic function of a form (see Reference LewisLewis 2012: 903). Synchronically, one explores the pragmatic functioning of a form in a particular period of English. Diachronically, assuming the form remains the same, or undergoes only minor changes, one traces the way in which the pragmatic meaning has arisen and/or changed, what pathways of development the form has followed, and what mechanisms of change have been at work. The case studies of pragmatic markers presented in Chapter 1, §1.4 are prototypical examples of the form-to-function approach. The origin of the forms, the emergence of pragmatic meaning in context, and ongoing changes in their use (including their obsolescence) are all part of such a study. It must be recognized, of course, that establishing the pragmatic meanings of forms in earlier periods can be difficult; one can make use of studies of the form or of comparable forms in Present-day English, but one must recognize that the pragmatic meaning of the form may have changed over time. Thus, careful examination of the contexts in which the form is used is necessary.

An example of a form-to-function approach is Reference MooreMoore’s (2015) study of the history of the reporting verbs quethen, quoth, and quote. These verbs have the pragmatic function of marking a change in speaker and dialogic turns in the narrative, thus contributing to textual organization. Quoth (OE cweþan, ME quethen) is the most common verb introducing direct and indirect speech in Old English. In Middle English it is displaced by seien ‘to say’. Quethen becomes more and more specialized, so that by Early Modern English it is a kind of “invariant quotative marker,” restricted to marking direct speech; it occurs in the past-tense form and with verb–subject order, invariably in a parenthetical reporting clause (quod he/she). It may be abbreviated qd. or qð. The verb declines in the eighteenth century and becomes archaic by the nineteenth; it is now used only for jocular or ironic purposes. Moore believes that the loss of quoth coincides roughly with the spread of quotation marks: “As written language settled on the convention of using punctuation to mark reported speech, the grammaticalized verbs of speaking became a redundant strategy for indicating the shifts in speaker” (Reference Moore2015: 265). Quote, a later addition to this set of speech communication verbs, arises in the eighteenth century, serving a different function. It defers authority of the spoken words to another, insisting that the reported words are a precise copy. Moore calls it a “credentializing pragmatic element.” Moore associates it with a more text-based and literate society in which indicating sources and faithfulness in presentation become important (on reporting verbs, see further Chapter 4).

Other examples of form-to-function studies are Reference BrintonBrinton (2014) and Reference BiberBiber (2004) summarized below.

In the function-to-form approach, one begins with the pragmatic function (speech act, speech representation, (im)politeness, genre) and studies how that function is expressed formally (either at a point in time or over time). This is an “onomasiological” approach, focusing on how a discourse-pragmatic function is expressed (Reference LewisLewis 2012: 903). Synchronically, it involves a search for the inventory of terms that are used to express a functional category. Diachronically, modifications in this inventory of terms as well as changes in the nature of the category itself are both important. Thus, for example, if you are trying to understand the speech act of complimenting in the eighteenth century, you would need to determine how compliments are expressed formally and how they function in this century; both may be very different from what we know about compliments in Present-day English. The function-to-form approach poses considerable challenges, especially for a corpus-based approach (see below, §2.6). We discuss these challenges in more detail in Chapter 6, §6.3.

An example of a function-to-form approach is Reference Landert, Suhr, Nevalainen and TaavitsainenLandert’s (2019) study of stance. A pilot study reveals that markers of stance (i.e., “speaker’s or writer’s attitude toward the certainty, reliability and source of information of their statements” [173]) are not evenly distributed but cluster in certain passages. She uses twenty lexical items commonly known to express stance: verbs (e.g., believe, seem, think, suppose), adjectives (evident, (un)likely, (im)possible, (im)probable), and adverbs (e.g., certainly, perhaps, surely, truly). She then automatically extracts these items from a corpus of Early Modern English dated from 1460 to 1760 containing speech-related texts, medical writing, pamphlets, and letters. The function-to-form part of her study involves the qualitative analysis of 300-word extracts, all containing a relatively high density of stance markers (9–12 markers) to determine how stance actually works. She finds some previously unidentified stance markers, such as I collect ‘I conclude’ and I credit ‘I believe’ – what have been called “hidden manifestations” (see Chapter 6, §6.3). She also finds that simple quantitative studies may be misleading as multiple stance markers may combine elaboratively in a single stance marking, or markers which individually do not mark stance may combine to mark stance. Her finding that stance is marked in a wide range of contexts but particularly seems to collocate with rhetorical questions and direct discourse suggests that Reference BiberBiber’s (2004) purely quantitative (form-to-function) findings about the overall lower marking of stance in Early Modern English compared to Present-day English (see below) may not be correct.

Other examples of function-to-form approaches are the studies by Reference Brown and GilmanBrown and Gilman (1989) and Reference Jucker and BergerJucker and Berger (2014) summarized above.

2.4 Pragmatic Units

Another way to conceptualize the scope of historical pragmatics is to focus on the size of the linguistic unit studied. While any level of linguistic structure can carry pragmatic meaning, even intonation, spelling, or punctuation (see, for example, Reference Claridge and KytöClaridge and Kytö 2020), Reference JuckerJucker (2008: 898–902) identifies the levels of expression, utterance, and genre or domain of discourse.

Expressions encompass words, phrases, and clauses serving pragmatic functions. Pragmatic expressions range from single-word pragmatic markers (well, why) and address terms (you, thou, madam, sir) to phrasal conversational formulas (thank you, no problem) and pragmatic markers (by the way, in fact) to (elliptical) clausal reporting structures (he said, she’s like) and comment clauses (you know, I think). You will see detailed examples of pragmatic expressions in this book, especially the chapters on pragmatic markers (Chapter 3), speech representation (Chapter 4), and address terms (Chapter 7). Corpora are often used to identify frequencies and distributions of these expressions, following a form-to-function approach.

An example from fairly recent history of the rise of a pragmatic expression is exclamatory as if! (as in He thinks you’ll be impressed. As if!). Pragmatically, as if is a dismissive or derisive response to some expressed or implied state of affairs. It expresses disbelief that this state of affairs does or will occur (Oxford English Dictionary [OED], s.v. as, adv., and conj., def. P1(c)). In Reference BrintonBrinton (2014), I explore the source of this form and its pathway of development. The earliest example cited in the OED dates from 1903, but corpus examples are not frequent until the 1990s. Monoclausal as if and if only clauses (e.g., As if he cared, If only he’d stop drinking), occurring as independent sentences, bear a strong affinity to exclamatory as if and would appear to be a likely source. The earliest examples of as if monoclauses can be dated to the mid- to late sixteenth century, but they are rare. They have a denial sense and are used as part of rhetorical strategy in a developing argument. In work on what is called “insubordination,” it has been argued these monoclauses (which look like subordinate clauses but function as independent clauses) arise from full sentences in which the main clauses have been ellipted (e.g., If only he’d stop drinking, I’d be happy) (see Reference Evans and NikolaevaEvans 2007). But in this case reconstruction of the ellipted main clause remains elusive. A more plausible source is the construction it be/appear/look/seem with a complement as if-clause (e.g., it seems as if he cared), which appear early; these express negative epistemic stance, a plausible source for the ‘denial/refutation’ meaning. Independence of the as if-clause involves deletion of the semantically depleted main clause (it seems). The final step in the development of exclamatory as if is deletion of the content of the as if-clause, which can typically be inferred from context.

The level of utterances refers to speech acts (e.g., compliments, apologies, insults, promises, greetings, thanks), but can be more broadly understood to encompass conversational routines, (im)politeness, verbal aggression, and speech representation, which denote pragmatic functions typically expressed on the utterance level. Studying these calls for a function-to-form approach.

A study focused on the utterance level is Reference JuckerJucker’s (2011b) study of changes in politeness in the history of English. In contemporary Anglo-American society, we emphasize “negative politeness” by trying not to impose upon others, by offering options and asking indirectly, and we are concerned about “saving face.” In Early Medieval society, however, we do not find evidence of these strategies. Jucker hypothesizes that neither positive nor negative politeness played a role then, since one’s place in the social hierarchy was fixed and relations were based on kin loyalty and mutual obligations. Commands, for example, tended to be expressed directly, in (what looks to us) a face-threatening way. This is what Jucker calls “discernment politeness.” The ME period, as a result of French influence, is a period marked by curteisie ‘courtesy’. This evolves into a system of “deference” or “negative politeness,” where you and thou mark status and distance between speaker and addressee but are not used as a way of saving face. Studies of Shakespeare suggest that by Early Modern English, positive politeness (in-group identity markers, hedges to avoid disagreement, naming of admirable qualities) has come to predominate. The subsequent change to contemporary negative politeness is marked by a number of formal changes, as shown in Table 2.1. The topic of politeness is taken up in detail in Chapter 5 and the question of commands (directives) is treated in Chapter 6, §6.5.

Table 2.1 Some formal correlates to the change from positive to negative (non-imposition) politeness

formal change	positive > negative politeness
thou > you	thou is intimate and marks in-group membership you is deferential
pray, prithee (beseech, exhort, beg, etc.) > please, can/could/would you	pray/prithee asserts the speaker’s sincere wish for the hearer to do X
	please (< if it please you), can/could/would you questions the willingness of the hearer to do X
excuse me/ pardon me/ forgive me > sorry	the older forms impose upon the hearer to forgive the speaker the new form expresses regret and is deferential
must > should, need to	must expresses an obligation imposed from outside; should expresses either an internal or an external obligation; need to expresses an internal or subjective obligation

(based on Jucker 2011b)

The third level is the level of discourse or text; this is the level above the sentence, or a “self-contained linguistic unit consisting of utterances” (Reference JuckerJucker 2008: 901). While it is possible to differentiate between spoken “discourse” and written “text,” many do not make this distinction. The terminology here is somewhat confusing, with overlapping terms. But I will consider “register” to be a type of language use that is socially determined, that is, defined by the social use made of the discourse and by a set of contextual and situational features, such as subject matter, nature and role of the participants, and function. Registers may be shaped in important ways by extralinguistic forces. Examples of registers include the discourse of sports, the discourse of science, the discourse of medicine, the discourse of religion, the discourse of literature, the discourse of news, or the discourse of law. Reference JuckerJucker (2008: 901) calls these “discourse domains.”

Within any register there are a variety of “genres.” These are texts that share a conventional structure, a set of stylistic features, and a specific communicative purpose. Within the religious register, for example, we find biblical texts, sermons, prayers, biblical exegesis (commentary), and saints’ lives, all of which are differently structured and serve different purposes. Newspapers encompass genres such as news reports, editorials, opinion pieces, sports reports, business reports, obituaries, movie and book reviews, classified advertisements, recipes, and so on, all of which are shaped by very different textual conventions. Historical pragmaticists seek to determine the genres which constitute each register, focusing on the characteristic features of a genre at a particular historical period or tracing the development of those features over time. Change may involve the rise of new genres and the death of others. Some genres may undergo little change while others change in significant ways. Prayers, for example, are remarkably stable and highly conservative (see Reference Kohnen and KytöKohnen 2012b). The form of prayers, as we know them, was established in the EModE period. Because they are interactive (between the supplicant and their god) and performative, prayers are characterized by frequent “oral” features, such as first- and second-person pronouns, terms of address, and a small set of speech act types (ordering, thanking, confessing, praising). In contrast, the genre of news reporting, as we know it today, is almost unrecognizable in the earliest newspapers. (Both religious texts and newspapers are discussed in more detail in Chapter 8.)

One example of a historical discourse study is Reference BiberBiber’s (2004) examination of markers of stance in the genres of drama, personal letters, newspaper reportage, and medical prose from 1600 to 1990. Stance – or the marking of personal attitude – may be expressed by a wide variety of forms, including modal auxiliaries (might, should) and semi-modals (ought to), stance adverbials (obviously, wisely, mainly, undoubtedly), that- or to-complement clauses introduced by verbs (prefer, urge), adjectives (it is advisable/convenient), or nouns (opinion, intention). Looking first at contemporary registers, he finds that stance is more frequently expressed in conversation than in the written registers (fiction, news, or academic). The most common forms are (semi-)modals and complement clauses, though the relative frequencies of different types varies in the four registers. Over time, all markers of stance have increased, except for modals, which have declined in frequency, especially in the last fifty years (cf. Reference Leech, Hundt, Mair and SmithLeech et al. 2009: Ch. 4). The “popular” genres (i.e., personal letters and drama) have led the way with increased stance marking; newspapers show only modest increases, medical prose has shown a decline (see Figure 2.1). Different types of stance marking predominate in different genres. Biber concludes that it is not the case that one grammatical system is being replaced by another. Rather, he suggests that there is a change in cultural norms, with speakers being more willing to express stance, whether as a result of deliberate stylistic policies or popular attitudes, especially in the current century.

Figure 2.1 Changes in stance marking from 1650 to 1900 in four genres (frequency per 1,000 words)

(adapted from Biber 2004: 122) (Douglas Biber. 2004. Historical patterns for the grammatical marking of stance. Journal of Historical Pragmatics 5(1). 122. https://jan.ucc.nau.edu/biber/Biber/Biber_2004.pdf. Reprinted with permission.)

2.5 The “Bad Data” Problem

It was famously pointed out by the sociolinguist William Reference LabovLabov (1972: 100, Reference Labov1994: 11) that the data available for historical linguistic study is “impoverished”: it survives by accident, may not resemble the vernacular of the time, and was perhaps never anyone’s native language. We have no phonetic records before the early part of the twentieth century. For this reason, he sees it as “bad” – “‘bad’ in the sense that it may be fragmentary, corrupted or many times removed from the actual production of native speakers.” For Labov, “[h]istorical linguistics can then be thought of as the art of making the best use of bad data.” Furthermore, “[w]e usually know very little about the social positions of the writers, and not much more about the social structure of the community … we know nothing about what was understood, and we are in no position to perform controlled experiments … we cannot use the knowledge of native speakers” (Reference Labov1994: 11). This problem is especially exacerbated for historical pragmatics, since pragmatics has typically used as its source of data naturally occurring oral conversation or narrative. In some sociolinguistic corpora, there are archival oral narratives where, using the age of the speaker as a proxy for distance in time (the “apparent-time” approach), we can perhaps extend our time frame to the mid-nineteenth century when we assume the speech of the speakers became fixed. But “real-time” approaches, if based on oral data, can only extend back to the advent of speech recording. As Taavitsainen and Fitzmaurice observe, “data problems grow more conspicuous the further back in time we go” (Reference Taavitsainen and Fitzmaurice2007: 11).

How can the “bad data” problem be addressed in historical pragmatics? One avenue is to observe that there is never an absolute dichotomy between “written” and “spoken” and that as we go back in time the gap between written and spoken narrows (Reference CulpeperCulpeper 2010: 191). Medieval texts are widely acknowledged to contain an “oral residue.” As Reference FleischmanFleischman notes (1990: 23), “many of the disconcerting properties of medieval vernacular texts – their extraordinary parataxis, mystery particles [i.e., pragmatic markers] … and jarring alternations of tenses, to cite but a few – can find more satisfying explanations if we first of all acknowledge the extent to which our texts structure information the way a spoken language does.”

Another avenue is to recognize that there are a number of types of written texts from the past that are “speech-related” and we can thus assume them to be close to the spoken English of the time. Culpeper and Reference KytöKytö (2010: 17–18) distinguish three categories of speech-related texts, all of which come down to us in written form, of course:

speech-based: for example, trial transcripts, witness depositions, and parliamentary records; these are based on actual speech events;
speech-purposed: for example, sermons, prayers, dramatic dialogue, dialogue in handbooks, and proclamations; these are read or performed and may be mimetic of actual speech; and
speech-like: for example, personal letters and diaries; these are produced as written texts but contain speech-like (oral, colloquial) features.

In the speech-based category, transcripts of trials and of witness depositions primarily date from Early Modern English, and a record of British parliamentary proceedings dates from the early nineteenth century. For these we cannot, of course, know the degree to which the scribe or recorder has interceded and edited or redacted the text, nor can we know the extent to which the formality of the settings has transposed the language used out of the realm of “real conversation.” Studies have shown that we must take a cautious view of the veracity of the speech in these documents. It is agreed that the transcriber, whether a professional scribe or lay recorder, plays a crucial role in shaping the language in such records. For example, Reference GrundGrund (2007) looks at depositions from the Salem witch trials (see Reference RosenthalRosenthal 2013) that have come down to us in multiple copies (i.e., with different recorders). He finds significant variation in the language and sometimes in the content of the different records, and suggests that the records represent reconstructions of the actual speech based on notes. He admits that the records “do seem to approximate spoken language,” albeit not necessarily the language of the speakers of these documents “but rather what constituted spoken language in the eyes of the recorder” (145). One example he cites is the appearance of pragmatic markers, a well-known feature of oral discourse (see Chapter 3). While frequently edited out of written transcripts of oral speech, some do appear in the Salem records, especially well, why, and oh. (See Lutzky’s discussion of the pragmatic marker why in Early Modern English in Chapter 3, §3.3). But as the markers used vary in the parallel records, he postulates that they might have been added by the recorders in order to make the records more speech-like. Reference Moore, Minkova and StockwellMoore (2002), looking at slander depositions in Early Modern English (mixed language texts in Latin and English), where one would assume that the exact reproduction of the slanderous statement would be crucial, finds that rather than verbatim speech representation, the language is often made to conform to a template of defamatory speeches. Moore still believes that these can be used as speech-based texts for linguistic research “as long as they are used with deference to their limitations” (412). Reference Kytö and WalkerKytö and Walker (2003) look at the trial transcripts and witness depositions in the Corpus of English Dialogues to determine the extent to which they are reliable written representations of past speech. Trial transcripts are presented in dialogue form and witness depositions in the scribe’s third-person narration with passages of indirect speech; both may have explicit scribal interventions. Kytö and Walker find some evidence of these as indeed speech-like. While we expect false starts, hesitations, pauses, and slips of the tongue to be edited out, these do sometimes occur in the records. Further evidence of them as faithful transcripts of speech include the glossing of dialect terms, the occurrence of identical phrasing in different documents, admissions by the scribe that he has not heard a passage clearly, references by the scribe to his notes, and endorsements of transcripts as “correct” or “official” copies. But there is also evidence of scribal or editorial inference that is either inadvertent or purposeful. Direct speech may be “reconstructed” or speech may be presented in indirect form (which places it within the viewpoint of the recorder; see Chapter 4), texts may be tidied up or corrections introduced to make the text more readable (rather than more faithful), additions and amendments may be made after the fact, sometimes for religious or political reasons. Many of these records exist in one or more forms (contemporaneous manuscripts or printed texts), later printed copies, and more recent editions, with the contemporaneous forms obviously being closer to the original speech. Given all this, Kytö and Walker conclude: “In all, we can never claim that the available early speech-related texts are equivalent to actual speech. However … there is evidence to suggest that certain texts may be relatively faithful written records of spoken interaction of the past” (Reference Kytö and Walker2003: 230).

In the speech-purposed category, sermons and prayers date back to Old English (see Chapter 8, §8.5). We have dramatic dialogue from Middle English; this reflects the authors’ conception of contemporary speech. Its verisimilitude is always a question, since the dialogue is shaped in part by conventions of genre but also in part by the individual author’s talents in capturing real speech. Speech-purposed data also occur in instructional handbooks, such as conversation and language-learning manuals. While all speech-related data only approximate speech, the consensus is that dramatic dialogue comes closest to real speech (Reference Culpeper and MerjaCulpeper and Kytö 2000); furthermore, drama has the advantage of supplying motivation, characterization, and action sequences.

Texts in the speech-like category were not produced orally nor typically intended to be performed aloud, but they lie toward the colloquial rather than literate end of the style spectrum. These include, above all, personal letters. Letters are a rich source of sociopragmatic data since we often know who wrote the letter and to whom and in what context. They thus include a rich array of address terms, for example (see Chapter 7, §7.8). Language use in personal letters resisted standardization and remained more speech-like well into the nineteenth century. We also have letters of uneducated speakers unfamiliar with the conventions of writing. Personal letters date back to the late ME period, with the collections of the letters of the Paston and Cely families (see Reference DavisDavis 2009; Reference HanhamHanham 1975). Today, we can see the descendants of letters, namely emails and instant messages, as clearly incorporating oral features. Personal diaries and journals, shaped by the exigencies of time or space and often recorded for one’s own private use, are also a source of colloquial language. Newspapers, in their early form, consisted of “letters” from correspondents abroad, while later newspapers include considerable amounts of quoted speech. Early pamphlets, often addressing a controversial topic, could be framed as a dialogue between contesting viewpoints (see Chapter 8, §8.3). The speech-like category also includes represented speech in narrative fiction and non-fiction (in prose and verse) extending back to Old English (see Chapter 4). But we must always be aware that within any of these genres there is a range from more colloquial (more speech-like) to less colloquial (less speech-like), as in the difference, for example, between private and personal letters. We will see in the next section that the development of genre-specific corpora has facilitated the study of speech-based and speech-like genres.

Perhaps the best avenue to approach the “bad data” problem, however, is to recognize that all texts, either spoken or written, are communicative acts shaped and constrained by pragmatic principles and including pragmatic forms and thus amenable to pragmatic analysis. For example, we find that some pragmatic markers – which are typically associated with oral discourse (see Chapter 3) – are much more common in, and characteristic of, written discourse; these include forms such as notwithstanding, parenthetically, or accordingly. All texts are legitimate objects of study. As Jucker and Taavitsainen conclude, “[B]oth spoken and written language are forms of communication produced by speakers/writers for target audiences with communicative intentions, and language is always produced within situational constraints. Therefore all forms of language that have survived and provide enough information to contextualise the use, are considered potential data for historical pragmatics” (Reference Jucker and Taavitsainen2013: 25).

2.6 Diachronic Corpus Pragmatics

Diachronic (or historical) corpus pragmatics is the use of corpus-linguistic methods in research in pragmatics involving historical data. As Taavitsainen notes (Reference Taavitsainen, Biber and Reppen2015: 252), “it is no exaggeration to say that corpus linguistics using large computer-readable language data has established itself as the main methodology in historical pragmatics.” This is the case whether one is undertaking a diachronic pragmatics study (looking at change in a pragmatic phenomenon over time) or a historical pragmatics (proper) study (looking at a pragmatic phenomenon in a historical stage of the language). This has not always been the case. In the three areas that come together here – historical linguistics, historical pragmatics, and pragmatics – historical linguistics was the first to embrace corpus-linguistic methods, with access to large collections of data that were electronically searchable and providing statistically verifiable and replicable quantitative results. Historical pragmatics in its infancy made use of the qualitative methods of philology, dependent upon close readings of texts to extract pragmatic forms, with careful analysis of these forms in context. But very early on, corpus methodology began to be used in historical pragmatics; for example, Reference BrintonBrinton (1996) and many of the articles in the early collection by Reference JuckerJucker (1995) make use of corpus analytic techniques. Pragmatics proper has been slowest to incorporate corpus methodology. Neither Anglo-American pragmaticists with their foundation in philosophy and the use of a small set of invented examples, implicit meaning, and intuitive linguistic judgments nor pragmaticists working with small sets of conversational data immediately welcomed the “digital turn” in linguistics. But as is evidenced, for example, by the handbook on corpus pragmatics by Reference Aijmer and RühlemannAijmer and Rühlemann (2015), pragmatics too has changed, perhaps following the lead of historical pragmatics.

The use of corpus methods in historical pragmatics is not without its critics. As Jucker notes, this methodology runs “the risk of losing the philological sophistication of earlier research” (Reference Jucker2008: 903). The focus on large sets of quantitative data might lead to “decontextualization,” or the loss of focus on context, which is crucial to pragmatic analysis. Moreover, while one can achieve more objective and empirical results with corpus methodology, it is not clear that the results will be replicable, as pragmatics depends on the interpretation of forms in context and is thus to some degree subjective and variable. Furthermore, corpora tend to use normalized spelling and edited editions that distance the researcher from the original data, often important for pragmatic analysis. This creates a dilemma: “On one hand, scholars want to make use of even larger corpora in order to achieve more solid and statistically valid generalizations, and on the other hand, they realize that they need rich contextualizations in order to grasp the subtleties of language use in all the extracts retrieved from the corpora” (Reference Taavitsainen, Jucker and TuominenJucker and Taavitsainen 2014: 12).

These concerns have been met foremostly by retaining qualitative analysis as an essential part of diachronic historical pragmatics, to serve as a complement to, or sometimes a corrective of, the quantitative results obtained by corpus analysis. The compilation of more sophisticated corpora is also a step in addressing these concerns; these provide access to original spellings and images of original texts (as well as means to search the corpus despite original spellings) (see §2.7 below on corpus annotation).

For the most part, the approach taken in diachronic corpus pragmatics has been what is called “corpus-based” (or “corpus-aided”). This is a deductive or top-down approach where researchers begin with a feature or set of features about which they have formulated a hypothesis and then extract the relevant data from the corpus. The collected data serve to validate, refute, or refine the initial hypothesis. For example, examples of well might be collected from a corpus of Early Modern English to determine when, or if, it functioned as a pragmatic marker during this period and to attempt to trace its development from an adverb/adjective to a pragmatic marker over time. In contrast, a “corpus-driven” approach, which is bottom-up and inductive, consists of approaching a corpus without prior assumptions and seeing what emerges from the corpus investigation. This approach is less often used in diachronic corpus pragmatics. Here, for example, one might search a corpus of Old English to reveal the resources used for issuing commands (but it would be impossible to be entirely inductive since one would likely have to have some preconceived notion of what forms one is looking for).

Corpus searches are not equally suitable for studying all types of pragmatic phenomena, since such searches require an identifiable and precise search string. Thus, corpus methods are best suited to the form-to-function approach (see above), where one begins with linguistic forms and queries how they function pragmatically. For example, beginning with the pragmatic marker you know (in its variant spellings) one could collect all examples of it in a historical corpus of English. Of course, the collected data would need to be manually examined (qualitatively) in order to rule out cases of you know that are not pragmatic markers (this is dependent on how finely tuned one’s search string is). Corpus methods are less well suited to the function-to-form approach (see above). Studying a pragmatic function, such as a speech act, in a corpus would require determining all the formal exponents of that function; this is not an easy task, especially for earlier periods of the language, where formal marking may differ from that in Present-day English, where the speech act may be performed indirectly, or where the speech act function may itself be different. Scholars have devised a number of “work-arounds” for studying functional pragmatic categories such as speech acts and politeness. We will look at these in detail in Chapters 5 and 6.

Despite some difficulties posed for historical pragmatic work, corpus linguistics has overall made an important advancement in the field: “we feel able to make claims about earlier generations’ or communities’ discourse practices because we base those claims upon real language use and the quantitative analysis of large databases representing authentic language use make these claims valid” (Reference Taavitsainen and FitzmauriceTaavitsainen and Fitzmaurice 2007: 27).

The development of electronic corpora of English, beginning with the pioneering but (by modern standards) small Helsinki Corpus of English Texts (HC), issued in 1991, has both facilitated the incorporation of corpus linguistics into historical pragmatics and served to address the “bad data” problem. While the “communicative view” (introduced in Chapter 1, §1.3) argues that written texts are as suitable for pragmatic study as spoken texts (since both represent intentional communicative acts involving an addresser and addressee) and that the dichotomy between spoken and written is not at all clear, there may be reasons for a historical pragmatist to prefer spoken or speech-related genres as a source of data. For example, a much more diverse set of pragmatic markers is found in spoken texts at a higher frequency than in written texts (see Chapter 3). Second-person address terms and vocatives are rare in written texts but used frequently in spoken or speech-related texts, such as letters (see Chapter 7), politeness phenomena may be much more extensive in spoken texts (see Chapter 5), and certain kinds of speech acts, such as directives, commissives, and expressives, are much more likely to occur in a spoken interaction than in a written text (see Chapter 6).

Thus, inclusion of speech or speech-related data in available corpora may be important for the pragmaticist. Some of the large multi-genre (“first-generation”) historical corpora contain a substantial proportion of fiction (and thus significant amounts of constructed dialogue). For example, the 475-million-word Corpus of Historical American English (COHA; Davies 2010) (1820–present) contains 47 percent fiction; the size of this corpus provides an unparalleled source for the study of the history of American English. The smaller Corpus of Late Modern English 3.0 (CLMET3.0; 1710–1920), containing historical British English, includes about 46 percent fiction, 4 percent drama, and 7 percent letters out of a total of approximately 34 million words. In the 13.5-million-word large Corpus of Early American Literature (CEAL; 1690–1920), 42 percent in the second period and 93 percent in the third period consist of fiction (fiction is less readily available for the first period, 1690–1780). Fiction constitutes the entire contents of the 26-million-word Corpus of English Novels (CEN), covering a later period, 1881–1922, and including examples of British, American, and Canadian English. The 3.3-million-word ARCHER 3.2: A Representative Corpus of Historical English Registers (including British texts from 1600 to 1999 and American texts from 1750 to 1999) includes two speech-related genres (drama, sermons) as well as a number of written genres falling on the colloquial end of the spectrum (fiction, personal letters, journals/diaries, news reportage). Drama constitutes 14 percent of the corpus and fiction 17 percent.

In addition to linguistic corpora, which allow for searches of various kinds (lexical, grammatical), text collections may – with various degrees of ease and success – be used for linguistic study. The massive text collections, Early English Books Online (EBBO), containing 146,000 titles published between 1475 and 1700 in England, Eighteenth Century Collections Online (ECCO), containing over 180,000 titles published during this century, and Evans Early American Imprints, Series I (Evans), containing virtually all books, pamphlets, and broadsides published in America between 1639 and 1800, likely include a significant amount of fiction and drama, though the exact amount is difficult to determine. All of EEBO and Evans and part of ECCO are available in formats usable for linguistic searches (through the Text Creation Partnership). Smaller text collections, such as the Chadwyck-Healey Eighteenth Century Fiction (ECF; 1700–1780), are valuable collections, but difficult to use for linguistic searches. They allow lexical searches but there is no way to easily calculate the frequency of items; moreover, because the collections are image-based, search results are not provided in a readily useable list (a KWIK concordance [key word in context]); each result must be individually checked in the original image of the text. Nonetheless, the Chadwyck-Healey English Drama (ED) collection, containing 3,900 dramatic works from the late thirteenth to early twentieth century, is important for pragmaticists because of the wealth of historical constructed dialogue it contains. Finally, online, fully searchable collections of dramatists, such as the online Shakespeare collections, are an invaluable resource, though typically only lexical choices are possible.

It is the development of typically smaller and more focused genre-specific (“second-generation”) corpora that has provided the richest source of data for historical pragmatics, especially those corpora including speech or speech-related data of the past. Table 2.2 gives a partial listing of such corpora, including corpora devoted to personal letters, witness depositions, trial transcripts, tracts, religious writing, and newspapers. Several of the corpora contain so-called “ego documents” such as letters, diaries, travelogues, and memoirs, which are autobiographical in nature; these are a particularly important source of speech-like data, especially as they tend to be colloquial in nature and record nonstandard varieties.

Table 2.2 Some genre-specific English corpora

	Text types	Dates	Corpus	Corpus size
a.	Authentic and constructed dialogues	1560–1760	A Corpus of English Dialogues (CED)	1.2 million words
b.	Personal letters	1410?–1680	Corpus of Early English Correspondence Sampler (CEECS)	450,000 words
	Personal letters (Scottish)	1540–1750	Corpus of Scottish Correspondence (CSC)	417,000 words
	Personal letters (Richard Orford)	1761–1790	A Corpus of late 18c Prose	300,000 words
	Personal letters (selected 19th c. figures)	1861–1919	A Corpus of late Modern English Prose	100,000 words
c.	Witness depositions	1560–1760	An Electronic Text Edition of Depositions (on CD-ROM)	270,000 words
d.	Trial transcripts	1720–1913	The Old Bailey Corpus, version 2.0 (a subset from The Old Bailey Proceedings Online, 1674–1913)	24.4 million words
e.	Tracts (on religion, politics, economy, science, law, and miscellaneous)	1640–1740	The Lampeter Corpus of Early Modern English Tracts	1.1 million words
f.	Religious prose	1150–1800	Corpus of English Religious Prose (COERP)	1 million words (EModE sampler)
g.	Medical writing, ranging from more academic texts to more popularized and utilitarian texts	1375–1800	Corpus of Early English Medical Writing (CEEM) (on CD-ROMs)	ME: c. 500,000 words EModE: c. 2 million words LModE: c. 2 million words
h.	Newspapers	1661–1791	Zurich English Newspaper Corpus (ZEN) (on CD-ROM or online)	1.6 million words
	Newspapers	1653–1654	Newsbooks at Lancaster	800,000 words
i.	Parliamentary proceedings (British)	1803–2005	The Hansard Corpus	1.6 billion words
j.	Television	1950s–present	The TV Corpus (TV)	325 million words
k.	Movies	1930s –present	The Movie Corpus (Movie)	200 million words

The Corpus of English Dialogues (Table 2.2a) is a compilation of a variety of speech-based and speech-purposed data for the EModE period, including what is called “authentic” dialogue (trial transcripts, witness depositions) and “constructed” dialogue (prose fiction, drama comedy, didactic works).
There are several corpora of letters (Table 2.2b). The Corpus of Early English Correspondence exists in several forms covering the period c.1410–1800, a full version, a parsed version, and an extended version. All of the letters are sociolinguistically annotated with available information about the writer and recipient. There are also several more specialized letter corpora containing Scottish letters, letters written to Richard Orford (steward to Peter Legh the Younger, Lyme Hall, Cheshire), and letters written by a selection of important nineteenth-century figures (e.g., Gertrude Bell, Lord and Lady Amberley).
A collection of witness depositions from across Britain is included in An Electronic Text Edition of Depositions (Table 2.2c). It includes the texts with their original spelling, but makes corpus searches possible by providing a word list giving the variant spellings of all words.
The Old Bailey Corpus (Table 2.2d) is an ongoing project of converting The Old Bailey Proceedings Online, 1674–1913, a record of London’s criminal court (including trial transcripts, indictments, interrogations, witness statements, verdicts, etc.), into a linguistic corpus which is annotated sociolinguistically (age, gender, social class of speaker), pragmatically (role of speaker), and textually. The offense, verdict, and punishment for each trial is also tagged. In Figure 2.2 you will see an example search using this corpus.
Tracts on religion, politics, economy, science, law, and miscellaneous topics are included in the Lampeter Corpus (Table 2.2e). As we will see in Chapter 8 §8.3, tracts are an important precursor to newspapers.
The Corpus of English Religious Prose (Table 2.2f) contains a variety of religious writings, including sermons, catechisms, prayers, religious biographies, prefaces, treatises, and pamphlets (see Chapter 8, §8.5).
In the Corpus of Early Medical Writing (Table 2.2g) one will find a range from more academic to more popular and utilitarian medical texts. They are given in both original and normalized spelling, which greatly aids corpus searches. The EModE part supplies links to the original (non-normalized spelling) version in the EEBO database. Early medical writings are an important source of recipes, as we will discuss in Chapter 8 §8.6.
There are two corpora of newspapers of different sizes and covering different periods (Table 2.2h). In addition to specialized newspaper corpora, we have available to us historical archives of most of the world’s major newspapers, such as of The New York Times (1851–2017) (Chadwyck-Healey “Historical Newspapers”) or The Times (London) (1785–2019) (Gage Cengage “The Times Digital Archive”). For linguistic study these can prove difficult to use as they are usually image-based. (Newspapers are briefly discussed in Chapter 8 §8.3.)
The Hansard Corpus (Table 2.2i) contains a record of nearly every speech given in the British Parliament over a 200-year period. These data represent more scripted, formal speech but are nonetheless a valuable resource.
Two corpora containing transcripts of British, American, Canadian, Australian, and New Zealand television and movies are The TV Corpus and The Movie Corpus (Table 2.2j). Rather than naturally occurring speech, these contain informal spoken language as conceived of by writers of television programs and movies (i.e., constructed dialogue) and are an important source of data on recent (twentieth-century) language history .

As an example of a search using one of these more specialized corpora, let’s search for well in The Old Bailey Corpus. While we cannot search by part of speech, the search returns many examples of well as a pragmatic marker (but also, of course, the adjective and adverb well). One example of the pragmatic marker is found in the trial of Vincent Davis for the murder of his wife in 1725, in the deposition of Mary Jeffery, a neighbor of the accused. She quotes the accused using well (see Figure 2.2). Here, the speaker is a woman, though she is quoting a man. The landlady, Mary Tindall, who is also deposed, is described in the corpus metadata as a “working proprietor (catering, lodging or leisure services).” The corpus also connects us to the transcribed text in The Old Bailey Proceedings (Figure 2.3) with a link there to the original page image (Figure 2.4). We can deduce quite a bit here about who uses the pragmatic marker and in what way. The speakers all seem to be of middling rank, though the accused, Vincent Davis, is likely a more lowly ranked worker as he and his wife lodge with Mrs. Tindall. Importantly, we see him using well in much the same way that it is used in Present-day English, as a qualifier that indicates that what follows is not exactly what the hearer expects to hear or is not optimally coherent (see Chapter 3, Exercise 2).

Figure 2.2 Sample search result for well in The Old Bailey Corpus

(https://obc-client.de)

Figure 2.3 Text excerpt from The Old Bailey Proceedings Online, April 1725, Vincent Davis (t17250407-9)

(Tim Hitchcock, Robert Showmaker, Clive Emsley, Sharon Hoard, Jamie McLauglin et al., The Old Bailey Proceedings Online, 1674–1913. www.oldbaileyonline.org, version 8.0, 2018. Reprinted under Creative Commons Attribution NonCommercial 4.0 International (CC-BY-NC 4.0) license.)

Figure 2.4 Original page image from The Old Bailey Proceedings Online

(Harvard Law School Library, Historical & Special Collections. Reprinted with permission.)

2.7 Corpus Annotation

It is possible to automatically tag corpora for word class and to parse the syntax, but annotating corpora for pragmatic elements is considerably more difficult and appears to have to be done manually. Because manual tagging is time-consuming, expensive, and in some cases subjective, progress in this area has been slow and incomplete. As Weisser notes, “Any type of linguistic annotation is a highly complex and interpretive process, but none more so than pragmatic annotation” (Reference Weisser2015: 84). Reference Archer, Culpeper, Davies, Lüdeling and KytöArcher et al. (2008: 637) conclude, somewhat pessimistically, that “[u]nlike grammatical annotation, pragmatic annotation cannot be fully realized.” If individual words or phrases are associated with particular pragmatic meanings, as in the use of please with commands, thanks/ thank you with thanking, or BE sorry with apologies, for example, we obviously have inherent meaning that can easily be tagged pragmatically, but such conventionalized phrases are only a small part of pragmatics.

A number of different (semi-)automatic systems have been developed to annotate speech acts (or what is often called dialogue structure) in corpora of Present-Day English. The corpora used are often quite constrained and task-oriented. For example, one such tagging schema uses telephone calls to and from British Rail customer service and tags for forty-one different speech acts (e.g., inform, direct, refuse, suggest, thank, correct, answer) as well as for turn, syntactic form, topic, mode (i.e., semantic categories such as deixis, probability), and polarity (see Reference Leech, Weisser, Archer, Rayson, Wilson and McEneryLeech and Weisser 2003).

But, again, there is much more to pragmatics than speech acts, and the question arises as to how, or whether, this information – including speaker characteristics (age, gender, social class), physical context of the interaction, social context (personal relations of interactants [power, social distance, role]), background or shared knowledge, cultural or societal values – can be tagged in a corpus. While some of this information may be included in the metadata attached to a corpus file, there have been attempts to embed this information in the file itself and attach it to each utterance, thus accounting for the sometimes shifting interactions between interlocutors. For example, The Old Bailey Corpus 2.0 includes sociobiographical information on the speaker (gender, age, occupation, social class) and pragmatic information (speaker role in the courtroom [defendant, judge, victim, witness, lawyer, interpreter]) as well as textual information about the scribe, printer, and publisher, which can be accessed for each text. Using a selection from the Corpus of English Dialogues, Reference Archer, Culpeper, Wilson, Rayson and McEneryArcher and Culpeper (2003) tag each utterance for the identity, sex, role, status, and age of both the speaker and the addressee; role includes activity role, kinship role, social role, and dramatic role. For this reason, they call this a “sociopragmatic corpus.” (see Chapter 1, §1.5 on “Related Fields”). This type of tagging accounts not only for static features but also for changing features based on the nature of the interaction. Using the trial transcripts from this corpus and focusing specifically on questions, Reference ArcherArcher (2005: 109–134) tags for three additional fields. The first is the interactional field consisting of “initiation” (typically questions, requests, requirements), “response” (typically answers, replies, acceptances, refusals), “response-initiation” (response–request), “report” (typically statements, explanations), “follow up” (typically comments, evaluations), and “follow up-initiation” (comment–question). This is combined with the force field, which overlaps with traditional speech act categories: counsel, question, and request (i.e., directive), sentence (i.e., declarative), express (i.e., expressive), and inform (i.e., representative). The third field is a grammatical form field (e.g., wh-interrogative, tag question, and so on). Obviously, this sort of manual tagging, which requires nuanced judgments about speaker intent and linguistic form, is very time-consuming and laborious, but it yields substantive results.

Reference ArcherArcher (2014) reports on an attempt to use an automatic semantic tagger designed for modern English to study verbal aggression in a ten-year period of The Old Bailey Proceedings (1783–1793), a period associated with William Garrow, a barrister known for his aggressive style. The tagger relied on semantic fields such as good/bad, true/false, angry/violent, calm, polite/impolite, respect/lack of respect; it also tagged for speech acts. The tagging found a relatively low number of aggression tags, suggesting to Archer that impoliteness was indirect, as it is in the modern courtroom, relying on what she calls “metapragmatic framing strategies,” where, for example, requests for information, clarification, or confirmation could actually function as accusations and insinuations. Archer notes, however, that while the tagging produces interesting leads, each case requires manual inspection. For example, the term politely is in one case used to describe the act of pickpocketing, and several other instances of the term are used metalinguistically. For historical study such a tagger would also have to take into consideration semantic change; for example, in the eighteenth century politely meant ‘smoothly, in a polished manner’. She suggests that connecting the tagger to the Historical Thesaurus of English might be a means to address this problem.

Reference CulpeperArcher and Culpeper (2009) is a study using the sociopragmatically annotated comedy and trial proceedings of the Corpus of English Dialogues combined with “keyness” analysis (key words, key parts of speech, and key semantic fields). Deploying keyness analysis, they identify the statistically based correlations in two dyads (examiner–examinee, master/mistress–servant). Like sociopragmatics (see Chapter 1, §1.5 on “Related Fields”), this approach allows them to study how local contexts (e.g., age, gender, status, role) motivate the use of linguistic forms, but keyness analysis allows context to be approached in a theoretically informed way. They thus propose a third-way approach complementing the form-to-function and function-to-form approaches discussed above, namely a context-to-form and/or -function approach, which they call “sociophilology,” “describing or tracing how historical contexts, including the co-text, genre, social situation and/ or culture, shape the functions and forms of language taking place with them” (2009: 287).

2.8 Chapter Summary

This chapter covered the following topics:

the scope of historical pragmatics, covering:
1. ◦ “historical pragmatics (proper),” a macro approach looking at the pragmatics of a historical text or period (e.g., directives in Shakespeare);
2. ◦ “diachronic pragmatics,” a micro approach tracing a discourse-pragmatic form or function as it changes over time (e.g., changing speech representation in the news register over time); and
3. ◦ “pragmahistorical linguistics,” a micro approach examining pragmatic factors which influence linguistic forms (e.g., changes in word order from Old to Middle English brought about by the givenness or newness of information).
two approaches to historical pragmatics:
1. ◦ form-to-function, an approach that begins with the linguistic form and studies its function in a historical period or over time (e.g., the changing form and function of quoth as a reporting verb);
2. ◦ function-to-form, an approach that works from discourse-pragmatic functions and examines their formal exponents (e.g., stance in Early Modern English).
the pragmatic units studied, including expressions (e.g., exclamatory as if!), utterances (e.g., forms of politeness), and discourse or text (e.g., expression of stance in different genres);
the “bad data” problem, resulting from the lack of naturally occurring oral conversation from the past, now addressed in part by different types of “speech-related” data (e.g., trial records, sermons, dramatic dialogue), which – with significant caveats – can be understood as close to spoken data;
diachronic corpus pragmatics, the primary methodology used today in historical pragmatics, utilizing large, multi-genre corpora, smaller and genre-specific corpora as well as text collections; and
corpus annotations, or attempts to pragmatically annotate corpora, which remain in their infancy.

Footnotes

¹ https://cqpweb.lancs.ac.uk/

² https://varieng.helsinki.fi/CoRD/corpora/index.html

Figure 2.1 Changes in stance marking from 1650 to 1900 in four genres (frequency per 1,000 words)

(adapted from Biber 2004: 122) (Douglas Biber. 2004. Historical patterns for the grammatical marking of stance. Journal of Historical Pragmatics 5(1). 122. https://jan.ucc.nau.edu/biber/Biber/Biber_2004.pdf. Reprinted with permission.)

Figure 2.2 Sample search result for well in The Old Bailey Corpus

(https://obc-client.de)

Figure 2.3 Text excerpt from The Old Bailey Proceedings Online, April 1725, Vincent Davis (t17250407-9)

(Tim Hitchcock, Robert Showmaker, Clive Emsley, Sharon Hoard, Jamie McLauglin et al., The Old Bailey Proceedings Online, 1674–1913. www.oldbaileyonline.org, version 8.0, 2018. Reprinted under Creative Commons Attribution NonCommercial 4.0 International (CC-BY-NC 4.0) license.)

Figure 2.4 Original page image from The Old Bailey Proceedings Online