Skip to main content Accessibility help
×
Hostname: page-component-68c7f8b79f-fnvtc Total loading time: 0 Render date: 2025-12-23T06:19:49.776Z Has data issue: false hasContentIssue false

Part II - Non-Canonical Syntax in Register-Based Varieties of English

Published online by Cambridge University Press:  aN Invalid Date NaN

Sven Leuckert
Affiliation:
Technische Universität Dresden
Teresa Pham
Affiliation:
Universität Vechta

Information

Type
Chapter
Information
Non-Canonical English Syntax
Concepts, Methods, and Approaches
, pp. 137 - 234
Publisher: Cambridge University Press
Print publication year: 2025
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This content is Open Access and distributed under the terms of the Creative Commons Attribution licence CC-BY-NC-ND 4.0 https://creativecommons.org/cclicenses/

Part II Non-Canonical Syntax in Register-Based Varieties of English

Chapter 7 Introduction: Different Ways of Saying Different Things Non-Canonical Syntax in Registers of English

7.1 Introduction

Research on registers has to account for two kinds of variation, register-based variation and text-internal variation. In this chapter, non-canonical constructions will be dealt with, first, from the point of view of register-based variation and, second, as text-internal variation due to language processing. Speakers can switch into and out of registers (Schilling-Estes Reference Schilling‐Estes, Chambers, Trudgill and Schilling-Estes2004: 375; Dorgeloh & Wanner Reference Dorgeloh and Wanner2010: 7) depending on the situation of use, whereas choices within texts tend to be motivated by the ways these are structured and the material is presented, that is, by conditions of information structure, complexity, or weight. Choices on structure and form have an impact on language processing, which means that register variation also has a psycholinguistic dimension.

Textual variation poses a challenge for the assumption of equivalence in meaning, as posited in the definition of (non-)canonicity in the Introduction to this volume. While interspeaker, such as social or regional, variation at least in principle relates to alternative ways of expressing ‘the same thing’ (Labov Reference Labov1972: 188), intraspeaker variation automatically extends itself to ‘different ways of saying different things’ (cf. Halliday Reference Halliday1978: 35). For example, when comparing speech to writing, or literary to everyday, informal language, limiting the analysis to categories that are (nearly) semantically equivalent means one would miss out on those lexical and grammatical features that result from differences in content or topic.

Within the field of register-based variation, some overlap and potential for confusion of the term ‘register’ with the concepts of ‘genre’ and ‘style’ must be noted. What, in traditional sociolinguistic research, is often referred to as ‘speech styles’ (e.g., Jucker Reference Jucker1992), with an interest in factors such as formality, social identity, and social practices (e.g., Eckert & Rickford Reference Eckert and Rickford2002), is nowadays part of a broader field of study, in which ‘register’ has become the more established term. It covers all kinds of functional relationships between linguistic form and situational context. Still, genre and style as concepts in their own right remain relevant for explaining textual variation: while the analysis of genres (e.g., Swales Reference Swales1990) typically highlights conventions and generally considers aspects of the rhetorical organisation of texts, their stylistic analysis tends to focus on choices that result from aesthetic, often literary, preferences or linguistic preferences associated with individual authors (e.g., Jeffries & McIntyre Reference Jeffries and McIntyre2010). Genres are not necessarily concerned with pervasiveness but rather with what is typical; for example, the genre of news may well contain a non-canonical structure, like a verbless clause in a headline, precisely once. Styles, by contrast, are generally seen as aesthetically, not functionally, motivated, and can therefore not be relied on to account for systematic register variation. For example, authors or newspapers may have their own style, but individual authorship will usually cause less variation than the overall situation in which a text is produced. While these approaches can be integrated as different ‘perspectives for analysing text varieties’ (Biber & Conrad Reference Biber and Conrad2019: 15), it is mostly registers that cover what is frequent and pervasive in texts, and thus match the quantitative approach to (non‑)canonicity in this volume.

We have structured this introduction on textual variation as follows: Section 7.2 will introduce the concept of register and key aspects of register analysis as well as its classic methodology in more detail. Section 7.3 then (re)turns to the issue of (non-canonical) syntax, covering three patterns of possible variation: reduction, expansion, and placement variation. Each of these patterns corresponds to one of the case studies that will follow in this part of the volume. Throughout the chapter, we will take special care to point out methodological issues, in particular the existence of the two approaches that arise from the field being rooted in both variationist linguistics and register studies. In Section 7.4, we will also point out some more recent trends and open questions in the field, including a look at related issues in other fields (e.g., psycholinguistics) and at text varieties such as online registers and AI-generated text.

7.2 Key Aspects of Register Analysis
7.2.1 The Concept of Register

Register is a concept that traditionally exists at an intersection of several linguistic disciplines, such as sociolinguistics, discourse studies, literary and linguistic stylistics, text-linguistics, applied linguistics, and the study of language for specific purposes. All these disciplines share an interest in the study of language use, addressing linguistic variation with a focus on the discourse situation, for example its formality, purpose, preferences or traditions of style, or specific topics and audiences. With the concept of register, linguists thus aim to recognise, for instance, that ‘people speak differently depending on whether they are addressing someone older or younger, of the same or opposite sex, of the same or higher or lower status …; whether they are speaking on a formal occasion or casually, whether they are participating in a religious ritual, a sports event, or a courtroom scene’ (Ferguson Reference Ferguson, Biber and Finegan1994: 15).

For more than a decade now, the study of register variation has also been recognised as a discipline of its own. Aiming at the precise and systematic description of linguistic features associated with different situations of language use, register studies nowadays propose a systematic framework for the linguistic analysis of textual variation. The starting point is a comprehensive set of situational parameters that provide the template for register classification. Following Biber and Conrad’s (Reference Biber and Conrad2019) Register, genre, and style, this set comprises six major situational characteristics: the discourse participants and their relations, the channel, circumstances, and setting (i.e., time and place) of both language production and comprehension, and the purpose and topic of the discourse produced (Biber & Conrad Reference Biber and Conrad2019: 40). Based on this framework, registers are text varieties with specific linguistic characteristics arising from these core components of the discourse situation. For identifying and classifying registers, the properties of the situation are thus ‘more basic’ than their linguistic characteristics (Biber & Conrad Reference Biber and Conrad2019: 9), which are both frequent and pervasive because they are functional for the situation. This means that this understanding of register variation is closely associated with the frequency-based approach to (non-)canonicity laid down in the Introduction to this volume. However, since some functional associations are also described as non-basic patterns (e.g., as non-basic patterns of word order resulting from information structure), the approach also relies on theory-based assumptions.

Biber and Conrad’s approach highlights that linguistic features, which include non-canonical constructions, are conditioned by the context, not the other way round. For example, the real-time production mode of spoken, conversational registers is typically associated with the occurrence of forms of ellipsis. However, as the work by Biber, Wizner, and Reppen in Chapter 8 of this volume shows, structurally reduced clauses also occur in settings with mixed characteristics of speech and writing. The language of news broadcasts they deal with is mostly planned and scripted but received in real time, which are properties that account for the mix of reduction and complexity features they observe. Situational characteristics are thus logically prior to linguistic features, and register analysis focuses on such ‘functional associations’ (Biber & Conrad Reference Biber and Conrad2019: 10).

Registers can be explored at very different levels of specificity, ranging from quite general text varieties, like conversation, news, and academic prose, to more specific sub-registers, such as scripted face-to-face conversation, for instance the dialogue in TV shows (e.g., Quaglio Reference Quaglio2009), news in social media (e.g., Liimatta Reference Liimatta2019; see also Clarke Reference Clarke2022 or Scheffler et al. Reference Scheffler, Kern and Seemann2022), or subtypes of academic discourse, which can be as diverse as introductions to research articles or office hour consultations. Two studies in this part of the volume, Chapter 8 by Biber et al. and Chapter 9 by Pham, deal with more general registers (news, reviews) but ultimately look into more specific sub-registers (television news broadcasts; printed, spoken, and online reviews). Often, a complex interplay of situational factors is required for pinning down functional associations. Pham’s study of clefts in evaluative language, looking at reviews from different media, refers to the situational parameter of channel of communication, but also suggests that the parameter of purpose is particularly relevant.

It needs to be emphasised that this framework of register analysis does not easily deal with all register constellations. Problems of register definition arise from cases (1) where there is a lot of variation among texts within one register, (2) where texts from different registers share many characteristics, or (3) where texts possibly do not belong to a register at all (cf. Biber & Egbert Reference Biber and Egbert2023). An example of the first category that has recently been discussed in the literature is student academic writing (Goulart et al. Reference Goulart, Biber and Reppen2022; Biber & Egbert Reference Biber and Egbert2023: 10). The texts discussed in these studies typically involve more than one communicative purpose; they contain an almost equal amount of text passages that either explain or argue. In a similar vein, the register of conversation typically includes discourse varieties as diverse as joking around, engaging in conflict, or giving advice (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021). A classic example of case (2), cross-register similarities, is the register of fiction, with novels, in particular, typically containing a mix of narration and speech. It has been an issue both of debate and empirical analysis in register analysis if fiction is one register or (at least) two (Egbert & Mahlberg Reference Egbert and Mahlberg2020). As for case (3), the issue of ‘texts-with-no-register’ is still open for discussion. Some scholars find that such texts are ‘more prevalent than we might believe’ (Biber & Egbert Reference Biber and Egbert2023: 18; also Biber & Egbert Reference Biber and Egbert2018) but, since corpora are usually based on pre-classified registers, corpus-based register analysis tends not to target this issue in particular. However, Biber and Egbert’s (Reference Biber and Egbert2018) study of web documents finds many ‘hybrid’ texts that users did not identify as belonging to a particular register, for example, due to their containing an inseparable mix of description and both personal and commercial persuasion. In experimental studies participants are typically not asked to produce a specific register (corresponding to a real-life situation). The study on particle placement by Günther in this part of the volume shows that the choice between continuous and discontinuous particle verbs results from cognitive complexity, thus highlighting that sentence- or discourse-internal parameters are also relevant for syntactic variation.

7.2.2 Non-Canonical Syntax and Discourse

Registers are text varieties (i.e., units of language use), which turns the syntax within them into utterances rather than mere units of grammar. As an utterance, a (canonical or non-canonical) sentence must be seen as tied to its discourse in two possible ways. Since discourse is, formally, any element larger than the sentence and, functionally, language use with a given purpose (Dorgeloh & Wanner Reference Dorgeloh and Wanner2023: 16), any construction is bound to its discourse by the surrounding text (the co-text) as well as by the discourse situation (the proper context).

As for the role of co-text, syntactic choices can be explained by factors such as information packaging, topic, focus, or processing load – all factors in which the text surrounding a construction plays a role. For example, Günther in Chapter 10 of this volume looks at particle placement in sentences in isolation. In longer texts, the placement of the object NP in front of or behind the particle also depends on the co-text since it will impact on the information status of the NP (Lohse et al. Reference Lohse, Hawkins and Wasow2004). In addition, aspects of the context are also relevant for particle placement; for example, there is an effect of speakers’ intentions, like emphasising what is important to them (Dehé Reference Dehé2002). Other non-canonical constructions are often primarily explained by context rather than the co-text, for instance, that-omission, or the passive voice. For example, zero-that clauses (I heard you were sick) are more likely in speech than in writing (Biber Reference Biber2012), while the passive voice, especially the long passive, is more common in writing (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 929). However, properties of the matrix clause, such as a first- or second-person pronoun and certain verbs (e.g., verbs of sensation), also play a role in that-omission (Thompson & Mulac Reference Thompson and Mulac1991), and the givenness of information in the subject vs. the agent by-phrase is also relevant for the use of the passive voice. According to Biber et al. (Reference Biber, Johansson, Leech, Conrad and Finegan2021: 932), 90% of the by-phrases in long passives express new information, while Birner (1996, Reference Birner2018) shows that English long passives are subject to similar information-structuring constraints as, for example, inversion (e.g., Above his head hung a massive seagull with its beak open; Birner Reference Birner2018: e159). So, ultimately, we need to consider both co-text and context together to understand the use of non-canonical constructions as utterances within discourse.

7.2.3 The Study of Non-Canonical Syntax in the Context of Register

The relevance of discourse, in the form of co-text and context, motivates another core distinction for the study of syntax and the role register plays therein. There are two distinct approaches to exploring this role, which differ fundamentally in how register is conceptualised. One approach is rooted in so-called variationist linguistics and looks at syntactic variation as different ways of ‘accomplishing the same function’ (Szmrecsanyi Reference Szmrecsanyi2019: 277). For example, the variationist approach aims to understand the syntactic choice between that- and that-less complement clauses, or between active and passive voice, by focusing on ‘constraints’ that govern which variant will be chosen (Szmrecsanyi Reference Szmrecsanyi2019: 78). Register will be one crucial factor here, which means it is a predictor variable in that kind of work (Dorgeloh & Wanner Reference Dorgeloh and Wanner2023: 32–41).

Register has an alternative role in the so-called text-linguistic approach, which is based more directly on the register analysis framework. As we have explained in the previous section, this approach looks at the frequency of linguistic characteristics and explains them with reference to their ‘functional correspondences’ with the discourse situation (Biber & Egbert Reference Biber and Egbert2023: 4). For example, the passive is relatively more frequent in news and academic texts because these are texts that focus on events or generalisations rather than individual agents (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 930). In that way, the approach uses syntactic characteristics for describing registers, which turns the role of register into one of a proper ‘object of investigation’ (Biber Reference Biber2012). Here, the description of registers is based on counting features, with a focus on those features that have a higher rate of occurrence in one register compared to others. A non-canonical construction thus becomes a register feature when it has a higher frequency (relative to text length or corpus size) in one specific register than in other registers. When certain features do not occur in any other register, they become register markers (Biber & Conrad Reference Biber and Conrad2019: 51). The studies by Biber et al. and Pham in this part of the volume mainly use the text-linguistic approach, presenting (normalised) frequencies across several registers, while Günther’s experimental study exemplifies the variationist approach, comparing the participants’ reactions to different syntactic variants.

Both approaches require usage-based evidence, typically in the form of corpus-based frequencies. These corpora need to be both representative and comparative (Biber & Conrad Reference Biber and Conrad2019: 10): As we emphasised in the discussion of register classification, register differences are often only gradient, since many registers are only more (or less) alike or different. For example, the study by Biber et al. in this part of the volume, dealing with the complexity of reduced structures in news broadcasts, touches on similarity with spoken discourse (reduction in general) but, in focusing on multiple reduction, there are also similarities with other written ‘economy’ registers. From a methodological viewpoint, syntactic features become register features if they are found to occur more frequently in the given register compared to at least one other (cf. Biber & Conrad Reference Biber and Conrad2019: 215).

The necessity for a comparative approach explains that corpus linguistics is the primary method in register studies. Depending on the specificity of the target registers, people either use freely available corpora or design their own to match a specific research question. Corpus size will vary depending on the frequency with which features occur. The study by Pham in this part of the volume looks at clefting, which is relatively rare, and uses a corpus of 310,000 words from six registers. By contrast, the TV news broadcast corpus compiled for Biber et al.’s study has a size of approximately 50,000 words covering three networks, further split into four text sections for an analysis of sub-registers. This corpus size is sufficient because reduced structures occur throughout the texts, which means they are not only more frequent than elsewhere but also pervasive in that register (cf. Biber & Conrad Reference Biber and Conrad2019: 9). As we pointed out earlier, experimental work typically lacks a specific register context, but instead usually strives for controlling for the co-text as variable. In this way, Günther in this part of the volume explores conditions of cognitive complexity and their effect on particle placement using the methods of a self-paced reading and a split rating task.

A conceptually and methodologically more complex approach to register analysis is the so-called multi-dimensional (MD) analysis, first introduced in Biber (Reference Biber1991). The method originally started out with a group of 67 lexical and grammatical features, which, by way of a computational factor analysis, were turned into a set of five dimensions of textual variation, identified in a large corpus of texts. Subsequently, these dimensions were labelled for reflecting certain functional properties of the texts, such as being ‘involved’ or ‘informational’, or ‘abstract’ or ‘non-abstract’. For example, as for some syntactic features we looked at before, Dimension 1 (involved vs. informational discourse) contains a higher density of that-deletion, while it is marked by a low density of agentless passives and of what is here called ‘deletions’ (Biber Reference Biber1991: 104–8). The main point of the approach is to provide quantifiable and generalisable descriptions of registers (Biber & Conrad Reference Biber and Conrad2019: 216), and the approach has established a broad research area over the last thirty years. The method goes beyond describing registers by individual features, but focuses on their patterns of co-occurrence. Within a dimension, syntactic characteristics are always part of a group of positive or negative features: for example, in an MD analysis of university registers, that-omission co-occurs with other ‘oral’ features such as contractions or wh-questions and with a negative dimension score for agentless passives (Biber & Conrad Reference Biber and Conrad2019: 228). In this way, the method manages to cover a larger set of linguistic features and to reveal similarities and differences across registers. Biber et al. (Chapter 8 in this volume) also discuss their findings on multiple reduction in the context of other ‘phrasal’ features belonging to the Dimension 1 characteristics of radio broadcasts.

7.3 Non-Canonical Syntactic Phenomena Studied

Throughout this volume, it is shown that syntactic constructions that may be judged as unusual, questionable, or even ungrammatical in isolation may still be acceptable in, and even characteristic of, specific registers in a language. We will now turn to an overview of non-canonical constructions that have been looked at from a register angle. There are different ways in which these could be systematised. For example, one could go by sentence-level vs. phrase-level vs. world-level phenomena, or we could arrange this section by the factors of the discourse that are conducive to the use of non-canonical syntactic patterns, such as modality (with spoken registers drawing differently on cognitive resources of the speaker than written registers) or function of the text. Instead, we will use form as our starting point. With regard to form, a non-canonical clause can deviate from the canonical clause in one of three ways: (a) the non-canonical clause is a shorter, reduced version of the canonical clause (such as in cases of ellipsis or deletion), (b) it is an expanded, more complex version (such as in cleft constructions), or (c) the word order of the canonical clause is rearranged in the non-canonical clause (such as in the case of particle placement, argument alternations or other non-default form–function mappings, including it-extraposition).Footnote 1 In looking at discussions of these non-canonical syntactic patterns and their relationship to register and genre, we will also keep in mind the two perspectives introduced earlier: studies focused on explaining the constructional make-up of a register (text-linguistic approach) and studies focused on explaining the choice between two competing constructions (variationist approach).

7.3.1 Reduction

Grammatical reduction, the simplification of grammatical structure (and with it the reduction of number of words per utterance), is a pattern characteristic of spoken registers, particularly conversation. Biber et al. (Reference Biber, Johansson, Leech, Conrad and Finegan2021: 1037) list pronouns, other proforms, and ellipsis among the most common forms of grammatical reduction occurring in this register. The use of reduction here is mainly due to situational parameters. Firstly, since conversation happens in real time and is interactive, it draws significantly on cognitive resources. Grammatical reduction can be seen as a strategy to reduce the demand on those resources. Secondly, the shared situational context of a situation allows speakers to underspecify what is said explicitly and to rely on the shared experience of the situation (and often also on shared knowledge). As a result, conversation includes many utterances that are non-clausal units, which is one of the reasons that the syntactic make-up of conversations looks very different from that of written registers. In their chapter on ‘the grammar of conversation’, Biber et al. (Reference Biber, Johansson, Leech, Conrad and Finegan2021: 1065) found that 38.6% of units in a sample of American and British English conversation (1,000 turns, 5,369 words) were non-clausal. Below is an example that shows how they count speech units (separated by a double line) and classify them as clausal (<Cl>) or non-clausal (<NCl>).

(1)

A: II So do you think an alligator would like salt water? <Cl> II

B: II It would probably kill him wouldn’t it? <Cl> II

A: II That one on the news has been out in the ocean for a while. <Cl> II

B: II Really? <NCl> II

C: II What are you talking about? <Cl> II I didn’t hear. <Cl> II

A: II The alligator in the ocean. <NCl> II I was asking him how he thought it liked saltwater. <Cl> II

C: II Oh. <NCl> II

In this example, all non-clausal units are treated the same, but in their contribution to this volume, Biber et al. make the distinction between non-clausal units without any syntactic structure, which include discourse markers like speaker C’s last utterance in the example above (Oh), and non-clausal units with internal syntactic structure, such as speaker A’s response The alligator in the ocean. They subsume the latter, as well as units that include a verb, under the category of ‘non-canonical reduced structures’. In their study, they look at such structures specifically in news broadcasts, looking into the parameters of the register that favour such expressions. Another example of reduced structures is the case of subject ellipsis, which is characteristic of spoken registers like conversation (Narimaya Reference Nariyama2004) and written registers like diary writing (Haegeman Reference Haegeman2013), texting (van Dijk et al. Reference Dijk, van Witteloostuijn, Vasić, Avrutin and Blom2016), and blog writing (Teddiman & Newman Reference Teddiman and Newman2007). Other ways to reduce or shorten an utterance include auxiliary contractions (hasn’t, we’ll) and acronyms. Both typically occur in registers that prize short forms due to high interactivity.

In most of these cases, it is clear which syntactic material has been omitted, and the non-canonical, reduced form can be considered a variant of the canonical, non-reduced form, which opens the path for a variationist design. However, as Biber et al. point out in their contribution to this volume, there are also reduced units for which it can be hard to say what the non-reduced form would have been, especially in cases where an utterance is constituted by a single noun phrase or adjective phrase. Structures reduced in this way do not lend themselves to a variationist approach, since there is no clear comparison of two variants. In example (1) above, we cannot say with certainty which canonical sentence C’s utterance Really? is the reduced version of. The variationist research design is best applied when it is clear which elements exactly have been reduced or deleted, as in the case of subject or object ellipsis. Since this does not hold for all of the data discussed here by Biber et al., their analysis follows the text-linguistic design.

7.3.2 Expansion

Non-canonical syntactic constructions that arise from making a canonical sentence longer and/or more complex include, for example, it-extraposition (2), it-clefts (3), wh-clefts (4), and left dislocation (5).

    1. a. It is bad to have such sharply diverging classes. (Corpus of Contemporary American English, COCA, News)

    2. b. To have such sharply diverging classes is bad.

    1. a. It was Democrats who killed the DREAM act. (COCA, Spoken)

    2. b. Democrats killed the DREAM act.

    1. a. What you should not do is prescribe carrot juice (COCA, Blog)

    2. b. You should not prescribe carrot juice.

    1. a. This guy, … he is an extra brand of crazy. (COCA, Spoken)

    2. b. This guy is an extra brand of crazy.

The distribution of these constructions is different from the reduced structures just discussed. They are found in all types of registers, but their uses are quite register-specific and, seen across registers, they are quite rare. For example, subject it-extraposition, as in (2a), is found to occur, at a rate of about two times per 1,000 words, in both academic and popular writing (Zhang Reference Zhang2015), while left dislocation, as in (5a), occurs almost exclusively in conversation (around two times per 10,000 words) and only ‘occasionally’ in fictional dialogue or written prose (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 948). The choice of a longer or more complex form is also discussed as affecting processing. For example, psycholinguistic work on reading comprehension shows that readers utilise it-clefts to anticipate the way in which information will be packaged next (e.g., Alemán Bañón & Martin Reference Alemán Bañón and Martin2019).

Sentences that lend themselves to it-extraposition (esp. sentences with a subject clause and an adjectival predicate, as in (2b)) occur more often in writing than in spoken registers (Kaltenböck Reference Kaltenböck2005), but, from a variationist angle, it is true for both written and spoken registers that in the vast majority of syntactic environments that allow it-extraposition the construction is applied. The reasons for choosing the construction usually lie in the co-text. Based on 1,701 instances of examples extracted from the British component of the International Corpus of English (ICE), Kaltenböck showed that in almost three out of four occurrences of it-extraposition the extraposed subject clause contains new information, which aligns with general ideas about information packaging.

Register-based research on cleft sentences is usually built on a small set of examples, and researchers look at the syntactic category and the information status of the foregrounded element (e.g., Hedberg Reference Hedberg1990). In Chapter 9 of this volume, Pham takes a different approach, focusing on how the use of cleft sentences may be influenced by situational factors, particularly the communicative purpose of evaluation. She found that, while most cleft constructions themselves are evaluative (a concept that is not as easy to code as syntactic categories like ‘clausal’ above), clefts do not occur more frequently in texts that clearly have an evaluative purpose. She hypothesises that clefts, which are non-canonical sentences by syntactic criteria, could be considered ‘less non-canonical’ if one looks into the function of the construction. This shows again that a syntactically non-canonical pattern can be the functionally canonical one.

7.3.3 Placement Variation

English is considered a strict word order language in which the arguments of the verb (like agent, theme, goal) are mapped onto syntactic positions predictably. Non-canonical sentences include those in which the expected arrangement is not followed. For example, it is generally assumed that canonically a direct object will directly follow the verb (Huddleston & Pullum Reference Huddleston and Pullum2002: 247), as in (6a). Sentences in which the direct object occurs after an adjunct, as in (6b), a case of heavy NP-shift, are non-canonical.

    1. a. He praised me for mastering the trills quickly. (COCA, Fiction)

    2. b. He was by all accounts a prodigy and mastered quickly the hard-earned lessons that most shipwrights spent a lifetime accumulating. (COCA, Magazine)

Another example of placement variation are passive constructions, which break the expected alignment of agent and subject position, resulting either in a sentence in which the agent is not realised at all (short passive) or a sentence in which a non-agent is placed in subject position and the agent becomes an adjunct inside a by-phrase (long passive). Much of the research on the occurrence of the passive as a non-canonical construction is text-linguistic in nature. The use of passive constructions is known to vary considerably across registers; for example, they are a frequent and pervasive characteristic of academic writing (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021). The reasons for that register specificity lie, to some extent, in the purpose and genre conventions of academic writing. Texts tend to be about processes, data, and discoveries rather than about the people who make them. The passive is one way to place a non-agent in the subject position of a sentence, that is, in the syntactic position most clearly linked to the discourse function of topic. Long passives are quite rare, even in academic writing. The main reason for that is that there is a long tradition, especially in the natural sciences, of presenting research as a disembodied endeavour. The passive provides the option to leave an agent unexpressed, and academic writing is a register in which this option is often considered desirable, especially if the agent is the author of the text, despite modern style manuals explicitly taking the position that the use of the first person (I will show instead of It will be shown) is clearer and intellectually more honest. As pointed out above, if passives do occur with by-phrases, the by-phrase tends to express new information or at least information that is less given than the subject (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 933).

Like most syntactic constructions, non-canonical or not, the use of passive constructions can be examined from a variationist or a text-linguistic perspective. Biber et al. (Reference Biber, Johansson, Leech, Conrad and Finegan2021), for example, take a text-linguistic approach and measure the use of the passive by tokens per X number of words in a corpus. A different approach is chosen by Seoane (Reference Seoane, Dalton-Puffer, Kastovsky, Ritt and Schendel2006), who studies the frequency of passives from a variationist perspective, counting the percentage of transitive verbs realised in active vs. passive voice.

Like reductions, the use of placement alternations and the preference for a non-canonical pattern in a given discourse situation is also explained from a cognitive angle. Following the influential work by Hawkins (Reference Hawkins1994, Reference Hawkins2004), it is generally assumed that speakers, when given a choice, prefer constructions that can be processed more efficiently. Hawkins’ principle of ‘minimising domains’ allows him to calculate which structures are more efficient in that regard. Simply put, the question is how much syntactic material (the ‘domain’) must be processed before the overall structure of the verb phrase is determined. Changing the word order in a sentence can be a way to minimise that domain.

Building on Hawkins’ work, Günther (Chapter 10 in this volume) looks at word order variation in the case of particle verbs (look up the information/ look the information up). Generally, the continuous word order (verb-particle) is considered the canonical pattern, the one that minimises the ‘domain’ that must be processed to figure out the meaning of the verb. If that is the case, it is not immediately clear why the non-canonical pattern, the one that is supposed to be less efficient to process, is more common in spoken language, a register that is supposed to favour the minimising of processing burdens, and that the canonical form cannot be chosen at all if the object of the verb is a pronoun (*look up it). Günther makes the important observation that the cognitive load for language production is not necessarily the same as for language perception. She chooses experimental data to explore the difference between minimising one’s own cognitive load as speaker vs. reducing the cognitive load for the hearer.

7.4 Trends and Open Questions

As we see throughout this part of the volume, register studies is a dynamic field that reflects trends in the development of registers as well as in linguistics as a discipline. Register is increasingly considered as one of the core factors that regulate syntactic variation (e.g., Szmrecsanyi Reference Szmrecsanyi2019) and language change (e.g., Biber et al. Reference Biber, Egbert, Gray, Oppliger, Szmrecsanyi, Kytö and Pahta2016). With respect to the former, the categorisation and analysis of digital modes of communication, which never really fit into the written/ spoken dichotomy and are increasingly multi-modal, has fuelled new branches of research into registers (for an overview, see Biber & Egbert Reference Biber and Egbert2018 and Page et al. Reference Page, Barton, Lee, Unger and Zappavigna2022). Such studies look both into how digital modes have expanded the inventory of linguistic forms and how they provide the ecosystem for new registers or transforming existing ones. For example, Zappavigna (Reference Zappavigna2018) has offered a classification of hashtags that includes the function of register-specific topic markers, Bohmann (Reference Bohmann and Squires2016) has looked into posts on Twitter, now known as ‘X’, as the potential origin of a new, non-canonical use of because, and Zhang (Reference Zhang2023) has shown that the linguistic profile of printed news is influenced by the now dominant register of digital news, a case that illustrates how a change in extra-linguistic behaviour (people increasingly consuming news in digital environments) impacts a register.

On the methodological side, the go-to approach to studying register both from the text-linguistic and variationist experience is still corpus-based, but there is growing awareness of the fact that corpus data are mainly production data, and that any analysis that looks into cognitive factors like processing cost as a criterion for explaining the use of non-canonical constructions should ideally also include data from language processing (as done by Günther in this part of the volume). It is no surprise, therefore, that there is a growing trend to rely on converging evidence, that is, by combining corpus data and experimental data.

An interesting situation for register-based work arises when criteria from different domains for what we consider ‘canonical’ do not align. In the case of particle verbs, for example, based on structural criteria, the discontinuous variant is considered to be non-canonical (see Günther, this volume), but data from language acquisition show that the vast majority of particle verb constructions produced by young children exemplify the discontinuous order (Diessel & Tomasello Reference Diessel and Tomasello2005), a fact that, at first sight, is not easy to reconcile with the assumption that canonical constructions occur more often in the input and are easier to acquire. In a similar vein, it is not immediately obvious why it-extraposition, a non-canonical sentence pattern by structural criteria, when looked at from a variationist perspective, is used much more frequently than its canonical, non-extraposed competitor, both in speech and writing (Kaltenböck Reference Kaltenböck2005).

Additionally, statistical analyses have become more sophisticated and have moved from monofactorial to complex multifactorial models. Examples are Gries’ (Reference Gries2003) groundbreaking study on particle verbs and Grafmiller’s (Reference Grafmiller2014) study of the realisation of the genitive in English in terms of an interaction of processing-related factors with register conditions across modality and genre.

As we move into a future in which we will increasingly encounter texts generated by artificial intelligence tools, we predict that one topic the field of register studies will have to wrestle with is the production of texts not conditioned by human communicative needs or processing constraints. Language models that power tools like ChatGPT (launched in late 2022) are trained on ‘large, uncurated, static datasets from the Web’ (Bender et al. Reference Bender, Gebru, McMillan-Major and Shmitchell2021: 615). Not only do such models not use language to encode meaning, they also tend to ‘encode hegemonic views that are harmful to marginalized populations’ (Bender et al. Reference Bender, Gebru, McMillan-Major and Shmitchell2021: 615). Texts generated by such models may sound plausible and formally in line with register expectations, but since they were not generated to express meaning in a specific speech situation, it is not immediately clear if they fall within the purview of register as a variable as discussed in this part of the volume. If, without any outward markers, a sizeable number of texts are generated by ‘stochastic parrots’ (Bender et al. Reference Bender, Gebru, McMillan-Major and Shmitchell2021: 617) rather than by human speakers with human communicative needs, a discipline that relies on large databases of text samples may have to rethink from the ground what it means to analyse syntactic variation by speech situation.

Chapter 8 The President wide awake at 3:14 AM tweeting about CNN Informational Non-Canonical Reduced Structures in TV News Broadcasts

8.1 Introduction

In Chapter 1 of this volume, Pham and Leuckert distinguish between two major conceptualisations of non-canonical syntax: constructions that represent a departure from ‘basic’ grammar, and constructions that represent a departure from typical or normal use. The first conceptualisation builds on a theoretical foundation of what constitutes ‘basic’ grammar in a given language. For example, the canonical or basic structure of English clauses is Subject-Verb-(Object or Complement). However, the second conceptualisation can provide a completely complementary perspective. For example, the typical/normal minimal response in English conversation consists of a single word like ok, yeah, oh. Such utterances are non-canonical in that they depart from ‘basic’ structures, but they are canonical in that they are the normal way for a speaker to respond to a previous utterance.

The present chapter documents a case where both perspectives are important: the use of Non-Canonical Reduced Structures (NCRSs) in TV news broadcasts (TVNBs). These are long, elaborated utterances with no main finite verb, and therefore they represent a striking departure from the rules of basic/canonical grammar. Such structures are also non-canonical in that they are rare or virtually unattested in most other registers – both spoken registers (including conversation) and written registers. But, at the same time, we show how a heavy reliance on NCRSs is becoming the norm in certain types of TV news broadcasts, and thus in that sense, these structures are becoming canonical in that register.

One of the major types of non-canonical grammar, in the sense of a departure from basic grammatical structure, involves structural reduction. Previous research has shown that such structures are especially prevalent in face-to-face conversation. The Grammar of spoken and written English (GSWE; Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 1031–120) groups these features into three major categories: Non-Syntactic Non-Clausal Units, Syntactic Non-Clausal Units, and Structurally Reduced Clausal Units.

The first of these categories – Non-Syntactic Non-Clausal Units – can be regarded as pragmatic rather than grammatical features. Non-Syntactic Non-Clausal Units do not have internal syntactic organisation, but they serve a wide range of pragmatic and interactional functions in conversation. Common types of Non-Syntactic Non-Clausal Units include vocatives (e.g., hey you), expletives (e.g., damn, shit), greetings (e.g., hi, bye), discourse markers (e.g., well, ok, right), or other utterance launchers (e.g., oh yeah, hey, there again) (see the survey in Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 1063–93).

In contrast, Syntactic Non-Clausal Units and Structurally Reduced Clausal Units have internal syntactic organisation. Syntactic Non-Clausal Units are phrases that can be described using the normal framework of syntactic analysis, but they are usually structurally incomplete because they lack a main verb (see Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 1093–9). These phrases can stand alone as a complete utterance, usually in response to some previous utterance as in examples (1) and (2).

(1)

Stand-alone noun phrase:

A: Is Nicki giving a lecture?

B: No, a training session

(2)

Stand-alone adjective phrase:

A: All of those houses have big square white things

B: Yeah, very solid inside

Structurally Reduced Clausal Units are main clauses where one or more of the semantically essential elements have been ellipted (see Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 1099–102). For example, subject noun phrases and auxiliary/copular verbs are commonly omitted in conversational utterances (marked by --), as in (3).

(3)

A: Are they happy?

B: -- depends on what you mean by that.

A: Well, -- -- at least content?

In practice, it is difficult to maintain the distinction between Syntactic Non-Clausal Units and Structurally Reduced Clausal Units in analyses of extended discourse. In the first place, dependent clause structures fail to fit tidily into one or the other category, because they are ‘clausal’ but often clearly not a main clause (cf. (4)).

(4)

A: I’m going to go back and get some more of those.

B: Yeah, because they’re a lot higher at the other store.

The bigger problem, though, is that identification of a reduced clausal unit requires speculative interpretation to try to reconstruct the underlying non-reduced form. For example, the stand-alone noun phrase in (1) and adjective phrase in (2) might have been interpreted as reduced clausal units with the subject noun phrase and copula be omitted. We show below that this problem is exacerbated in the analysis of TVNBs, which employs extremely complex sequences of reduced structures. For these reasons, the following analyses are based solely on the forms that are actually employed in discourse, with no attempt to reconstruct an underlying ‘original’ form. As a result, we do not distinguish between Syntactic Non-Clausal Units and Structurally Reduced Clausal Units; rather, these are both treated as instances of Non-Canonical Reduced Structures (NCRSs).

The reliance on NCRSs in conversation can be interpreted in functional terms, associated with the pressures of real-time language production, and the fact that speakers co-construct discourse and share the same time/place, resulting in a frequent use of context-dependent linguistic structures. NCRSs have also been described as a salient characteristic of ‘simplified registers’, which have been categorised according to their functional motivations as ‘handicap registers’ and ‘economy registers’. Ferguson (Reference Ferguson and Hymes1971), focusing on the ‘handicap’ functions, coined the term ‘simplified register’ to refer to the ‘modified varieties of speech typically addressed to listeners not believed to be fully competent in the language’, such as ‘baby talk’ or ‘foreigner talk’ (Ferguson Reference Ferguson, Obler and Menn1982: 49). Janda (Reference Janda1985) extended the analysis of simplified registers to include those that are functionally motivated by the need for efficiency and economy of expression (like note-taking), rather than a perceived listener handicap. Interestingly, though, Janda shows how simplified economy registers employ many of the same NCRSs as simplified handicap registers, such as the omission of subject noun phrases, omission of definite articles in noun phrases, and omission of copula/auxiliary be.

The use of NCRSs in note-taking can be attributed to the time pressure of recording language produced in a real-time situation. However, texts in other written simplified registers – such as newspaper headlines and classified advertisements – have been carefully pre-planned and edited. In these cases, the motivation for economy of expression relates to publication costs, resulting in linguistic forms that express maximal content in as few words as possible (see the detailed discussion in Bruthiaux Reference Bruthiaux1996). As a result, these written economy registers often employ extended sequences of NCRSs, assuming that the intended readers will have the background knowledge required for understanding. Thus consider example (5) from a classified advertisement (from Bruthiaux Reference Bruthiaux1996: 85):

(5)

88 ACURA LEGEND, 5 speed, 4 dr. Red ext w/grey leather int. Loaded, with car phone. Low mileage. Mint condition. Steal = $12,000 obo…

The use of extended sequences of phrasal NCRSs in written economy registers is perhaps not especially surprising, given the fact that writers can take as much time as they want to manipulate the final form of the language that they produce. In contrast, we would never expect a speaker in conversation to produce an extended sequence of phrasal NCRSs like in (5). Rather, corpus research has shown repeatedly that speakers in conversation rely heavily on clausal structures, even though there is also a dense use of reduced structures. That is, NCRSs in conversation tend to occur as responses to a previous utterance, rather than as a sequence of multiple reduced utterances. In addition, any single utterance in conversation is likely to include only a single simple NCRS, rather than a structure consisting of multiple phrases or reduced clauses.

In all of the registers discussed above, the use of NCRSs is motivated by functional considerations relating to the challenges of the production (or perceived comprehension) circumstances, or the desire to achieve more economical linguistic expression. However, Ferguson (Reference Ferguson1983) noticed that Sports Announcer Talk (SAT) is a spoken simplified register that employs many of these same types of NCRSs, but in some cases without time pressure, handicap, or economy motivations. In particular, Ferguson studied the discourse of baseball game radio broadcasts, a speech event directed to inform listeners, where fast-action events are usually separated by considerable amounts of time, permitting extensive opportunity for the announcer to plan their speech and produce as much language as they want. Despite those situational characteristics, the discourse of SAT frequently employs many of the same kinds of NCRSs as conversation and other simplified registers. Ferguson (Reference Ferguson1983: 159) interprets these as instances of clausal structures with a deleted subject pronoun, a deleted copula, or a deleted subject plus copula, as in examples (6)–(10).

(6)

-- had six homeruns last season.

(7)

-- pops it up.

(8)

Klutz – in close at third.

(9)

McCatty--in difficulty.

(10)

-- -- fastball. -- -- strike. -- -- one and one.

The functional motivation of such structures in SAT is less clear than with handicap or economy registers. One possibility suggested by Ferguson (Reference Ferguson1983: 168) is that these NCRSs might have been initially used in SAT modelled on the pattern of news headlines, and as ‘a way of sounding exciting’. Support for that interpretation comes from other studies that describe the characteristics of radio broadcasts of events in progress. For example, the Multi-Dimensional analysis of register variation in Biber (Reference Biber1988: 128) shows that radio broadcasts are surprisingly phrasal, in contrast to all other spoken registers. Detailed analysis of that pattern shows that radio broadcasts are actually structurally reduced: ‘there are relatively few verbs, because many of the verbs are deleted due to time constraints, or to give the impression of action that moves so fast that there is no time for a full description’ (Reference Biber1988: 135). As a result, this register has a higher density of nouns and prepositional phrases than other spoken registers. The radio broadcasts included in the 1988 study were recordings of events in progress (taken from the London-Lund Corpus). Some of these events were live sports competitions, while others were less dynamic events like a funeral procession or a state wedding. Surprisingly, though, radio broadcasts of all of these events relied on a dense use of NCRSs. For example, the excerpt in (11) reporting on a state funeral procession includes numerous phrases and non-finite (dependent) clauses, but no main finite verbs at all.

(11)

flanked [pause]

by its escort of the Royal Air Force [pause]

the gun carriage [pause]

bearing the coffin [longer pause]

draped with the Union Jack [longer pause]

on it [pause]

the gold [pause]

and enamel [pause]

of the insignia of the Garter

(London-Lund Corpus 10.5; see also Biber Reference Biber1988: 134)

Similar to reportage of a baseball game, there is virtually no time pressure on language production in this case. But the broadcaster chooses to produce discourse that relies heavily on NCRSs, giving the impression of action-packed reportage.

The present study focuses on the use of NCRSs in another spoken register produced in circumstances with no time pressure: TVNBs. TVNBs differ from radio broadcasts of events in progress in that they usually report on past events. However, as we show below, TVNBs further differ from all simplified registers previously studied in three major respects:

  • They use NCRSs with much higher frequencies.

  • They employ a wider inventory of different grammatical types of NCRSs.

  • They combine NCRSs in much more complex structural combinations.

Similar to radio broadcasts, the use of NCRSs in this register is not motivated by either handicap or economy considerations. However, since TVNBs normally do not report on events in progress, the functional motivation for these patterns differs from radio broadcasts of live events. As we discuss in Section 8.4.3 below, one major clue to this functional motivation comes from a comparison of the TVNBs offered by different networks, which vary widely in their use of NCRSs associated with their differing emphases on ‘hard news’ versus human interest stories.

8.2 Situational Characteristics of Television Network News Broadcasts

TVNBs are composed mostly of scripted language, combined with some real-time speech produced by reporters in the field or people who are the subject of a news story. The fact that anchors and reporters read pre-scripted reports seems to be a major factor influencing the types and frequencies of NCRSs found in TVNBs (see below). Interestingly, though, the audience must comprehend the discourse in the spoken mode, in real time. We return to this consideration in the conclusion.

TVNBs are organised as a sequence of news stories or segments. The opening segment functions in a similar way to the headlines found in a newspaper, announcing the topics of the subsequent stories, and providing hints of why those stories are interesting. Then, a typical broadcast will include a series of stories/segments that present the news. Some of these will consist of the anchor in the studio presenting a story; some stories will consist of a reporter interviewing other people; and some stories will involve a reporter covering a live event outside of the studio. News stories can also be characterised for whether they cover ‘hard news’ (e.g., stories relating to national or international concerns) versus ‘soft news’ (also known as ‘human interest stories’, which focus more on entertainment purposes than informational purposes). In our analyses below, we compare the patterns of language use across four general types of news segments: the opening segment (referred to as the ‘headlines’), the lead story (i.e., the first major story covered in the broadcast), a later story that was presented in the studio, and a later story that was presented from a location out of the studio.

TVNBs can be regarded as a register with mixed characteristics of speech and writing. Similar to conversation, TVNBs are produced in the spoken mode. However, most language produced by the announcers and reporters has been previously scripted in the written mode and was then read out loud during the broadcast. In contrast, recorded language spoken by other people (either interviewees or people involved in events/situations outside the news studio) has usually been produced in real time as the speaker is deciding what to say.

In the United States, there are several different types of TVNBs, including broadcasts offered by major commercial networks like NBC and CBS; broadcasts offered by non-profit or semi-governmental agencies like PBS; and broadcasts offered by 24-hour news channels like CNN and Fox News. The present study focuses on the evening news broadcasts offered by the three major commercial networks: ABC, CBS, and NBC.

It is easy to assume that the main communicative purpose of TVNBs is to present information about current and past events, similar to newspaper reportage. However, this turns out to be a simplification. In fact, media researchers describe the main communicative purpose of TVNBs as presenting the news stories that will attract the largest audiences possible. As a result, TVNBs have evolved to focus more on entertainment than information (see the extended discussions in Postman and Powers Reference Postman and Powers2008 and Montgomery Reference Montgomery2007). This shift began in the 1970s and 1980s, as TV networks realised that TVNBs could be extremely profitable (because of the advertising revenue) if they succeeded in attracting large audiences. Unlike a newspaper, a TVNB has a limited time duration and can therefore cover only a limited number of stories. For these reasons, TVNBs have evolved to be different from newspapers in their focus on stories that have high audience appeal/interest.

It turns out that the three networks included in the present study differ in this regard. ABC has widely publicised its attempts to humanise the news and thus mostly broadcasts ‘soft news’. In contrast, CBS (and to some extent NBC) focuses much more on the coverage of ‘hard news’ (see Moos Reference Moos2011; Stelter Reference Stelter2012). These characterisations are based on a survey of the topics covered in the respective broadcasts. However, as we show below, these differences also have a major impact on the linguistic style of the broadcasts, especially in relation to their use of NCRSs.

8.3 A Taxonomy of Non-Canonical Reduced Structures Found in TV News Broadcasts

In the simplest cases, NCRSs in news broadcasts occur as an utterance consisting of a single phrase or partial clause. These may be, for example, a noun phrase (NP; e.g., A terrifying situation), an adjective phrase (AdjP; e.g., Yes, very close), a prepositional phrase (PP; e.g., In my career, yes), an ‑ing-clause (e.g., And videotaping their crimes), an ‑ed-clause (e.g., Taken down), a wh-word (e.g., Why not?), or a wh-clause (e.g., What you will never see again).

It turns out, though, that such examples are quite rare in TVNBs. Instead, what we normally find are utterances consisting of a NCRS that consists of multiple constituents. In some cases, these structures could clearly be analysed as a top-level constituent with an embedded constituent, as in (12)–(13).

(12)

[NP [PP]]: A sharp fall on Wall Street

(13)

[AdjP [PP]]: So, close to a ten percent correction

However, as we tried to apply such analyses to coding extended texts, we found that clear-cut examples like those above were rare. Rather, NCRSs in TVNBs normally consist of multiple constituents that have an unclear syntactic relationship to one another. For example, the NCRS in (14) consists of a prepositional phrase followed by a noun phrase.

(14)

In Portland, a hidden world

We interpreted this structure as consisting primarily of a noun phrase, with the prepositional phrase serving an adverbial function. However, there are no explicit signals of syntactic function here, and so many structures of this type are open to multiple interpretations.

Noun phrases followed by non-finite clauses also usually have unclear syntactic structure. For example, the title of our chapter illustrates an NCRS with an embedded -ing-clause (cf. (15)).

(15)

The President wide awake at 3:14 AM tweeting about CNN

In this case, the utterance can be analysed as a reduced finite main clause with two coordinated main verbs (i.e., ‘The President was awake and was tweeting …’). In other cases, like (16), though, similar structures are more plausibly interpreted as a noun phrase modified by a non-finite relative clause.

(16)

The major change tonight involving the Miss America pageant.

[compare: the major change which involved …, versus *the major change was involving …]

However, many structures of this type are not readily interpretable as deriving from either a reduced finite main clause or from a non-finite relative clause. For example, consider the second utterance in the announcement in (17).

(17)

We also have new reporting coming in right now involving a troubling security breach at a US Air Force base. A driver breaching the main gate at Travis Air Force Base, then crashing, the SUV exploding.

In this NCRS, the -ing-clauses do not function to specify the identity of head nouns, and thus a non-finite relative clause interpretation is not plausible. That is, the goal is not to identify a particular ‘driver’ or a particular ‘SUV’. Rather, the -ing-clauses function to tell us what the ‘driver’ did, and what happened to the ‘SUV’. At the same time, though, these structures cannot be readily interpreted as reduced progressive aspect clauses. That is, the intended meaning is not that ‘the driver was breaching’ or that ‘the SUV was exploding’. Rather, the more likely canonical forms would have been ‘A driver breached …’ and ‘the SUV exploded’.

Reflecting such uncertainties, our analysis here focuses on the sequences of grammatical structures employed in NCRSs, rather than trying to force a specific analysis of syntactic relations. For the quantitative analyses in Section 8.4 below, we made an interpretation of the top-level constituent in each sequence of structures, and counted each NCRS as a token of that category. As a result, the overwhelming majority of tokens in our corpus analysis are categorised as top-level noun phrase structures combined with other secondary structures. However, many of these instances are like example (17), discussed above, which could have been analysed as top-level clausal structures with a secondary noun phrase.

It turns out that the specific classification of NCRS tokens has little bearing on the overall patterns found in our corpus analysis. Rather, the main finding about the nature of NCRSs in TVNBs is that they are incredibly long and complex, in contrast to the NCRSs documented in previous research. As a result, any attempt at classification is a very poor reflection of the huge range of structural and syntactic variation and complexity actually found in these utterances. Thus, consider examples (18)–(21), which are typical of the NCRSs found in this register.

(18)

That controversial tweet one of eighteen in five days about Russian meddling following the indictments of thirteen Russian officials by Special Counsel Robert Mueller, including blaming his predecessor.

(19)

Also tonight, after President Trump told millions that the government is fully prepared for this hurricane, the President tweeting today, questioning how many really died in Puerto Rico, saying 3,000 people didn’t die after hurricanes hit Puerto Rico.

(20)

On the eve of Michael Flynn’s sentencing, former FBI director James Comey pushing back on Flynn’s claim the FBI never warned him of the consequences about lying about his contacts with the Russian ambassador.

(21)

New tonight, what the judge is now saying involving Paul Manafort after Robert Mueller’s team accused him of tampering with witnesses while he was under house arrest, allegedly trying to get them to lie.

These utterances all lack a finite main verb, making them instances of NCRSs. The utterances also share the characteristic that they are composed of multiple grammatical structures (phrases and dependent clauses) that have complex (and often indeterminate) syntactic relations to one another. Beyond those similarities, however, the four examples are strikingly different in terms of the particular structures that are combined and in terms of the syntactic relations among those structures. As such, they clearly illustrate the inadequacies of any attempt to exhaustively classify tokens of NCRSs in TVNBs. Rather, the main pattern that emerges from the corpus-based analysis of these structures is their extreme diversity and complexity, in contrast to the types of NCRSs previously described in other registers. Those corpus findings are discussed in the following section.

8.4 Distribution of NCRS Types in TV News Broadcasts
8.4.1 Corpus and Methods

Our quantitative-linguistic description of TVNBs is based on analysis of a corpus of 144 texts sampled from the three US television network news broadcasts for the year 2018. We began by selecting one broadcast for each month of the year, and then sampled four specific segments/stories from each broadcast: the ‘headlines’, the lead story, a story presented in the studio, and a story presented outside of the studio (see Table 8.1). All texts were collected from the major evening news shows of the three networks (i.e., ABC’s World News Tonight, CBS’s Evening News, and NBC’s Nightly News). The entire 30-minute broadcast was downloaded for each week. Most of the language in these broadcasts had been previously scripted and then read out loud by the announcers or reporters. However, broadcasts also include some spontaneous spoken language produced by interviewees or by other people who appear in video clips.

Table 8.1Summary of the 2018 TVNB Corpus
The table compares the linguistic description of 144 text samples from three U S television network news broadcasts for the year 2018. See long description.
Table 8.1Long description

The table is divided into 3 columns. The labels for the columns are network or broadcast segment, number of texts, and number of words. The three major network broadcasts are A B C, C B S, and N B C. A last row is provided for total. Each broadcast channel is subdivided into 4 sections: headlines, lead story, in-studio, and on-scene.

The data in the table provided are as follows:

  1. 1. A B C broadcast:

    • The values for headlines are 12 and 2569.

    • The values for lead story are 12 and 7940.

    • The values for in-studio are 12 and 3930.

    • The values for on-scene are 12 and 4120.

  2. 2. For C B S broadcast:

    • The values for headlines are 12 and 2504.

    • The values for lead story are 12 and 4842.

    • The values for in-studio are 12 and 4396.

    • The values for on-scene are 12 and 4266.

  3. 3. For N B C broadcast:

    • The values for headlines are 12 and 2183.

    • The values for lead story are 12 and 6021.

    • The values for in-studio are 12 and 4159.

    • The values for on-scene are 12 and 4122.

  4. 4. The values for the total are 144 and 51052.

Texts were downloaded from the Access World News database. Transcription was taken at face value. We relied on the punctuation in the transcripts to determine utterance boundaries. An informal comparison of actual video broadcasts to the transcribed texts indicates that the transcriptions are accurate representations of the words produced in speech, and that the clausal punctuation conventions generally corresponded to our own perceptions of utterance boundaries. As noted in Section 8.2, we distinguish among the four major segments of a TVNB (headlines, lead story, in-studio reportage, on-scene reportage) for our analyses of NCRSs.

The texts included in our corpus were hand-coded to identify all tokens of NCRSs, operationally defined as an utterance that does not include a finite main verb. NCRSs were then further hand-coded for their structural characteristics, including the top-level structure (e.g., noun phrase, prepositional phrase, -ing-clause), the structural type of any adjacent secondary structure, the presence of multiple secondary structures, and the presence of a deictic time/place adverbial. The frequency of each major combination of structures was counted in each text, and then converted to rates of occurrence (per 1,000 words).

8.4.2 Comparison of NCRSs in TV News Broadcasts versus Conversation

To establish a baseline for comparison, we carried out the same analyses on a small corpus of 10 AmE conversations (from the LSWE Corpus), totalling roughly 11,100 words. The results, summarised in Figure 8.1, confirm the qualitative descriptions in the GSWE (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: chapter 14), showing that most NCRSs in conversation are simple structures. Three structural types predominate in conversation: simple noun phrases, wh-questions with no verb, and simple verb phrases (i.e., with no subject and/or with an omitted auxiliary verb).

Bar graph comparing linguistic structures in T V news and conversations, shown as rates per 1,000 words. News broadcasts have higher frequencies than conversations, especially for Total N P and Total complex. See long description.

Figure 8.1 NCRS types in TV news broadcasts versus conversation (rate per 1,000 words)

Figure 8.1Long description

The vertical axis marks rate per 1000 words and ranges from 0 to 30, in increments of 5. The horizontal axis marks the non-canonical reduced structures or N C R S, including total N P, simple P P, simple V P, simple w h-structure, simple N P, N P plus e d, N P plus i n g, N P plus P P, N P plus other, and total complex. The vertical bar graph includes two types: a shaded one indicates news broadcasts, and a nonshaded one indicates conversation. The trend observed in price from left to right for a set of vertical bar graphs are 25, 4, 1, 0, 0, 1, 1, 1, 2, 3, 2, 0, 8, 0, 7, 0, 3, 0, and 16, 0.

In most cases, all three of these structural types occur in adjacency pairs with a preceding utterance, where the NCRS provides a response, asks clarification, or provides active engagement (often with a direct repetition of part of the preceding utterance). Noun phrases occur most frequently with these functions, as in examples (22)–(27):

(22)

A: And her, her brother once played with uh.

B: Oh. Duke Ellington?

A: Yeah Duke Ellington.

(23)

A: Were they like throwing things at you? Like shoes and stuff?

(24)

A: The oysters were this big, I’ve never seen oysters that big.

B: I know you said that.

A: Six huge, huge oysters.

(25)

A: They didn’t even bring us forks.

B: Yeah no forks, no, no plates.

(26)

A: Also our friend is very interested because he’s the one that started the, the little uh, the saints.

B: The Young Saints?

(27)

A: It says, what?

B: Dinner at five-thirty, reception to follow

However, wh-questions with no verb (cf. (28)–(30)) and simple verb phrases (with omitted subject and/or omitted auxiliary verb; cf. (31)–(32)) are also found.

(28)

A: Of course, if she’s late, then

B: Who?

(29)

A: I think you’ll need a fork for this.

C: Mm.

A: Nancy, what about you?

(30)

A: Can we take out the middle part?

B: What middle part?

C: Of what?

A: The table.

(31)

A: We saw pictures of you doing that. -- Doing your act.

(32)

A: Well then I’ll go at four.

B: -- you going to take your car?

In contrast, Figure 8.1 shows that NCRSs in TVNBs (1) are much more frequent overall, but (2) are rarely simple structures. Simple verb phrase structures rarely occur in news broadcasts, but simple noun phrases and wh-structures do occur, with roughly the same frequencies as in conversation. However, because the discourse of TVNBs is usually not dialogic, these structures are usually not part of an adjacency pair. Rather, simple noun phrases and wh-structures are usually used as topicalisation devices, emphasising the topic of the following discourse. Such structures are especially common in the headline segments of a TVNB, as in (33).

(33)

And, a holiday jobs bonanza. Companies so desperate for workers, they’re offering job perks and higher pay. How you can cash in.

In example (33), the first NCRS is a simple noun phrase (a holiday jobs bonanza), introducing the topic. The last NCRS is a wh-clause – a dependent clause with no accompanying main clause. The middle NCRS, which is actually much more typical of TVNBs, is a highly complex noun phrase, followed by a postnominal comparative construction, which itself consists of multiple phrases and a dependent clause.

Figure 8.1 also shows that simple prepositional phrases occasionally occur in TVNBs. Similar to the simple NP and wh-NCRSs, these simple PPs serve topicalisation functions; as in (34).

(34)

First, to the Mueller interview itself.

The most dramatic pattern shown in Figure 8.1, however, is the much higher frequencies of complex NCRSs in TVNBs compared to conversation. In fact, complex NCRSs are much more frequent than simple NCRSs in this register. The overwhelming majority of these are noun phrases combined in a sequence with other phrases and dependent clauses. Three major types of combination, illustrated in (35)–(38), are especially prevalent: NP + PP (+PP), NP + -ing-clause, and NP + -ed-clause.

(35)

NP + PP (+PP)

A sharp fall on Wall Street.

<(Bell tolling)>

The Dow drops more than eleven hundred points as a sell-off continues.

Also, tonight, the security plans for an Olympics just fifty miles from North Korea. […]

And the leader of the pack of underdogs.

(36)

NP + -ing-clause

The lead personal attorney John Dowd resigning. This leaves a big vacancy. The other headline breaking at this hour, General H.R. McMaster now out, a new national security adviser coming.

(37)

NP + -ed-clause

In central Pennsylvania, this morning, a seven-year-old killed in a hit and run.

(38)

NP + -ed-clause

Our team on Syria tonight, the horror, new airstrikes. Families, young children seen screaming. More than 200 killed.

It actually turns out that it is difficult to find relatively non-complex examples like those above. Rather, the overwhelming majority of NCRSs in these TVNBs are extremely complex constructions, involving multiple phrases and dependent clauses. The result shown in Figure 8.1 for ‘Total complex’ (i.e., NCRSs that include more than three structures) shows that these long, complex structures are extremely common in news broadcasts, occurring over 15 times per 1,000 words. In contrast, such structures are virtually non-existent in conversation. We have already presented several examples of such structures in TVNBs; other examples include (39)–(41):

(39)

A new tropical threat closing in tonight with fears of mudslides and landslides and life threatening flash floods.

(40)

Plus, NBC News obtaining this photo showing Kavanaugh and his second accuser, amid a new effort to get information to the FBI and the third accuser tonight.

(41)

With a potentially dramatic and contentious showdown set for Monday, Kavanaugh spotted this morning en route to the White House, said to be eager to testify.

These examples illustrate a major functional characteristic of NCRSs in TVNBs: that they use phrases and non-finite clauses to compress a lot of information into relatively few words. Phrasal modifiers with similar functions are common in the prose of academic research writing and other informational written registers (see Biber & Gray Reference Biber and Gray2016; Biber et al. Reference Biber, Gray, Staples and Egbert2022), but they are rare in most spoken registers. Thus, the extremely frequent use of such structures in TVNBs is especially noteworthy. We return to a fuller discussion of this pattern in the conclusion.

8.4.3 Use of NCRSs across the Segments of TV News Broadcasts

As discussed in Section 8.2, the segments of TVNBs have quite different situational characteristics. Therefore, it is likely that NCRSs would be used differently in the four discourse types.

Figure 8.2 shows that this expectation is realised, with NCRSs being much more frequent in the headline segment than in any other segment of TVNBs. The overwhelming majority of those structures are headed by a noun phrase.

A vertical bar graph displays rates of N C R S types in T V news. See long description.

Figure 8.2 NCRS types across segments of TV news broadcasts (rate per 1,000 words)

Figure 8.2Long description

The vertical axis marks rate per 1000 words and ranges from 0 to 60, in increments of 10. The horizontal axis marks the non-canonical reduced structures or N C R S, including total N P, P P, w h-structure, i n g clauses, and others. The vertical bar graph includes four types, which are shaded from dark to light and indicate headlines, lead, scene, and studio. The total N P is the maximum for the segments of T V news which declines gradually over P P, and other segments, and rises slightly at others around 4.

Figure 8.3 (which contrasts headlines with all other segments combined) provides further details about the NP NCRS constructions, showing that most of these NP-headed structures are complex rather than simple noun phrases. More detailed consideration of these results shows that an extremely high proportion of all discourse in the TVNB headline segment consists of complex NCRSs. That is, Figure 8.3 presents the rates of occurrence of NCRSs per 1,000 words of text. However, a typical complex NCRS is longer than 20 words. As a result, the finding for complex NCRSs in Figure 8.3 – that there are approximately 34 complex NCRSs per 1,000 words in headlines – means that almost 70% of the words in headlines consist of complex NCRSs (i.e., 34 NCRSs × 20 words each). The extended example in (42) illustrates the dense concentration of complex NCRSs in a headline segment.

(42)

Opening segment with the ‘headlines’

<ANCHOR> Tonight, the fiery hearing on Capitol Hill.

<GRAPHICS: BREAKING> The FBI agent in the hot seat after that agent sent text messages about then-candidate Donald Trump, what he wrote. And, tonight, he’s now firing back at allegations of personal bias within the FBI as they investigated Russia, the Trump campaign and Hillary Clinton’s emails. At times, today, the hearing turning ugly and personal.

<VIDEO with CONGRESSMAN talking> Mr. Chairman, this is outrageous.

<GRAPHICS: TRUMP DECLARES> Also tonight, President Trump declaring victory at NATO, saying other world leaders had just agreed to pay more. But what really happened? The French president, Emmanuel Macron, then saying there was no such agreement.

<GRAPHICS: DEADLY RIP> Deadly rip currents up and down the east coast tonight. The body of a swimmer pulled from the water. And, tonight, the teenager surviving nearly 10 hours after being pulled out to sea.

<GRAPHICS: DEADLY BOULDER> The tragedy on an American roadway, the driver under arrest. The 800-pound boulder falling out of the back of his truck, killing a mother and daughter.

<GRAPHICS: PAPA JOHN’S> The fallout tonight involving Papa John’s and its founder after using a racial slur during a conference call. What’s happened now?

<GRAPHICS: TOY STORE> The American toy store and its promotion growing out of control. After telling parents they could pay their child’s age, they could not handle the lines forming across America.

<GRAPHICS: MALL> And the shopping mall collapsing tonight. Worried shoppers had just been evacuated.

In some ways, these structures are reminiscent of the titles of newspaper articles. For example, consider the titles of articles from the New York Times given in (43)–(47), which all illustrate NCRSs constructed with noun phrases, prepositional phrases, or wh-clauses.

(43)

The Man Who May Challenge Putin for Power

(44)

The Revolutionary Power of a Skein of Yarn

(45)

On Trump’s Social Network: Ads for Miracle Cures, Scams and Fake Merchandise

(46)

Why the New Obesity Guidelines for Kids Terrify Me

(47)

How Barr’s Quest to Find Flaws in the Russia Inquiry Unraveled

Such examples show that the NCRSs in TVNBs versus newspapers are similar in their basic grammatical building blocks. However, NCRSs in TVNBs are noteworthy in three other respects: (1) the extreme complexity of individual NCRSs; (2) the extreme density of NCRSs over an extended stretch of discourse; and (3) the fact that these NCRSs are intended to be comprehended in real time by listeners. Newspaper titles are usually between 5 and 10 words long with a few embedded structures, in contrast to the NCRSs in TVNBs, which are frequently longer than 20 words with numerous embedded structures. The two are further different in that the newspaper title is immediately followed by the actual prose story, using standard canonical structures. The title functions to announce the general topic. In contrast, the opening segment of a TVNB consists of a barrage of long, complex NCRSs in sequence. These function as mini-synopses of the content of news stories, rather than as simple announcements of topics. The fact that listeners are able to comprehend discourse of this type in real time requires explanation; we return to that issue in the conclusion.

Bar graph compares frequency of noun phrase types per 1,000 words in headlines and other segments. See long description.

Figure 8.3 NP NCRS types in headlines versus other segments (rate per 1,000 words)

Figure 8.3Long description

The vertical axis marks rate per 1000 words and ranges from 0 to 40, in increments of 5. The horizontal axis marks the non-canonical reduced structures or N C R S, including simple N P, N P plus e d, N P plus i n g, N P plus P P, N P plus other, total complex. The vertical bar graph includes two types: a dark shaded one for headlines and a light shaded for other segments. The rate for headlines across the various N C R S begins from 5, remain constant, rises to 15, 18, drops to 4, and then peaks to 34. The rate for other segments across the various N C R S starts from 2, remain constant, rises to 6, diminishes to around 4 to 3, and then peaks to 11.

The extremely dense use of complex NCRSs in the ‘headlines’ opening segment might overshadow their use in other segments of TVNBs. However, Figures 8.2 and 8.3 show that NCRSs – including complex structures – are also quite common in the other segments of news broadcasts. This pattern shows that NCRSs are not restricted to ‘headline’ functions. Rather, they are frequently integrated into the normal discourse of regular news stories, as in the story in (48) about then-President Trump’s recent actions. Notice in particular the contrast between discourse produced in real time (by the people talking in the recorded videos) versus the discourse produced by the news anchor: all utterances in this story produced by the news anchor incorporate complex NCRSs, reflecting the fact that the language was probably pre-scripted. In contrast, recorded discourse produced by either Trump or Sanders is probably not pre-scripted, and shows no instances of NCRSs.

(48)

In-studio news story:

<ANCHOR> Tonight, President Trump vowing to take action after the Florida school massacre, which White House sources say impacted him personally.

<VIDEO with TRUMP> We must do more to protect our children. We have to do more to protect our children.

<ANCHOR> The President asking for regulations banning bump stocks, the devices allowing some guns to shoot hundreds of rounds per minute. The President had already called for a review of bump stocks after they were used by the Las Vegas shooter.

<VIDEO with TRUMP> I expect that these critical regulations will be finalized, Jeff, very soon.

<ANCHOR> Under fire from victims to do more--[incomplete]

<VIDEO with CROWD > Shame on you. Shame on you.

<ANCHOR> The White House now considering legislation that would strengthen background checks, but neither of those efforts would have stopped the Florida shooter.

<VIDEO with SANDERS> The President is trying to do everything that he can.

<ANCHOR> White House Press Secretary Sarah Sanders also facing questions tonight about the President’s tweet blaming the FBI for missing warning signs about the shooter because, quote, “They are spending too much time trying to prove Russian collusion with the Trump campaign. There is no collusion.”

<VIDEO with SANDERS> We would like our FBI agencies to -- to not be focused on something that is clearly a hoax in terms of investigating the Trump campaign and its involvement.

<ANCHOR> That controversial tweet one of eighteen in five days about Russian meddling following the indictments of thirteen Russian officials by Special Counsel Robert Mueller, including blaming his predecessor.

Examples like (48) show that complex NCRSs can actually be the normal strategy for constructing certain types of news stories, rather than a specialised device restricted to the opening segment that introduces the stories included in the news broadcast.

8.4.4 Use of NCRSs across TV Networks

The style of discourse illustrated in (42) and (48) above is a recent historical innovation, in addition to being peculiar in comparison to other spoken registers. In fact, some present-day news stories in TVNBs make minimal use of NCRSs, as in (49).

(49)

<ANCHOR> We begin tonight with a deadly collision on an interstate highway in New Mexico. A tractor-trailer truck hit a Greyhound bus head-on. Hospitals report at least four people were killed and at least thirty-five injured, at least three critically. The bus was carrying forty-seven people from Albuquerque to Phoenix. Mireya Villarreal has late details of this developing story.

<REPORTER with VIDEO> A horrific scene sprawled out over New Mexico’s Interstate 40 near the Arizona state line. This Greyhound bus’s front end ripped off, debris scattered across the highway as emergency crews desperately tried to reach victims.

<WOMAN talking > Oh, God.

< REPORTER with VIDEO> New Mexico state police confirm multiple deaths. The bus was carrying forty-seven passengers and heading to Phoenix, Arizona, from St. Louis. Many of the seriously injured were taken to area hospitals. One trauma center says it has received six patients, three in critical condition. Investigators believe a tractor-trailer crossed the median and hit the bus head-on. The tractor-trailer lost most of its haul, and another vehicle was left a mangled mess.

<REPORTER> Witnesses say it took a while to get some of the passengers out of the bus. Bystanders were actually wrapping children in blankets as they sat along the highway. And at least two hospitals sent their choppers in to pick up the most critical patients. John, clearly these victims are the top priority, but troopers will quickly move to the investigation to try and figure out exactly how this happened.

Our casual observations of news broadcasts indicated that their differential reliance on NCRSs is strongly associated with the preferred discourse styles of different networks (e.g., commercial versus public; all-news networks versus general networks). The present study focuses only on news broadcasts offered by the three major commercial networks in the United States. However, Figure 8.4 shows that there are major differences in preferred style even within that restricted sample. ABC exhibits a much greater reliance on complex NCRSs than the other networks, while CBS exhibits the least frequent use of these structures. Excerpts (42) and (48) above illustrate the style of discourse typical in ABC broadcasts, while (49) illustrates the more traditional and conservative style found in CBS broadcasts. NBC is intermediate between the two. In fact, (48) above is a story from an NBC broadcast, exhibiting an extremely frequent use of complex NCRSs, but many other NBC stories are much more similar in style to (49).

A bar chart compares the rate per 1,000 words of syntactic structures in broadcasts from A B C, C B S, and N B C. See long description.

Figure 8.4 NCRS types across networks (rate per 1,000 words)

Figure 8.4Long description

The vertical axis marks rate per 1000 words and ranges from 0 to 45, in increments of 5. The horizontal axis marks the non-canonical reduced structures or N C R S, including total N P, w h-structure, i n g clause, other, simple N P, N P plus e d, N P plus i n g, N P plus P P, N P plus other, total complex, and deictic adverbs. and total complex. The vertical bar graph includes three types: a dark shaded one for A B C, a mild shaded one for N B C, and a light shaded one for C B S. The rate is maximum for A B C news in all domains and peaks for total N P. The C B S and N B C have approximately lower costs in all domains.

It is not merely a coincidence that these linguistic differences correspond to the advertised emphases of the news networks, with ABC emphasising ‘soft news’ and humanising the news, versus CBS emphasising coverage of ‘hard news’. Previous descriptions of these differences have focused on differences in the topics of news stories. For example, hard news stories tend to cover major national and international events and issues, while a soft news story might be about an alligator wandering into a swimming pool in Florida. However, the results here suggest a much more basic and pervasive difference between the emphases, with stories on essentially the same topic being presented with fundamentally different linguistic styles. Apparently, the extreme reliance on complex NCRSs in ABC broadcasts is intended to convey high human interest, perhaps by implying a sense of urgency that precludes production of complete canonical structures. In contrast, the reliance on canonical structures in CBS broadcasts conveys a no-nonsense reporting of ‘just-the-facts’, associated with the emphasis on hard news. Future research is required to explore the differing motivations and effects of these contrasting discourse styles. However, the results here clearly show that the dense use of complex NCRSs is not only unique to TVNBs, but that it is further restricted to particular kinds of broadcasts hoping to achieve specific audience effects.

8.5 Conclusion

NCRSs in TVNBs are interesting because they are ‘non-canonical’ in both of the two major senses introduced in Chapter 1 of this volume: they fail to conform to the minimally grammatical clause structures recognised by grammatical theory, and they represent types of grammatical structures that are extremely rare in most other registers. NCRSs are especially interesting because they are rare in face-to-face conversation, making it unlikely that they have a functional motivation relating to constraints of the production circumstances. Rather, it turns out that these are often extremely complex constructions, both structurally and syntactically, which could not normally be produced in spontaneous speech. Thus, their functional motivation seems related to creating a perception of urgency and excitement, coupled with compressing a lot of information into a single utterance.

NCRSs are further interesting because they are arguably becoming the ‘canonical’ form in certain types of TVNBs. In particular, news broadcasts from ABC make extensive use of NCRSs, in all segments of the broadcast, to the extent that reliance on these structures has become the unmarked style of discourse. Thus, from the perspective of language use, NCRSs could be portrayed as the canonical form of utterances in ABC news broadcasts.

The findings here are also interesting because they complement previous studies of grammatical complexity carried out from a Register-Functional perspective (see especially Biber et al. Reference Biber, Gray, Staples and Egbert2022). In particular, the use of NCRSs in TVNBs is noteworthy because:

  1. (1) These non-canonical structures are more grammatically complex than those found in most other registers, and they occur with much higher frequencies.

  2. (2) These non-canonical structures result in a type of text complexity that has not been found in other spoken registers – that is, a discourse style relying primarily on phrasal rather than clausal structures.

In the preceding sections, we have focused on the first of these two considerations. The corpus findings presented above document both the grammatical characteristics of NCRS types that had not been previously described and the characteristics of a register and discourse style that had not been previously noticed.

However, the theoretical implications of the second consideration are equally important. The findings here appear to contradict one of the major generalisations of previous research on text complexity carried out from a register-functional perspective: that all spoken registers, regardless of their communicative purpose, rely on clauses to construct text (including an extensive use of dependent clauses), in contrast to certain written registers, which rely on phrasal complexity (see the detailed descriptions of complexity variation in Biber Reference Biber1992, Biber & Gray Reference Biber and Gray2016, Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021, Biber et al. Reference Biber, Gray, Staples and Egbert2022, and Biber et al. Reference Biber, Larsson and Hancock2023, Reference Biber, Larsson and Hancock2024). This generalisation dates back to the Biber (Reference Biber1988) multi-dimensional study of spoken/written register variation, which concluded that the patterns of register variation within the spoken mode are fundamentally different from those within the written mode:

There is a difference between speech and writing in the range of forms that are produced in each mode. That is, there seems to be a cognitive ceiling on the frequency of certain syntactic constructions in speech, so that there is a difference in the potential forms of the two modes. … [This difference …] seems to be related primarily to the processing constraints of speech – to the fact that even the most carefully planned and informational spoken [registers] are produced and comprehended in real-time, setting a cognitive ceiling for the syntactic and lexical complexity ….

(Biber Reference Biber1988: 163; emphasis in the original)

In short, spoken discourse is produced in real time, while written discourse is produced in situations that allow for extensive planning and revision. Writers have the possibility of taking as much time as they want to plan exactly what they want to write, and if they write something unintended, they can delete, add, revise, or edit the language of the text.Footnote 1 As a result, other factors – especially communicative purpose – can have a major influence on the linguistic characteristics of written texts. In contrast, all spontaneous spoken registers are produced in real time, which constrains the extent to which the speaker can vary linguistic characteristics, regardless of the communicative purpose.

This difference in production circumstances is especially relevant for the kinds of complexity features used in speech versus writing. Popular written registers (like fiction or blogs) employ many clauses, including finite dependent clauses. But informational written registers (like academic research articles) use relatively few verbs, and instead employ many phrases (noun phrases, prepositional phrases, adjective phrases), functioning mostly as modifiers of other noun phrases. In contrast, all spoken registers – regardless of their informational or interpersonal focus – have been found to rely on a clausal style of discourse (see Biber et al. Reference Biber, Gray, Staples and Egbert2022). Thus, the extreme reliance on phrasal text complexity is found only in written informational registers, apparently enabled by the extended production opportunities provided by the written mode.

One previously noted exception to this general pattern is the complexity profile of radio broadcasts documented in Biber (Reference Biber1992), which was intermediate between the characteristics of other spoken registers versus informational written registers in its use of phrasal complexity features (see Biber Reference Biber1992: 155, figure 3). The findings presented in the preceding sections of the present study indicate that TVNBs are an even more dramatic exception to this general pattern, representing a spoken register with an extremely frequent use of phrases and non-finite clauses, contrasted with a comparatively rare use of main verbs and finite clauses.

While we do not have space for a full discussion of this finding here, there are three major considerations that should be noted. First, the phrasal NCRSs in TVNBs are for the most part produced in writing. That is, they are originally scripted in writing, even though they are delivered and comprehended in the spoken mode. Thus, they do not represent an exception to the general claim that the dense use of phrasal complexity features is not normally feasible in spoken production. In this regard, it is interesting to contrast the previously scripted discourse spoken by the anchor in (48) – with a dense use of phrasal NCRSs – versus the recorded spoken utterances produced spontaneously by others, which never employ NCRSs.

This finding indicates that there is a major difference between the production constraints versus the comprehension constraints of the spoken mode. That is, the characteristics of TVNBs documented here indicate that the dense use of phrasal NCRSs poses little problem for the comprehension of discourse in the spoken mode, even though the findings here are consistent with the claim that such structures are difficult to produce in the spoken mode.

However, there are other differences between the phrasal complexity features in informational written registers versus the phrasal NCRSs common in TVNBs. The first difference is that informational written registers rarely use reduced non-canonical structures. Rather, written discourse relies on complete ‘canonical’ main clauses. Phrasal complexity features are common because information is packaged primarily in noun phrases that are modified by other phrases; but the discourse overall consists of complete main clauses with finite verbs.

This difference relates to the differing syntactic functions of phrases and non-finite clauses in TVNBs versus informational writing. Phrases and non-finite clauses usually function as modifiers of other phrases in informational written registers. In contrast, the NCRS phrases and non-finite clauses illustrated in the TVNB excerpts above are more readily interpreted as clause constituents rather than phrasal modifiers. We have deliberately avoided such syntactic interpretations in our analysis, choosing instead to limit ourselves to the structures found on the surface. However, consideration of the likely syntactic functions of constituents in these examples indicates that they are fundamentally different from the typical syntactic functions of phrases in informational written texts.

Finally, it is important to remember that the spoken discourse of TVNBs is almost always supported by videos and other visual images. While we do not have direct evidence of the importance of this characteristic for comprehension, it represents a major difference from most other spoken registers. Thus, it is likely that this characteristic is a major factor enabling the ready comprehension of NCRSs in TVNBs. In future research, we plan to explore these considerations in much more detail, and also undertake historical research on the emergence of complex NCRSs and the evolution of the register of TVNBs.

Chapter 9 What was it about it that you loved? Clefts in Evaluative Language

9.1 Introduction

The term ‘cleft’ is commonly used to subsume several disparate constructions of the English language. What all of these have in common is that they can typically be related to a more basic non-cleft clause which they ‘cleave’ into two parts, of which one is backgrounded in a subordinate clause and one foregrounded in the main clause. Thus, the non-cleft sentence Sigrid loves linguistics can be cleft into Sigrid loves and linguistics. These parts occur in different syntactic functions and orders in the different types of cleft constructions, summarised in Table 9.1.

Table 9.1Summary of cleft types
Table shows different types of cleft sentences with their structure, including basic wh-cleft, reversed wh-cleft, paraphrased wh-cleft, paraphrased reversed wh-cleft, and it-cleft. See long description.
Table 9.1Long description

The table provides five examples of cleft constructions in linguistics, detailing their types and the corresponding components of subject S, verb V, and cleft structure C S. The table is divided into two columns with the headers example and cleft type, abbreviation, and characteristics. The five examples are arranged in five rows. The corresponding data is arranged from left to right as follows:

  1. 1. Example 1: What Sigrid loves is linguistics, the data is:

    • Equals basic w h-cleft or W H C L, S: nominal relative clause, V: specifying B E, and C S: highlighted element.

  2. 2. Example 2: Linguistics is what Sigrid loves.

    • Equals reversed w h-cleft or R W H C L, S: highlighted element, V: specifying B E, and C S: nominal relative clause.

  3. 3. Example 3: The thing that Sigrid loves is linguistics.

    • Equals paraphrased basic w h-cleft, or W H C L, S: general noun plus adnominal relative clause, V: specifying B E, and C S: highlighted element.

  4. 4. Example 4: Linguistics is the thing that Sigrid loves.

    • Equals paraphrased reversed w h-cleft, or R W H C L, S: highlighted element, V: specifying B E, and C S: general noun plus adnominal relative clause.

  5. 5. Example 5: It’s linguistics that Sigrid loves.

    • Equals it cleft or I T C L, expletive it, V: specifying B E, and C S: highlighted element plus relative-like cleft clause.

In (basic) wh-clefts (WHCLs) like (1), the backgrounded part becomes part of a nominal relative clause in subject (S) position, while the highlighted element functions as a subject complement (CS) after specifying be. Reversed wh-clefts (RWHCLs) like (2) invert this order of subordinate clause and highlighted element. WHCLs and RWHCLs are also understood to comprise so-called paraphrased variants, in which the relativiser of the nominal relative clause is replaced by a general noun and an adnominal relative clause as in (3) and (4). In an it-cleft (ITCL) like (5), finally, the highlighted constituent occurs as the CS of the main clause, introduced by expletive it (cf. Hedberg Reference Hedberg2000: 891; Huddleston & Pullum Reference Huddleston and Pullum2002: 67) and specifying BE. The rest becomes part of ‘a relative-like’ ‘cleft clause’ ‘introduced by that, who/which, or zero’ (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 950).Footnote 1

English clefts have only become the focus of linguistic attention at the beginning of the twentieth century. While much linguistic work on clefts focuses on their syntactic or formal analysis, a first influential functional description of WHCLs and RWHCLs was provided by Halliday in Reference Halliday1967 in the larger context of his discussions of the thematic clause structure in English. Halliday described the two types of clefts as identifying clauses, in which a ‘thing to be identified’, realised as a nominal relative clause, is equated with an ‘identifier’. ‘What is significant is that, whichever of the two occurs in first position, the whole of that element is thematic’ (Halliday Reference Halliday1967: 224, 226). While Halliday’s examinations focused primarily on thematic structure, Prince’s groundbreaking study from 1978 concentrated on information-structural characteristics. It was the first to study clefts in naturally occurring discourse. Although with 37 WHCLs and 186 ITCLs Prince’s corpus (Reference Prince1978: 886) was small as measured by twenty-first-century standards, Prince provided descriptions of the information structure and discourse conditions of both constructions and established a taxonomy of ITCLs, all of which are still valid today (cf. Section 9.4). Collins (Reference Collins1991) was the first large-scale corpus investigation of cleft constructions, covering both spoken and written language. In the London-Lund Corpus and the Lancaster-Oslo/Bergen Corpus, he identified 1,785 tokens and analysed them as to formal characteristics, information structure, thematic structure, communicative meanings, and register variation. Prince (Reference Prince1978) and Collins (Reference Collins1991), the primary influences upon this study, and subsequent smaller-scale studies (e.g., Weinert & Miller Reference Weinert and Miller1996; Calude Reference Calude2007; Gast & Levshina Reference Gast, Levshina and De Cesare2014) showed that various (interconnected) factors such as the formality of a communicative situation, mode, register, but also informativity, ‘topicality, presupposition and weight’ (Collins Reference Collins2006: 1706) influence the use of clefts. For example, ITCLs are particularly frequent in academic prose, while RWHCLs highlighting a demonstrative pronoun (e.g., That’s what she said.) are very common in conversation (Collins Reference Collins1991: 181–2; Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 952).

What has not been tackled so far, however, is the question of how the communicative purpose influences the use of cleft constructions. Therefore, this contribution aims to answer the questions whether the primary textual purpose of evaluating is a factor conditioning the use of clefts and, if so, how the individual types of clefts contribute to the linguistic expression of evaluation. To this end, it applies both a quantitative and a qualitative approach. It studies the expression of evaluation in (primarily) evaluative texts (in comparison to primarily non-evaluative texts), as well as the interplay between evaluation and the above-mentioned factors of informativity, topicality, presupposition, and weight in cleft constructions. This contribution is based on the hypotheses that the textual communicative purpose of evaluating is indeed a factor conditioning the use of cleft constructions (H1) and that the linguistic expression of evaluation and certain subtypes of evaluation, occurring in specific positions within the clause, are more characteristic of some types of clefts than of others (H2).

After a discussion of how interpretations of non-canonicity predominant in linguistics may be applied to the different types of cleft constructions, Section 9.3 will introduce the concept of evaluation. Section 9.4 will then provide an overview of the major characteristics of the different types of clefts, before Section 9.5 will introduce the data base of this study and relevant categories of analysis. Then, the results of quantitative and qualitative analyses will be presented (Section 9.6) and Section 9.7 will provide a discussion and conclusion.

9.2 Clefts and Non-Canonicity

The aim of this chapter is to discuss how, first, theory- and then frequency-based approaches to syntactic non-canonicity apply to English clefts. Huddleston and Pullum (Reference Huddleston and Pullum2002: 46) explicitly mention ITCLs as an example of a non-canonical construction, related to a ‘syntactically more basic or elementary’ canonical clause. As mentioned in the Introduction to this volume, this description implies a definition of canonical structures as ‘minimally complete’. Consequently, since all types of clefts involve lexical material absent from the corresponding non-cleft (cf. Table 9.1), all of these are, in fact, non-canonical according to this theory-based approach to syntactic non-canonicity. But while WHCLs and RWHCLs can be accounted for by ‘canonical’ syntactic concepts (cf. Table 9.1), the syntactic analysis of ITCLs, especially the expletive it and the cleft clause, remains controversial. Hence, the latter may be regarded as more non-canonical than the former, making non-canonicity a gradable concept. Furthermore, cleft constructions are now firmly established as a means of information packaging (cf. e.g., Huddleston & Pullum Reference Huddleston and Pullum2002: 67; Ward et al. Reference Ward, Birner, Huddleston, Huddleston and Pullum2002: 1424–5). As Section 9.4 will show in more detail, each type of cleft construction is characterised by a typical distribution of (relatively) given and new information within the clause. Information-structural approaches, which consider SVX the canonical word order and are thus ultimately both theory- and frequency-based, consequently regard all types of clefts as non-canonical. Finally, with overall frequencies (across registers and modes) of approximately 17 WHCLs, 37 RWHCLs, and 40 ITCLs per 100,000 words (calculation based on Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 952),Footnote 2 cleft constructions are clearly rare in comparison to other constructions like relative clauses (more than 1,000 per 100,000 words; cf. Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 597). Purely frequency-based approaches would thus clearly regard all types of clefts as non-canonical, WHCLs even more so than RWHCLs and ITCLs. As mentioned in Section 9.1, however, previous studies on clefts showed that factors like formality, mode, and/or register influence the frequency of clefts. ITCLs, for example, occur frequently in academic discourse, while RWHCLs, especially those which highlight a demonstrative, are popular in unplanned spoken discourse (cf. Collins Reference Collins1991: 181–2; Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 952). Consequently, when the scope of these frequency-based approaches is narrowed down to specific styles, modes, or registers, individual types of clefts may well turn out to be less non-canonical, while probably still non-canonical in comparison to other syntactic constructions.

In summary, all types of clefts are clearly non-canonical – both in consideration of what dominant theoretical approaches define as the basic or elementary (i.e., canonical) clause of the English language and in consideration of overall frequencies. But the above discussion also showed that what is so clearly non-canonical may turn out to be less non-canonical if we narrow our perspective to modes, styles, or specific registers. The present study aims to find out whether clefts may be called canonical when the scope is confined to texts with a specific communicative purpose, namely evaluating.

9.3 Evaluation

Although there is a long research tradition of exploring concepts like modality (e.g., Palmer Reference Palmer1986), evidentiality (e.g., Chafe Reference Chafe, Chafe and Nichols1986), and hedging (e.g., Hyland Reference Hyland1998), research on assessments, attitudes, and feelings truly gained momentum only in the 1990s. Common umbrella terms are now, amongst others, ‘appraisal’ (Martin & White Reference Martin and White2005), ‘affect’ (Ochs & Schieffelin Reference Ochs and Schieffelin1989), ‘stance’ (e.g., Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021), and ‘evaluation’ (Thompson & Hunston Reference Thompson, Hunston, Hunston and Thompson2000). Research on evaluation faces the difficulty that evaluation is often ‘context-dependent’ and only ‘implied’ (Hunston Reference Hunston2011: 10, 13). Consequently, it has mostly been perceived as ‘not amenable to large-scale corpus investigations’ (Biber & Zhang Reference Biber and Zhang2018: 119) and has primarily been ‘explored in lexical terms’ (Hunston & Sinclair Reference Hunston, Sinclair, Hunston and Thompson2000: 74), studies often focusing on specific word classes or even single lexical items. In various publications, however, Biber and others show that there are ‘stance constructions’, that is, ‘lexico-grammatical devices’ (Biber & Zhang Reference Biber and Zhang2018: 104) which feature an overtly evaluative lexeme framing or controlling a grammatical constituent with a proposition (e.g., to like that …, happy that …).Footnote 3 These stance constructions are explicitly evaluative and can easily be identified in large corpora. Furthermore, Hunston and Sinclair (Reference Hunston, Sinclair, Hunston and Thompson2000: 89) also claim that it is possible to identify syntactic constructions or ‘patterns’ typical of evaluative language, mentioning, amongst others, WHCLs. This suggests that cleft constructions might indeed be an important syntactic means for the expression of evaluation.

For the present purpose, ‘evaluation’ is understood to be the verbal expression of attitude towards or feelings about entities, actions, or propositions (cf. Thompson & Hunston Reference Thompson, Hunston, Hunston and Thompson2000: 5). Following Biber and other linguists (e.g., Biber & Finegan Reference Biber and Finegan1989: 94; Biber & Zhang Reference Biber and Zhang2018: 104), ‘stance’ is defined as the explicit expression of evaluation in lexico-grammatical constructions. Contrary to Biber and Zhang (Reference Biber and Zhang2018), however, stance constructions are regarded as a subcategory of evaluation, because these lexico-grammatical devices require an evaluative lexeme.

9.4 Characteristics of Cleft Constructions

Besides the syntactic characteristics mentioned above, the different types of clefts share semantic characteristics: The subordinate clause of all clefts carries a presupposition (cf. Keenan Reference Keenan, Fillmore and Langendoen1971: 45) in the form of an open proposition containing a variable (i.e., ‘Sigrid loves x’), for which the highlighted element specifies a value (i.e., x = ‘linguistics’). All clefts are thus identifying constructions, ‘expressing a relationship between an element that is to be identified (the ‘identified’) and an element that identifies it (the ‘identifier’)’ (Collins Reference Collins1991: 67; cf. Halliday Reference Halliday1967: 223–4) but mentioning these in different order. It might seem contradictory to state that certain types of clefts (see below) contain brand-new information in their subordinate clauses. However, informativity and presupposition represent two distinct concepts (cf. Collins Reference Collins2006: 1710). In fact, placing brand-new information in a subordinate clause which contains a presupposition triggers a ‘known fact’ effect (Prince Reference Prince1978: 904), that is, it marks this information as ‘not-at-issue’, ‘non-negotiable’, ‘non-controversial’ (Collins Reference Collins1991: 119), or known to the recipient, even though this may, strictly speaking, not be the case (cf. Prince Reference Prince1978: 903). Furthermore, all clefts trigger an ‘exhaustiveness implicature’ (Huddleston & Pullum Reference Huddleston and Pullum2002: 1416): the assumption that the highlighted element constitutes ‘an exhaustive listing of the entities which satisfy the identified clause’ (Collins Reference Collins1991: 71).

While all three cleft constructions share these semantic characteristics and have the same propositional content as the corresponding non-cleft, they are by no means interchangeable. First, there are restrictions as to which relativisers can occur and which formal and functional categories may be highlighted.Footnote 4 Furthermore, as is to be expected against the background of their syntactic structure, the information principle, and the principles of end-focus and end-weight (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 888–90), the three cleft constructions differ as to the topicality and typical (relative) information status of their constituents. In WHCLs, the nominal relative clause is thematic, that is, ‘what the sentence is primarily about’ (Collins Reference Collins2006: 1707). Only given or inferable information, ‘which the coöperative speaker may assume is appropriately in the hearer’s consciousness’ (Prince Reference Prince1978: 903), is acceptable, while brand-new information is (usually) out of place. The subordinate clause is thus ‘very low in communicative dynamism’ (Collins Reference Collins2006: 1714). The rhematic highlighted element, by contrast, typically conveys new information (Collins Reference Collins1991: 117), or at least information which ‘must not be assumed to be more prominent in the hearer’s consciousness than the information in the wh-clause’ (Prince Reference Prince1978: 892; emphasis in the original). This mapping of a constituent of low information status ‘on to the semantic function of “identified”, the textual function of theme, and the logico-semantic function of presupposition’ gives WHCLs ‘a characteristically “interpersonal” flavour’ (Collins Reference Collins1991: 133). This is illustrated in (6), where the evaluation as fascinating is inferable, while the CS is brand-new.

(6)

What I found fascinating is the complexities of mental illness.

(EVLA-23-GR)Footnote 5

In typical RWHCLs, by contrast, the thematic highlighted element contains given information. The rhematic nominal relative clause ‘is more likely to contain new information than that’ in WHCLs (Collins Reference Collins1991: 145), but even where new, it typically makes only a small contribution to the development of the discourse, such as in generalisations or clichés (e.g., That’s the way it goes., cf. Collins Reference Collins1991: 146). This limited communicative dynamism of their clausal CS and the overall shortness of RWHCLs suggest that their primary function is that of text-structuring or ‘internal referencing’ (Collins Reference Collins1991: 146). In (7), the demonstrative that is an extended anaphoric reference (i.e., textually evoked), while the cleft clause contains inferable information.

(7)

So that’s why I love them so much.

(CC-522)

For ITCLs, Prince’s study (Reference Prince1978) established the existence of two basic types. In the most frequent type, renamed ‘old-presupposition it-clefts’ by Collins (Reference Collins2006), the thematic foregrounded CS conveys new (or given, but contrastive; cf. Collins Reference Collins1991: 168) information in combination with inferable or evoked information in the cleft clause. In (8), the positive evaluation in the cleft clause is textually evoked in the preceding sentence, but the foregrounded element lists brand-new arguments for this evaluation.

(8)

it was the small things like homemade scones, nice teas, and coffee, ham, cheese and milk in the fridge […] that made this apartment special.

(EVLA-151-ABB)

‘New-presupposition it-clefts’, by contrast, contain brand-new (anchored) information in the cleft clause, which they mark as non-negotiable or non-controversial. More precisely, ‘not only is the hearer not expected to be thinking about the information in the that-clause, but s/he is not expected even to know it’ (Prince Reference Prince1978: 898). Collins distinguishes two subtypes of new-presupposition ITCLs: In the first, the highlighted element is also brand-new, albeit of low communicative dynamism, since it has ‘a “circumstantial” or “scene setting” role’ (Collins Reference Collins2006: 1710). This is the case in (9), from the beginning of a restaurant review, where neither the temporal setting nor the proposition expressed in the cleft clause has been textually evoked. The second subtype of new-presupposition ITCLs highlights an element representing given or inferable information, as in (10) where the personal pronoun is clearly an anaphoric reference.

(9)

It was little more than a year after opening in 2010 that TV chef Tom Kitchin’s second Edinburgh restaurant was awarded a Michelin star.

(EVLA-117-LP)
(10)

he it wǎs ⌴ who built Saint Paul’s chùrch ⌴ in Stoke róad himsélf ⌴ at his own expénse

(cf. Collins Reference Collins2006: 1710)

Finally, Collins (Reference Collins1991: 84), following Halliday (Reference Halliday1967: 236–7), outlines one last difference between the different types of clefts: the theme in ITCLs is given ‘textual prominence’ due to its being predicated, while the theme in WHCLs is given cognitive or ‘ideational prominence’ as one of the two members in an equative relationship.

9.5 Data and Method

This study of clefts in evaluative language is based on the Corpus of Evaluative Language, henceforth the ‘EVLA-Corpus’, a corpus of approx. 310,000 words of English texts from six different primarily evaluative registers, represented in roughly equal shares (cf. Table 9.2). To cover as wide a range of linguistic uses as possible within the scope of this study, the subcorpora represent (1) written language published in print, (2) written language published online, and (3) transcripts of spoken language recorded in videos and published online. While subcorpora (1) and (2) share the mode of transmission, subcorpora (2) and (3) share the medium of publication and a lower degree of formality, related to a lower degree of planning and editing. Each subcorpus combines texts of two different registers evaluating (a) books and (b) food and accommodation, respectively. These were chosen because they represent entities commonly and publicly evaluated or reviewed in everyday life. This enabled the compilation of a representative corpus and ensured that the analysed linguistic features are entrenched as part of established registers. Subcorpus (1) contains 20 academic book reviews (from ten journals representing five humanities) as well as 917 reviews of eating and drinking places and accommodations from three Lonely Planet guidebooks. Subcorpus (Footnote 2) contains 840 non-academic book reviews (of 28 books) from the social cataloguing website Goodreads.com and 450 reviews of eating and drinking places and accommodations from each Tripadvisor and Airbnb. All three online platforms permit users to publish reviews and thus offer reading, travel, or accommodation guidance, respectively. The spoken subcorpus was sampled from the video sharing platform YouTube, more precisely its sub-registers BookTube and Mukbang. The 16 BookTube videos (by four different booktubers) selected for the present purpose were all book reviews and did not include other types of BookTube videos like bookshelf tours or book haul videos.Footnote 6 Mukbang, finally, originated in South Korea in the late 2000s. In its asynchronous Westernised form, Mukbang consists of audiovisual recordings shared on platforms like YouTube in which a so-called mukbanger consumes and comments on a meal.Footnote 7 The 20 mukbang videos (by five different mukbangers) transcribed for the present purpose are all reviews of dishes (some by fast-food chains). As far as (author-/user-)names, places of residence, and language use permitted, only texts by writers/speakers with English as an L1 were included. Doubtful cases were excluded from the corpus. Finally, this EVLA-Corpus was complemented by a primarily non-evaluative Control Corpus (CC) of approx. 60,000 words, structured into the same subcorpora (1)–(3).

Table 9.2Composition of the EVLA-Corpus and word count
(1) Written, print(2) Written, online(3) Spoken, online
(a) Academic reviews (AC)(a) Goodreads reviews (GR)(a) BookTube videos (BT)
(b) Lonely Planet reviews (LP)(b) Tripadvisor and Airbnb reviews (TA / ABB)(b) Mukbang videos (MUK)
95,80697,326118,353

The primarily manual analysis of these corpora yielded a total number of 532 clefts: 433 in the EVLA-Corpus (120 WHCLs, 268 RWHCLs, 45 ITCLs) and 99 in the CC-Corpus (25 WHCLs, 61 RWHCLs, 13 ITCLs). These were extracted with a context of five preceding and five subsequent C-Units (cf. Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 1060–103) and then annotated for various variables related to clefts and evaluation, summarised in Table 9.3 and Table 9.4.

Table 9.3Cleft-related variables and levels used in data annotation
The table lists the variables related to clefts and the levels of cleft-related variables in different corpora. See long description.
Table 9.3Long description

The table is segmented into two columns with the header variables related to clefts and levels of cleft-related variables. The data is presented in different rows. The second row is subdivided into 4 sections while the third row into 3 sections. The data filled in from left to right in the table are as follows:

  • The levels for the type of clefts are W H CL, R W H C L, I T C L.

For highlighted constituent, the data is:

  • The levels for form are noun phrase (N P), adverb phrase, adjective phrase, prepositional phrase, finite clause, non-finite clause, verbless clause.

  • The levels for syntactic function in corresponding non-cleft are subject, direct object (O d), prepositional object, subject complement (C S), adverbial, predication, premodifier, prepositional complement.

  • The levels for weight are 1 to 10 orthographic units.

  • The levels for information status are new, inferable, evoked.

For subordinate clause, the data is:

  • The levels for relativiser are what, who, why, where, when, how (plus adjective/ adverb), that, zero, non-finite clause.

  • The levels for weight are 1 to 10 orthographic units.

  • The levels for information status are new, inferable, evoked.

Table 9.4Evaluation-related variables and levels used in data annotation
Table outlines evaluation-related variables, including syntactic function of the evaluative constituent, semantic category, degree of directness, and information status of the evaluation. See long description.

a The levels ‘superordinate’, ‘subordinate’, and ‘conjoin’ were used as classifications of the syntactic function of evaluations which occurred in the same sentence as, but outside the cleft constructions themselves.

Table 9.4Long description

The table is divided into two columns and labeled as variables related to evaluation and levels of evaluation-related variables. The data is arranged in different rows from left to right. The data is as follows:

  • For the variable syntactic function of the evaluative constituent, the levels are subject, subject complement (C S), cleft clause, adverbial, predicate, (plus superordinate, subordinate, conjoin).

  • For the variable semantic category, the levels are epistemic modality, deontic modality, affect, judgment, appreciation, style of speaking, stance, and graduation.

  • For the variable degree of directness, the levels are direct and indirect.

  • For the variable information status of evaluation, the levels are new, inferrable, and evoked.

The variables related to clefts are those which previous studies have shown to be predictors for the differences between as well as within the different types of clefts (cf. Prince Reference Prince1978; Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985; Collins Reference Collins1991; Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021). For analyses of the form and function of constituents, Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985) was used as a reference grammar. Weight was measured in number of orthographic units (OUs; cf. Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021). For determining the information status of constituents (as well as evaluations, see below), this study reverted to Prince’s influential taxonomy (Reference Prince and Cole1981), which distinguishes seven degrees of assumed familiarity (‘brand-new’, ‘brand-new anchored’, ‘unused’, ‘inferable’, ‘containing inferable’, ‘textually evoked’, and ‘situationally evoked’) and groups these into three principal degrees of assumed familiarity: ‘new’, ‘inferable’, and ‘evoked’ (or ‘given’). It is important to note that ‘in those constructions sensitive to discourse-old status, inferrable information consistently patterns with discourse-old information’ (Ward & Birner Reference Ward, Birner, Horn and Ward2004: 156).

Example (11) illustrates the cleft-related variables: It is an ITCL which highlights and negates the lexically headed noun phrase a flash Edwardian-era villa (4 OUs), which in the non-cleft would function as the prepositional complement of in. The cleft clause (4 OUs) is introduced by that. While the CS represents brand-new information, the subordinate clause contains information inferable in this context.

(11)

It’s not a flash Edwardian-era villa that you’re staying in. (EVLA-123-LP)

The operationalisation of evaluation is naturally difficult, since it is inherently subjective and elusive. Following the above definition, sentences were classified as evaluative when they expressed an attitude or feeling.Footnote 9 The classification used in this study primarily follows Biber et al., who distinguish Epistemic stance (expressions of ‘certainty (or doubt), actuality, precision, or limitation’, Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 964), Attitudinal stance (expressions of ‘personal attitudes or feelings’, Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 966), and Style of Speaking stance (‘writer comments on the communication itself’, Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021: 967) as the three principal semantic categories. It relies, however, on Martin and White (Reference Martin and White2005: 35–6) for a subcategorisation of Attitude into Affect (expressions of feelings or emotions), Judgment (assessments of behaviour or actions), and Appreciation (evaluations of entities, individuals, or facts) to specify this semantic category. Finally, this study adds the traditional category of Deontic modality, that is, expressions of ‘permission, obligation, or volition (or intention)’, subsumed by Biber et al. (Reference Biber, Johansson, Leech, Conrad and Finegan2021: 483, 967) under Attitudinal stance, and Martin and White’s (Reference Martin and White2005: 37) category of Graduation, which includes, for example, intensifiers, downtoners, and adjustments of category boundaries and is not included as a stance marker by Biber et al. (Reference Biber, Johansson, Leech, Conrad and Finegan2021). Consequently, this study distinguishes seven semantic categories of evaluation (see Table 9.4).

These semantic categories are illustrated by explicitly evaluative examples from the EVLA-Corpus: (12) shows two instances of Epistemic modality (would and definitely), while (13) contains instances of Deontic modality (have to), Affect (fall in love), and Appreciation (amazing journey). Honestly in (14) expresses Style of Speaking Stance, while the prepositional phrase to no end intensifies (Graduation) the expression of Affect (intrigues). (15), finally, evaluates positively the author’s creation of the plot from little pieces (Judgment) and intensifies this evaluation by so (Graduation).

(12)

this is something that I would definitely recommend, erm, for you to try out […].

(EVLA-276-BT)
(13)

That’s all you have to know to pick it up and fall in love with it and just like go on this amazing journey.

(EVLA-175-BT)
(14)

It’s […] something that just honestly intrigues me to no end every time I turn the page.

(EVLA-292-BT)
(15)

And one thing she does so well is that […].

(EVLA-302-BT)

Following Martin (Reference Martin, Hunston and Thompson2000), the present study further distinguishes two degrees of explicitness of evaluation: direct (or explicit) evaluations include those expressed by lexemes with an evaluative denotation, but also evaluations expressed by metonymies, metaphors, or idioms, or through entailments. For example, the mukbanger in (16) makes use of a metaphor to judge a certain behaviour as immoral or problematic. By contrast, evaluations which require an inference or a conversational implicature or for which knowledge of specific (social) norms or values is required are regarded as indirect (or implicit). So when the reviewer in (17) states that a LOT of fudging may be involved in what claims to be a scientific approach, any reader acquainted with scientific standards will be able to infer that the approach of the reviewed book is evaluated as unreliable. For reasons of clarity and simplicity, the following discussions will rely primarily on examples of direct evaluations.

(16)

So what I’m saying is that that’s a red flag.

(EVLA-409-MUK)
(17)

here’s where a LOT of fudging can come in […].

(EVLA-19-GR)

Due to the difficulty of the operationalisation of evaluation and the potential subjectivity of categorisations relating to evaluation and information status, the author and a specifically trained research assistant coded a random selection of 411 clefts from both corpora, achieving an agreement rate of 82.5%. Discrepancies in coding were resolved through discussions.

9.6 Analysis and Results

Figure 9.1 gives an overview of the normalised frequencies (per 100,000 words) of clefts in the subcorpora of the EVLA-Corpus and the Control Corpus. Contrary to expectations, in a two-sided binomial probability test, the frequency of all cleft constructions turns out to be significantly lower in the EVLA-Corpus and its subcorpora than in the Control Corpus and its subcorpora (total: p = 0.007; subcorpus (1): p = 0.005; subcorpus (2): p = 0.011; subcorpus (3): p < 0.001). At first, this seems to suggest a correlation between the primary textual communicative purpose and frequency of use. However, while ITCLs are less frequent, WHCLs and RWHCLs are considerably more frequent in both the EVLA and the Control Corpus than in other corpora containing texts with mixed communicative purposes (cf. Collins Reference Collins1991; Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021),Footnote 10 which disproves hypothesis H1. As expected based on previous studies like Biber et al. (Reference Biber, Johansson, Leech, Conrad and Finegan2021: 953), in both the EVLA and the Control Corpus, ITCLs are significantly more common in formal written discourse than in the other subcorpora (p < 0.001), and particularly frequent in academic writing (EVLA subcorpus 1(a): 40.0 per 100,000 words). RWHCLs, by contrast, occur significantly more frequently in spoken language than in the other subcorpora (p < 0.001), while register and medium differences are less drastic for WHCLs.

The bar graph displays the frequency of three language categories across different contexts. The vertical bars show the relative frequencies of I T C L, R W H C L, and W H C L in the different subcorpora of the E V L A-Corpus and the Control Corpus, namely written and princt, written and online and spoken and online. See long description.

Figure 9.1 Cleft constructions in the subcorpora of the EVLA-Corpus and the Control Corpus (normalised frequencies per 100,000 words)

Figure 9.1Long description

The vertical axis marks rate per 100,000 words and ranges from 0 to 300, in increments of 50. The horizontal axis marks the two categories of corpora: E V L A, or corpus of evaluative language, and C C, or control corpus, in four different scenarios:

  1. 1. Written, print

  2. 2. Written, online

  3. 3. Spoken, online

  4. 4. Total

The vertical bar is divided into three sections and denotes three different clefts, namely I T C L, R W H C L, and W H C L, all in dark to light shades. The dominating cleft is R W H C L in all domains and peaks for the spoken, online segment. The frequency of I T C L cleft is high in the first segments but declines gradually. A data table is provided at the bottom of the bar graph with eight columns marking the E V L A and Control corpus for the four above-mentioned scenarios. The table rows are filled with data for the three clefts. The data filled from left to right is as follows:

  • For I T C L, the corresponding values are 23, 29.9, 15.4, 5.1, 6.8, 25.7, 14.4, and 20.7.

  • For R W H C L, the corresponding values are 15.7, 51.2, 43.2, 75.9, 178.30, 174.5, 86, and 97.3.

  • For W H C L, the corresponding values are 28.2, 12.8, 34.9, 40.5, 49.9, 71.9, 38.5, and 39.9.

While the following paragraphs will focus on aspects directly relevant to answer the above research questions, suffice it to mention that the results of the quantitative and qualitative analyses of the clefts of both the EVLA and the Control Corpus largely confirm the results of previous analyses (Collins Reference Collins1991; Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021), concerning the relativisers which can occur in clefts, the forms and functions of highlighted elements, and the information status of constituents. This shows that not only do the clefts in the present study behave like clefts on average, but there is also no considerable difference between the clefts used in primarily evaluative texts and those used in primarily non-evaluative texts as far as formal, functional, and informational characteristics are concerned.

The analysis of informativity, however, revealed one unexpected result: typical RWHCLs, as illustrated by (12), have a given thematic highlighted element and a nominal relative clause of a comparatively newer or the same information status. Its being part of the presupposed subordinate clause imbues this new information with a ‘non-controversial flavour’ (Collins Reference Collins1991: 119). While the majority of RWHCLs in the present study (EVLA: 94.0%, CC: 85.2%) conform to that distribution, there are 25 clauses which feature a new highlighted element and a communicatively less salient nominal relative clause. Of these, seven highlight a cataphoric or interrogative element. The remaining 18 clauses, however, are genuine exceptions from the typical information structure of RWHCLs. Example (18), for example, features brand-new anchored information as the highlighted constituent and a given nominal relative clause. Note that the formal definiteness of the highlighted element is due to the presence of an individuating post-modification (cf. Birner & Ward Reference Birner and Ward1998: 134). This RWHCL resembles new-presupposition ITCLs, subtype 1, in that the highlighted element has a circumstantial meaning and, despite its newness, low communicative salience. Example (19) shows a RWHCL with a brand-new foregrounded constituent and a brand-new anchored subordinate clause. Beautiful wooden villas have not been mentioned before and are also not situationally evoked. The noun phrase (NP) is therefore informationally indefinite (Prince Reference Prince, Mann and Thompson1992: 300) or brand-new, its newness attenuated by its topicality. Its formal definiteness, however, seems to mark the existence of these villas as common ground. These RWHCLs with an exceptional information structure are not restricted to the EVLA-Corpus, since four comparable clauses can also be found in the Control Corpus. The effect of (possibly strategically) placing these brand-new NPs in sentence-initial position is similar to the ‘known fact’ effect achieved by placing brand-new information in the subordinate clauses of ITCLs and RWHCLs. At the same time, the exclusiveness implicature characteristic of clefts gives these subjects prominence by exclusion (cf. Collins Reference Collins1991: 156) and suggests that it is these villas and nothing else that is under consideration.

(18)

[I just kept listening to it.]Footnote 11 The last time that I was home at my family’s was when I was listening to Ninth House.

(EVLA-167-BT)
(19)

These beautiful wooden villas are what Auckland’s inner suburbs are all about.

(EVLA-122-LP)

In the following analyses, only evaluations contained in the cleft constructions themselves will be included (EVLA: 756 evaluations, CC: 185 evaluations). Figure 9.2 shows impressively that the majority of all types of clefts in the two corpora is used to express evaluations and that the overall probability of cleft constructions being used evaluatively is roughly the same regardless of whether the primary textual purpose is to evaluate or not. While WHCLs are most frequently evaluative, considering the absolute numbers of evaluations in the different types of cleft constructions in relation to their different average lengths reveals that RWHCLs, which are on average comparatively short (9.8 words vs. WHCLs: 19.2 words; ITCLs: 15.7 words), are most densely evaluative across both corpora (2.0 evaluations per 10 words vs. WHCLs and ITCLs: 1.3 evaluations per 10 words each).

A clustered bar chart compares relative frequencies of Non-evaluative and Evaluative categories across groups, including E V L A and C C W H C L, R W H C L, I T C L, and T O T A L. Evaluative bars are taller in all groups. See long description.

Figure 9.2 Evaluative and non-evaluative cleft constructions in the EVLA-Corpus and the Control Corpus (relative frequencies)

Figure 9.2Long description

The vertical axis marks relative frequencies that range from 0 to 100% in increments of 20%. The horizontal axis marks the two categories of corpora: E V L A, or corpus of evaluative language and C C, or control corpus, in four different scenarios of clefts:

  1. 1. W H C L

  2. 2. R W H C L

  3. 3. I T C L

  4. 4. Total

The vertical bar is divided into two sections and denotes non-evaluative and evaluative in dark and light shades. The dominating section is evaluative and peaks for C C in W H C L cleft. The relative frequency is lower for the non-evaluative section under all domains. A data table is provided at the bottom of the bar graph with eight columns marking the E V L A and C C corpus for the four above-mentioned scenarios. The table rows are filled with data for the two sections. The data filled from left to right is as follows:

  • For non-evaluative, the corresponding values of relative frequencies are 17.5%, 4.0%, 28.0%, 32.8%, 24.4%, 14.4%, 24.7%, and 23.2

  • For evaluative, the corresponding values of relative frequencies are 82.5%, 96.0%, 72.0%, 67.2%, 75.6%, 85.6%, 75.3%, and 76.8%.

As far as the different semantic categories of evaluation are concerned, what seems most interesting and confirms hypothesis H2 is that the different types of clefts behave mostly similarly in both corpora: across both corpora, Epistemic modality and Affect are perspicuously less frequent in ITCLs (7.6% and 9.5%) than in WHCLs (15.5% and 19.7%) and RWHCLs (20.2% and 21.2%), while Appreciation is more frequent in ITCLs (48.6%) than in WHCLs (28.5%) and RWHCLs (15.8%). The remaining four semantic categories of evaluation are represented roughly equally in the different cleft constructions, expressions of Deontic modality, Judgment, and Style of Speaking Stance being generally infrequent and Graduation being amongst the most frequent semantic categories (22.4–32.8%). Moreover, the huge majority of evaluations contained in the clefts of this study are direct across both corpora (EVLA: 94.7%; CC: 93.0%) and all three cleft constructions (WHCLs: 92.7%, RWHCLs: 94.9%, ITCLs: 97.1%). Since this study focuses on evaluative texts, that is, texts with the manifest purpose of assessing entities (e.g., books, chapters, hotels, food, prices) or actions (e.g., writing, cooking, decorating), examples will primarily display direct Appreciation and Judgment from the EVLA-Corpus.

If we consider the syntactic positions in which evaluations are expressed, we find that there are considerable differences between the three cleft constructions but that, again, characteristics of clefts are mostly stable across the two corpora (cf. Figure 9.3): in both corpora, most evaluations are contained in those constituents which are either foregrounded or backgrounded by the cleft constructions. Unsurprisingly, evaluations contained in non-clausal adverbials are most frequently of the semantic categories Epistemic modality (e.g., probably, definitely) and Graduation (e.g., pretty much, kind of), while predicates typically express Epistemic or Deontic modality (e.g., would or have to). Also rather unsurprisingly, in all types of clefts and in both corpora, the information status of the evaluations typically matches the information status of the constituents in which these evaluations are contained. To illustrate their particular use for the expression of evaluation, the following paragraphs will discuss a selection of typical examples of each type of cleft, also considering the availability of agnate alternatives and the effect of a potential substitution by these alternatives on the expression of evaluation(s) and on the aforementioned pragmatic meanings of clefts (the presupposition, the ‘known fact’ effect, and the exclusiveness implicature).

A bar chart compares the frequencies of grammatical structures across different clefts. Each group has bars for cleft clause, subject complement, predicate, nonclausal adverbial, and subject. See long description.

Figure 9.3 Syntactic positions of evaluations in the EVLA-Corpus and the Control Corpus (relative frequencies)

Figure 9.3Long description

The vertical axis marks relative frequencies that range from 0 to 100% in increments of 20%. The horizontal axis marks the two categories of corpora: E V L A, or corpus of evaluative language and C C, or control corpus, in three different scenarios of clefts:

  1. 1. W H C L

  2. 2. R W H C L

  3. 3. I T C L

The vertical bar is divided into five sections and denotes a cleft clause, subject complement, predicate, non-clausal adverbial, and subject from dark to light shades. The relative frequency follows a varying trend across each domain. A data table is provided at the bottom of the clustered bar graph indicating the trends. The data from left to right for each category is as follows:

  • For the cleft clause, the corresponding values of relative frequencies are 0, 0, 0, 0, 40.3%, and 69.7%.

  • For the subject complement, the corresponding values of relative frequencies are 50.2%, 44.3%, 84.8%, 80.2%, 40.3%, and 30.3%.

  • For predicate, the corresponding values of relative frequencies are 0.4%, 1.6%, 0.5%, 1.1%, 1.4%, and 0.

  • For non-clausal adverbial, the corresponding values of relative frequencies are 4.1%, 9.8%, 10.4%, 9.9%, 0, and 6.9%.

  • For subject, the corresponding values of relative frequencies are 45.3%, 44.3%, 4.3%, 8.8%, 0, and 0.

Figure 9.3 shows that, in WHCLs, evaluations occur nearly as frequently in subjects as in subject complements. Evaluations contained in subject nominal clauses most frequently express Affect (30.2%) or Appreciation (21.5%) (across both corpora).

(20)

the thing that I loved the most about this book was the characters.

(EVLA-179-BT)

The paraphrased WHCL in (20) highlights the NP the characters, the direct object (Od) of loved. The proposition ‘I loved x the most about this book’ is inferable from the previous passionate introduction of the reviewed book. That the book features characters is also inferable but certainly more communicatively dynamic than the information contained in the subject. The subject expresses Affect (loved) and Graduation (the most). Interestingly, the expression of personal feelings is thus backgrounded and becomes less obtrusive, less conspicuous than in a main clause. Both the agnate RWHCL (?The characters were the thing/what I loved …) and ITCL (It was the characters that I loved …) maintain the backgrounding of Affect but make the characters the theme. This decrease in communicative dynamism in the course of the clause is inappropriate in RWHCLs but is, in fact, characteristic of old-presupposition ITCLs. Both the ITCL and the original WHCL contain the same presupposition and thus also present the evaluation as non-controversial. Its thematic position in the original WHCL additionally presents the presupposition as given in the hearer’s consciousness. This ‘interpersonal’ flavour (Collins Reference Collins1991: 133) is absent in the corresponding ITCL, where information is presented simply as known. Furthermore, while in the ITCL the theme is given textual prominence by its being predicated, highlighting the identity of the characters, the original WHCL gives ideational prominence (cf. Halliday Reference Halliday1967: 236–7) to the characters, representing them as participants in the process described in the relative clause. Finally, while the agnate non-cleft (I loved the characters the most about this book.) has the same proposition, there are considerable differences: end-focus falls on the final adverbial, the original backgrounding of Affect and foregrounding of the object are lost, and so are the exclusiveness implicature, the presupposition, and the ‘known fact’ effect. Note that these pragmatic meanings and the backgrounding effect associated with the subject clauses of WHCLs may be particularly helpful in those few clauses whose subjects express brand-new (anchored) evaluations (EVLA: 18, CC: 3).

(21)

So for those that dont know how to eat this, what you do is you just pour this delicious kind of smelly […] sauce all over your food.

(EVLA -385-MUK)

Evaluations contained in the highlighted CS of WHCLs, by contrast, most frequently express Appreciation (35.8%) or Graduation (29.6%), as in (21) (delicious, smelly, and kind of). This WHCL highlights a brand-new anchored predication. In the subordinate clause, it uses the substitute verb do. In the context of a discussion about how to eat a specific dish, this reference to an unspecified action is inferable and can be assumed to be in the hearer’s consciousness. The evaluations occur in the foregrounded constituent, but in inconspicuous position as premodifiers in the Od of the predication. Neither RWHCLs nor ITCLs can foreground a predication (cf. Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 1386). The corresponding non-cleft (You just pour …) is information-structurally less appropriate than the original WHCL since it does not show the same increase in communicative dynamism: It does not presuppose ‘you do x’, much less present this information as in the hearer’s consciousness.

(22)

[And they feel like friends.] And that’s what I really like, that he does with stories.

(EVLA-250-BT)

In RWHCLs, most evaluations occur in nominal relative clauses in subject complements (see Figure 9.3). This is unsurprising, since foregrounded subjects are frequently demonstrative pronouns. Across both corpora, evaluations contained in nominal relative clauses of RWHCLs can most frequently be assigned to the semantic categories of Graduation (33.4%) and Affect (24.7%): in (22), the nominal relative clause emphatically (Graduation) states that the fact that fictitious characters feel like friends (anaphoric text reference) – and only this fact – is what the BookTuber likes (Affect). Strikingly, these evaluations are again backgrounded as part of the subordinate clause. What has been foregrounded is the raised Od of does from the object clause. Since the BookTuber’s love for this author’s characters is inferable from the preceding context, we may conclude that it is ‘the equative relationship itself [in combination with the exclusiveness implicature] which may provide the primary informational contribution of the construction to the discourse’ (Collins Reference Collins1991: 146). This stage-ending function makes the RWHCL particularly appropriate here. The agnate WHCL (?What I really like that … is that.) not only runs counter to the principle of end-weight but is also thematically less appropriate due to the givenness of the foregrounded constituent and the resulting decrease in communicative dynamism. The corresponding (old-presupposition) ITCL (?It’s that that I really like …) with highlighted given information suggests an inappropriate contrastive reading of that. Finally, the agnate non-cleft (I really like that he does that …) is textually less coherent since it places the anaphoric and thematic demonstrative in subordinate position late in the sentence. It also lacks the exclusiveness implicature, the presupposition, the ‘known fact’ effect, and the summarising function and elevates the BookTuber’s emphatic expression of Affect to the main clause. It thus evaluates more obtrusively than the original RWHCL.

(23)

This writing talent is possibly the only reason I gave it three stars over two.

(EVLA-32-GR)

Non-pronominal subjects in RWHCLs contain a further 26 evaluations (Appreciation: 53.9%). In (23), the highlighted assessment that the author of the reviewed book is talented (Appreciation) is inferable from the preceding sentence. The CS, which reports that the reviewer rated the reviewed book as acceptable (Appreciation), is situationally given. Therefore, the corresponding WHCL (?The only reason I gave it … is …) and non-cleft (?I gave it … because of …) both seem thematically less appropriate than the original RWHCL. Furthermore, the non-cleft lacks the presupposition, the ‘known fact’ effect, and the exclusiveness implicature and emphasises the evaluation of the author by end-focus. The agnate ITCL highlighting a prepositional phrase (?It’s possibly because of … that I gave it …), finally, would have to be interpreted as an old-presupposition it-cleft with an inappropriate contrastive focus. The original RWHCL with very low overall communicative dynamism thus achieves what none of the agnates could: It summarises a previous argument by highlighting an inferable, thematic, non-contrastive, and impersonally evaluative constituent, while backgrounding a rhematic, but situationally given evaluation, which is explicitly ascribed to the reviewer by the use of the first-person pronoun. Moreover, the fact that the author is talented is not directly asserted but only as part of the existential presupposition triggered by the definite NP. The effect of placing brand-new information in non-pronominal subjects of RWHCLs, comparable to the ‘known fact’ effect, was already discussed previously for examples (18) and (19).

(24)

It wasn’t until it was assigned in one of my 11th grade courses that I realized she was just a dumb old biddy and she was clearly missing out on one of the most perfect books ever to be written.

(EVLA-9-GR)

Finally, despite generally low numbers, there are clear tendencies as to where the different types of ITCLs typically feature evaluations: while across both corpora old-presupposition ITCLs contain evaluations slightly more frequently in the highlighted constituent (48.4%) than in the cleft clause (45.2%), new-presupposition ITCLs of both types favour evaluations placed in the cleft clause (type 1: 87.5%; type 2: 75.8%). Evaluations in all types of ITCLs belong most frequently to the semantic categories of Appreciation (43.0%) and Graduation (22.8%). The ITCL in (24) contains brand-new anchored information in both the foregrounded constituent and the cleft clause. The circumstantial role of the former, however, makes the cleft clause the primary contribution to the discourse. As is typical of new-presupposition ITCLs, the evaluations – of the former teacher as a dumb old biddy (Appreciation), of the reviewed novel as the most perfect book (Appreciation and Graduation), and of the certainty of the truth of the proposition (clearly; Epistemic modality) – are contained in the cleft clause where they are backgrounded as part of the presupposition. Despite their newness they are presented as non-negotiable, as known facts, possibly unknown just to the reader, not as the reviewer’s personal opinion. These evaluations may thus have been strategically positioned to make this section of the review more impersonal and objective and potentially more convincing. While there are no WHCL and RWHCL alternatives, the circumstantial clause receives end-focus and has more communicative salience in the corresponding non-cleft (I realized … when it was assigned …), in which the ‘known fact’ effect is lost.

This discussion shows that a substitution of an ITCL by a WHCL or RWHCL is indeed not always possible (cf. Collins Reference Collins2006: 1716) but may be impeded by factors like the form or function of the foregrounded constituent. It also shows that each type of cleft can be used to thematise or rhematise and background or foreground chunks of given or new information depending on the writer’s or speaker’s communicative intentions. The cluster of cleft constructions thus presents itself as a flexible tool particularly useful for the expression of evaluation.

9.7 Discussion and Outlook

Based on the EVLA-Corpus, a corpus of primarily evaluative language, and a Control Corpus of primarily non-evaluative language, this study set out to analyse the interaction between the primary communicative purpose and the use of the different cleft constructions. It permitted a number of important conclusions.

To begin with, hypothesis H1 needed to be refuted. This conclusion was based on the comparison of the frequencies and the syntactic, semantic, and information-structural characteristics of the three types of clefts in the EVLA- and the Control Corpus as well as the comparison with other corpora containing texts with mixed communicative purposes (Collins Reference Collins1991; Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021). This revealed that the use of cleft constructions is not directly correlated with the primary textual communicative purpose of evaluating, but, rather, as established in previous studies (e.g., Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021), with mode, style, and other register characteristics (and possibly also the operationalisation of the three types of clefts).

Second, the consideration of the frequencies of each type of cleft in the two corpora and in specific registers and the comparison with other constructions like relative clauses clearly demonstrated that frequency-based judgments of the (non‑)canonicity of clefts confirm the traditional classification of clefts: Although WHCLs and RWHCLs are slightly more frequent (i.e., less non-canonical) than ITCLs in overall language use, all clefts are comparatively infrequent (i.e., non-canonical) both in general language use and in specific registers, such as spoken language, academic writing, or evaluative texts. In other words, against previous expectations, it could not be shown that, based on their frequency, clefts are canonical in texts with the primary communicative purpose of evaluating.

Third and most impressively, the quantitative analyses showed that the majority of examples of each of the three cleft constructions is evaluative and, in fact, directly evaluative, irrespective of whether they occur in the primarily evaluative or the primarily non-evaluative corpus. Moreover, RWHCLs are most densely evaluative, featuring the highest number of evaluations in relation to their average overall length, thus confirming hypothesis H2. This means that, regardless of the primary overall textual communicative purpose, all clefts, and especially RWHCLs, are constructions which are very closely associated with the direct expression of evaluation (i.e., with an immediate intention to evaluate). In other words, since clefts are virtually always used to evaluate directly, it is possible to conclude – even without a comprehensive study of the multifaceted phenomenon of the language of evaluation – that cleft constructions are, in fact, an expectable and thus canonical choice in communicative situations in which the speaker or writer intends to express an evaluation. Consequently, while the overall textual communicative purpose of evaluating did not turn out to be a factor increasing the frequency or canonicity of cleft constructions, the immediate communicative intention to evaluate did so, most clearly for RWHCLs.

Fourth, this important finding permits another even more far-reaching conclusion, namely that clefts are not only indeed a syntactic ‘pattern’ typical of evaluative language (Hunston & Sinclair Reference Hunston, Sinclair, Hunston and Thompson2000: 89) but, even more importantly, can be regarded as belonging to an extended set of overtly evaluative lexico-grammatical stance constructions. Clefts differ from the stance constructions discussed, for example, by Biber and Zhang (Reference Biber and Zhang2018: 106) primarily in that the overtly evaluative lexeme(s) do(es) not control a syntactic constituent with a proposition but may occur in different syntactic constituents of the cleft itself. Consequently, in an empirical study of such an extended set of lexico-grammatical stance constructions, clefts can be expected to present themselves as a frequent and thus canonical syntactic choice.

Last but not least, further qualitative and quantitative analyses and comparisons between the three constructions demonstrated that the various combinations of given/new material in thematic/rhematic and foregrounded/ backgrounded positions make the cluster of cleft constructions a flexible tool especially when it comes to the expression of evaluation – which might indeed be one explanation for their frequent evaluative use: Thus, the individual cleft constructions have specific preferences as to where they feature evaluations most frequently and as to which semantic types of evaluation they express, again confirming H2. These preferences are also influenced by factors such as the weight, informativity, and thematicity of their constituents, which in many contexts make the three cleft constructions non-interchangeable. Further, all clefts carry an exclusiveness implicature; their subordinate clauses trigger presuppositions and the ‘known fact’ effect, that is, even brand-new information is presented as uncontroversial and non-negotiable. An effect similar to the ‘known fact’ effect was demonstrated for a subtype of RWHCLs which, to the author’s best knowledge, has not been discussed in the literature so far. These RWHCLs foreground brand-new information in the thematic subject, presented as information that should already be in the hearer’s consciousness and only needs a casual reminder. It is these pragmatic meanings of clefts, absent from corresponding non-clefts, which permit the speaker or writer either to put particular emphasis on an evaluation or to make an evaluation seem more non-negotiable, objective, impersonal, and/or unobtrusive. Thus, cleft constructions may even leave the recipient with the impression of being manipulated (cf. Collins Reference Collins2006: 1712).

In conclusion, the present study has important implications for the theory and analysis both of linguistic evaluation and of syntactic (non‑)canonicity. On the one hand, a corroboration of the assumption that cleft constructions belong to an extended set of explicitly evaluative lexico-grammatical stance constructions could make the analysis of linguistic evaluation in large-scale corpora more viable. The fact that individual cleft constructions occur particularly frequently in certain registers calls for future research to focus on the particular positions and forms of evaluations in clefts in different registers. Thus, since evaluations can occur in different syntactic constituents of clefts, it would be worthwhile for future research to investigate whether clefts might be a means to support both the explicit expression of evaluation in opinionated registers and also the less explicit expression of evaluation in other registers (cf. Biber & Zhang Reference Biber and Zhang2018). Finally, the findings also call for the analysis of other non-canonical syntactic constructions which typically contain evaluative lexemes such as tough-constructions to test whether these might also belong to this extended set of explicitly evaluative lexico-grammatical stance constructions.

On the other hand, the present study showed that the immediate communicative purpose of evaluating may indeed be a factor influencing the use of cleft constructions. It illustrated the particular value of a flexible frequency-based approach to syntactic (non-)canonicity, which permits us to narrow down the scope of our data to specific communicative situations and redefine cleft constructions as a canonical choice when it comes to explicitly expressing evaluations with the help of stance constructions.

Chapter 10 Cognitive Complexity and Non-Canonicity Zooming in on Particle Placement

10.1 Introduction

Even though English has a rather rigid word order, it allows for some discontinuous forms such as a relative clause that is extraposed and thus separated from the head it modifies, or the fronting of a noun phrase (NP) without the prepositional head which it complementises, resulting in a stranded preposition. Discontinuous forms can also occur at the word level in multi-word expressions, such as the transitive particle verb. As illustrated in (1), particle verbs consist of two elements, a verb and a particle. In the joined order, shown in (1a), the particle immediately follows the verb – the multi-word verb is presented as one unit. However, it is possible for the direct object to intervene between the verb and its particle, resulting in a split multi-word expression. This variant is illustrated in (1b).

  1. (1)

    a.Jane picked up the book.joined variant
    b.Jane picked the book up.split variant

Which of the two variants is the canonical, which is the non-canonical one? The answer to this question might be less straightforward than for information-packaging structures such as object fronting, where the departure from the basic SVO order is motivated by particular discourse functions. As will become evident in what follows, the default is much more difficult to determine for split multi-word expressions, because ‘theory-based’ and ‘frequency-based’ definitions make different predictions for this particular case of syntactic variation. First of all, one could argue that discontinuous structures in general are non-canonical because they depart from a basic SVO order: even though the verb precedes the object, a second part of the complex word follows it. What is more, they violate Behaghel’s first law, which says that elements that belong together will also have adjacent positions in the structure (Behaghel Reference Behaghel1932). There is another factor which has been widely discussed in the literature on syntactic variation but has received less attention in accounts of non-canonical syntax: the cognitive load associated with the variants. If we take a higher degree of cognitive complexity to translate into non-canonicity, the split particle verb, again, turns out as the non-canonical variant because it is associated with a higher processing load than the joined verb. This is because of the distance between two dependent elements (Gries Reference Gries, Dehé, Jackendoff, McIntyre and Urban2002a, Reference Gries2003; Lohse et al. Reference Lohse, Hawkins and Wasow2004). The latter explains why the discontinuous variant is not an option if further factors contribute to cognitive complexity, such as a long or structurally complex direct object (see (2)), as shown by previous corpus studies (e.g., Fraser Reference Fraser1976; Chen Reference Chen1986; Gries Reference Gries, Dehé, Jackendoff, McIntyre and Urban2002a, Reference Gries2003; Lohse et al. Reference Lohse, Hawkins and Wasow2004).

    1. a. Fred picked up the book John had bought him while he was in Europe.

    2. b. *Fred picked the book John had bought him while he was in Europe up. (Gries Reference Gries2003: 14)

This begs the question of why speakers would ever use the non-canonical variant – it departs from the default order and is said to be more difficult to process. Actually, in contrast to other discontinuous orders such as preposition stranding, the split particle verb can be linked to a particular discourse function more clearly, which might motivate its existence: in the joined variant, the head noun of the direct object NP is in end-focus position; in the split variant, it is the particle (see Gries Reference Gries2003). Example (3) demonstrates that the position after the verb is connected to contrastive stress on the particle (see also Dehé Reference Dehé2002).

(3)

So, that’s another reason why the market for oil just seems to drive the price up and not down.

(COCA, 1990 SPOK)

This function brings about an interesting effect: with unstressed pronouns, speakers do not have a choice, as the verb has to be split, that is, the discontinuous variant is not just the more frequent option, it is the only option. This reveals a divergence between theory-based and frequency-based approaches to (non-)canonicity, as the non-basic order is the default in certain syntactic contexts. Interestingly, even with lexical heads, the split option is more frequent in spoken English (60%, Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan1999: 932). In written English, in contrast, the joined variant is much more frequent (90%, Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan1999: 932).

Against this backdrop, it is not clear which of the two variants is the non-canonical order. Is the split particle verb a departure from the default which can be motivated by information-structuring functions? However, if it serves more particular functions we would probably expect it to be less frequent. So, could we argue that the joined particle verb is the non-canonical form with reduction of cognitive complexity as a further factor motivating non-canonicity?

Cognitive complexity has been argued to be a determinant in many syntactic variation phenomena, such as the ordering of postverbal prepositional phrases (Hawkins Reference Hawkins2000; Wasow & Arnold Reference Wasow, Arnold, Rohdenburg and Mondorf2003), the placement of adpositions (Berlage Reference Berlage, Rohdenburg and Schlüter2009, Reference Berlage2014), preposition stranding (Gries Reference Gries and Samiian2002b; Hoffmann Reference Hoffmann2011), adjectival comparison (Mondorf Reference Mondorf, Rohdenburg and Mondorf2003), and the use of zero forms (Hawkins Reference Hawkins, Rohdenburg and Mondorf2003; Rohdenburg Reference Rohdenburg, Rohdenburg and Mondorf2003). The general idea underlying the explanations in these studies can be summarised as follows: whenever they have the choice, speakers will choose the variant that results in a lower processing load.

As intuitive as psycholinguistic explanations for these phenomena might seem, they have to be taken with a grain of salt: most of them are based on corpus data and these data provide only indirect evidence for processing-based choices. Psycholinguistic studies have shown that distance dependencies can facilitate processing. Konieczny (Reference Konieczny2000), for example, has identified these so-called ‘anti-locality effects’ in the processing of verbs following relative clauses in German. A sentence-final verb is read faster when preceded by a complex direct object containing a relative clause than with a two-word NP. Konieczny (Reference Konieczny2000) has shown that the type of data does matter – while the offline data from his experiments is aligned with corpus findings, the online data is not. This indicates that corpus studies do not necessarily provide support for processing-based assumptions, even though they certainly yield highly interesting results for the study of syntactic variation. In order to determine whether the reduction of cognitive complexity could be considered a factor motivating the use of non-canonical constructions, this chapter investigates the complexity-based explanations that have been put forth for particle placement in English (cf. Gries Reference Gries, Dehé, Jackendoff, McIntyre and Urban2002a, Reference Gries2003; Lohse et al. Reference Lohse, Hawkins and Wasow2004) from an experimental perspective. The aim is to provide the missing experimental data for a phenomenon that is well researched from a corpus-linguistic perspective. To this end, a self-paced reading study was conducted. This experiment was complemented by a split rating task (cf. Bresnan & Ford Reference Bresnan and Ford2010). It will be shown that, as reported by Konieczny (Reference Konieczny2000), the offline data is aligned with the corpus findings but that the online data is not, suggesting that a discontinuous structure is not necessarily connected to a higher cognitive load.

The chapter is structured as follows. Section 10.2 discusses the above-mentioned corpus studies on particle placement, which relate the phenomenon to processing. Section 10.3 reports the self-paced reading study; Section 10.4 reports the split rating task. The mismatch between the results and the implications are discussed in Section 10.5.

10.2 Cognitive Complexity as Determinant of Particle Placement

There are two corpus studies which make explicit reference to processing complexity as determinant of particle placement, Lohse et al. (Reference Lohse, Hawkins and Wasow2004) and Gries (Reference Gries, Dehé, Jackendoff, McIntyre and Urban2002a, Reference Gries2003), which is why they are discussed in the following. Lohse et al. (Reference Lohse, Hawkins and Wasow2004) relate two factors, direct object complexity operationalised as NP length in words and verb idiomaticity,Footnote 1 to Hawkins’ ‘Minimise domains’ (MiD) as a single underlying principle. According to this principle, domains in which syntactic or lexical dependencies hold should be kept as short as possible (Hawkins Reference Hawkins2014). Lohse et al.’s (Reference Lohse, Hawkins and Wasow2004) corpus study finds effects of length that are predicted by MiD: while the ratio of joined and split constructions is almost balanced for one-word NPs (47% of split constructions), there is a sharp decline at an NP length of three words (18% of split constructions) and a second one for NPs that comprise five or more words (5% of splits). This is accounted for as follows: the phrasal combination domain of the verb phrase (VP PCD) comprises the elements that must be processed for the construction of the VP, that is, the verb, the particle and the ‘first constructing word in the object NP’ (Lohse et al. Reference Lohse, Hawkins and Wasow2004: 240), the determiner. Example (4) illustrates the effects of minimising VP PCDs for particle placement.

  1. (4)

    a.Joe [VPlookedup[NP the number of the ticket]]
    VP PCD123
    b.Joe [VPlooked[NPthenumberoftheticket]up]
    VP PCD1234567

In the joined variant (4a), the VP PCD comprises only three words – looked, up, and the definite article. In the split variant (4b), in contrast, the domain comprises seven words. This makes the role of complexity evident: the longer the NP, the longer the VP PCD, which – according to MiD – should be minimal.

The second relation of interest here is idiomaticity, a lexical dependency between the verb and the particle. For dependent particles (i.e., particles that lack a literal interpretation), Lohse et al. (Reference Lohse, Hawkins and Wasow2004) predict the joined variant to be more frequent. This is due to the lexical dependency domain (LDD) for particles that depend on the verb Ptd being increased in the split order as shown in (5a) and (5b). In the latter, the verb look is separated from its particle. The longer the NP, the greater the domain, as evident in (5c).

  1. (5)

    a.lookupd [NP the number]
    PtdVLDD 1 2
    b.look[NP thenumber]upd
    PtdVLDD 1    2  3  4
    c.look[NP thenumberofthehotel]upd
    PtdVLDD 1    2  3456 7

Lohse et al.’s (Reference Lohse, Hawkins and Wasow2004) corpus data confirm what is predicted on the basis of minimal LDDs and PCDs – joined particle verbs are more frequent than split ones with dependent particles, and this preference for the joined construction is enhanced by an increase in length of the direct object. In other words, with idiomatic particle verbs the cut-off point for NP length is reached earlier than for transparent ones.

Gries (Reference Gries, Dehé, Jackendoff, McIntyre and Urban2002a, Reference Gries2003) also reports a corpus study on particle placement. He discusses a range of previously identified determinants and links all of them to his so-called ‘Processing Hypothesis’, which reads as follows:

By choosing one of the two constructions for an utterance U a speaker S subordinates to different processing requirements of both constructions in that he formulates U in such a way as to communicate the intended message with as little processing effort as possible. More specifically, for most variables at least, this means that construction0 [the joined variant] will be preferred for verb particle constructions with DOs requiring a lot of processing effort – construction1 [the split variant] will be preferred for verb-particle constructions with DOs requiring little processing effort.

According to Gries, the joined variant is easier to process from both the speaker’s and the hearer’s perspective: in the split construction, the speaker has to hold the particle in memory until after the direct object is uttered, while ‘the hearer has to wait longer for assigning the correct parse to the incoming expression, namely until some yet unknown particle completes … the verb’ (Reference Gries2003: 58). An object that requires more processing effort will add to the difficulties induced by the split construction and hence the joined order is the preferred option. With idiomatic verbs, the semantic dependencies between verb and particle make the joined order more economic in terms of processing and hence the preferred option.

The subsequent sections report an online and an offline study on particle placement in English, which put these processing-based hypotheses to the test. The central research question is whether the non-canonical version is more difficult to process. If corpus frequencies reflect processing difficulties, this should be the case – in the corpus studies, the joined variant is more frequent overall. Follow-up questions target the factors that are said to add to processing difficulties in particle placement: the complexity of the direct object and the idiomaticity of the verb-particle combination. A rating task addresses the question whether the easier-to-process variant is also the preferred one.

10.3 Experiment 1: Self-Paced Reading Study

This section reports a self-paced reading study (SPRT) on particle placement. The idea behind this is that reading times are a window to cognition and that processing difficulties are reflected in reading latencies (e.g., Just & Carpenter Reference Just and Carpenter1980).

10.3.1 Factors

As pointed out above, the factors order, complexity, and idiomaticity were integrated into the analysis. Order has two levels, split and joined. Since NP length has been shown to be an accurate measure of nominal complexity (see Berlage Reference Berlage2014), it was chosen to operationalise complexity of the direct object. The factor Complexity has two levels, simple and complex. Lohse et al.’s (Reference Lohse, Hawkins and Wasow2004: 243) corpus study showed that there is still quite a high ratio of splits (258/647 = 40%) for two-word NPs and it revealed a clear cut-off point at five and more words (14/461 = 3% splits). This is why a simple direct object is operationalised as a two-word NP, and a complex direct object is operationalised as a five-word NP (see (6) and (7) below).

For the lexical dependency between the verb and the particle, a binary distinction was adopted, that is, the factor Idiomaticity has two levels, idiomatic and transparent. ‘Transparent’ denotes fully compositional structures, which corresponds to the combination of independent verb + independent particle in Lohse et al.’s (Reference Lohse, Hawkins and Wasow2004) study. ‘Idiomatic’ denotes the opposite end on the scale, non-compositional verb-particle combinations, corresponding to dependent verb + dependent particle. Dependency was determined with the help of an entailment test applied by Lohse et al. (Reference Lohse, Hawkins and Wasow2004).Footnote 2

10.3.2 Materials and Design

The factors Order, Complexity, and Idiomaticity were crossed. Seventeen opaque and 15 transparent verb-particle combinations were used. Example (6) illustrates the four conditions with an idiomatic verb and (7) shows the four conditions for a transparent one.

  1. (6)

    Idiomatic
    a.Harold looked up the address before the trip.simple, joined
    b.Harold looked the address up before the trip.simple, split
    c.Harold looked up the address of the hotel before the trip.complex, joined
    d.Harold looked the address of the hotel up before the trip.complex, split

  1. (7)

    Transparent
    a.Steven typed in the password before the crash.simple, joined
    b.Steven typed the password in before the crash.simple, split
    c.Steven typed in the password for the account before the crash.complex, joined
    d.Steven typed the password for the account in before the crash.complex, split

Word order variation brings about a complication for reaction time experiments. In Harold looked the address up, for instance, the particle occurs in the final position, while the joined counterpart Harold looked up the address hosts a lexical noun in that position. In order to provide a more uniform point of measurement for each condition, a temporal adverbial was added as a clause-final constituent. A temporal PP was chosen to keep potential attachment ambiguities minimal: as the head nouns of the direct object are non-event nouns, an interpretation of the temporal PP as nominal modifier is implausible. The number of syllables of each word was constant across items and, apart from the temporal prepositions, lexical items occurred only once in the trial.

The experimental material contained 32 target stimuli, 24 stimuli for another experiment, and 46 fillers. The fillers were followed by a yes-no-question in order to make sure participants stayed focused. The experiment used a mixed design and each participant received the stimuli in a different, pseudo-randomised order.

10.3.3 Predictions

If corpus frequencies reflect processing effort, the following effects are expected. The non-canonical variant should overall be more difficult to process due to the distance between verb and particle. A complex NP and an idiomatic verb should enhance this difficulty.

H1 Order. There are longer reaction times for the split than for the joined variant.

H2 Interaction of Complexity and Order. In the split order, there are longer reaction times with complex NPs than with simple ones.

H3 Interaction of Idiomaticity and Order. In the split order, there are longer reaction times in idiomatic verb-particle combinations than in transparent ones.

H4 Interaction of Order, Idiomaticity, and Complexity. The relative advantage of the joined over the split order should be lowest with transparent particle verbs and simple NPs and highest for idiomatic particle verbs and complex NPs.

10.3.4 Participants and Procedure

The experiment was conducted at the University of Edinburgh. Fifty-one students (L1-speakers of British English) participated. They were paid for their participation. Participants received oral and written instructions. They were given practice items to familiarise themselves with the procedure before they started the experiment. The experiment was run using E-Prime 2.0.

10.3.5 Data and Analysis

One participant was excluded from the final data set because they had misunderstood the task and read the stimuli out aloud. Nine participants were excluded because they had eight or more wrong answers to the comprehension questions, which reduced the set to 41 participants. Table 10.1 illustrates the different points of measurement for the reaction times for the simple-NP condition.

Table 10.1Points of measurement in self-paced reading experiment
The table displays the sentence structures in a self-paced reading experiment. See long description.
Table 10.1Long description

The table is divided into five columns and labeled as precritical, critical, preposition, article, and noun. There are two rows filled with data from left to right as follows:

  • For joined, the corresponding data is looked up, the, address, before, the, and trip.

  • For split, the corresponding data is looked, the, address, up, before, the, and trip.

Since the particle assumes different positions – it either follows or precedes the direct object – the main focus is on the material following the critical region. The following reaction times will be analysed in the subsequent sections: PrepRT, NounRT, and SentenceRT. PrepRT is the reaction times on the preposition following the critical region (i.e., before in Table 10.1). This point is used to see whether processing difficulties spill over to the following material. The final element of a clause has been shown to have longer reaction times (cf. Mitchell & Green Reference Mitchell and Green1978), which is enforced by a further processing difficulty. NounRT, the reaction times on the clause-final noun (trip in Table 10.1), serves to measure this potential wrap-up effect. SentenceRT is the cumulative RT of the entire clause.

For each dependent variable, data points that were more than 2.5 standard deviations away from the mean (by Item and by Participant) were removed. This resulted in a loss of 2–3.1% of the observations.

10.3.6 Statistical Models and Effects

As the experiment had a mixed design with repeated-measures elements, mixed-effects linear models (Baayen et al. Reference Baayen, Davidson and Bates2008) were fitted, using the lme4 (Bates et al. Reference Bates, Maechler, Bolker and Walker2015) and lmerTests package (Kuznetsova et al. Reference Kuznetsova, Brockhoff and Christensen2017) for R (version 4.2.2). The null model contained all fixed effects (to be discussed below) and a maximal random effect structure with varying intercepts for Item and Participant, as well as varying slopes of Order, Complexity, Idiomaticity, and Trial for each Participant and Item. First, the random effect structure was simplified stepwise using a principal components analysis, removing random effects that made up for less than 5% of the variance. The fixed-effects structure was reduced step by step starting out with the highest p‑value (Gries Reference Gries2021). The final models hence contain a simpler random effect structure and only significant fixed effects. To reduce non-normality in the residuals, the models were subjected to model criticism (cf. Baayen & Milin Reference Baayen and Milin2010), where the observations with absolute standardised residuals exceeding a distance of 2.5 standard deviations to the mean were removed. This resulted in an additional loss of 1.6–2.6% of the data points. Since many of the numeric variables showed a skewed distribution, they were transformed using logging to the base of 2 (‘Log’) or box-cox transformations (‘Bcn’). To avoid convergence problems in model-fitting arising from different scales, some of the variables were scaled (‘Sc’). An overview of the effects is presented in what follows.

  • Order The particle verb can be joined or split.

  • Complexity The direct object can be simple or complex.

  • Idiomaticity The particle verb can be transparent or idiomatic.

  • Order*Complexity*Idiomaticity Interaction term.

  • Trial and TrialSc The position of an item in the trial (and as scaled variable).

  • CritRTBcnSc The transformed and scaled reaction times in the critical region (i.e., the element preceding the temporal preposition).

  • PreCritRTBcnSc The transformed and scaled reaction times to the element preceding the critical region.

  • ArticleRTSc The scaled reaction times to the article (i.e., the penultimate element in the sentence).

  • SurprisalPrep The degree of surprisal of the preposition.

  • Age and AgeSc The age of the participant (and as scaled variable).

  • Gender The gender of the participant.

The variables CritRTBcnSc and PreCritRTBcnSc as well as ArticleRTSc and SurprisalPrep were added as controls. As the difference in ordering across conditions results in different lexical items in the critical and precritical region, the reaction time at these previous points of measurement were included to control for a potential spillover onto PrepRT (see Bartek et al. Reference Bartek, Lewis, Vasishth and Smith2001). The same rationale applies to the sentence-final point of measurement NounRT; however, since the point is more uniform, only the RT of the immediately preceding word was included.

Since reaction times can also be influenced by the degree of predictability of words (e.g., Levy Reference Levy2008), the degree of surprisal of the preposition was included as a control in the form of a negative binary log of a bigram frequency of the preposition and the element preceding plus 1 divided by the frequency of the first word (see, e.g., Rühlemann & Gries Reference Rühlemann and Gries2020).Footnote 3 The frequency data were obtained from the British National Corpus (BNC), accessed via English Corpora. The fixed effect structure of each final model is summarised in the Appendix (Tables 10.4, 10.5, 10.6, and 10.7).

10.3.7 Results

Table 10.2 summarises the raw mean reaction times for the three points of measurement.

Table 10.2Mean reaction times in milliseconds and standard deviations (SD)
Table shows condition data for different sentence types, comparing the means and standard deviations (S D) for preposition noun sentence constructions in simple and complex, transparent and idiomatic, joined and split forms. See long description.
Table 10.2Long description

The table presents statistical data on preposition noun sentence constructions, comparing various conditions with means and standard deviations (S D) for different sentence types. The conditions are divided based on sentence structure (simple versus complex), transparency (transparent versus idiomatic), and construction type (joined versus split). The table is divided into 4 columns labeled as condition, preposition, noun, and sentence. The sentence construction types are subdivided into mean and S D. The rows are filled with data from left to right as follows:

  • For the condition simple, transparent, and joined, the corresponding data are 250, 112, 450, 231, 3242, and 1142.

  • For the condition simple, transparent, and split, the corresponding data are 232, 110, 417, 214, 3139, and 1151.

  • For the condition complex, transparent, and joined, the corresponding data are 259, 108, 421, 196, 4419, and 1501.

  • For the condition complex, transparent, and split, the corresponding data are 235, 103, 435, 204, 4294, and 1381.

  • For the condition simple, idiomatic, and joined, the corresponding data are 247, 119, 424, 228, 3255, and 1245.

  • For the condition simple, idiomatic, and split, the corresponding data are 237, 109, 424, 227, 3155, and 1057.

  • For the condition complex, idiomatic, and joined, the corresponding data are 251, 107, 440, 241, 4402, and 1533.

  • For the condition complex, idiomatic, and split, the corresponding data are 246, 127, 439, 219, 4396, and 1513.

The null model for PrepRTSc, the scaled reaction times to the preposition, contained Order, Idiomaticity, and Complexity as three-way interaction, SurprisalPrep, the scaled and transformed reaction times on the two words preceding it (i.e., CritRTBcnSc and PreCritRTBcnSc), as well as Age and Gender. The random intercept structure was simplified to a random intercept for Item and Participant as well as a random slope of Complexity for each Participant. Order had a significant effect (t = 4.624, p = 0.000042) but it did not interact with Complexity and Idiomaticity. The latter two variables did not have a significant effect; neither did SurprisalPrep. Trial and the reaction times to the two previous elements had significant effects, as expected. Interestingly, the effect of Order is not as predicted: there are higher reaction times for joined than for split, as illustrated in Figure 10.1. That means that the split verb facilitates processing in the spillover region. Hypotheses H1–H4 are thus not confirmed.

The line graph displays the relationship between verb–particle order and predicted scaled reaction times that range from negative 0.2 to 0.2. R Ts increase from split to joined order. The graph includes horizontal grid lines. See long description.

Figure 10.1 Predicted scaled reaction times (RTs) in the spillover region across orders

Figure 10.1Long description

A line graph plots a positive correlation between the order of the verb and particle and the predicted scaled R Ts where the R Ts increase from the split order located at approximately negative 0.07 on the y-axis to the joined order, located at approximately 0.07 on the y-axis. The error bar for the split order extends from approximately negative 0.17 to 0.03, while the error bar for the joined order extends from approximately negative 0.03 to 0.17.

As pointed out before, in the wrap-up region, the reaction time to the immediately preceding element only was included as control. The final model for NounRT has a random intercept for Participant as well as Item, and a random slope of Complexity for each Participant. It contains a significant effect of TrialSc (t = 8.783, p < 0.0001), the interaction of Complexity, Idiomaticity, and Order (t = -2.237, p = 0.025) as well as AgeSc (3.018, p = 0.005). While the first two are expected, the latter is not. However, the effect of Age – an increase of reaction time with an increase of AgeSc – is caused by an outlier: While the age range for all participants is 18 to 25, there was one participant aged 39. In a subsequent model fitted to data excluding this particular participant, AgeSc was not significant anymore.

Figure 10.2 illustrates the three-way interaction of Complexity, Idiomaticity, and Order. As can be seen, for both orders, complex NPs with idiomatic particle verbs result in a longer reaction time than simple phrases. This difference, however, is not significant. There are slightly longer reaction times for the split than for the joined order in the idiomatic condition, but this difference is not significant either.

Graph shows predicted scaled logged R T s based on verb–particle order, split into 2 panels: idiomatic and transparent. Each panel compares joined and split orders for complex and simple items. See long description.

Figure 10.2 Predicted logged and scaled reaction times (RTs) in the wrap-up region for the Order*Complexity*Idiomaticity interaction

Figure 10.2Long description

The line graph is split into two panels: idiomatic and transparent. The vertical axis marks the predicted scaled logged R Ts which ranges from negative 0.2 to 0.2 In the panel for idiomatic the order of verb and particle for complex and simple increases from joined to split. In the panel on the right for tansparent, the order of verb and particle for complex increases from joined to split while for simple, the order decreases from joined to split.

For transparent verb-particle combinations, there are longer predicted reaction times for the split with complex NPs than for the joined (t = 2.334, p = 0.0198). For simple phrases with transparent verbs, the opposite effect is displayed: the joined order has longer predicted reaction times, but this difference fails to reach significance (t = 1.897, p = 0.0581). The difference between the complex and the simple condition with transparent verbs is significant for the split (t = 2.096, p = 0.037) but not for the joined order, thus providing support for H2, but only for transparent particle verbs. Figure 10.3 illustrates the same interaction from a different perspective.

Line graph shows predicted scaled logged R T s by idiomaticity, split into two panels: complex and simple. Each panel compares joined and split orders. See long description.

Figure 10.3 Predicted logged and scaled reaction times (RTs) in the wrap-up region for the Idiomaticity*Order*Complexity interaction

Figure 10.3Long description

The line graph is split into two panels based on complexity: complex and simple. The vertical axis marks the predicted scaled logged R Ts, which range from negative 0.2 to 0.2 In the panel for complex, the idiomaticity of verb particle combination for split, indicated by broken lines, increases from idiomatic to transparent while for joined, indicated by solid lines, decreases from idiomatic to transparent. In the panel on the right for simple, the idiomaticity of verb particle combination for split decreases from idiomatic to transparent while for joined increases from idiomatic idiomatic to transparent.

In the simple condition, there are longer reaction times for splits with idiomatic than with transparent particle verbs, but this difference is not significant. For complex NPs, there are slightly longer reaction times for split verbs in the transparent condition than in the idiomatic. This difference is not significant either. This implies H3 and H4 do not receive support at the wrap-up position.

Finally, the cumulative reaction times to the entire sentence were analysed. Here, the surprisal variable and preceding reaction times were not included. The final model has a random intercept for Participant and a random slope for Complexity. Again, there is a significant three-way interaction of Order, Complexity, and Idiomaticity. As shown in Figure 10.4, there is a clear effect of Complexity resulting in much longer predicted reaction times.

Line graph with two panels based on complexity: simple and complex. See long description.

Figure 10.4 Predicted logged and scaled reaction times (RTs) of the whole stimulus for the Order*Idiomaticity*Complexity interaction

Figure 10.4Long description

The line graph is split into two panels based on complexity: simple and complex. The vertical axis marks the predicted scaled logged R Ts, which range from negative 0.5 to 0.5. In the panel for simple, the order of verb and particle combination for idiomatic, indicated by broken lines, remains steady from split to joined while for transparent, indicated by solid lines, increases from split to joined. In the panel on the right for complex, the order of verb and particle combination for idiomatic it increases from split to joined while for transparent it remains steady from split to joined.

However, this is not surprising, because the complex condition contains three additional words. What is less expected, though, is an advantage of the split over the joined variant in the simple transparent condition (t = 2.717, p = 0.0067). Again, none of the four hypotheses is supported by the data.

10.3.8 Discussion

The reading study revealed several effects. In the spillover region, there is a speed-up in the split condition (i.e., a distance dependency facilitates reading). This effect is neither modulated by the complexity of the object nor the semantic status of the particle verb. In the wrap-up region, there are effects of Complexity and Idiomaticity which go into the predicted direction: with idiomatic verbs, the split condition has longer reaction times than the joined, and complex phrases require more reading time. The differences, however, are not significant. The effect of Complexity is also found in the transparent condition: the split requires more reading time than the joined order with complex NPs; splits with complex NPs have longer reaction times than splits with simple objects, which is as expected. With simple objects and transparent verbs, in contrast, the joined verb has longer reaction times than the split. The relative advantage of the split Order across Complexity and Idiomaticity conditions in the spillover position and the relative advantage of the joined Order in the simple and complex idiomatic condition as well as the complex transparent condition later on in the sentence cancel each other out, so that the differences are not significant anymore on the sentence level. The (not significant) advantage of the split in the simple transparent condition in sentence-final position adds to the advantage of the split in the position following the critical region, making the simple transparent split condition the fastest for the whole sentence.

The results are rather unexpected: only H1 and H2 are partially confirmed for merely one of three points of measurement. What is more, there is an effect that goes against the predicted direction. There are several possible explanations. First, the processing advantage of the split particle verb could just be an apparent one, a spillover effect that is caused by the different elements across conditions in the critical region. This, however, was controlled for by including the preceding reaction times as predictors. Apart from that, the effect persists in the transparent simple condition for the whole clause. A second explanation derives from expectation-based models of syntactic comprehension (Levy Reference Levy2008). The higher the expectation of an item (and, likewise, the lower its surprisal), the easier it is to process. The more information the reader has on an element, that is, the more material precedes an element, the more they will expect that word. The words preceding the critical one narrow down the number of alternatives. In the present study, the observed advantage of the split in the spillover region could result from a highly predictable particle. Thus, the verb followed by the direct object in looked the address creates a high expectation of the particle up, which could speed up reading. This does not only hold for the idiomatic verb-particle combinations but also for the transparent ones (e.g., lift up, shave off), which could explain why this factor does not have an influence in the spillover region.

It is not just the particle’s predictability which could cause the facilitation effect in the spillover position: if the particle follows the direct object NP, it signals that the multi-word verb and its argument are completed, which might then increase the predictability of a VP-adjunct. Since the particle is not followed by a punctuation mark and hence is not the clause-final element, an upcoming adjunct or a coordinator are the only syntactic options. This is not the case in the joined order – once the reader has encountered the nominal head (or the nominal head in the post-head PP), they might expect either a (further) post-head dependent of the noun, a dependent of the verb or a coordinator, that is, there are more options. As Levy (Reference Levy2008) shows, expectations about upcoming constituency influence processing, so readers need not predict the exact identity of a word. This can explain why the SurprisalPrep, the variable operationalising the degree of expectation of the preposition, turned out as non-significant in all models. A higher expectation of a VP-adjunct is a more likely explanation for the speed-up at the preposition in the split condition. It also accounts for the fact that Complexity and Idiomaticity do not play a role at this point of measurement.

Even though the corpus-based predictions do not receive support from the reaction times to the preposition, the longer distance dependency does induce difficulties at a later point. While the reaction times in the wrap-up region are not significantly longer for split constructions than for joined, this effect cancels out their relative advantage earlier on in the sentence in both the idiomatic (simple and complex) and the complex transparent condition. Mixed results for processing distance dependencies have been reported in the literature: On the basis of conflicting results for English relative clauses, Levy (Reference Levy2008: 1166) hypothesises that a word-by-word processing difficulty is modulated by expectation but that ‘the retrieval and integration of a long-distance dependent incurs a substantial processing cost’. Vasishth and Drenhaus (Reference Vasishth and Drenhaus2011: 69) also find both a processing cost and a facilitatory effect of distant elements (i.e., locality and anti-locality effects), in German relative clauses and conclude that expectation-based facilitation and the cost of distance dependencies ‘operate at different stages of processing’. Even though the split construction facilitates reading, possibly due to an increased expectation of a particle and/or a postverbal adjunct, the integration of the particle across a distance could still come at a certain cost, which displays later on. A complex NP that is more difficult to process in combination with an idiomatic particle verb could contribute to this difficulty, as predicted by Gries (Reference Gries, Dehé, Jackendoff, McIntyre and Urban2002a, Reference Gries2003) and Lohse et al. (Reference Lohse, Hawkins and Wasow2004). This could explain why the relative advantage of the split only shows on sentence level in the simple transparent condition.

To sum up: the processing-based hypotheses from the corpus-linguistic literature are not supported by the present data. Implications will be discussed in detail in Section 10.5, but it should be pointed out here that even though the differences reported are significant, the effects are rather small.

10.4 Experiment 2: Split Rating Task

The split rating task (Bresnan & Ford Reference Bresnan and Ford2010) is a judgment task that contrasts two alternatives which have to be rated according to naturalness. Participants are asked to compare the two alternatives and distribute 100 points between them to express their rating, for example, 50/50 if there is no difference or 10/90 if the second variant is much more natural than the first. Every combination that adds up to 100 is possible, which means the only option not available is to reject both variants (Bresnan & Ford Reference Bresnan and Ford2010: 186). What is important to stress here is that this test does not measure the absolute acceptability of certain constructions but rather their relative acceptability.

10.4.1 Factors and Predictions

This experiment tested the same factors as the previous one: Order, Complexity, and Idiomaticity. If corpus frequencies reflect processing difficulties, participants should identify the easier of two variants as the more natural variant. Previous corpus studies (Gries Reference Gries, Dehé, Jackendoff, McIntyre and Urban2002a, Reference Gries2003; Lohse et al. Reference Lohse, Hawkins and Wasow2004) give rise to the following predictions: the non-canonical variant should receive lower ratings than the canonical, even more so with complex direct objects and/ or idiomatic verbs.

H1 Order. There are lower ratings for the split than for the joined variant.

H2 Complexity and Order. In the split order, there are lower ratings with complex NPs than with simple ones.

H3 Idiomaticity and Order. In the split order, there are lower ratings in idiomatic verb-particle combinations than in transparent.

H4 Order, Idiomaticity, and Complexity. The relative preference of the joined over the split order should be lowest with simple NPs and transparent particle verbs and highest with complex NPs and idiomatic particle verbs.

10.4.2 Materials and Design

In the split rating task, the same items as in the self-paced reading experiment were used to establish one-to-one comparability. The comprehension questions were left out. Two variants were presented as minimal pairs, contrasting the two orders.

10.4.3 Participants and Procedure

The split rating task followed the reading experiment (i.e., there were the same participants in both experiments). To minimise potential priming effects on the rating task as well as fatigue effects, the first experiment was followed by a break during which participants filled in a questionnaire inquiring about demographic information. Participants received the same random order as in Experiment 1.

Again, E-Prime 2.0 was used. The procedure of the rating task was also identical to the preceding reading experiment, that is, oral and written instructions were followed by a set of practice items and the actual experiment. To avoid miscalculations, participants only had to enter the value they assigned to the first sentence of the pair and confirm this by pressing a button. The difference to 100 was given out automatically.

10.4.4 Data and Statistical Models

As in Experiment 1, linear mixed-effects models were fitted. In order to establish direct comparability, the same participants were excluded. Ratings which were 2.5 standard deviations away from the mean (by Item) were excluded, resulting in a loss of data points and thereby reducing the set to 1,305 observations. Again, the model was subjected to model criticism which resulted in an additional 31 observations being removed. The model selection process was as described in Section 10.3.4, but the fixed effect structure of the null model was simpler, since SurprisalPrep and reaction times were not included as controls. A summary of the final model is provided in the Appendix.

10.4.5 Results

Table 10.3 summarises the raw ratings across conditions.

Table 10.3Mean ratings and standard deviations (SD) across conditions in Experiment 2
ConditionMeanSD
Simple, transparent, joined44.917.6
Simple, transparent, split43.017.2
Complex, transparent, joined48.418.6
Complex, transparent, split39.918.5
Simple, idiomatic, joined51.518.5
Simple, idiomatic, split36.216.8
Complex, idiomatic, joined52.521.6
Complex, idiomatic, split29.717.4

The final model has varying intercepts for Participant and Item and varying slopes of Order for each Participant and Item. The fixed effect structure contains three significant two-way-interaction terms: Order*Complexity (t = 5.293, p < 0.0001), Order*Idiomaticity (t = 4.713, p < 0.0001), and Complexity*Idiomaticity (t = 2.104, p = 0.0356). For both Complexity and Idiomaticity conditions, the joined order obtains higher ratings than the split, thus confirming H1. Figure 10.5 illustrates the first interaction.

Line graph shows predicted ratings by verb-particle order and complexity. Ratings drop from joined to split for both conditions. See long description.

Figure 10.5 Predicted ratings for the Order*Complexity interaction

Figure 10.5Long description

The vertical axis marks predicted rating that range from 30 to 60. The horizontal axis marks order of verb and particle for joined and split. Two lines are plotted on the graph which has a gradual decline. The solid line indicates complex and decline from joined by around 53 to split by around 35. The broken line indicates simple and decline from joined by around 49 to split by around 40.

In both the complex and the simple condition, the split receives lower ratings than the joined variant. Both differences are significant (simple: t = 3.488, p = 0.0009, complex: t = 6.513, p < 0.0001). As predicted, the difference is more pronounced in the complex condition (difference in predicted ratings: 18.7) than in the simple (difference in predicted ratings: 10.2). This provides support for H2. Figure 10.6 illustrates the effect of Order*Idiomaticity.

Line graph shows predicted ratings by verb-particle order. See long description.

Figure 10.6 Predicted ratings for the Order*Idiomaticity interaction

Figure 10.6Long description

The vertical axis marks predicted rating that range from 20 to 60. The horizontal axis marks the order of the verb and particle for joined and split. Two lines are plotted on the graph, which has a gradual decline. The solid line indicates the idomatic and decline from joined by around 53 to split by around 33. The broken line indicates transparent and decline from joined by around 48 to split by around 42.

The joined order has higher ratings than the split for both idiomatic (t = 7.040, p < 0.001) and transparent particle verbs (t = 1.921, p = 0.0596), even though the latter difference fails to reach significance. As predicted by H3, the relative advantage of the joined order is larger for idiomatic (difference in predicted ratings 21.8) than for transparent verbs (difference in predicted ratings 6.1). Figure 10.7 shows the final interaction.

The line graph displays predicted ratings by direct object complexity. See long description.

Figure 10.7 Predicted ratings for the Complexity*Idiomaticity interaction

Figure 10.7Long description

The vertical axis marks predicted rating that range from 30 to 60. The horizontal axis marks the complexity of the direct object, segmented into complex and simple. Two lines are plotted on the graph. The solid line indicates the idomatic and rises gradually from complex by around 42 to simple by around 45. The broken line indicates transparent and remains steady from joined to split by around 45.

For idiomatic verb-particle combinations, the simple-NP condition has higher ratings than the complex condition (t = 2.351, p = 0.0189). For transparent verbs, the simple condition receives slightly lower ratings than the complex condition. However, the difference is small and not significant (t = 0.678, p = 0.489). As predicted by H4, the predicted relative preference for the joined order over the split is very low (estimate = 1.919, t = 0.581, p = 0.56307) for simple NPs and transparent verb-particle combinations and highest for complex NPs and idiomatic particle verbs (estimate = 26.043, t = 8.138, p < 0.0001). The remaining two combinations are positioned in between: for complex NPs and transparent verbs, the relative advantage of the joined order over the split is smaller but still significant (estimate = 10.394, t = 3.145, p = 0.0025). For simple NPs with idiomatic particle verbs, the relative preference for joined over split is higher and also significant (estimate = 17.568, t = 5.494, p < 0.0001).

10.4.6 Discussion

Interestingly, all four hypotheses are confirmed in this experiment. Participants prefer joined constructions over split. Both a complex direct object and an idiomatic particle verb add to the relative preference for joined constructions. This experiment thus mirrors Gries’ (Reference Gries, Dehé, Jackendoff, McIntyre and Urban2002a, Reference Gries2003) and Lohse et al.’s (Reference Lohse, Hawkins and Wasow2004) corpus findings and stands in stark contrast to the experiment reported in Section 10.3. The implications of this mismatch will be addressed below.

10.5 General Discussion and Conclusion

This chapter has investigated effects of cognitive complexity on the choice between a canonical and a non-canonical verb-particle order, using an online experiment and an offline task. The most interesting finding of this chapter is the discrepancy between different types of data: the offline experiment supports the corpus-based hypotheses, the online task does not. There are several possible explanations. First, since Experiment 1 does not provide expected results, there could be a flaw in the design. However, since the very same items tested on the same participants yield results as predicted in Experiment 2, this is unlikely. Second, the method might not be suitable: the actual differences in reading times are significant but small. Maybe differences in processing complexity with a two-word and a five-word NP are too subtle to be captured by an experimental method that still requires an active response, a press of a button, and hence is only ‘quasi’-online. A third potential reason also relates to the methods: in an offline task like the split rating task here, participants provide a reaction after processing has been completed. Subtle differences in processing load might not be relevant here. A fourth reason relates to the question of whose processing efficiency is at issue – the speaker’s or the hearer’s. Lohse et al. (Reference Lohse, Hawkins and Wasow2004) point out that a minimal domain has advantages for both phrase recognition and production. Gries (Reference Gries, Dehé, Jackendoff, McIntyre and Urban2002a, Reference Gries2003) considers the difference between the speaker’s and the hearer’s processing effort ‘not decisive’ (Reference Gries, Dehé, Jackendoff, McIntyre and Urban2002a: 287), making reference to Arnold et al. (Reference Arnold, Losongco, Wasow and Ginstrom2000) who claim that a processing facilitation for the speaker corresponds to a facilitation for the hearer as well. However, corpus data are production data, a reading experiment collects perception data. The mismatch found here could hence imply that speakers choose particular constructions not so much to reduce the cognitive load for the hearer; they might do so to minimise their own processing effort (see Kunter Reference Kunter2017 for a discussion of this matter for the genitive alternation and adjectival comparison). Konieczny (Reference Konieczny2000: 644), who found a similar mismatch of online and offline data on German relative clauses, considers processing costs of distance dependencies to be ‘primarily a production phenomenon’. Again, this is because the processing-based hypotheses receive support from corpus studies, and corpus data is production data. Konieczny (Reference Konieczny2000) suggests that judgment tasks – even though they are usually considered to be perception experiments – might comprise production components because participants compare the sentence they rate against a possible alternative which they have to generate first. In the task that was used here, the split rating task, the alternative is given, that is, subjects do not have to produce the structure. Still, comparing two alternatives to determine which one sounds more natural might involve production strategies. This could explain why the rating task reflects the corpus findings from previous studies.

As pointed out above, the results from Experiment 1 have to be taken with a grain of salt: the differences are significant but rather small. What is more, the experiment tested the phenomenon in one particular syntactic context, simple clauses with a postverbal temporal modifier. For more robust conclusions, further studies are needed. These should test a wider range of syntactic contexts and manipulate the complexity of the direct object to a greater extent than has been done here.

So, in the light of the above, which one of the two variants is the non-canonical one? As has been pointed out in Section 10.2, the split particle verb has information-structural properties that could motivate its existence. The data presented here suggest that there could be an additional function: a split particle verb provides a clear signal of the boundaries of the direct object, which could reduce cognitive complexity, at least up to a certain cut-off point. These two aspects could be argued to motivate the existence of a non-canonical, discontinuous construction. The continuous order, in contrast, could also qualify as the non-canonical variant whose use is motivated by similar processing factors: corpus studies have shown that the longer the direct object, the less likely the split, as discussed in Section 10.2. Even though this has not been tested in the experiment reported here, it seems plausible that this is because once a distance dependency exceeds a certain length, cognitive complexity increases. A frequency-based approach to non-canonicity does not resolve the issue of a categorisation either, as the distribution of the variants differs across modes and depends heavily on the nature of the direct object. This may not seem like a satisfying answer, but it highlights the fact that ‘the syntactic canon is elusive’ (Pham & Leuckert, Chapter 1 in this volume) and that we are looking at a moving target here.

Footnotes

Chapter 7 Introduction: Different Ways of Saying Different Things Non-Canonical Syntax in Registers of English

1 Our classification is not quite the same as the one proposed in the Introduction to this volume, which, apart from reordering, additions, and subtractions, proposes two more categories of non-canonicity: one for a lack of default formal marking (e.g., lack of number agreement) and one for the realisation of clause elements by expletives (as with it-clefts or existential clauses).

Chapter 8 The President wide awake at 3:14 AM tweeting about CNN Informational Non-Canonical Reduced Structures in TV News Broadcasts

1 But see Biber and Conrad (Reference Biber and Conrad2019: 174–221) and Biber and Egbert (Reference Biber and Egbert2018) for detailed discussion of written registers produced when writers do not avail themselves of the opportunities for careful edited production.

Chapter 9 What was it about it that you loved? Clefts in Evaluative Language

a The levels ‘superordinate’, ‘subordinate’, and ‘conjoin’ were used as classifications of the syntactic function of evaluations which occurred in the same sentence as, but outside the cleft constructions themselves.

1 For a discussion of the differences between cleft clauses and relative clauses, cf., for example, Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985: 1386–7).

2 For ease of comparison with the present corpus study, frequencies relating to the Longman Spoken and Written English Corpus (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan2021) are also given per 100,000 words.

3 Biber and Zhang (Reference Biber and Zhang2018: 99) distinguish these lexico-grammatical stance constructions from other expressions of ‘an (implicit) attitude or epistemic assessment’, which they call ‘evaluation’. Biber et al. (Reference Biber, Johansson, Leech, Conrad and Finegan2021), however, use the term ‘stance’ synonymously with ‘evaluation’ and claim that it can also be expressed by devices other than lexico-grammatical ones.

4 For details on the precise syntactic characteristics of clefts, cf. Biber et al. (Reference Biber, Johansson, Leech, Conrad and Finegan2021: 950–4); Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985: 1383–9); Huddleston & Pullum (Reference Huddleston and Pullum2002: 1414–27).

5 Details in brackets indicate the corpus (EVLA-Corpus vs. Control Corpus), then the number of the cleft in the respective corpus and, for the EVLA-Corpus, the register in which this construction occurred (cf. Table 9.2).

6 For more information on BookTube, see Perkins (Reference Perkins2017) or Anderson Gold (Reference Anderson Gold2020).

7 For more information on Mukbang, see Choe (Reference Choe2019) or Kircaburun et al. (Reference Kircaburun, Harris, Calado and Griffiths2020).

8 A non-finite clause may replace the adnominal relative clause in paraphrased WHCLs and RWHCLs such as The conservatory is the place to lap up the sun […] (EVLA-120-LP).

9 This approach assumes the existence of non-evaluative language. Alternatively, these clefts could be regarded as expressions of a high degree of commitment to the truth of these propositions (i.e., as cases of Epistemic modality).

10 These differences might be due to the broader range of registers contained in these general corpora or to differences in operationalisation of the variable ‘type of cleft’.

11 Sections given in square brackets provide context from outside the clefts.

Chapter 10 Cognitive Complexity and Non-Canonicity Zooming in on Particle Placement

1 Length and complexity are closely related – the more embedded phrases, the more words the NP will comprise (cf., e.g., Berlage Reference Berlage2014). Yet, there are studies which suggest that the syntactic category of the nominal dependent does play a role when length is controlled for (e.g., Ferreira Reference Ferreira1991; Wasow & Arnold Reference Wasow, Arnold, Rohdenburg and Mondorf2003). Gries (Reference Gries, Dehé, Jackendoff, McIntyre and Urban2002a, Reference Gries2003) finds that, for particle placement, both length (word and syllable count) and complexity are predictors.

2 The test was originally developed for classifying dependency relations between verbs and prepositional phrases (Hawkins Reference Hawkins2000). If an element is entailed, it is independent. The verb particle combination lift up as in they lifted up the child, for instance, contains an independent verb and an independent particle because both they lifted the child and the child goes up are entailed. In look up as in they looked up the number, in contrast, both the particle and the verb are dependent because this neither entails that *they looked the number nor that the number is/becomes/comes/goes/stays up.

3 Some of the bigrams were not attested, which would have resulted in undefined values.

References

References

Alemán Bañón, José & Martin, Clara (2019). Anticipating information structure: An event-related potentials study of focus assignment via the it-cleft. Neuropsychologia, 134, 107203.10.1016/j.neuropsychologia.2019.107203CrossRefGoogle ScholarPubMed
Bender, Emily, Gebru, Timnit, McMillan-Major, Angelina, & Shmitchell, Shmargaret (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of FAccT 2021, 610–23.10.1145/3442188.3445922CrossRefGoogle Scholar
Biber, Douglas (1991). Variation across speech and writing. Cambridge: Cambridge University Press.Google Scholar
Biber, Douglas (2012). Register as a predictor of linguistic variation. Corpus Linguistics and Linguistic Theory, 8(1), 937.CrossRefGoogle Scholar
Biber, Douglas & Conrad, Susan (2019). Register, genre, and style. Second edition. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Biber, Douglas, with Egbert, Jesse, Gray, Bethany, Oppliger, Rahel, & Szmrecsanyi, Benedikt (2016). Variationist versus text-linguistic approaches to grammatical change in English: Nominal modifiers of head nouns. In Kytö, Merja & Pahta, Päivi, eds, The Cambridge handbook of English historical linguistics. Cambridge: Cambridge University Press, 351–75.Google Scholar
Biber, Douglas & Egbert, Jesse (2018). Register variation online. Cambridge: Cambridge University Press.10.1017/9781316388228CrossRefGoogle Scholar
Biber, Douglas & Egbert, Jesse (2023). What is a register? Accounting for linguistic and situational variation within – and outside of – textual varieties. Register Studies, 5(1), 122.10.1075/rs.00004.bibCrossRefGoogle Scholar
Biber, Douglas, Johansson, Stig, Leech, Geoffrey N., Conrad, Susan, & Finegan, Edward (2021). Grammar of spoken and written English. Amsterdam: John Benjamins.10.1075/z.232CrossRefGoogle Scholar
Birner, Betty J. (2018). On constructions as a pragmatic category. Language, 94(2), e158e179.10.1353/lan.2018.0031CrossRefGoogle Scholar
Bohmann, Axel (2016). Grammatical change because Twitter? Factors motivating innovative uses of because across the English-speaking Twittersphere. In Squires, Lauren, ed., English in computer-mediated communication: Variation, representation, and change. Berlin: Mouton de Gruyter, 149–78.Google Scholar
Clarke, Isobelle (2022). Register and social media. Register Studies, 4(2), 133–7.CrossRefGoogle Scholar
COCA The Corpus of Contemporary American English (520 million words, 1990–present) (2008–). Compiled by Mark Davies. Retrieved from http://corpus.byu.edu/coca/.Google Scholar
Dehé, Nicole (2002). Particle verbs in English: Syntax, information structure, and intonation. Amsterdam: John Benjamins.10.1075/la.59CrossRefGoogle Scholar
Diessel, Holger and Tomasello, Michael (2005). Particle placement in early child language: A multifactorial analysis. Corpus Linguistics and Linguistic Theory, 1(1), 89112.CrossRefGoogle Scholar
Dijk, Chantal N. van, van Witteloostuijn, Merel, Vasić, Nada, Avrutin, Sergey, & Blom, Elma (2016). The influence of texting language on grammar and executive functions in primary school children. PLoS ONE, 11(3), e0152409. Retrieved from https://doi.org/10.1371/journal.pone.0152409.CrossRefGoogle ScholarPubMed
Dorgeloh, Heidrun & Wanner, Anja (2023). Discourse syntax: English grammar beyond the sentence. Cambridge: Cambridge University Press.Google Scholar
Dorgeloh, Heidrun & Wanner, Anja, eds (2010). Syntactic variation and genre. Berlin: Mouton de Gruyter.10.1515/9783110226485CrossRefGoogle Scholar
Eckert, Penelope & Rickford, John R. (2002). Style and sociolinguistic variation. Cambridge: Cambridge University Press.10.1017/CBO9780511613258CrossRefGoogle Scholar
Egbert, Jesse & Mahlberg, Michaela (2020). Fiction – one register or two? Speech and narration in novels. Register Studies, 2(1), 72101.10.1075/rs.19006.egbCrossRefGoogle Scholar
Ferguson, Charles A. (1994). Dialect, register, and genre: Working assumptions about conventionalization. In Biber, Douglas & Finegan, Edward, eds, Sociolinguistic perspectives on register. Oxford: Oxford University Press, 530.Google Scholar
Goulart, Larissa, Biber, Douglas, & Reppen, Randi (2022). In this essay, I will …: Examining variation of communicative purpose in university writing. Journal of English for Academic Purposes, 59, 101159.10.1016/j.jeap.2022.101159CrossRefGoogle Scholar
Grafmiller, Jason (2014). Variation in English genitives across modality and genres. English Language and Linguistics, 18(3), 471–96.10.1017/S1360674314000136CrossRefGoogle Scholar
Gries, Stefan Th. (2003). Multifactorial analysis in corpus linguistics: A study of particle placement. New York: Continuum.Google Scholar
Haegeman, Liliane (2013). The syntax of registers: Diary subject omission and the privilege of the root. Lingua, 130, 88110.10.1016/j.lingua.2013.01.005CrossRefGoogle Scholar
Halliday, Michael A. K. (1978). Language as social semiotic. London: Edward Arnold.Google Scholar
Hawkins, John A. (1994). A performance theory of order and constituency. Cambridge: Cambridge University Press.Google Scholar
Hawkins, John A. (2004). Efficiency and complexity in grammars. Oxford: Oxford University Press.10.1093/acprof:oso/9780199252695.001.0001CrossRefGoogle Scholar
Hedberg, Nancy (1990). Discourse pragmatics and cleft sentences in English. PhD thesis, Minneapolis: University of Minnesota Press.Google Scholar
Huddleston, Rodney & Pullum, Geoffrey K., eds (2002). The Cambridge grammar of the English language. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
ICE International Corpus of English (1990–). Retrieved from www.ice-corpora.uzh.ch/en.html.Google Scholar
Jeffries, Lesley & McIntyre, Daniel (2010). Stylistics. Cambridge: Cambridge University Press.10.1017/CBO9780511762949CrossRefGoogle Scholar
Jucker, Andreas H. (1992). Social stylistics: Syntactic variation in British newspapers. Berlin: Mouton de Gruyter.10.1515/9783110851151CrossRefGoogle Scholar
Kaltenböck, Gunther (2005). It-extraposition in English: A functional view. International Journal of Corpus Linguistics, 10(2), 119–59.10.1075/ijcl.10.2.02kalCrossRefGoogle Scholar
Labov, William (1972). Sociolinguistic patterns. Philadelphia: University of Philadelphia Press.Google Scholar
Liimatta, Aatu (2019). Exploring register variation on Reddit: A multi-dimensional study of language use on a social media website. Register Studies, 1(2), 269–95.10.1075/rs.18005.liiCrossRefGoogle Scholar
Lohse, Barbara, Hawkins, John A., & Wasow, Thomas (2004). Domain minimization in English verb-particle constructions. Language, 80(2), 238–61.10.1353/lan.2004.0089CrossRefGoogle Scholar
Nariyama, Shigeko (2004). Subject ellipsis in English. Journal of Pragmatics, 36(2), 237–64.CrossRefGoogle Scholar
Page, Ruth, Barton, David, Lee, Carmen, Unger, Johann Wolfgang, & Zappavigna, Michele (2022). Researching language and social media: A student guide. London: Routledge.10.4324/9781003121763CrossRefGoogle Scholar
Quaglio, Paulo (2009). Television dialogue: The sitcom Friends vs. natural conversation. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Scheffler, Tatjana, Kern, Lesley-Ann, & Seemann, Hannah (2022). The medium is not the message: Individual level register variation in blogs vs. tweets. Register Studies, 4(2), 171201.10.1075/rs.22009.schCrossRefGoogle Scholar
Schilling‐Estes, Natalie (2004). Investigating stylistic variation. In Chambers, Jack K., Trudgill, Peter, & Schilling-Estes, Natalie, eds, The handbook of language variation and change. Oxford: Blackwell, 375401.10.1002/9780470756591.ch15CrossRefGoogle Scholar
Seoane, Elena (2006). Changing styles: On the recent evolution of scientific British and American English. In Dalton-Puffer, Christiane, Kastovsky, Dieter, Ritt, Nikolaus, & Schendel, Herbert, eds, Syntax, style and grammatical norms: English from 1500–2000. Bern: Peter Lang, 191211.Google Scholar
Swales, John (1990). Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press.Google Scholar
Szmrecsanyi, Benedikt (2019). Register in variationist linguistics. Register Studies, 1(1), 7699.10.1075/rs.18006.szmCrossRefGoogle Scholar
Teddiman, Laura & Newman, John (2007). Subject ellipsis in English: Construction of and findings from a diary corpus. Paper presented at the 26th International Conference on Lexis and Grammar, Bonifacio, France, 2–6 October. Retrieved from https://www.researchgate.net/publication/255608296_Subject_Ellipsis_in_English_Construction_of_and_Findings_from_a_Diary_Corpus.Google Scholar
Thompson, Sandra A. & Mulac, Anthony (1991). The discourse conditions for the use of the complementizer that in conversational English. Journal of Pragmatics, 15(3), 237–51.10.1016/0378-2166(91)90012-MCrossRefGoogle Scholar
Zappavigna, Michele (2018). Searchable talk: Hashtags and social media metadiscourse. Amsterdam: John Benjamins.Google Scholar
Zhang, Difei (Lynn) (2023). Flipping those pages, swiping that screen: A corpus-based analysis of the digital transformation of the news register. PhD thesis, University of Wisconsin-Madison. ProQuest Dissertations Publishing, 30316227.Google Scholar
Zhang, Guiping (2015). It is suggested that … or it is better to …? Forms and meanings of subject it-extraposition in academic and popular writing. Journal of English for Academic Purposes, 20, 113.CrossRefGoogle ScholarPubMed

References

Access World News (2024). Online database by NewsBank Inc.Google Scholar
Biber, Douglas (1988). Variation across speech and writing. Cambridge: Cambridge University Press.10.1017/CBO9780511621024CrossRefGoogle Scholar
Biber, Douglas (1992). On the complexity of discourse complexity: A multidimensional analysis. Discourse Processes, 15(2), 133–63.10.1080/01638539209544806CrossRefGoogle Scholar
Biber, Douglas & Gray, Bethany (2016). Grammatical complexity in academic English: Linguistic change in writing. Cambridge: Cambridge University Press.Google Scholar
Biber, Douglas, Gray, Bethany, Staples, Shelley, & Egbert, Jesse (2022). The register-functional approach to grammatical complexity. London: Routledge.Google Scholar
Biber, Douglas & Egbert, Jesse (2018). Register variation online. Cambridge: Cambridge University Press.10.1017/9781316388228CrossRefGoogle Scholar
Biber, Douglas, Johansson, Stig, Leech, Geoffrey N., Conrad, Susan, & Finegan, Edward (2021). Grammar of spoken and written English. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Biber, Douglas & Conrad, Susan (2019). Register, genre, and style. Second edition. Cambridge: Cambridge University Press.10.1017/9781108686136CrossRefGoogle Scholar
Biber, Douglas, Larsson, Tove, & Hancock, Gregory R. (2023). The linguistic organization of grammatical text complexity: Comparing the empirical adequacy of theory-based models. Corpus Linguistics and Linguistic Theory, 20(2), 347–73.10.1515/cllt-2023-0016CrossRefGoogle Scholar
Biber, Douglas, Larsson, Tove, & Hancock, Gregory R. (2024). Dimensions of text complexity in the spoken and written modes: A comparison of theory-based models. Journal of English Linguistics, 52(1), 65–94.CrossRefGoogle Scholar
Bruthiaux, Paul (1996). The discourse of classified advertising: Exploring the nature of linguistic simplicity. Oxford: Oxford University Press.10.1093/oso/9780195100327.001.0001CrossRefGoogle Scholar
Ferguson, Charles A. (1971). Absence of copula and the notion of simplicity: A study of normal speech, baby talk, foreigner talk and pidgins. In Hymes, Dell, ed., Pidginization and creolization of languages. Cambridge: Cambridge University Press, 141–50.Google Scholar
Ferguson, Charles A. (1982). Simplified registers and linguistic theory. In Obler, Loraine K. & Menn, Lise, eds, Exceptional language and linguistics. New York: Academic Press, 4966.Google Scholar
Ferguson, Charles A. (1983). Sports announcer talk: Syntactic aspects of register variation. Language in Society, 12(2), 153–72.10.1017/S0047404500009787CrossRefGoogle Scholar
Janda, Richard D. (1985). Note-taking English as a simplified register. Discourse Processes, 8(4), 437–54.10.1080/01638538509544626CrossRefGoogle Scholar
LSWE Corpus Longman Spoken and Written English Corpus (1999). Compiled by Longman for the Longman grammar of spoken and written English. Harlow: Longman.Google Scholar
Montgomery, Martin (2007). The discourse of broadcast news. London: Routledge.10.4324/9780203006634CrossRefGoogle Scholar
Moos, Julie (2011). Tyndall: ABC spends half its evening newscast on ‘soft news’. Retrieved from www.poynter.org/reporting-editing/2011/tyndall-abc-spends-half-its-evening-newscast-on-soft-news/.Google Scholar
Postman, Neil & Powers, Steve (2008). How to watch TV news. New York: Penguin.Google Scholar
Stelter, Brian (2012). Big three newscasts are changing the state of play. Retrieved from www.nytimes.com/2012/01/09/business/media/at-abc-cbs-and-nbc-news-accentuating-the-differences.html.Google Scholar

References

Anderson Gold, Tara (2020). A book club for the 21st century: An ethnographic exploration of BookTube. PhD thesis, University of North Carolina.Google Scholar
Biber, Douglas & Finegan, Edward (1989). Styles of stance in English: Lexical and grammatical marking of evidentiality and affect. Text: Interdisciplinary Journal for the Study of Discourse, 9(1), 93124.Google Scholar
Biber, Douglas & Zhang, Meixiu (2018). Expressing evaluation without grammatical stance: Informational persuasion on the web. Corpora, 13(1), 97123.10.3366/cor.2018.0137CrossRefGoogle Scholar
Biber, Douglas, Johansson, Stig, Leech, Geoffrey N., Conrad, Susan, & Finegan, Edward (2021). Grammar of spoken and written English. Amsterdam: John Benjamins.10.1075/z.232CrossRefGoogle Scholar
Birner, Betty J. & Ward, Gregory (1998). Information status and noncanonical word order in English. Amsterdam: John Benjamins.10.1075/slcs.40CrossRefGoogle Scholar
Calude, Andreea S. (2007). Demonstrative clefts in spoken English. PhD thesis, University of Auckland.Google Scholar
Chafe, Wallace L. (1986). Evidentiality in English conversation and academic writing. In Chafe, Wallace & Nichols, Joanna, eds, Evidentiality: The linguistic coding of epistemology. Norwood: Ablex, 261–72.Google Scholar
Choe, Hanwool (2019). Eating together multimodally: Collaborative eating in mukbang, a Korean livestream of eating. Language in Society, 48(2), 171208.10.1017/S0047404518001355CrossRefGoogle Scholar
Collins, Peter (1991). Cleft and pseudo-cleft constructions in English. London: Routledge.Google Scholar
Collins, Peter (2006). It-clefts and wh-clefts: Prosody and pragmatics. Journal of Pragmatics, 38(10), 1706–20.10.1016/j.pragma.2005.03.015CrossRefGoogle Scholar
Gast, Volker & Levshina, Natalia (2014). Motivating w(h)-Clefts in English and German: A hypothesis-driven parallel corpus study. In De Cesare, Anna-Maria, ed., Frequency, forms and functions of cleft constructions in Romance and Germanic. Berlin: Mouton de Gruyter, 377414.10.1515/9783110361872.377CrossRefGoogle Scholar
Halliday, Michael A. K. (1967). Notes on transitivity and theme in English. Part 2. Journal of Linguistics, 3(1), 199244.10.1017/S0022226700016613CrossRefGoogle Scholar
Hedberg, Nancy (2000). The referential status of clefts. Language, 76(4), 891920.10.2307/417203CrossRefGoogle Scholar
Huddleston, Rodney & Pullum, Geoffrey K., eds (2002). The Cambridge grammar of the English language. Cambridge: Cambridge University Press.10.1017/9781316423530CrossRefGoogle Scholar
Hunston, Susan (2011). Corpus approaches to evaluation: Phraseology and evaluative language. London: Routledge.Google Scholar
Hunston, Susan & Sinclair, John (2000). A local grammar of evaluation. In Hunston, Susan & Thompson, Geoff, eds, Evaluation in text: Authorial stance and the construction of discourse. Oxford: Oxford University Press, 74101.10.1093/oso/9780198238546.003.0005CrossRefGoogle Scholar
Hyland, Ken (1998). Hedging in scientific research articles. Amsterdam: John Benjamins.10.1075/pbns.54CrossRefGoogle Scholar
Keenan, Edward (1971). Two kinds of presupposition in natural language. In Fillmore, Charles & Langendoen, D. Terence, eds, Studies in linguistic semantics. New York: Holt, 4554.Google Scholar
Kircaburun, Kagan, Harris, Andrew, Calado, Filipa, & Griffiths, Mark D. (2020). The psychology of Mukbang watching: A scoping review of the academic and non-academic literature. International Journal of Mental Health and Addiction, 19, 1190–213, https://doi.org/10.1007/s11469-019-00211-0.Google Scholar
Martin, James R. & White, Peter R. R. (2005). The language of evaluation: Appraisal in English. Houndsmill: Palgrave Macmillan.10.1057/9780230511910CrossRefGoogle Scholar
Martin, James R. (2000). Beyond exchange: APPRAISAL systems in English. In Hunston, Susan & Thompson, Geoff, eds, Evaluation in text: Authorial stance and the construction of discourse. Oxford: Oxford University Press, 143–75.Google Scholar
Ochs, Elinor & Schieffelin, Bambi (1989). Language has a heart. Text: Interdisciplinary Journal for the Study of Discourse, 9(1), 726.Google Scholar
Palmer, Frank R. (1986). Mood and modality. Cambridge: Cambridge University Press.Google Scholar
Perkins, Kathryn (2017). The boundaries of BookTube. The Serials Librarian, 73(3–4), 352–6.CrossRefGoogle Scholar
Prince, Ellen F. (1978). A comparison of wh-clefts and it-clefts in discourse. Language, 54(4), 883906.CrossRefGoogle Scholar
Prince, Ellen F. (1981). Toward a taxonomy of given-new information. In Cole, Peter, ed., Radical pragmatics. New York: Academic Press, 223–55.Google Scholar
Prince, Ellen F. (1992). The ZPG letter: Subjects, definiteness, and information-status. In Mann, William C. & Thompson, Sandra A., eds, Discourse description: Diverse linguistic analyses of a fund-raising text. Amsterdam: John Benjamins, 295326.10.1075/pbns.16.12priCrossRefGoogle Scholar
Quirk, Randolph, Greenbaum, Sidney, Leech, Geoffrey N., & Svartvik, Jan (1985). A comprehensive grammar of the English language. London: Longman.Google Scholar
Thompson, Geoff & Hunston, Susan (2000). Evaluation: An introduction. In Hunston, Susan & Thompson, Geoff, eds, Evaluation in text: Authorial stance and the construction of discourse. Oxford: Oxford University Press, 127.Google Scholar
Ward, Gregory & Birner, Betty (2004). Information structure and non-canonical syntax. In Horn, Laurence R. & Ward, Gregory, eds, The handbook of pragmatics. Malden: Blackwell, 153–74.Google Scholar
Ward, Gregory, Birner, Betty, & Huddleston, Rodney (2002). Information packaging. In Huddleston, Rodney & Pullum, Geoffrey K., eds, The Cambridge grammar of the English language. Cambridge: Cambridge University Press, 1363–448.Google Scholar
Weinert, Regina & Miller, Jim (1996). Cleft constructions in spoken language. Journal of Pragmatics, 25(2), 173206.10.1016/0378-2166(94)00079-4CrossRefGoogle Scholar

References

Arnold, Jennifer E., Losongco, Anthony, Wasow, Thomas, & Ginstrom, Ryan (2000). Heaviness vs. newness: The effects of structural complexity and discourse status on constituent ordering. Language, 76(1), 2855.10.1353/lan.2000.0045CrossRefGoogle Scholar
Baayen, R. Harald & Milin, Petar (2010). Analyzing reaction times. International Journal of Psychological Research, 3(2), 1228.10.21500/20112084.807CrossRefGoogle Scholar
Baayen, R. Harald, Davidson, Douglas, & Bates, Douglas M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390412.10.1016/j.jml.2007.12.005CrossRefGoogle Scholar
Bartek, Brian, Lewis, Richard L., Vasishth, Shravan, & Smith, Mason R. (2001). In search of on-line locality effects in sentence comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(5), 1178–98.Google Scholar
Bates, Douglas, Maechler, Martin, Bolker, Ben, & Walker, Steve (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 148.10.18637/jss.v067.i01CrossRefGoogle Scholar
Behaghel, Otto (1932). Deutsche Syntax, Volume IV. Heidelberg: Winter.Google Scholar
Berlage, Eva (2009). Prepositions and postpositions. In Rohdenburg, Günter & Schlüter, Julia, eds, One language, two grammars? Differences between British and American English. Cambridge: Cambridge University Press, 130–48.Google Scholar
Berlage, Eva (2014). Noun phrase complexity in English. Cambridge: Cambridge University Press.10.1017/CBO9781139057684CrossRefGoogle Scholar
Biber, Douglas, Johansson, Stig, Leech, Geoffrey N., Conrad, Susan, & Finegan, Edward (1999). Longman grammar of spoken and written English. Harlow: Longman.Google Scholar
BNC The British National Corpus, CQPweb version compiled by Mark Davies (2004). Distributed by Oxford University Computing Services on behalf of the BNC Consortium. Retrieved from www.natcorp.ox.ac.uk/.Google Scholar
Bresnan, Joan W. & Ford, Marilyn (2010). Predicting syntax: Processing dative constructions in American and Australian varieties of English. Language, 86(1), 186213.10.1353/lan.0.0189CrossRefGoogle Scholar
Chen, Ping (1986). Discourse and particle movement in English. Studies in Language, 10(1), 7995.10.1075/sl.10.1.05cheCrossRefGoogle Scholar
COCA The Corpus of Contemporary American English (520 million words, 1990–present) (2008–). Compiled by Mark Davies. Retrieved from http://corpus.byu.edu/coca/.Google Scholar
Dehé, Nicole (2002). Particle verbs in English: Syntax, information structure and intonation. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Ferreira, Fernanda (1991). Effects of length and syntactic complexity on initiation times for prepared utterances. Journal of Memory and Language, 30(2), 210–33.10.1016/0749-596X(91)90004-4CrossRefGoogle Scholar
Fraser, Bruce (1976). The verb-particle combination in English. New York: Academic Press.Google Scholar
Gries, Stefan Th. (2002a). The influence of processing on syntactic variation: Particle placement in English. In Dehé, Nicole, Jackendoff, Ray, McIntyre, Andrew, & Urban, Silke, eds, Verb-particle explorations. Berlin: Mouton de Gruyter, 269–88.Google Scholar
Gries, Stefan Th. (2002b). Preposition stranding in English: Predicting speakers’ behaviour. In Samiian, Vida, ed., Proceedings of the Western conference on linguistics. Fresno: California State University, 230–4.Google Scholar
Gries, Stefan Th. (2003). Multifactorial analysis in corpus linguistics: A study of particle placement. London: Continuum Press.Google Scholar
Gries, Stefan Th. (2021). (Generalized linear) Mixed-effects modeling: A learner corpus example. Language Learning, 71(3), 757–98.10.1111/lang.12448CrossRefGoogle Scholar
Hawkins, John (2000). The relative order of prepositional phrases in English: Going beyond manner-place-time. Language Variation and Change, 11(3), 231–66.Google Scholar
Hawkins, John (2003). Why are zero-marked phrases close to their heads? In Rohdenburg, Günter & Mondorf, Britta, eds, Determinants of grammatical variation in English. Berlin: Mouton de Gruyter, 175204.10.1515/9783110900019.175CrossRefGoogle Scholar
Hawkins, John (2014). Cross-linguistic variation and efficiency. Oxford: Oxford University Press.CrossRefGoogle Scholar
Hoffmann, Thomas (2011). Preposition placement in English. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Just, Marcel Adam & Carpenter, Patricia (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87(4), 329–54.10.1037/0033-295X.87.4.329CrossRefGoogle ScholarPubMed
Konieczny, Lars (2000). Locality and parsing complexity. Journal of Psycholinguistic Research, 29(6), 627–45.CrossRefGoogle ScholarPubMed
Kunter, Gero (2017). Processing complexity and the alternation between analytic and synthetic forms in English. Habilitation thesis, Heinrich-Heine-Universität Düsseldorf.Google Scholar
Kuznetsova, Alexandra, Brockhoff, Per B., & Christensen, Rune H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 126.10.18637/jss.v082.i13CrossRefGoogle Scholar
Levy, Roger (2008). Expectation-based syntactic comprehension. Cognition, 106(3), 1126–77.10.1016/j.cognition.2007.05.006CrossRefGoogle ScholarPubMed
Lohse, Barbara, Hawkins, John A., & Wasow, Thomas (2004). Domain minimization in English verb-particle constructions. Language, 80(2), 238–61.10.1353/lan.2004.0089CrossRefGoogle Scholar
Mitchell, D. C. & Green, David W. (1978). The effects of context and content on immediate processing in reading. Quarterly Journal of Experimental Psychology, 30(4), 609–36.10.1080/14640747808400689CrossRefGoogle Scholar
Mondorf, Britta (2003). Support for more support. In Rohdenburg, Günter & Mondorf, Britta, eds, Determinants of grammatical variation in English. Berlin: Mouton de Gruyter, 251304.10.1515/9783110900019.251CrossRefGoogle Scholar
Rohdenburg, Günter (2003). Cognitive complexity and horror aequi as factors determining the use of interrogative clause linkers in English. In Rohdenburg, Günter & Mondorf, Britta, eds, Determinants of grammatical variation in English. Berlin: Mouton de Gruyter, 205–49.10.1515/9783110900019CrossRefGoogle Scholar
Rühlemann, Christoph & Gries, Stefan Th. (2020). Speakers advance-project turn completion by slowing down: A multifactorial corpus analysis. Journal of Phonetics, 80, 100976.10.1016/j.wocn.2020.100976CrossRefGoogle Scholar
Vasishth, Shravan & Drenhaus, Heiner (2011). Locality in German. Dialogue and Discourse, 1(2), 5982.10.5087/dad.2011.104CrossRefGoogle Scholar
Wasow, Thomas & Arnold, Jennifer (2003). Post-verbal constituent ordering in English. In Rohdenburg, Günter & Mondorf, Britta, eds, Determinants of grammatical variation in English. Berlin: Mouton de Gruyter, 119–54.Google Scholar
Figure 0

Table 8.1 Summary of the 2018 TVNB CorpusTable 8.1 long description.

Figure 1

Figure 8.1 NCRS types in TV news broadcasts versus conversation (rate per 1,000 words)Figure 8.1 long description.

Figure 2

Figure 8.2 NCRS types across segments of TV news broadcasts (rate per 1,000 words)Figure 8.2 long description.

Figure 3

Figure 8.3 NP NCRS types in headlines versus other segments (rate per 1,000 words)Figure 8.3 long description.

Figure 4

Figure 8.4 NCRS types across networks (rate per 1,000 words)Figure 8.4 long description.

Figure 5

Table 9.1 Summary of cleft typesTable 9.1 long description.

Figure 6

Table 9.2 Composition of the EVLA-Corpus and word count

Figure 7

Table 9.3 Cleft-related variables and levels used in data annotationTable 9.3 long description.

Figure 8

Table 9.4 Evaluation-related variables and levels used in data annotationTable 9.4 long description.

Figure 9

Figure 9.1 Cleft constructions in the subcorpora of the EVLA-Corpus and the Control Corpus (normalised frequencies per 100,000 words)Figure 9.1 long description.

Figure 10

Figure 9.2 Evaluative and non-evaluative cleft constructions in the EVLA-Corpus and the Control Corpus (relative frequencies)Figure 9.2 long description.

Figure 11

Figure 9.3 Syntactic positions of evaluations in the EVLA-Corpus and the Control Corpus (relative frequencies)Figure 9.3 long description.

Figure 12

Table 10.1 Points of measurement in self-paced reading experimentTable 10.1 long description.

Figure 13

Table 10.2 Mean reaction times in milliseconds and standard deviations (SD)Table 10.2 long description.

Figure 14

Figure 10.1 Predicted scaled reaction times (RTs) in the spillover region across ordersFigure 10.1 long description.

Figure 15

Figure 10.2 Predicted logged and scaled reaction times (RTs) in the wrap-up region for the Order*Complexity*Idiomaticity interactionFigure 10.2 long description.

Figure 16

Figure 10.3 Predicted logged and scaled reaction times (RTs) in the wrap-up region for the Idiomaticity*Order*Complexity interactionFigure 10.3 long description.

Figure 17

Figure 10.4 Predicted logged and scaled reaction times (RTs) of the whole stimulus for the Order*Idiomaticity*Complexity interactionFigure 10.4 long description.

Figure 18

Table 10.3 Mean ratings and standard deviations (SD) across conditions in Experiment 2

Figure 19

Figure 10.5 Predicted ratings for the Order*Complexity interactionFigure 10.5 long description.

Figure 20

Figure 10.6 Predicted ratings for the Order*Idiomaticity interactionFigure 10.6 long description.

Figure 21

Figure 10.7 Predicted ratings for the Complexity*Idiomaticity interactionFigure 10.7 long description.

Accessibility standard: WCAG 2.2 AAA

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

The HTML of this book complies with version 2.2 of the Web Content Accessibility Guidelines (WCAG), offering more comprehensive accessibility measures for a broad range of users and attains the highest (AAA) level of WCAG compliance, optimising the user experience by meeting the most extensive accessibility guidelines.

Content Navigation

Table of contents navigation
Allows you to navigate directly to chapters, sections, or non‐text items through a linked table of contents, reducing the need for extensive scrolling.
Index navigation
Provides an interactive index, letting you go straight to where a term or subject appears in the text without manual searching.

Reading Order & Textual Equivalents

Single logical reading order
You will encounter all content (including footnotes, captions, etc.) in a clear, sequential flow, making it easier to follow with assistive tools like screen readers.
Short alternative textual descriptions
You get concise descriptions (for images, charts, or media clips), ensuring you do not miss crucial information when visual or audio elements are not accessible.
Full alternative textual descriptions
You get more than just short alt text: you have comprehensive text equivalents, transcripts, captions, or audio descriptions for substantial non‐text content, which is especially helpful for complex visuals or multimedia.
Visualised data also available as non-graphical data
You can access graphs or charts in a text or tabular format, so you are not excluded if you cannot process visual displays.

Visual Accessibility

Use of colour is not sole means of conveying information
You will still understand key ideas or prompts without relying solely on colour, which is especially helpful if you have colour vision deficiencies.
Use of high contrast between text and background colour
You benefit from high‐contrast text, which improves legibility if you have low vision or if you are reading in less‐than‐ideal lighting conditions.

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×