12.1 Introduction
The following quote serves as an illustration of a non-canonical sentence pattern that – across registers – distinguishes itself by its highly expressive and productive nature, namely the introductory-it pattern (henceforward intro-it), also referred to as ‘(it-)extrapositioning’ (Kaltenböck Reference Kaltenböck2004), ‘preparatory’ or ‘anticipatory it’ (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan1999; Hewings & Hewings Reference Hewings and Hewings2002).
It was reported to Minivan News that at 5 pm around 20–25 people descended upon the Maldivian High Commission in Colombo with placards demanding the release of the hunger-strikers still held in jail […].
Given its information-structural value, the pattern is commonly employed in both the spoken and the written mode (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan1999) and has thus received quite a bit of scholarly attention in English as a Native Language (ENL) and English as a Foreign Language (EFL) contexts, where it has been shown to entail versatile functional and structural possibilities but also to occur in preferred registers of employment (cf. Section 12.2). When it comes to English as a Second Language (ESL) varieties, however, research is surprisingly scarce. This also applies to the Englishes spoken in South Asian regions, whose speakers, in fact, constitute the largest number of ESL speakers across the globe (Bolton Reference Bolton2008: 6). The history of English in Asia began in the early seventeenth century, with the foundation of the East India Company in 1600 marking the beginning of British trading – and ultimately imperial – activities. Since then, at least six varieties of English with varying evolutionary states (Schneider Reference Schneider2007) have evolved in South Asia – influenced and shaped not just by the superstrate language but also by prevalent and persistent linguistic, cultural, and social customs.
In the present chapter, these six varieties, namely Indian (IndE), Bangladeshi (BgE), Nepali (NpE), Maldivian (MvE), Pakistani (PkE), and Sri Lankan English (SLE), are employed to fill the existing gap in academic discourse concerning the use of the intro-it in outer-circle varieties in general and newspaper language in particular. While in Section 12.2 the structural and functional peculiarities of the pattern as well as previous research will be discussed, Sections 12.3 and 12.4 are dedicated to our own study, with a presentation of our research gaps and questions and our database and methodology (Section 12.3) as well as our quantitative and qualitative results (Section 12.4). This is followed by a discussion of the findings, a conclusion, and an outlook in Section 12.5.
12.2 Taking a Closer Look at the Intro-it
12.2.1 Defining the Intro-it
As has been mentioned in the Introduction to this volume, there are two major approaches to non-canonicity that can be distinguished: the frequency-based approach and the theory-based approach. When it comes to the intro-it pattern, this represents a particularly interesting distinction. With regard to the former approach, its enablement of the adherence to common information-structural principles like those of end-weight and end-focus (Behaghel Reference Behaghel1909) make it the cognitively least demanding choice given that, in comparison with its syntactically canonical counterpart, the intro-it reaches a point of grammatical completeness before the notional subject is added. This fact benefits production, comprehension, and processing (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan1999: 677)Footnote 2 and makes the use of the pattern both the more natural and the more frequent choice.
From a structural or theoretical perspective, the situation is a little more complex: structurally, the notion of non-canonicity adopted by the researchers contributing to the present volume, in a nutshell, entails a deviation from an Sn‑V-X structure, with ‘Sn’ referring to the notional – and thus semantically loaded – subject and ‘X’ operating as a placeholder for what is to follow to make a given construction grammatically complete (see the Introduction to this volume). Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985: 53), for example, suggest seven basic clause patterns of English, which entail
to be followed by (1) zero, (2) Od-n, (3) CS, (4) Oi-n-Od-n, (5) Od-n-Co, (6) A, or (7) Od-n-A. Deviations from the default
structure are, of course, not random choices, but are triggered by complex and versatile (information-)structural mechanisms, a circumstance which, as was mentioned above, becomes particularly apparent in case of the intro-it. The pattern is defined by Biber et al. (Reference Biber, Johansson, Leech, Conrad and Finegan1999) as the use of ‘the dummy subject it … in the ordinary subject position, anticipating a finite or non-finite clause in extraposition’ (Reference Biber, Johansson, Leech, Conrad and Finegan1999: 155), a definition that already alludes to the structural diversity of the pattern. As is illustrated below, the expletive it, which excludes time, weather, or distance it (Callies Reference Callies2009), may be followed by either a copular or a lexical verb, while, depending on the verbal constituent and what is to be expressed, the postverbal element may take the form of an adjective phrase (AdjP), a noun phrase (NP), or a prepositional phrase (PP), which is then succeeded by the notional subject clause constituting either a that-clause, a to-infinitive-clause, an -ing-clause, a (w)h-clause (e.g., with whether or how), or an if-clause.
| It | + |
| + | AdjP NP PP | + | that-clause to-inf-clause -ing-clause (w)h-clause if-clause |
However, even if it is the most frequent choice, the role of it as a proxy is not limited to the subject of a construction but may affect the object as well (Kaltenböck Reference Kaltenböck2004: 65; Gentens Reference Gentens2016). As illustrated below, it may, in fact, be used to fill grammatically vacant positions in almost all of the seven basic clause patterns previously discussed, as illustrated in Table 12.1 (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 1391–3).
Naturally, the frequencies with which the non-canonical structures are employed vary. While extraposed that-clauses, for example, are generally named as the most frequent realisation (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan1999), extraposed -ing-clauses are most commonly associated with rather informal spoken discourse (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 1393).

Note. g refers to grammatical constituents while n refers to notional (or semantically loaded) ones.
Table 12.1Long description
The table displays different clause structures for the usage of "intro-it" in English. It includes three columns: "Canonical Variant," "Non-canonical Variant," and "Example."
Row 1: The canonical variant is S-V-zero, and the non-canonical variant is Sg-V-zero-Sn. Example: "It doesn’t matter whether you like it."
Row 1 (continued): The canonical variant is S-V-zero, and the non-canonical variant is Sg-V-pass-zero-Sn. Example: "It is claimed that she does not know him."
Row 2: The canonical variant is S-V-Od, and the non-canonical variant is Sg-V-Od-Sn. Example: "It surprised me to see you."
Row 3: The canonical variant is S-V-Cs, and the non-canonical variant is Sg-V-Cs-Sn. Example: "It is nice to see you."
Row 5: The canonical variant is S-V-Od-Co, and the non-canonical variant is S-V-Od-g-Od-g-Co-Od-n. Example: "He must consider it interesting working here."
Row 6: The canonical variant is S-V-A, and the non-canonical variant is Sg-V-A-Sn. Example: "It was on the news that he is guilty."
Row 7: The canonical variant is S-V-Od-A, and the non-canonical variant is S-V-Od-g-A-Od-n. Example: "She put it into his head that he should resign."
Each row provides a sentence example illustrating different syntactic forms using intro-it across basic clause patterns.
In addition to its formal diversity, the pattern, very generally speaking, also enables the evaluation or assessment of discourse content (Couper-Kuhlen & Thompson Reference Couper-Kuhlen and Thompson2008). It is associated with particular semantic or rhetorical domains which can be derived from the nature of the verb and/or verb-complement structure, which, in a way, ‘set(s) the scene’ for the subsequent clausal content. These include the expression of necessity, importance, ease, or difficulty by means of particular lexical bundles containing, most commonly, combinations of four (e.g., it is clear that, it is possible to, it is difficult to), five (e.g., it may be possible to, it is interesting to note), or even six words (e.g., it should not be forgotten that, it must have been difficult to) (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan1999). Many previous studies focus on the classification of verb and adjective phrases. Examples include Römer (Reference Römer2009) or Hewings and Hewings (Reference Hewings and Hewings2002), with the latter suggesting a fourfold distinction between hedging (1), emphasising (2), attributive (3), and attitude marking functions (4).
It seems reasonable to do some additional testing.
It is essential to take the medication on time.
It was proposed that the results had been forged.
It is surprising to see him here.
A more comprehensive approach was adopted by Larsson (Reference Larsson2017), who, unlike Hewings and Hewings (Reference Hewings and Hewings2002), whose focus was mainly on AdjPs and passivised verb phrases, also took postverbal NPs into account. Another advantage of her approach is the fact that, while her classification, too, includes attitude markers, hedges, and emphatics (as well as combinations thereof), the latter two functions are only assigned if there is linguistic ‘evidence’, such as amplifiers preceding the adjective (e.g., extremely surprising, very disappointing) as an indicator of an emphatic use or the use of seem or a modal verb of possibility (e.g., could, might) as indicators of hedging, making the analysis less subjective and thus more reliable. Her final category, observations, is reserved for neutral propositional content which, prior to her study, had been mostly disregarded.
Now, what unites research on the semantic or rhetorical functions of the sentence pattern is the fact that there appear to be population- as well as register-specific preferences (see also Zhang Reference Zhang2015). It thus remains for us to determine whether this also holds true for ESL in general and newspaper language in particular.
12.2.2 Theories and Previous Research
To the best of our knowledge, the first researcher to discuss extraposition in his work is Jespersen (Reference Jespersen1927), claiming that ‘words in extraposition may be added as a kind of afterthought after the sentence has been completed; they stand outside it and form, as it were, a separate utterance which might even be called a separate sentence’ (Reference Jespersen1927: §17.12). Now, even though this first perspective seems to extend to structures like dislocations as well (e.g., He is nice, your brother.) and thus encompasses a broader range of phenomena, it-extrapositioning has still received quite a lot of scholarly attention ever since. Research has been dedicated to the description of the structural and functional employments as well as to the discussion of the information-structural mechanisms that may be at play. With the postponement of clausal subjects or objects to the end of a construction while it occupies the grammatical position, the pattern, for example, operates in accordance with the principle of end-weight (Behaghel Reference Behaghel1909). The concept associates the placement of syntactically weighty elements towards the end of a sentence with reduced cognitive efforts for both the production and the comprehension of an utterance (see, e.g., Arnold et al. Reference Arnold, Losongco, Wasow and Ginstrom2000). The applicability of the concept to the intro-it has been confirmed by a number of researchers, including Huddleston and Pullum (Reference Huddleston and Pullum2002), Gómez-González (Reference Gómez-González1997), and Kaltenböck (Reference Kaltenböck2004). Simultaneously, the structure also serves the adherence to the given-new progression (e.g., Clark & Haviland Reference Clark, Haviland and Freedle1977; Prince Reference Prince and Cole1981) and the principle of end-focus (Behaghel Reference Behaghel1909: 138; Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 1356–61; Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan1999: 897–9). Both concepts are strongly connected to the efficiency of information packaging. Given that the dummy it merely fulfils a grammatical function and does not contribute any semantic value to the message that is being produced or perceived, attention is placed on the postponed clausal element, which is commonly associated with new and thus focused information. While the study by Herriman (Reference Herriman2000) supports the view that the structure enforces the principle of end-focus, Kaltenböck (Reference Kaltenböck2004) maintains that this is, indeed, strongly connected to the given-new status. He finds that the majority of instances with discourse-new clausal subjects (roughly 72%) were used in an intro-it construction. This is supported by Huddleston and Pullum (Reference Huddleston and Pullum2002), as well as Miller (Reference Miller2001), who describe discourse-new clausal subjects in non-canonical position as more felicitous or even mandatory.
As far as other language contexts are concerned, quite some research has been devoted to EFL learners already. While the accounts report a general overrepresentation of the phenomenon, it has been found that the structural and functional realisations may vary. One example includes the preference for extraposed to-clauses by some learner populations, including German and Swedish learners of English (Callies Reference Callies2009; Larsson Reference Larsson2016). Likewise, Hewings and Hewings (Reference Hewings and Hewings2002) and Larsson (Reference Larsson2017) have determined that learners tend to use the structure to increase the force of a claim rather than for hedging purposes, with the employment of unusually forceful adjectives, such as amazing or stupid, resulting in non-target-like uses by some learners (Römer Reference Römer2009).
When it comes to the ESL context, non-canonical patterns, and among these fronting and left dislocation in particular, have been described as common features of ‘New Englishes’ (e.g., Alsagoff & Lick Reference Alsagoff, Lick, Foley, Kandiah, Zhiming, Gupta, Alsagoff, Lick, Wee, Talib and Bokhorst-Heng1998; Sharma Reference Sharma, Butt and King2003, Reference Sharma and Hickey2012; Bhatt Reference Bhatt and Mesthrie2008; Mesthrie & Bhatt Reference Mesthrie and Bhatt2008), with a number of corpus-based studies providing detailed accounts of a variety of syntactic structures (Lange Reference Lange2012; Winkle Reference Winkle2015; Leuckert Reference Leuckert2019). One study, Götz (Reference Götz2017), investigates fronting in the newspaper language of the same varieties considered in the present study and finds interesting variety-specific preferences, thus proving that fronting is by no means limited to informal spoken data – as suggested by previous research. As to research into the intro-it – or postponement patterns in general – only a very limited number of studies have been conducted. One of the few exceptions is Dubey’s (Reference Dubey1989) monograph on Newspaper English in India, in which he briefly mentions that, in comparison with the movement of participles or parts of NPs, the cases of postponements employed in newspapers in India were almost exclusively limited to clausal subjects, presumably to achieve end-focus (Reference Dubey1989: 77).
It remains to be seen in what ways this attestation of a preference for extraposed clausal structures can be extended, both on a structural and a functional basis. The precise nature of the research gaps identified and our research questions will be discussed in the following section.
12.3 The Intro-it in South Asian Varieties of English
12.3.1 Research Gaps and Research Questions
In the previous description of the status quo of research into the topic, we have identified two major research gaps which the present chapter intends to fill. To the best of our knowledge, no detailed account of the intro-it has been provided in the South Asian English context. Therefore, we pose the following research questions:
12.3.2 Database and Methodology
In order to avoid overinterpretations of (spontaneous) speech and because we are mainly interested in nativised forms of Englishes, we investigated a corpus of acrolectal newspaper language that had been subjected to an editing process. In doing so, we are able to ascertain that all instances of the intro-it we found are deliberate uses. The database used for the present chapter is the South Asian Varieties of English Corpus (SAVE Corpus; Bernaisch et al. Reference Bernaisch, Koch, Mukherjee and Schilk2011) which, based on two sources each, contains the newspaper language of six ESL varieties in South Asia. It was compiled between 2008 and 2011 and contains approximately 3 million words per subcorpus (i.e., about 18 million words in total). Details on the papers used as well as the time spans considered can be found in Table 12.2.

Table 12.2Long description
The table lists newspapers from various South Asian countries, along with their respective U R Ls and the time span of their publication. The table is divided into 4 columns, where the headers are the aforementioned. Each country has two papers along with the relevant data. The rows are filled from left to right as follows:
1. For Bangladesh:
The corresponding data are Daily Star, New Age, www.thedailystar.net, ww.newagebd.com, 2003 to 2006, and 2005 to2007.
2. For India:
The corresponding data are The Statesman, The Times of India, www.thestatesman.net, http://timesofindia.indiatimes.com, 2002 to 2005, and 2002 to 2005.
3. For Maldives:
The corresponding data are Dhivehi Observer, Minivan News, www.dhivehiobserver.com, www.minivannews.com, 2004 to 2007/2008, and 2004 to 2008.
4. For Nepal:
The corresponding data are Nepali Times, The Himalayan Times, www.nepalitimes.com, www.thehimalayantimes.com, 2000 to 2007/2000, and 2002 to 2008.
5. For Pakistan:
The corresponding data are Daily Times, Dawn, www.dailytimes.com.pk, www.dawn.com, 2002 to 2006, and 2002 to 2007.
6. For Sri Lanka:
The corresponding data are Daily Mirror, Daily News, www.dailymirror.lk, www.dailynews.lk, 2002 to 2007, and 2001 to 2005.
With BrE as the historical input variety of the varieties under scrutiny, the news section of the British National Corpus (BNC) was employed as a control corpus. The first version of the BNC (i.e., BNC 1994) was compiled between 1991 and 1994 and has a size of about 100 million words, 9 million of which are part of the periodicals section used as part of our database.
We employed a modified random sampling technique to extract the articles to be analysed, making a point to attempt to have each file of the corpus (each of which contains a varying number of articles) feature in the analysis. Files and/or articles were disregarded, however, if they were too short, did not contain neutral newspaper language (e.g., birthday greetings, advertisements), or were used repeatedly in the corpus.
As Table 12.3 illustrates, roughly 8,000 sentences per subcorpus were extracted for the manual annotation process. They were copied into an Excel spreadsheet before identifying the instances of the intro-it, which were then annotated for a number of structural and functional variables, which can be found in Table 12.4.

Table 12.3Long description
The table is sectioned into 5 columns. The columns are labeled as: Country, Corpus, Number of sentences, total, Number of intro-its, total, Number of intro-its, p t s. There are 7 rows for different countries and an eight row for total. The data in the rows are filled from left to right as follows:
For Great Britain, the corresponding data are B N C underscore News, 8055, 143, and 17.75.
For India, the corresponding data are S A V E underscore I N, 8289, 165, and 19.91.
For Bangladesh, the corresponding data are S A V E underscore B D, 8034, 123, and 15.31.
For Maldives, the corresponding data are S A V E underscore M V, 7951, 280, and 35.22.
For Sri Lanka, the corresponding data are S A V E underscore S L, 8130, 256, and 31.49.
For Nepal, the corresponding data are S A V E underscore N P, 8245, 123, and 14.92.
For Pakistan, the corresponding data are S A V E underscore P K, 8098, 185, and 22.85.
For total, the corresponding data are 56802, 1275, and 22.45.

Table 12.4Long description
The table is sectioned into 2 columns. The columns are labeled as: variables and meaning. There are 14 rows for different variables. The data in the rows are filled from left to right as follows:
The data for S E M A N T I C F U N C T I O N is semantic function based on verb or verb-complement structure.
The data for S underscore length is length of the sentence of interest in words.
The data for I N T R O I T underscore T F is indication of whether the given sentence contains an intro-it.
The data for C O P U L A R underscore T F is indication of whether the verb used is a copular or not.
The data for P A S S I V E underscore T F is indication of whether the verb is used in the passive or not.
The data for T H A T underscore C L is indication of whether the postponed constituent is a that-clause.
The data for Z E R O underscore T H A T underscore C L is indication of whether the postponed constituent is a zero that clause.
The data for T O underscore I N F underscore C L is indication of whether the postponed constituent is a to-inf clause.
The data for I N G underscore C L is indication of whether the postponed constituent is an -ing clause.
The data for I F underscore C L is indication of whether the postponed constituent is an if-clause.
The data for W H underscore C L is indication of whether the postponed constituent is a wh-clause.
The data for H underscore C L is indication of whether the postponed constituent is an h-clause.
The data for I N T R O I T underscore S is indication of whether the intro-it affects the subject.
The data for I N T R O I T underscore O is indication of whether the intro-it affects the object.
Except for the s_length variable, which was determined using an Excel formula, each of the variables was coded manually by the authors of this chapter. While the majority are based on structural observations, the classification of the semantic function calls for some additional explanatory remarks. Due to the previously outlined advantages of her approach, we adopted Larsson’s (Reference Larsson2017) general distinction between attitude markers, expressing ‘the writer’s affective attitude towards what is stated in the clausal subject’, emphatics, as a means to ‘strengthen the force of the utterance’, hedges, used to weaken the forcefulness of an utterance and to indicate a lack of full commitment, and observations, which may express ‘affectively neutral … propositional content’ (Larsson Reference Larsson2017: 61).
Note, however, that we did not code for combinations of the former three categories because, after close consideration of the data at hand, we came to the realisation that particular categories override others. As the examples below illustrate, both emphatics (5) and hedges (6) exert a domineering influence on the perception of the meaning expressed by means of the attitude markers.Footnote 3
It is extremely sad that she cannot attend the event.
It might be useful to call him again.
In case of a combination of all three (7), the instance was considered a hedge as well, given that, irrespective of the strong terms in which a claim might be expressed, as soon as it is preceded by a hedge, its force is still weakened.
It is perhaps very upsetting to her that she had to cancel the party.
Finally, given that our database is comprised of newspaper language, generally characterised by neutrality and formality, we considered it reasonable to slightly extend the observational category. We propose a distinction between observation-neutral, observation-attitude, observation-hedging, observation-emphatics, and observation-reporting. While the former category comprises factual information without any kind of assessment (e.g., It is the government’s responsibility that the laws are enforced.), observation-attitude, observation-emphatics, and observation-hedging – as opposed to the simple attitude, emphatics, and hedging categories – are not concerned with the attitude of the author of the article (or with making their statement more or less forceful) but with that of, for example, the people they are writing about (e.g., It is the manager’s declared belief that he will win the election [attitude]; It has been criticised immensely that the minister did not attend the event. [emphatics]; The chairman maintained it seemed reasonable to take action. [hedging]). Finally, observation-reporting predominantly contains verbs that are employed in passivised verb structures and which are commonly used in the news sector to report events neutrally (e.g., It is reported that fifty people were wounded during the attack.)
Following the annotation process, all analyses were conducted using the software package R Studio. In order to attend to research question (1), the data were submitted to a regression modelling process using the R package dplyr that predicts the probability for an intro-it to occur based on the independent variables. To answer research questions (2a) and (2b), we zoomed in on the formal, functional, and semantic contexts of the intro-it by applying conditional inference trees (ctree) to our dataset (Hothorn et al. Reference Hothorn, Hornik and Zeileis2006). In a nutshell, ctree analysis recursively performs univariate splits of the dependent variable based on values of an independent variable or a set of covariates. As pointed out by Timofeev (Reference Timofeev2004) and Phelps and Merkle (Reference Phelps, Merkle, Eichhorn and Kovar2008), adopting such a classification tree approach presents a number of advantages that are relevant for our dataset: (i) it can handle small datasets, (ii) as a nonparametric statistical approach, this method only comes with a limited number of statistical assumptions, (iii) it can be applied to various data structures and is exceptionally efficient in handling categorical predictors, (iv) the output provides rankings of variable importance, and (v) the output of a classification tree analysis can easily be understood and interpreted. Please note that, in order to be in a position to reveal the effects of the individual variables on the complementation patterns of the intro-it and to thus provide a comprehensive account of the behaviour of these patterns in the individual corpora, we ran individual trees for each variable. The results of the analysis are presented and discussed in the following.
12.4 Results
12.4.1 The Likelihood of Using the Intro-it across SAVEs
When predicting the likelihood of using an intro-it to occur across SAVEs and BrE, we fitted a generalised linear model and took S_Length (logged) and Corpus as predictors. The interaction between the two variables turned out not to be significant, which means that the final model consisted only of the two main predictors S_Length (logged) and Corpus. The model is highly significant and explains 4.4% of the variance in the data (R2 = 0.044). Although this might not seem to be a high value, it performed significantly better than the baseline model, and visual inspection of the residual plots suggests that the model performed decently. The two main effects are illustrated in Figure 12.1.

Figure 12.1 Effect plots illustrating the main effects Corpus (panel 1) and S_Length (logged) (panel 2) predicting the likelihood for an intro-it to occur in the SAVE Corpus and the BNC_NEWS
Figure 12.1Long description
Plot a: corpus effect plot
This line graph is titled corpus effect plot and illustrates how the variable corpus affects the dependent measure I N T R O I T T F. The graph is designed to show categorical differences across various corpora.
The vertical axis is labeled I N T R O I T T F, measuring relative frequency or weight of the I N T R O I T T F function. It ranges from 0.000 to 0.030, with tick marks at regular intervals.
The horizontal axis is labeled corpus and displays seven discrete corpus labels:
B D, B N C, I N, M V, N P, P K, and S L. Each corpus label corresponds to a point on the graph, indicating the value of I N T R O I T T F, for that dataset. The points are connected by a continuous line to help visualize trends across corpora. Each point also has a vertical error bar, reflecting the confidence interval or variability for that data point.
The pattern of the graph is as follows:
It begins at a low value at B D, possibly the baseline corpus.
The line remains relatively flat through B N C and I N, suggesting minimal change in I N T R O I T T F.
A sharp peak occurs at M V, where I N T R O I T T F reaches its highest value across all corpora.
The value then drops at N P, indicating a decrease in frequency.
A slight rise follows at S L, though it does not reach the height seen at M V.
Overall, the plot shows non-linear variation in I N T R O I T T F across corpora, with some datasets contributing significantly more than others. The presence of error bars helps interpret the reliability of these observed differences.
Plot b: S length effect plot
This plot is titled S length effect plot, showing the relationship between S length likely sentence length or a sentence-length-related measure and the outcome variable I N T R O I T T F.
The vertical axis is labeled I N T R O I T T F, ranging from 0.00 to 0.10 in increments of 0.02.
The horizontal axis is labeled S length and spans from negative 2 to 4, suggesting that this variable is standardized e.g., z scored for comparability.
A single smooth curve is plotted on the graph to depict the effect of S length on I N T R O I T T F.
The curve starts near the bottom-left corner, with values close to 0.00 when S length is around negative 2, indicating that shorter sentences are associated with very low I N T R O I T T F values.
The line then gradually increases, becoming steeper as S length moves toward the upper end of the scale.
At around S length equals 4, the curve reaches close to 0.10, indicating that longer sentences are predicted to have higher I N T R O I T T F values.
Error bars or confidence intervals are visible along the line, growing wider at the extremes of the S length range, which indicates greater uncertainty in predictions for very short or very long sentences.
A rug plot is shown at the base of the graph along the x-axis, consisting of small vertical tick marks that indicate the density or distribution of actual data points. Most of these ticks cluster between negative 1 and 2, suggesting that most sentences in the data fall within this range of length.
The overall trend suggests a positive, nonlinear relationship: as sentence length increases, so does I N T R O I T T F, with more substantial increases for longer sentences.
It becomes immediately obvious that the use of the intro-it is not generally more likely to occur across SAVEs than in BrE. In fact, the likelihood of using this pattern is not significantly different from BrE in BgE, IndE, NpE, or PkE. However, we do see that the pattern is significantly more likely to occur in Sri Lanka and on the Maldives.
The second main predictor that increases the likelihood of using the intro-it is S_Length (logged). Since this effect does not engage in an interaction with Corpus, it seems that sentence length has a robust effect across varieties in the sense that the longer a sentence is, the more likely it is to make use of an intro-it. While an increased sentence length obviously generally increases the chance for non-canonical patterns to occur (e.g., cf. Götz Reference Götz2017; cf. Kircili, Chapter 13 in this volume; Kircili, Reference Kirciliforthcoming), it also makes sense from an information-packaging standpoint to make use of this pattern (e.g., when there are very long subjects; see Section 12.2.2). Here, the intro-it enables the writer to finalise the main structure of a sentence (i.e., Sg-V-Cs-Sn) and thus to be in a position to pack the main information (contained in the notional subject) into a structure that is in line with the principle of end-weight and thus more easily accessible to and processed by the reader, as in example (8) from BgE.
[It]Sg [is]V indeed [a curious coincidence]Cs [that just as the vice-president of the United States, Dick Cheney, stepped up the rhetoric against Iran, warning during a joint news conference with the Australian prime minister, John Howard, in Sydney that ‘it would be a serious mistake if a nation like Iran were to become a nuclear power’ and refusing to rule out the use of force to keep atomic weapons out of the hands of Tehran, Israel was reported by the conservative British broadsheet The Daily Telegraph to be in negotiations with the US-led forces for an ‘air corridor’ over Iraq should it decide on unilateral action on the theocratic west Asian state.]Sn
While this does not seem to be a particularly surprising finding, we can note that the cognitive effect of easing the processing load for the reader (and writer), which has been attested for other weight-sensitive structures, including the placement of subordinate clauses, and which has mainly been researched in the context of ENL varieties of English (e.g., Arnold et al. Reference Arnold, Losongco, Wasow and Ginstrom2000; Diessel Reference Diessel2005; Junge et al. Reference Junge, Theakston and Lieven2015), also seems to hold true for the intro-it and SAVEs.
12.4.2 Comparing the Use of the Intro-it across SAVEs
We will now turn to our second research question and zoom in on the cases in which intro-its were used across SAVEs by taking a look at their overall frequencies, their formal complementation patterns, and the semantic functions in which the construction is used.
Frequency of the Intro-it across SAVEs
Figure 12.2 illustrates the frequencies of the intro-it per thousand sentences (pts) across SAVEs as compared to the BNC_NEWS.Footnote 4

Figure 12.2 Normalised frequencies of intro-its per thousand sentences across SAVEs and the BNC_NEWS
Figure 12.2Long description
The horizontal axis labeled I N T R O I T P T S, or frequency of intro-its per thousand sentences, ranges from 0 to 35, in increments of 5. The horizontal axis labeled C O R P U S marks different labels of S A V E or South Asian variety of English corpus and B N C or British national corpus. The labels include B N C-N E W S, S A V E-B D, S A V E-I N, S A V E-M V, S A V E-N P, S A V E-P K, and S A V E-S L, and the bars associated with each corpus vary in shades. The frequency is more for S A V E-M V. All others are around the range of around 18 to 30.
As Figure 12.2 illustrates, we can observe considerable variation in the frequencies of the intro-it between the varieties, which is highly significant (χ2 = 132.86, df = 6, p < 2.2e-16). The highest frequencies are found in the newspaper language of the Maldives with 35.22 pts and Sri Lanka with 31.49 pts, followed by Pakistan with 22.85 pts, India with 19.91 pts, and Great Britain with 17.75 pts. The lowest frequencies are found in Bangladesh with 15.31 pts and Nepal with 14.92 pts. These frequencies are very much in line with the predicted probabilities of the pattern to occur (see Section 12.4.1). Compared to BrE, the historical input variety, we can observe that it is Bangladesh and Nepal that show similar – although slightly lower – frequencies, while India and Pakistan make use of the pattern somewhat more frequently. Yet, it is only writers from Sri Lanka and the Maldives who use the pattern significantly more frequently than the BrE ones.
Structural Preferences of the Intro-it across SAVEs and British English
Turning to the structural preferences across SAVEs, we will first focus on complementation patterns. In doing so, we look at the different clause types that the intro-it may occur in. The findings are illustrated in Figure 12.3.

Figure 12.3 Ctree illustrating the clause types used to complement the intro-it ~ Corpus + Passive in the SAVE Corpus and the BNC_NEWS
Note. TH = that-clauses, ZT = zero that-clauses, TO = to-infinitive clauses, IN = infinitive clauses, IF = if-clauses, WH = which-clauses, HO = how-clauses.
Figure 12.3Long description
Decision tree diagram analyzes clause types used to complement the intro-it, with particular focus on how they are influenced by the variable passive T F. At the top of the tree, there is a node labeled passive T F with a significance level of p less than 0.001, indicating a strong relationship between this variable and the outcome. The tree splits into two branches based on a numerical threshold.
Left Branch:
First corpus Node: This node is labeled corpus with a significance of p = 0.002, indicating a significant effect of the corpus on clause types.
B N C Node: This leads to a node labeled B N C with n = 115, meaning there are 115 observations from the B N C corpus. The corresponding bar graph here shows higher frequencies of clause types T H and T O.
B D, I N, M V, N P, P K, S L Node: This node represents other corpora, aggregating data from B D, I N, M V, N P, P K, and S L, with n = 825. The graph for this node shows elevated values for Z T and T O, suggesting that these clause types are more frequent in these corpora compared to T H and T O.
Right Branch:
Second corpus Node: This right branch leads to another corpus node, where the significance is p = 0.024. This is still statistically significant but weaker than the first corpus node.
B D, I N, M V, P K, S L Node: This node, with n = 288, displays a clear preference for the clause type T H, which has the highest bar in the graph, showing that these corpora use T H more frequently than others.
B N C, N P Node: This node includes data from B N C and N P corpora, with n = 44. The bar graph for this node shows elevated frequencies for both T H and Z T, indicating a higher usage of these clause types in the B N C and N P corpora.
In conclusion, the decision tree analyzes how different corpora affect the choice of clause types used with intro-it. The nodes highlight significant variations between corpora and show clear preferences for certain clause types within those corpora, with T H and T O being more prominent in some cases, and Z T and T O being more frequent in others.
The ctree illustrates that the most important and significant factor triggering particular complementation patterns is voice of the verb across all varieties (see node 1; p < 0.001). When we follow the left branch of the tree (i.e., the cases where there is a main verb that is not used in the Passive), node 2 illustrates significantly different complementation patterns (p = 0.003) between the BNC on the one hand (see final node 3) and the SAVE data on the other, which all cluster together (see final node 4). Albeit significantly different, the overall distribution patterns follow the same order regarding the three most frequent complementation patterns: the most frequent complementation pattern for active constructions in newspaper language is to-infinitives, with the SAVEs making use of this construction in almost 60% of the cases, compared to ca. 45% in the BNC. These are illustrated by example (9) from BrE and example (10) from PkE.
Now [it]Sg [is]V [up to the aid agencies]Cs [to make life as comfortable and easy as possible for the refugees.]Sn
[It]Sg [is]V [difficult]Cs [to quantify the gains and losses],Sn but unsuspecting investors ought to have lost heavily.
That-clauses figure as the second most frequent complementation pattern (cf., however, Section 12.2.2), with ca. 40% of all cases in BrE and ca. 36% in the SAVEs, followed by zero-that clauses, which BrE newspapers make use of in ca. 6% of the cases and SAVE newspapers in ca. 4% of the cases. There is another noteworthy finding among the low-frequency complementation patterns, namely that, in BrE, intro-it in active constructions is complemented using how-clauses (see (11)), which is something that we do not find in SAVEs at all, which, in turn, complement the intro-it using wh-clauses, as in (12), which is not observed in the British data at all.
But [it]Sg [has been]V [a mystery]Cs [how this and spongiform encephalopathies, such as mad cow, kill brain cells.]Sn
No shots were fired during the raid and [it]Sg [is not known]V [where the men are being hauled up.]Sn
With verbs in the Passive voice, if we follow the tree to the right, there is a significant split between the varieties again (see node 5, p = 0.028), and this time Nepal clusters together with Great Britain (see final node 7), whereas the other SAVEs form a cluster again (see final node 6). While both groups use that-clauses most frequently, the former generally shows more variety by also using different complementation patterns than the SAVEs, which almost exclusively use that-clauses, namely in over 80% of the cases (see (13)). This is followed by to-infinitive-clauses (ca. 8%, see (14)), and zero-that clauses (ca. 4%, see (15)), and very low proportions of the other complementation types (each under 2%). While British and Nepalese English speakers also complement the introductory-it pattern most frequently by that-clauses (ca. 65%, see (16)), they also show a high proportion of zero-that clauses (over 20%, see (17)), and also make use of how-clauses (ca. 5%, see (18)). The remaining patterns also only occur with very low frequencies, albeit still higher than in the SAVEs.
[It]Sg [was announced]V [that the party would look after the victim’s children.]Sn
During the Oslo meeting on June 26, [it]Sg [was agreed]V [to put pressure on Sri Lanka in this regards as well.]Sn
[It]Sg [was reported]V [ø he may have been manhandled.]Sn
[It]Sg [was agreed]V [that the wife was the chatelaine par excellence.]Sn
[ø It should be able to accumulate enough water by 2050,]Sn [it]Sg [is hoped.]V
[It]Sg [remains to be seen]V [how much Biggs has got left and even Duff has made mistakes.]Sn
In addition to the verb’s voice, the Verbtype and whether a subject or an object is referred to by the intro-it have an effect on the complementation type. This is illustrated in Figure 12.4.

Figure 12.4 Ctree illustrating the clause types used to complement the intro-it ~ Corpus + Verbtype + Function in the SAVE Corpus and the BNC_NEWS
Figure 12.4Long description
A diagram illustrates a conditional inference tree used to analyze clause types in relation to verb type and corpus. The diagram consists of nodes connected by lines, and it presents how these nodes break down into different categories based on probabilities.
Top Node:
Verb type node: The conditional inference tree begins with a top node labeled verb type, which contains a probability value that indicates the likelihood of a certain verb type influencing the outcomes in subsequent nodes. This node is connected to two lower nodes, each labeled corpus with different associated probabilities. This suggests that the selection of a corpus or the type of corpus plays a significant role in determining the pattern of clause types used with different verb types.
Branching Nodes:
First corpus node: The left branch leads to the first corpus node. This node further branches out into additional nodes.
Second corpus node: The right branch leads to another corpus node, which connects to further nodes as well. These branching nodes break down the data further, leading to specific groupings of clauses or categories.
Bar Graphs:
At the bottom of the diagram, five bar graphs are presented, each corresponding to a different node with a specific sample size n.
Node 3 where n = 27: The first bar graph is labeled node 3 where n = 27. The y-axis ranges from 0 to 1, representing the frequency of different clause types, and the x-axis labeled with various clause types: T H, Z T, T O, I N, I F, W H, H O. The bar graph displays varying heights of bars corresponding to each clause type, indicating their relative frequency within this node.
Node 5 where n = 7: The second bar graph is labeled node 5 where n = 7. It follows the same y-axis and x-axis as Node 3, displaying the distribution of clause types across the data, with each bar corresponding to one of the clause types listed. The heights of the bars reflect how frequently each clause type occurs in this specific node, to-infinitive clauses being most frequent.
Node 6 where n = 365: The third bar graph is labeled node 6 where n = 365. This graph also shows the frequency of the seven clause types for the second largest subset of the data with 365 observations. The bar heights indicate the frequency distribution of different clause types in this data segment, that-clauses being most frequent.
Node 8 where n = 106: The fourth bar graph is labeled node 8 where n = 106. Like the previous graphs, it shows the frequency of clause types for a dataset with 106 observations. The bar heights show the distribution of different clause types in this segment of the data.
Node 9 where n = 767: The fifth bar graph, labeled node 9 where n = 767, represents the largest subset of the data. The bar heights in this graph indicate how the various clause types are distributed in this final node. To-infinitive clauses are most frequent, followed by that-clauses.
T O has the highest value in all the nodes except node 6 where T H has the highest value.
The ctree in Figure 12.4 illustrates that, across all corpora, complementation patterns in intro-it sentences differ significantly depending on the Verbtype used (see node 1, p < 0.001). If a lexical verb is used, there is another significant split between the corpora (see node 2, p = 0.002), with NpE showing significantly different complementation patterns (see node 3) than the other SAVEs, however, this time plus the BrE speakers. While in NpE, for lexical verbs there is an almost even preference of complementing the intro-it with that-clauses, as in (19), or to-infinitive-clauses, as in (20), for the other varieties, there is another interesting split concerning the functional element that is referenced by the intro-it (see node 4, p = 0.04)): if the it-pattern refers to an object, the six varieties show a clear preference for selecting a to-clause to complement it in ca. 70% of the cases (see node 5), as illustrated in (21), whereas subjects are complemented by that-clauses in almost 80% of the cases (see node 6), see example (22). As far as copular verbs are concerned, the SAVEs behave significantly differently from BrE (see node 7, p = 0.003), with BrE speakers showing an almost even distribution between that- and to-clauses (see node 8), illustrated in (23) and (24), and SAVE speakers choosing to-clauses in over half of the cases (see node 9), as in example (25).
[It]Sg [was carefully instilled]V [in her]A [that all girls grew old before they grew up.]Sn
[It]Sg [took]V [divers from the Gent Fire Brigade]Od [two days]A [to retrieve Prem Prasad’s body.]Sn
But a senior banker, Shaukat Tarin, told Dawn that [it]Sg [was]V [best]Cs [to leave it to shareholders to assess the merits and demerits.]Sn
[It]Sg [saddens]V [me]Od [that educated people in the government could do such things.]Sn
[It]Sg [has become]V [increasingly apparent]Cs [that Thomson belonged to the American experimental tradition.]Sn
[It]Sg [was]V [very difficult]Cs [for a court to say what a reasonable mother would have done in circumstances which were almost hypothetical.]Sn
But [it]Sg [will be]V [difficult]Cs [to say that the dead has been mourned in a befitting way by the nation itself.]Sn
After having looked in detail at the different structural preferences across the SAVEs and BrE, we will now turn to the semantic functions that are associated with the intro-it.
12.4.3 Semantic Function of the Intro-it across SAVEs and the BNC_NEWS
We will now turn to the semantic functions in which the intro-it is used. The ctree in Figure 12.5 illustrates a number of variety-specific preferences. After the first significant split between the British, Nepalese, and Maldivian varieties on the one hand, and BgE, IndE, PkE, and SLE on the other hand (see node 1, p < 0.001), if we first follow the branch to the left, the most noteworthy difference from the latter group is that the former group shows the highest proportions of the pattern used in an attitudinal function. Further down the tree, there is another significant split that clusters the British and the Nepalese speakers together (see final node 3) and separates them from the Maldivian speakers (see final node 4). British and Nepalese speakers show a clear preference for using the intro-it in an attitudinal function in ca. 36% of the cases, see (25), followed by an almost equal proportion of observation-neutral and observation-attitudinal cases with ca. 14% each (see final node 3), as illustrated in (26) and (27), respectively. In comparison, apart from a similarly high proportion of ca. 30% of all cases in the attitudinal function, MvE (see final node 4) also shows a relatively high proportion of intro-its that are used either in an observation-reporting function (see example (28)) or observation-neutrally (as in (29)).

Figure 12.5 Ctree illustrating the semantic functions in which the intro-it is used by the different SAVEs vs. the BNC_NEWS
Note. A = attitude, E = emphatic, H = hedging, OA = observation-attitudinal, OE = observation-emphatics, OH = observation-hedging, ON = observation-neutral, OR = observation-reporting
Figure 12.5Long description
The tree diagram analyzes the semantic functions of the introductory it across different corpora. The diagram consists of a hierarchical structure of nodes connected by lines, illustrating the decision-making process and categorization of data. Here’s a detailed description of the diagram:
Top Node:
Corpus node: At the top of the tree structure is a node labeled corpus, which contains various statistical p-values indicating the significance of different corpora in the analysis. These values help determine how strongly each corpus influences the results in subsequent branches of the diagram.
Branching Nodes:
The corpus node branches into several additional nodes based on specific corpora, each representing a different set of data sources or variables. The branches are labeled as follows: B N C, M V, N P, S L, B D, I N, and P K.
Each branch represents a corpus or a specific set of data from the given categories. These nodes lead to further breakdowns of the data based on the semantic functions of the introductory it.
Bar Graphs:
Underneath each branch, there are bar graphs labeled node 3 through node 9, representing different subsets of the data, categorized according to the various corpora. These nodes display the distribution of semantic functions across the following categories: A, O A, O E, O H, O N, and O R.
Each bar graph corresponds to a specific node and shows the frequency of each semantic function for that subset of data. The Y-axis of each bar graph ranges from 0 to 1, indicating the relative frequency of each category. The X-axis contains the aforementioned categories, each representing a different semantic function that the introductory it may take.
The highest value is for A in nodes 3 and 4 and O A for all the other nodes. The graphs follow an overall increasing trend except nodes 3 and 4.
It would not be prudent to count too much on any unilateral pre-election cease-fire.
Now it is up to the aid agencies to make life as comfortable and easy as possible for the refugees.
The Himalmedia poll tried to gauge the people’s enthusiasm for general elections, and more than half the respondents felt it was ‘inappropriate’ for the prime minister to call elections during an emergency, while 29.4% agreed with the decision.
It has been pointed out that almost everyone in the broader democracy movement has charges pending against them.
It takes 35 minutes to reach the island from Male’ by launch.
When we follow the right branch of the tree, an overall trend that Sri Lanka, India, Pakistan, and Bangladesh have in common is the high proportion of using the intro-it in observation-attitudinal and -reporting functions, as compared to the other group. More specifically, we can observe that SLE splits off from the remaining varieties first (see node 5, p < 0.001) with the highest proportions of observation-attitudinal (see (30)) and -reporting (see (31)) functions (see final node 6).
‘It is good to see Mr. Samaraweera and Mr. Sooriyarachchi finally take a solid decision, because things were getting quite hard for them in the government,’ he said.
In fact it has been said that this was one of the root causes of the ethnic problem in Sri Lanka.
Further to the right of the tree, node 7 (p = 0.01) splits off IndE from BgE and PkE, with IndE showing the highest proportions of the intro-it to occur in observation-reporting (as in (32), -attitudinal (33), and -neutral (34) functions (see final node 8). Pakistan and Bangladesh, finally, use the pattern most frequently in an observation-attitudinal (see (35)) function, followed by an observation-reporting (see (36)) and -attitudinal (37) function (see final node 9).
It was announced that the party would look after the victim’s children.
‘It is hard to I [sic] they are gone forever,’ said an emotional batchmate.
It takes a two-hour gruelling walk to reach Kodiyanpalayam.
‘It is obvious why four-day cricket struggles to attract spectators,’ said Woolmer.
It is expected that virus-less blood is safe blood.
It is difficult to loosen the gripping power of the long held index. (PK_DA_2007–01-21
It is clearly visible that the major split between the two variety groups shows different preferences for using the intro-it in an attitudinal function (in the case of the British, Nepalese, and Maldivian speakers), whereas the remaining SAVEs prefer using the pattern in observational functions of different subtypes.
12.5 Discussion, Conclusion, and Outlook
In this chapter, we set out to investigate the use of the intro-it in SAVEs as compared to BrE. First, we were interested to find out whether the construction is a frequent phenomenon in SAVEs (compared to BrE), as has been reported for other non-canonical structures (e.g., Alsagoff & Lick Reference Alsagoff, Lick, Foley, Kandiah, Zhiming, Gupta, Alsagoff, Lick, Wee, Talib and Bokhorst-Heng1998; Sharma Reference Sharma, Butt and King2003, Reference Sharma and Hickey2012; Bhatt Reference Bhatt and Mesthrie2008; Mesthrie & Bhatt Reference Mesthrie and Bhatt2008). Here, we found that the construction is indeed more likely to occur in MvE and SLE than in BrE, but none of the other SAVEs show high frequencies of this pattern. The same holds true for the normalised frequencies of the intro-it across varieties per thousand sentences. In this respect, neither the likelihood of using the intro-it nor the actual frequency of usage follows the overall trend that has been found for other non-canonical patterns in ESL types of English. However, we do find high frequencies in Sri Lanka and on the Maldives, as compared to the other varieties under scrutiny. MvE has previously been shown to behave similarly to BrE for other structural features (e.g., Mukherjee & Gries Reference Mukherjee and Th. Gries2009; Götz Reference Götz2017), whereas Sri Lanka has often been claimed to behave similarly to India, possibly due to epicentral influences (e.g., Mukherjee Reference Mukherjee and Stierstorfer2008). As far as the frequencies of the intro-it in newspaper language are concerned, however, neither of these observations can be corroborated. For Sri Lanka, we could argue that this finding is very much in line with recent literature that situates SLE between phase 3 and phase 4 concerning its evolutionary status (e.g., Bernaisch Reference Bernaisch2015; Kircili et al. Reference Kircili, Degenhardt, Bernaisch and Götz2024). In this phase, we would indeed expect innovative and particular features to emerge in SLE, including a more frequent use of certain structural features, such as the construction at hand. For MvE, on the other hand, these high frequencies seem to be a rather unexpected finding. Therefore, another explanation could be the influence from the indigenous languages spoken in these two speech communities, because a structurally similar feature seems to exist in both Sinhala (Fernando Reference Fernando and Gooneratne1973), the language most frequently spoken in Sri Lanka, and Dhivehi (Fritz Reference Fritz2004), the language most frequently spoken on the Maldives. We might tentatively interpret this finding in the sense that there might (also) be L1 influences at play, as documented for other structural features in SAVEs (e.g., Lange Reference Lange2007 on focus marking in IndE). This hypothesis, however, will require further and much more systematic investigations on different databases which include information on the speakers’/writers’ L1s in order to make more robust claims.
Turning to the structural and semantic preferences of the intro-it, we found that the primary factors that trigger different structural complementation patterns are linguistic ones. More specifically, the quality of the verb has the main influence on the choice of the complementation pattern, that is, we found significantly different complementation preferences determined by whether the verb is (a) in the active or in the passive voice or (b) whether the it is used with a lexical or a copular verb. For verbs in the active voice, the preferred complementation pattern across varieties is to-infinitive clauses, whereas for passive constructions that-clauses are used most frequently. Also, as far as newspaper language is concerned, the previously reported observation that the intro-it is mainly complemented by that-clauses (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan1999 on British and American conversation data) only holds true for lexical verbs when the subject is extraposed. For all other cases (i.e., for lexical verbs when objects are extraposed and for all copular verbs), the most frequent complementation pattern is the to-infinitive clause. Extraposed -ing clauses, which have been reported to occur frequently in rather informal spoken discourse (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 1393), accordingly, do not feature in newspaper data at all. The pattern of the intro-it thus seems to be very genre-sensitive, which stresses the necessity of taking register variation thoroughly into account when describing World Englishes (see also Mahboob & Liang Reference Mahboob and Liang2014). This fact notwithstanding, such clear preferences triggered by linguistic factors across all varieties are particularly noteworthy in the sense that they might be universal features across all types of Englishes, and that they might indeed constitute parts of the ‘Common Core’ (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 16) of English.
However, below this level, we did find the complementation patterns of BrE to be significantly different from the SAVEs, or clustering together with MvE or NpE. Turning to our second research question, whether we can observe different preferences in the form of the extraposed clauses between the SAVEs and BrE, we see that our findings are very much in line with previous research on other non-canonical features in SAVEs using the same database (e.g., Götz Reference Götz2017 on fronting). We attempt to explain these similarities in that MvE and NpE do not seem to have emerged as far as the other varieties in terms of their evolutionary state and therefore still behave more similarly to the historical input variety. The Englishes of Pakistan, India, Bangladesh, and Sri Lanka seem to have emerged – and distanced themselves – furthest from BrE, while MvE, and NpE are still behaving most similarly to BrE. In the case of Nepal, for example, researchers have not even reached consensus on whether NpE is an EFL or ESL type of English (see Götz Reference Götz2022 for a summary of the discussion). We therefore might assume that innovative uses of the intro-it might correlate with the evolutionary state (Schneider Reference Schneider2007) of these New Englishes, meaning the more ‘advanced’ a variety is, the more it will deviate from BrE or American English (see also Mukherjee & Gries Reference Mukherjee and Th. Gries2009; Götz Reference Götz2017). This observation also seems to correlate with the status of the English language within the speech community (see also Koch & Bernaisch Reference Koch, Bernaisch, Andersen and Bech2013).
As far as the semantic functions of the intro-it are concerned, we see more variety-specific diversity than on the structural level: apart from BrE and NpE as well as BgE and PkE forming pairs, each of the other varieties has developed unique semantic preferences concerning how they use the intro-it. While it is again Nepal and the Maldives that show the most similarities to BrE (which we would like to assume to be the case for similar reasons as discussed above), Sri Lanka, India and Bangladesh, and Pakistan use the intro-it in significantly different semantic functions than the other varieties. This might give rise to the assumption that cultural differences become most visible in their different semantic profiles. Alternatively, this might be an indication of different writing styles being employed in newspapers in different varieties of English: some speech communities might put more emphasis on writing about different attitudes, whereas other speech communities might focus more on observational attitudes or observed reports. While only looking into one syntactic pattern, however, this interpretation might at most turn into a research hypothesis for future research. Hence, we would like to put special emphasis on this aspect as being a particularly valuable avenue for future research, because, so far, not many studies have discussed the topic of ‘semantic nativisation’ in ESL yet (see, however, Werner & Mukerjee Reference Werner, Mukherjee, Hoffmann, Rayson and Leech2012).
As we hope to have shown with the present study, investigating non-canonical syntactic features in SAVEs from different perspectives can be a worthwhile endeavour, as it has brought to light linguistic features that apply to different types of Englishes and those that are unique and characteristic of certain varieties of English. The intro-it in particular has been a rather underresearched phenomenon within the ESL paradigm, so that we hope to have been able to put this construction ‘on the research map’ in ESL varieties of English.
Also, extending the scope of intro-it studies to the newspaper genre has brought to light different structural complementation patterns that have not yet been revealed for other text types or modes. The same holds true for the semantic categories that we extended in the present study. For the newspaper genre, reporting seems to be a highly frequent function, so that it seemed necessary (and turned out to be relevant) to extend this category further. It would be worthwhile for future research to investigate different (non-canonical) features in the same (but also in different) database(s) in order to test if our findings are universally valid for the varieties under scrutiny or whether they might be construction-, or even database-specific.








