1. Introduction
Research in usage-based Construction Grammar (CxG) has in many ways followed the more general trends and developments in corpus-linguistic studies. While the early constructional research in the late 1980s and the 1990s largely focused on theory-formation and the synchronic analysis of constructions (e.g. Fillmore et al. Reference Fillmore, Kay and O’Connor1988; Goldberg Reference Goldberg1995), in more recent research constructionist approaches have been successfully applied to diachronic investigations (e.g. Noël Reference Noël2007; Bergs & Diewald Reference Bergs and Diewald2008; Fried Reference Fried2009; Hilpert Reference Hilpert2013, Reference Hilpert2024; Traugott & Trousdale Reference Traugott and Trousdale2013; Barðdal et al. Reference Barðdal, Smirnova, Sommerer and Gildea2015) and to the study of World Englishes (Hoffmann Reference Hoffmann, Buschfeld, Hoffmann, Huber and Kautzsch2014, Reference Hoffmann2020; Brunner Reference Brunner2022; Brunner & Hoffmann Reference Brunner, Hoffmann, Ngefac, Wolf and Hoffmann2022). Research on World Englishes has been greatly facilitated in the last two decades both by theoretical advances, such as the Dynamic Model of Postcolonial Englishes (Schneider Reference Schneider2007), and by new corpus resources, most notably the 1.9-million-word Corpus of Global Web-based English (GloWbE; Davies & Fuchs Reference Davies and Fuchs2015). While the Dynamic Model has allowed researchers to formulate detailed hypotheses on lexico-grammatical variation and innovation based on a rich sociolinguistic model, GloWbE provides plentiful data for the analysis of various low- and medium-frequency constructions that are challenging to study with smaller corpus resources, such as the different components of the International Corpus of English family of corpora (ICE; Greenbaum Reference Greenbaum1996). These advances also make it possible to study the productivity and global variation of the construction that is the topic of this article, i.e. the Complex Modifier Construction (CMC). Our specific focus is on left-headed prenominal constructions instantiated by constructs such as easy-to-use, better-than-expected and off-the-charts.
As an explanatory framework, the Dynamic Model connects synchronic observations with historical information about different varieties of English. Based on the socio-political and sociolinguistic situation in the regions where the varieties are spoken, the model assigns each variety into a distinct ‘phase’, which is intended to capture the progress of the postcolonial variety from the language spoken at the time of the early settlers/colonisers all the way to a fully independent variety with its own social and regional dialects (Schneider Reference Schneider2007: 52–5; see section 2.1 for more information). Importantly for the topic of this article, the Dynamic Model provides an interesting testing ground for the study of constructional productivity in World Englishes: a number of recent studies have provided empirical evidence for a hypothesis that constructional productivity correlates with the evolutionary phase of the variety, so that the varieties that are more advanced according to the model exhibit higher productivity than the less advanced ones (e.g. Hoffmann Reference Hoffmann, Buschfeld, Hoffmann, Huber and Kautzsch2014, Reference Hoffmann2020, Reference Hoffmann2021; Brunner & Hoffmann Reference Brunner, Hoffmann, Ngefac, Wolf and Hoffmann2022). For example, in partially filled constructions, such as the Comparative Correlative Construction (e.g. the more you eat, the fatter you get; Hoffmann Reference Hoffmann, Buschfeld, Hoffmann, Huber and Kautzsch2014) or the V the Ntaboo-word out of -construction (e.g. Michael Fassbender acted the shit out of this one; Hoffmann Reference Hoffmann2020), the less advanced varieties typically make use of more substantive constructions that are used with a high frequency in the input variety, whereas the more advanced varieties show greater slot productivity. According to Hoffmann, this reflects the fact that the abstract schemas (macro-constructions) that license the constructs may not be as well entrenched in the less advanced varieties than in the advanced ones,Footnote 1 which means that the speakers of the less advanced varieties make greater use of individual micro-constructions and partially filled macro-constructions (e.g. V the shit out of instead of V the Ntaboo-word out of). In other words, productivity in this model is explained by a higher degree of entrenchment of the more schematic constructions (see Hoffmann Reference Hoffmann, Buschfeld, Hoffmann, Huber and Kautzsch2014: 174).
The Dynamic Model also allows us to make predictions about the productivity of the CMC. According to Schneider (Reference Schneider2007: 82, 89), L2 varieties of English tend to be structurally simpler when compared to their input varieties due to cognitive processes related to second language acquisition (SLA), and the less advanced the variety, the simpler it is predicted to be in terms of grammatical structure. However, when it comes to the complexity of the noun phrase in particular, previous research has shown that the typological profiles of the substrate languages involved (head-initial versus head-final) also have an effect on the complexity of NP modifiers in the local variety of English (Brunner Reference Brunner2014, Reference Brunner2017; Akinlotan Reference Akinlotan2018; see section 3). More specifically, head-initial languages generally favour post-modification as opposed to premodification, and as a consequence, premodifiers have been found to be less frequent and less complex in varieties of English with largely head-initial substrate languages when compared to those with head-final substrate languages. By studying the productivity of the CMC in World Englishes, our goal in this paper is to establish whether the productivity in each variety correlates primarily with its evolutionary phase in the Dynamic Model or with the typological profile of the most important substrate languages.
The CMC is particularly well suited for this kind of comparison, as there is evidence that complex left-headed premodifiers have greatly increased in frequency in English in the latter part of the twentieth century (Günther Reference Günther2019: 663) – a development that we will later show to be led by American English (AmE). In our analysis, we focus on comparing the prediction derived from the Dynamic Model to the one that draws on the influence of typological features of the respective substrate languages in language contact settings. If the evolutionary phase that a certain variety has arrived at can explain the differences in the productivity of the CMC, the construction should have spread more readily from AmE to other phase V varieties of English, such as British English, Canadian English and Australian English, while less advanced varieties, such as Philippine English, Tanzanian English or Hong Kong English (all in phase III; Brunner Reference Brunner2022: 3) should show weaker productivity. However, if the predictions of the language contact hypothesis are supported by the data, the Englishes spoken in regions with predominantly head-final substrate languages (e.g. Singaporean English, Hong Kong English) should show greater productivity when compared to varieties with predominantly head-initial substrate languages (e.g. Tanzanian English, Kenyan English). As the varieties of English spoken in Africa and South-East Asia differ systematically according to this parameter, our analysis focuses mainly on comparing the West and East African varieties (with mostly head-initial substrate languages) to the South-East Asian varieties of English (with mostly head-final substrate languages).
After this introduction, we continue in section 2 by providing an overview of the relevant literature on World Englishes research and previous studies on complex premodifiers in English. In section 3, we offer a brief account of the theoretical basis of the constructional analysis employed in this article and discuss the CMC in more detail. The focus of section 4 is on data and methodology; here, we introduce the GloWbE corpus used in the case studies and explain the methods used in our analysis. Section 5 presents the results of our case studies, and section 6 concludes the article with a discussion of our main findings and some suggestions for future research.
2. Background
2.1. Structural complexity in World Englishes research
In research on the development of Englishes around the world, Schneider’s (Reference Schneider2007) Dynamic Model of the Evolution of Postcolonial Englishes (PCEs) has become one of the most influential frameworks. The model assumes that emerging varieties of English in postcolonial contexts typically follow an underlying, fundamentally uniform evolutionary process brought about by the social dynamics between two main groups of speakers involved, i.e. a settler strand and an indigenous population. Similar historical, political and (socio)linguistic factors are thought to be at work in all (post)colonial contact situations, and some synchronically observable differences between PCEs may be regarded as consecutive stages in a diachronic process. At the heart of the model are five evolutionary phases that PCEs go through, from the foundation phase (I), when English is first transplanted to a new colony, over a nativisation phase (III), in which the first structural linguistic innovations at the interface of lexis and grammar occur, and finally to the differentiation phase (V), when regional, social and ethnic dialects develop. Stronger social contact between the two groups leads to greater linguistic interaction, and language contact in general depends heavily on the socio-political conditions.
Language contact thus plays a major role in Schneider’s model. On the one hand, it is essentially linked to social contact which triggers linguistic and cultural interaction and the rewriting and development of identity constructions. On the other, with a view to linguistic accommodation between the two strands, language contact is identified as one of three major clusters of sources and processes of the nativisation of PCEs on a structural level (Schneider Reference Schneider2007: 100), e.g. in phonology, vocabulary, lexico-grammar and syntax. Contact includes the selection and adoption of elements from different, competing systems, especially from the indigenous languages.
Previous studies on NP-complexity in varieties of English have linked the occurrence and complexity of pre- and postnominal modification to two explanatory factors: SLA-induced structural simplification and language contact. As to the first factor, L2 varieties as contact varieties of English in a general sense are likely to exhibit simpler constructions than their input varieties due to the general cognitive processes of SLA (Schneider Reference Schneider2007: 82, 89). The extent of simplification that can be observed in a certain variety is assumed to depend on the respective variety’s evolutionary advancement in the Dynamic Model: more advanced varieties are expected to show higher degrees of complexity because the ratio of native speakers increases and the variety will be used in a broader range of domains, while the range of SLA contexts and the necessity for simplification decreases. As to language contact, the typological features of the major substrate languages are expected to affect the overall incidence of head-final/head-initial NPs and the degree of complexity of pre- and postmodifying structures (see e.g. Schilk & Schaub Reference Schilk and Schaub2016; Brunner Reference Brunner2017; Akinlotan Reference Akinlotan2018; Brato Reference Brato2020).
Several recent studies have examined NP-complexity in African Englishes. For the West African variety of English in Ghana, Brato (Reference Brato2020) finds that about 60 per cent of NPs in the two corpora he examined (the Ghanaian component of ICE and the Historical Corpus of English in Ghana) are not premodified at all with only 6 per cent showing complex modifiers. The number of complex premodified NPs was expected to be rare as the most widely used L1s in Ghana use postmodification only. However, Brato finds that, over time, NP-complexity increases as speakers become more proficient and employ more sophisticated structures. For Nigerian English, Akinlotan (Reference Akinlotan2018) observes that simple, i.e. one-word premodifiers (typically adjectives) occur much more frequently than complex premodifiers (70 vs 30 per cent). Two- and three-word or longer structures are very rare, which testifies to only minimal syntactic complexity in that variety. Similar findings were obtained by Brunner (Reference Brunner2014, Reference Brunner2017) for Kenyan English: premodified NPs occur at a comparatively low rate when compared to postmodified NPs. These findings for African Englishes seem to be affected by the typological profiles of the substrate languages because African languages exhibit ‘a greater tendency to place modifiers after nouns than languages in other parts of the world’ (Dryer Reference Dryer, Hieda, König and Nakagawa2011: 287).
Quite the opposite has been observed for South-East Asian Englishes, especially English in Singapore. Brunner (Reference Brunner2014, Reference Brunner2017) reports a high frequency of premodified NPs (i.e. head-final structures) and a much lower proportion of postmodified NPs (i.e. head-initial structures) for this variety. He links this to the fact that the varieties of Chinese (Mandarin, Cantonese, Hokkien) used as L1s in several South-East Asian English-speaking countries do allow premodification (Brunner Reference Brunner2014: 26). Mazaud’s (Reference Mazaud2004) study of complex premodifiers, i.e. constructions consisting of two or more words used between a determiner and a head noun, even found the highest token frequency and number of original formations in Singaporean English, which is also the most advanced variety in terms of its evolutionary status in the Dynamic Model in her study.
With a view to the different instantiations and productivity of the CMC examined in the present article, several recent corpus-based studies on World Englishes in the framework of CxG suggest that constructional productivity correlates positively with the phase that a variety has advanced to in the Dynamic Model (Brunner & Hoffmann Reference Brunner and Hoffmann2020, Reference Brunner, Hoffmann, Ngefac, Wolf and Hoffmann2022; Hoffmann Reference Hoffmann2020). Concerning the creative use of constructions, more advanced varieties are expected to exhibit greater slot productivity and should, correspondingly, rely less on specific, substantive fillers (Brunner & Hoffmann Reference Brunner, Hoffmann, Ngefac, Wolf and Hoffmann2022: 27). Brunner & Hoffmann (Reference Brunner, Hoffmann, Ngefac, Wolf and Hoffmann2022) conclude that this is also a plausible outcome from a usage-based perspective because it is at the later stages of development of a variety that the number of native speakers, and bilingual speakers in general, increases, and the variety will be used in a broader range of domains. This positively affects the amount of input that becomes available for a constructional pattern, which in turn should lead to higher rates of productivity (Brunner & Hoffmann Reference Brunner, Hoffmann, Ngefac, Wolf and Hoffmann2022: 25). In particular, in their study of the way-construction, Brunner & Hoffmann (Reference Brunner and Hoffmann2020) argue that the overall frequency of the construction, as well as the productivity of the verbal, nominal and prepositional slots of the construction, correlate positively with a variety’s phase in the Dynamic Model. By contrast, the less entrenched the construction is in a variety, the stronger its tendency to select specific prototypical, or frequent, fillers in the constructional slots. Moreover, less advanced varieties will exhibit a preference for concrete verbs or nouns.
In this article, our aim is to contribute to the ongoing discussion about the potential effect of the evolutionary phase of a variety to constructional productivity, on the one hand, and the influence of language contact, on the other. Before presenting our constructional analysis and corpus findings in more detail, however, we provide some general background to the study of complex premodifiers in the context of English language research.
2.2. Previous research on complex premodifiers
Complex premodifiers started to attract scholarly attention in the late 1960s due to their apparent frequency increase in certain genres of English, such as newspapers and magazines (Crystal & Davey Reference Crystal and Davey1974 [1969]). The items discussed in Crystal & Davey (Reference Crystal and Davey1974 [1969]) represent a variety of types, including right-headed participial constructions (e.g. faster-arriving, computer-made), left-headed phrasal participles (hoped-for) and parts of quotative phrasal compounds (Yah, ha-ha-got-it-wrong-again). However, as the authors’ focus is on stylistic matters, they do not provide a systematic account of the different structural realisations of the pattern, nor do they make it clear if there was a form that specifically caught their attention in the first place. Interestingly, Crystal & Davey (Reference Crystal and Davey1974 [1969]: 186) have a rather negative impression of complex premodifiers, some of which they consider ‘outlandish’, and which in their view are stylistically characteristic of American magazines. These judgments are interesting because they not only suggest that some of the types discussed by the authors represent innovative usage but also that this innovation was particularly associated with American English (by two professional British linguists).
Since these early remarks, studies of complex premodifiers were published sporadically, and they did not have a major impact on the theoretical discussions on word formation and grammatical analysis for quite some time. One indication of this is the fact that the authors who discuss complex premodifiers rarely recognise each other’s work, which leads to somewhat overlapping analyses and a wide range of terminological variation. To illustrate, after being simply labelled ‘adjectival formations’ in Crystal & Davey (Reference Crystal and Davey1974 [1969]), complex premodifiers have been called, e.g. ‘compound adjectives’ (Meys Reference Meys1975), ‘string compounds’ (Ogata Reference Ogata1976), ‘complex compounds’ (Carroll Reference Carroll1979), ‘phrase compounds’ (Bauer Reference Bauer1983), ‘extended modifiers’ (Puci Reference Puci1988), ‘phrasal compounds’ (Lieber Reference Lieber1992), ‘juxtapositions’ (Hohenhaus Reference Hohenhaus1996) and ‘complex modifiers’ (Matthews Reference Matthews1997). As can be seen, some analyses treat complex modifiers as a syntactic phenomenon, while others approach the topic from the perspective of word formation (i.e. compounding). The difficulty of distinguishing phrases from compounds has long been recognised in the literature (e.g. Nattinger & DeCarrico Reference Nattinger and DeCarrico1992; Sanchez-Stockhammer Reference Sanchez-Stockhammer2018: 24–35; Bauer Reference Bauer and Schlücker2019), and the various criteria that have been proposed to account for a multi-word item’s compound status have been shown to be inconclusive (e.g. Bauer Reference Bauer and Schlücker2019). In this article, we consider the multi-word items under study (e.g. tough-to-beat, on-the-ground, milder-than-average) to be regularly formed syntactic constructions, but the waters are admittedly muddied by diachronic developments: the better entrenched a multi-word item becomes, the more likely it is that it will be processed as a single chunk and analysed as a compound (cf. Nattinger & DeCarrico Reference Nattinger and DeCarrico1992; Vartiainen Reference Vartiainen2016).
From the perspective of productivity, the role of the attributive position as a hub for new and creative complex modifiers has been regularly discussed in the literature (see e.g. Mutt Reference Mutt1967; Mazaud Reference Mazaud2004: 235; Goldberg & Shirtz Reference Goldberg and Shirtzforthcoming). For example, Bauer & Renouf (Reference Bauer and Renouf2001: 114) remark that it is ‘possible to do much more in attributive position than in most other positions’, while Mazaud (Reference Mazaud2004: 235) proposes that ‘almost any group of words’ can be placed in attribution to modify the head noun of an NP. However, while these statements may accurately describe the situation in Present-day English, historically things have been quite different: table 1 provides examples of some of the items included in our corpus data together with their first attestations in the Oxford English Dictionary (OED). Dates for post-verbal (predicative or adverbial) as well as premodifying uses are provided when available. While some of the items in table 1 are already attested in the nineteenth century, most first attestations come from the (late) twentieth century. Furthermore, there are some items, such as off-the-charts and on-the-fly, that occur regularly in our corpus data but are not represented at all in the OED.
Table 1. First attestations of some frequent micro-constructions in our data: adverbial/predicative use vs attributive use (OED)

The available evidence thus suggests that the use of left-headed premodifiers is a relatively recent phenomenon in the history of English. This conclusion is supported by a diachronic corpus study by Günther (Reference Günther2019) that focused on the frequency of complex premodifiers in the Corpus of Historical American English (COHA). In Günther’s data, the token frequency of three-word premodifiers increased from under 20 pmw to nearly 160 pmw in just over a century (1900s to 2000s) (Günther Reference Günther2019: 652). Interestingly, Günther also found a substantial increase in the type frequency of a particular left-headed pattern (ADJ-to-VERB) starting in the 1930s (Günther Reference Günther2019: 663). This finding is directly relevant to the present article, as one of the constructions studied (the Tough-Modifier Construction; see section 5.1) is structurally realised by this pattern.
Günther’s corpus study is also interesting in light of Crystal & Davey’s negative judgments on complex premodifiers that were discussed above. Indeed, it is plausible that Crystal & Davey’s remarks were based on an acute observation of the premodifiers’ increased frequency in American texts – something that was later more systematically investigated by Günther with data from COHA.
3. Constructions studied
In this section, we provide a general introduction to grammatical analysis in the framework of CxG. We then proceed to offer a more detailed account of the CMC and the constructional change that permitted the licensing of left-headed modifiers. In our analysis, we make use of feature matrices commonly employed in Berkeley Construction Grammar, for instance (Fillmore Reference Fillmore2013).Footnote 2
3.1. Principles of constructional analysis
Our analysis of the constructions studied in this article follows the general principles of usage-based CxG, which is here understood as a group of related linguistic theories that subscribe to certain fundamental principles pertaining to grammatical organisation and the nature of the linguistic sign (see Östman & Fried Reference Östman and Fried2005a; Hoffmann & Trousdale Reference Hoffmann, Graeme, Hoffmann and Trousdale2013a). The basic unit of analysis in CxG is a construction, which is understood as a conventionalised pairing of form and meaning/function (e.g. Goldberg Reference Goldberg1995; Fried Reference Fried, Alexiadou and Kiss2015). Constructions are conceptualised as forming an associative network, or the constructicon, where each construction is linked to a number of other constructions in various ways. Inheritance links are particularly important in CxG models of language, as they help describe the relationship between a construction that is structurally and semantically motivated by another construction (Goldberg Reference Goldberg1995: 72). A type of inheritance link that is especially relevant to our purposes is an instance link, where a more schematic construction licenses another, more specific or substantive construction. As an example, the two prenominal possessive meso-constructions in English (N-’s N; PRONposs N; e.g. John’s book; his book) are related through an instance link to the Possessive Construction, an abstract macro-construction that not only licences both these prenominal constructions but also the Periphrastic Possessive Construction. Via the inheritance link, the general meaning associated with the Possessive Construction is transmitted down to all three meso-constructions, while their structural specifications and semantic constraints are determined individually at the meso-level.Footnote 3
The constructional specifications, and the relationships between the constructions, are in many strands of CxG described in terms of feature matrices, which include the necessary information about the constructions’ structural and semantic properties and the constraints concerning their co-occurrence. The precise formalisms employed in constructional analysis vary from one CxG approach to another, but the underlying principle is the same: the specifications are intended to guarantee that the constructions will only connect, or unify, with those constructions that are semantically and morphosyntactically compatible with them. For instance, the English quantifier much can only unify with common mass nouns that are singular in number and semantically unbounded (e.g. much snow/water/beer). Trying to unify much with proper nouns (e.g. much John) or count and bounded nouns (e.g. much book) results in a failure because of conflicting feature values in the specifications of much and the lexemes in question (see Fried & Östman Reference Fried, Östman, Fried and Östman2004: 33–4). However, and importantly for the idea that a construction is a combination of form and meaning, a phrase such as much snow is more than just the sum of its parts: while both much and snow are semantically unbounded, together they receive a bounded interpretation. This can be illustrated by the acceptability of aspectually bounded clauses, such as He cleared much snow in a few minutes. According to constructional analysis, this aspectual shift can be explained by the fact that the lexemes unify with a schematic Determination Construction, which imposes a bounded interpretation on all phrases licensed by it. In other words, the unification of much and snow is possible in the first place because of their matching semantic features (including [-bounded]), but the Determination Construction overrides the value inherent in the two lexemes. This kind of unification mismatch between the internal attributes of the lexemes and the external attribute of the construction is always resolved in favour of the construction – a phenomenon known as the Override Principle (see Fried & Östman Reference Fried, Östman, Fried and Östman2004: 37–8; Michaelis Reference Michaelis2005: 51).
3.2. The Complex Modifier Construction
The constructions studied in this article include three meso-constructions of the CMC which we call in somewhat abbreviated form the Tough-Modifier Construction, the Comparative Modifier Construction, and the Prepositional Modifier Construction, exemplified in (1) to (3), respectively.



According to our analysis, the CMC itself is a meso-construction, which is connected to the Modification Construction through an instance link.Footnote 4 The Modification Construction licenses all simple and complex premodifiers in the English noun phrase by permitting words of different word classes to function as heads of modifiers and ascribing them with a modifying function in relation to the head of the noun phrase. A partial representation of the Modification Construction is provided in figure 1.Footnote 5

Figure 1. The English Modification Construction
In figure 1, the Modification Construction is described by using a ‘boxes-within-boxes’ diagram. The notation employed is discussed in more detail in, for example, Fried & Östman (Reference Fried, Östman, Fried and Östman2004), so we only offer a brief summary of the relevant features. First, the outside boundary represents the entire construction, which is named on the upper right-hand corner (‘Modification’). The construction is specified in terms of its syntax, semantics, and pragmatics. Syntactically, the construction generates nominal elements, represented by the cat(egory) specification, and the constructs licensed by the construction can be fully formed phrases (e.g. cold snow) or not (e.g. interesting book).Footnote 6 This specification is described by the max(imality) feature, which is here left undetermined (represented by empty brackets). The negative value for the lex(ical) feature, on the other hand, specifies that instead of licensing lexical items, the unification of the modifier and the head results in a phrasal construct.
The meaning of the construction is expressed by the downward pointing arrows in the construction’s (sem)antic specification, which are coindexed with the corresponding information in the two smaller boxes. This is intended to express that all lexical semantic information, here described in terms of Frame Semantics, becomes integrated at the constructional level (see Fried & Östman Reference Fried, Östman, Fried and Östman2004: 60). The general pragmatic function of the construction is expressed in the prag(matic) specification, which states that all modifier–head constructs licensed by the Modification Construction involve referents that are assigned a property expressed by the modifier and whose reference becomes more constrained as a consequence (Fried Reference Fried, Alexiadou and Kiss2015: 981).
The smaller boxes include the specifications relevant to the successful unification of the modifier and the head noun. The syntactic category of the head is specified (noun), while the category of the modifier is left unspecified (indicated by the empty square brackets). This captures the fact that modifiers in the English noun phrase can belong to a number of word classes, such as adjectives, nouns and verbs (participles). The positive value for lex in the head indicates that the modified head must be a lexeme, while the unspecified lex value in the modifier’s specification indicates that both single lexemes and phrases can be used to modify the head. The three dots within the brackets in the semantic specification indicate that the frame-semantic information, which comes from the specific lexemes used in the construction, is left unexpressed.
However, the description of the Modification Construction in figure 1 does not say much about complex modifiers apart from permitting phrasal modifiers to be licensed by the construction. On this note, we should point out that phrasal modifiers have been grammatical in English since the Old English period, as illustrated by the participial construction in (4).

So, how should constructs like these be analysed from a unification-based CxG perspective? From a semantic perspective, when a complex participle like milk-drinking is unified with a head noun in the Modification Construction, the event described by the verbal construction is no longer temporally grounded; the property reading imposed by the Modification Construction requires that the description is understood as temporally stable (De Smet & Heyvaert Reference De Smet and Heyvaert2011) or aspectually open-ended (Vartiainen Reference Vartiainen, Rayson, Hoffmann and Leech2012). From a structural perspective, the construction should specify the left-branching order of constituents (argument/complement-head) for both -ing and -ed participle modifiers as well as the systematic relationship between the verb’s arguments and the structure of the noun phrase (i.e. that the external subject argument of the corresponding finite clause is realised as the head noun). As all this information is much more specific than what is included in the more general Modification Construction, we consider it sensible to posit an intermediate meso-level construction that licenses these kinds of modifiers in the constructicon, which we have already referred to as the Complex Modifier Construction (CMC).
While complex right-headed participial modifiers have been grammatical in English since Old English, we already saw in section 2.2 that many left-headed micro-constructions are relatively recent innovations, which suggests that at some stage there has been a change in the licensing potential of the CMC. We model this change by introducing a new feature into the structural specification of the construction called headedness, which has the following potential values: [right], [left] and [ ] (unspecified). The conventionalisation and entrenchment of left-headed modifiers can thus be economically depicted as a change in the headedness feature from [right] to an unspecified value [ ]. As the CMC inherits its general pragmatic and structural specifications from the Modification Construction, headedness is the only feature that needs to be specified to describe the change at this level of grammatical organisation. Figure 2 illustrates the change in the syntax of the CMC (the arrow indicates the direction of change).

Figure 2. Partial representation of the increased licensing potential of the Complex Modifier Construction
From a historical perspective, the change depicted in figure 2 must have taken place at different times, and at varying rates, in the varieties of English around the world. Indeed, this would explain the somewhat critical assessment by Crystal & Davey (Reference Crystal and Davey1974 [1969]), for instance, who, writing as speakers of British English, rather dismissively associated the usage with the style of American magazines. In the context of World Englishes, it is possible that left-headed modifiers may not be fully acceptable to some speakers even today. In such cases, the speakers will need to rely on the Override Principle to resolve what for them is a genuine unification failure (a mismatch in the headedness value). The central question asked in this article is whether the spread of the innovation has been affected more by the evolutionary phase of the variety of English or by the typological profiles of the substrate languages spoken in the multilingual communities.
4. Data and methods
4.1. Data
The data used in the present analysis are drawn from the Corpus of Global Web-Based English (GloWbE; Davies Reference Davies2013; Davies & Fuchs Reference Davies and Fuchs2015). GloWbE is a web-derived corpus composed of 1.9 billion words from 1.8 million web pages in twenty different countries where English is used either as the first language or a second language variety (mostly African and Asian countries). The texts for the corpus were collected in December 2012, and they consist of informal blogs (about 60 per cent) and other written texts harvested from the internet, such as newspapers, magazines and company websites (Davies & Fuchs Reference Davies and Fuchs2015: 3). From a methodological perspective, the corpus has some obvious advantages related to the relative recency and size of the material (especially when compared to the ICE corpora), but there are also several challenges and limitations that concern large web-derived corpora, especially when it comes to the confirmation of authorship. To ensure that the webpages are correctly associated with each of the twenty countries represented in the corpus, and that they thus represent the actual varieties of English used there, the compilers collected the texts for each country separately, using the Google ‘Advanced Search’ facility and limiting the searches by region. While it is not entirely clear how well Google has correctly identified websites by country (Davies & Fuchs Reference Davies and Fuchs2015: 4), there are undoubtedly cases where language data appear on a webpage that has been sorted into a particular country category even if the data were not produced by a writer/speaker who uses the variety in question (e.g. an interview with a US singer on a Nigerian website). Furthermore, although the URLs with hyperlinks to the original sources are provided in the corpus interface for each of the 1.8 million webpages, it is sometimes difficult to perform background checks because the source websites are no longer online or the website does not include enough information about the author to determine their country of origin.
Keeping these limitations in mind, we queried the corpus for each of the constructions, targeting hyphenated prenominal material (following Günther Reference Günther2019: 650; see section 5 for the exact corpus queries). After collecting the data, we inspected all concordance lines manually and discarded false positives and duplicates prior to analysis. The different construction types were then counted per variety and subsequently combined into four macro-categories, which were formed on the basis of two principles. First, we took into account the phase of the variety in Schneider’s Dynamic Model by pooling the data from the Inner Circle varieties (American, Canadian, British, Irish, Australian, New Zealand) into a single group. Next, we formed three more categories based on areal proximity and the typological profiles of the major substrate languages: West and East African Englishes (Nigeria, Ghana, Kenya, Tanzania), South-Asian Englishes (India, Pakistan, Sri Lanka, Bangladesh) and South-East Asian Englishes (Singapore, Hong Kong, Malaysia, Philippines). We decided to exclude South African English from the African group because of the country’s unique colonial history and multilingual setting. Jamaican English was also left out from the analysis as it did not fit neatly in any of the macro-categories.
We present our findings in section 5, where our focus is on comparing the South-East Asian Englishes to the West and East African Englishes, as they provide us with the best opportunity to study the two competing hypotheses discussed above. Of the four African and four South-East Asian varieties included in the analysis, only Singaporean English is at phase IV according to the Dynamic Model, while all the rest are at phase III (e.g. Brunner Reference Brunner2022: 3). In other words, we do not expect to see much difference in the productivity of complex premodifiers between the macro-varieties if the evolutionary hypothesis is correct. The language contact hypothesis, by contrast, predicts that the South-East Asian varieties of English should show greater productivity than the African varieties due to the typological configurations of the substrate languages.
4.2. Methods
Conceptually, to gauge the productivity of a construction, we want to consider the number of different types which instantiate the construction; this number can be called its vocabulary size (cf. Baayen Reference Baayen2001: 2). To contrast the productivity of the construction in different varieties, we must then compare the vocabulary sizes of the construction in the varieties of interest. However, there are well-known challenges when comparing type frequencies across subcorpora of different sizes because type frequency depends on corpus size in a non-linear manner (e.g. Baayen Reference Baayen2008: 222–3). That is to say, the probability of introducing a word which does not already exist in the corpus is not constant: the larger the corpus, the higher the probability that every word added has already appeared in the corpus before. The same is true for constructions.
Between subcorpora of (approximately) equal size, it might be possible to simply compare the observed type frequencies of the corpora, or to use simple calculations such as the type-token ratio (TTR). However, in practice, the twenty national components of GloWbE are not evenly sized. For example, the Indian component consists of 96 million words and the New Zealand component comprises 81 million words, but especially the African and South-East Asian components are much smaller and each total around 40 million words. In contrast, the GB and US sections comprise around 387 million words each. These discrepancies lead to variation in our observed vocabulary sizes. Generally, we observe the largest number of types and tokens of all the constructions in the Inner Circle varieties in large part because they are the largest subcomponents of GloWbE. Correspondingly, the observed type and token counts are typically much smaller in the corpus components for the varieties which are less advanced according to Schneider’s Dynamic Model but also have significantly smaller GloWbE subcomponents. The challenges related to the observed vocabulary sizes are compounded by the fact that even if two subcorpora are of equal size, they may still contain a different number of instances of the construction, which causes further challenges to comparisons that are made using methods such as TTR.
In order to enable the comparison of vocabulary sizes between different subcomponents of GloWbE, we make use of a statistical approach belonging to so-called Large Number of Rare Events (LNRE) models, following, for instance, Brunner & Hoffmann (Reference Brunner and Hoffmann2020) (see also Baayen Reference Baayen2008: 229). LNRE models are specifically designed for situations where the observed process has a large number of potential outcomes, most of which are rare; in our case, the large number of potential instantiations of a construction. Specifically, we make use of a model which uses as a starting point the observed frequency spectrum of a subcorpus (i.e. the number of types which appear in the data, and how many times each of the types appears; Baayen Reference Baayen2001: 8–12), and based on this observed distribution projects forward the number of different types which we might expect to observe if we were able to sample an increasingly large number of instances of the construction beyond what we actually observe in the data (e.g. Brunner & Hoffmann Reference Brunner and Hoffmann2020; Baayen Reference Baayen2001: 51–7; 2008: 229–36). The results of the model are presented below as vocabulary growth curves, which show the projected number of types (vertical axis) as the number of instances increases (horizontal axis). Curves which end higher up than others indicate subcorpora which are predicted to have a higher number of observed types as the number of observations increases.
In practice, we perform the statistical LNRE analysis using the zipfR package by Evert & Baroni (Reference Evert and Baroni2007). We fit finite Zipf–Mandelbrot models (Evert Reference Evert2004) for each of the combined categories separately to predict the growth of tokens within each group. However, since some of the groups have only a relatively small number of instances of some of the constructions studied, the predicted rate of type growth may not be particularly accurate for all groups, as we discuss in more detail in sections 5 and 6. We then plot the type growth curves extrapolated to twice the highest observed token count for the construction.
For our analysis, we also inspect plots of the frequency spectra of the constructions in our varieties of interest. Frequency spectra are the data which the finite Zipf–Mandelbrot model uses for its predictions in the LNRE analysis, but the spectra themselves can also give insights into the distribution of type frequencies in the data. The spectrum plots show the number of types in the data (vertical axis) which have a specific token frequency (horizontal axis); see, for example, figure 4 below for an example. The spectra generally form L-shaped curves, with the high point of the curve on the left caused by the relatively high number of hapax legomena, and the horizontal part of the curve representing the small number of types with significantly higher token frequencies. The exact size of the curve is of course linked to the overall number of observations from the subcorpora in question. However, the overall balance of the points forming the curve can describe the distribution and balance between lower and higher frequency types within the subcorpus. The frequency spectra may thus help shed further light on the vocabulary growth curves predicted by the LNRE model, such as the frequent use of conventionalised or entrenched instantiations of a construction, or conversely, a higher degree of innovativeness in the use of a construction.
5. Results
5.1. Tough-Modifier Construction
In our first case study, we focus on the Tough-Modifier Construction, which we queried by using a list of adjectival predicates previously used in Günther (Reference Günther2019: 658): bad, cheap, difficult, easy, good, great, hard, impossible, nice, safe and tough. We also searched the corpus for all comparative and superlative forms of the adjectives, which comprise c. 4.6 per cent of all tokens (333/7,303) in our dataset. Following Günther (Reference Günther2019), we made use of hyphenation in the queries in order to locate the relevant constructs (e.g. easy-to-*) and removed all false positives and duplicates from the results prior to analysis. Initially, we examined the type frequencies of all individual varieties represented in GloWbE separately, but ultimately decided to pool the data into larger categories as presented in section 4.1. This was a necessary step to take, as the type frequencies in many individual varieties were simply too low to yield reliable results in the LNRE model. Examples of the Tough-Modifier Construction in the targeted South-East Asian and African Englishes are given in (5) to (10) below, while figure 3 presents the estimated growth curves of the four macro-groups studied. It should be noted that while most of the data consist of three-word strings (e.g. hard-to-reach), there are occasional instances of more complex structures in our data, such as the phrasal verb construct in (7).







Figure 3. Estimated productivity of the Tough-Modifier Construction in the four macro-categories studied (SEA = South-East Asian, IC = Inner Circle, SA = South-Asian, Afr = African)
As can be seen in figure 3, there are substantial differences in the predicted productivity between the groups. Although we do not present data from the individual varieties, we point out that the productivity of the construction is highest in American and Canadian English. As a group, however, the model assigns highest productivity to South-East Asian Englishes. Considering the global impact of American English, it is possible that the high productivity of the South-East Asian varieties can be explained by American influence (see e.g. Mair Reference Mair2013; Gonçalves et al. Reference Gonçalves, Loureiro-Porto, Ramasco and Sánchez2018).
When it comes to our two competing hypotheses, our data provide strong support for the hypothesis based on language contact. With mostly head-initial substrate languages, the African varieties (Kenya, Tanzania, Nigeria, Ghana), rank at the bottom of the pack by a large margin. Further support for the contact hypothesis is provided by the frequency spectrum in figure 4, which compares the African varieties to the South-East Asian varieties: the number of hapax legomena is far greater in the South-East Asian dataset when compared to our African data. As the high number of hapaxes has often been proposed to be a reliable measure of high productivity (e.g. Baayen & Renouf Reference Baayen and Renouf1996; Pierrehumbert & Granell Reference Pierrehumbert and Granell2018), the Tough-Modifier Construction truly seems to be better entrenched in the South-East Asian varieties of English than in the African varieties.

Figure 4. Frequency spectra of the Tough-Modifier Construction in the African varieties (left) and the South-East Asian varieties (right)
5.2. Comparative Modifier Construction
To retrieve tokens of the Comparative Modifier Construction, we used the search string *-than-*, which targeted three-word constructs (or longer) that were composed of a comparative adjective head and a prepositional complement (than-phrase). The complement of than can be instantiated by an adjective (11), a nominal (12) or a participle in -ed (13).



As we did with the data on the Tough-Modifier Construction, we checked all the concordance lines manually and used the cleaned dataset in an LNRE analysis. According to the LNRE model, the construction is most productive in the Inner Circle varieties, followed closely by the South-Asian and South-East Asian varieties. While the productivity of the Inner Circle and the South-East Asian varieties is not surprising in light of our previous case study, the high productivity of the South-Asian Englishes does stand out; however, we must leave a more detailed analysis of this for future research due to space constraints, although we should briefly mention that there is evidence in the OED that at least some micro-constructions of the Comparative Modifier Construction date back to the late eighteenth and early nineteenth centuries. Considering the colonial history of the South-Asian countries, it may be that the construction has had time to become better entrenched (when compared to the Tough-Modifier Construction) in these varieties.
Going back to our main groups of interest, the result is once again clear: according to the LNRE model, the South-East Asian varieties of English are far more productive than the African varieties (figure 5).

Figure 5. Estimated productivity of the Comparative Modifier Construction in the four macro-categories studied (SEA = South-East Asian, IC = Inner Circle, SA = South-Asian, Afr = African)
Additional evidence can again be gleaned from the respective frequency spectra (figure 6), where the number of hapax legomena is once again higher in the South-East Asian varieties. This time, the number of frequently used types is also very low in the African varieties, attesting to the limited use of this particular meso-construction in the varieties in question.

Figure 6. Frequency spectra of the Comparative Modifier Construction in the African varieties (left) and the South-East Asian varieties (right)
5.3. Prepositional Modifier Construction
Our final case study focuses on a meso-construction that takes the form of a prepositional phrase. We queried the corpus based on a list of 42 English prepositions listed in Essberger (2009) and focused on items that consist of three words or more (e.g. in-*-*; out-*-*). Examples (14) to (17) illustrate the kinds of constructs targeted by our query.




The results of the LNRE model are depicted in figure 7. As might be expected, the Inner Circle varieties are again in the lead in terms of estimated productivity, but the high productivity predicted for the African varieties comes as a surprise: according to the model, the productivity of the Prepositional Modifier Construction is comparable, and even slightly higher, in the African Englishes than in the South-East Asian Englishes.

Figure 7. Estimated productivity of the Prepositional Modifier Construction in the four macro-categories studied (SEA = South-East Asian, IC = Inner Circle, SA = South-Asian, Afr = African)
However, the frequency spectra (figure 8) again show that the number of hapax legomena in the South-East Asian varieties is substantially higher when compared to the African varieties. Indeed, it seems that the LNRE model underestimates the productivity of the construction in the South-East Asian Englishes because in addition to a large number of hapax legomena, the data for these varieties also include a number of micro-constructions that have very high token frequencies. However, the spectra show that it is certainly not accurate to claim that the South-East Asian varieties would mainly resort to a limited number of micro-constructions that are used frequently in the input varieties; rather, they make full use of frequent types while also being highly productive in terms of hapax legomena. Indeed, we propose that the results yielded by LNRE models should ideally be always complemented by other evidence, such as data provided by frequency spectra, because the model seems to produce rather coarse-grained results in comparisons like this.

Figure 8. Frequency spectra of the Prepositional Modifier Construction in the African varieties (left) and the South-East Asian varieties (right)
6. Discussion and conclusions
In this article, we have studied the productivity of three left-headed meso-constructions of the CMC in World Englishes. According to previous research (Günther Reference Günther2019), the CMC has become more productive in recent American English in particular, which we interpreted as a change involving the headedness feature in its constructional specification. While there are surely other ways to account for this innovation, such as the introduction of an independent left-headed modifier construction in the constructicon, we propose that a simple change in the specification of the CMC provides a more economical and elegant way of modelling the construction’s increased licensing potential. This analysis also takes into consideration the fact that complex premodifiers are not an innovation in English as such – only the left-headed ones are. Using Traugott & Trousdale’s (Reference Traugott and Trousdale2013) terminology, we are dealing with a constructional change that affects the structural pole of the CMC, thus increasing the variety of structures felicitously licensed by the construction.
The relative recency of this development makes the CMC a good candidate for a study of constructional productivity in World Englishes. Previous research has provided interesting evidence that connects the phase of a postcolonial variety of English in the Dynamic Model with the progression of diachronic developments that originated in the Inner Circle varieties, such as the semantic extension of the way-construction (Brunner & Hoffmann Reference Brunner and Hoffmann2020) and the increased slot productivity in the V the Ntaboo-word out of -construction (Hoffmann Reference Hoffmann2020). Importantly, however, the focus of these studies was on constructional meaning, on the one hand, and lexical variability within the construction, on the other. In other words, the effect of substrate languages on grammatical structure did not emerge as a topic of investigation in either case. In our study, by contrast, structural considerations are relevant, and the results of our case studies of the West and East African and South-East Asian varieties of English are quite clear. First, the evolutionary hypothesis receives support from the high productivity of the CMC in the Inner Circle varieties, but not from the comparison of the respective African and South-East Asian Englishes, where the typological profiles of the substrate languages were found to be a more important factor affecting constructional productivity. Therefore, our conclusion is that both hypotheses are relevant, but in language contact settings the structural preferences of the substrate languages may sometimes affect the structural complexity of the local English variety even more than the phase of the variety in the Dynamic Model.
From a methodological perspective, the LNRE model enables the comparison of the productivity of different constructions between samples of different sizes by estimating the growth of the vocabulary size regardless of the observed sample size. In practice, however, small samples present problems for many statistical methods, and the finite Zipf–Mandelbrot method is no exception. One of the most severe problems relates to the discretisation of the distribution: in a small sample, a difference of just one or two observations can lead to a substantial change in the result of the statistical calculation. In the case of the LNRE model, when a dataset that consists of a few hundred (or only a few dozen) observations is projected forwards to thousands of estimated observations, a relatively small and random change in the original observations can lead to substantial differences in the projected vocabulary growth. This introduces a degree of uncertainty, which can be mitigated to some degree by making use of complementary methods, such as frequency spectra.
Consequently, we suggest that LNRE models are best applied to large sample sizes, such as the entire vocabulary of a text or a subcorpus (cf. Baayen Reference Baayen2008: 222–36). However, even in non-optimal circumstances, the potential problems do not necessarily translate into actual problems. For instance, Brunner & Hoffman (Reference Brunner and Hoffmann2020: 15) point out that their data for some of the Dynamic Model phases they investigate are limited, and that the results for these phases must therefore be interpreted ‘with great care’. But even with this caveat, their results largely agree with their hypothesis as well as with the overall understanding of the nature of the Dynamic Model and its phases. In our study, the results for the Tough-Modifier Construction and the Comparative Modifier Construction seemed robust, but the investigation of the Prepositional Modifier Construction yielded somewhat conflicting results. We think that this is probably due to the relatively low token frequency of the prepositional type in some of the African Englishes studied, which inflates the number of hapax legomena in these varieties: if the construction is rare in a variety in the first place, hapaxes will be overrepresented, and the model will consequently interpret this as a high productivity of new types. Similarly, the existence of high-frequency micro-constructions in some of the South-East Asian varieties may lead the LNRE model to overestimate their significance and consequently underestimate the predicted productivity of the varieties in question. As an alternative explanation, it is possible that the CMC is not as well entrenched as a more abstract construction in some African Englishes, and the productivity therefore only concerns the individual meso-constructions. In other words, speakers may categorise individual tokens as instances of, say, the Prepositional Modifier Construction, but not of the more abstract CMC (see also Hilpert Reference Hilpert2015: 137–40).
In future studies, the precise effect of well-established types on the productivity of the CMC should also be examined in more detail. While we frequently found micro-constructions in our data that were not represented in the OED, there were also some frequent types, such as easy-to-use and holier-than-thou, which may have an impact on the statistical models produced by the LNRE, and which are certainly also relevant from the perspective of productivity. In our case studies, we decided to include types like these in the analyses due to the generally low frequency of the constructions studied, but in future research we hope to better address their potential effect.
Acknowledgements
We thank the participants of the Workshop on Creativity and Productivity in CxG organised in Helsinki on 9 September 2023. We also thank the two anonymous reviewers for their valuable comments. This work was supported in part by the Research Council of Finland, grant 363720.