1. Introduction
The alternation between the be- (1) and get-passive (2) in English has been widely studied.
In recent years, evidence-based corpus analyses have supplied a robust quantitative perspective on the study of this phenomenon. In this article, we contribute to the study of the passive alternation by investigating the interplay of the linguistic and social factors that condition variant choice, be or get. Our focus is on English in Canada’s most populated province, Ontario. The data upon which this study is based are drawn from informal oral histories of people born and raised in Ontario from the late 1800s to the early 2000s, the first centenary of Canadian English (cf. Chambers Reference Chambers1997), and the time frame of the greatest increase of the get-passive. This is for example evidenced by Hundt’s (Reference Hundt2001) examination of the rise of get-constructions in the ARCHER corpus, a diachronic multi-genre corpus of British and American English.Footnote 1 She found that, after a period of stable frequency, with a normalized rate between 63.99 and 77.45 tokens per 100,000 words between 1650 and 1849, the rate of get-constructions almost triples by the end of the twentieth century (see figure 1). According to Hundt, this is also the period in which the get-passive became increasingly more entrenched: the get-passive was first attested in the eighteenth century, and gradually increased in frequency in the nineteenth and twentieth centuries.
get-constructions in archer (Hundt Reference Hundt2001: 59)

This rise of get presents an interesting dilemma because a central question at the heart of the literature has been whether the two passive variants are truly synonymous. For example, Leech et al. (Reference Leech, Hundt, Mair and Smith2009: chapter 7) conducted a corpus study on the Brown family of corporaFootnote 2 and found that despite the rise in the get-passive, it is still mostly restricted to adversative contexts. Adversative contexts are those that express contrariety, opposition, or antithesis, such as (3), as opposed to benefactive verbs which signal favourable outcomes (4) and neutral contexts that have neither negative nor positive associations (5).

Similarly, several studies have argued that there is a difference in agent responsibility (e.g. Leech et al. Reference Leech, Hundt, Mair and Smith2009: chapter 7; Mair Reference Mair2006: chapter 4). According to Leech et al. (Reference Leech, Hundt, Mair and Smith2009: 145), more responsibility is assigned to the subject expressed by the action with get-passives (e.g. I ain’t going to get caught from Leech et al. (Reference Leech, Hundt, Mair and Smith2009: 145)) than with be-passives (e.g. I wasn’t caught).Footnote 3 They argue that as a result, get-passives occur less often with agents expressed in a by-phrase than be-passives (also see e.g. Mair Reference Mair2006: chapter 4 and Hundt Reference Hundt2001). In addition, get-passives typically occur more with animate subjects than with inanimate ones across studies (e.g. Bohmann et al. Reference Bohmann, Müller, Honkanen, Neuhausen, Busse, Dumrukcic and Kleiber2023; Hundt et al. Reference Hundt, Dallas and Nakanishi2024). In these studies, animacy is typically used as a proxy for agent involvement, as ‘inanimate subjects rank low on the agency scale and are therefore more likely to be affected patients and less likely to be responsible for the action’ (Hundt Reference Hundt2001: 75).
Thus, although get and be might be assumed to be semantically differentiated – at least to some extent – something must be changing in the passive system for get to be advancing. Further, there must be contexts in which the two forms alternate, i.e. vary. As Sankoff & Thibault (Reference Sankoff, Thibault, Johns and Strong1981: 213) determined, if ‘one form appears to be replacing the other, either in time or along some socioeconomic or demographic dimension in the speech community’ and ‘this replacement is correlated with speaker’s age’, then ‘this may be evidence of a dynamic process’. In order to verify if this is what is happening in the passive system, it is crucial to examine the alternating forms together in the same analysis and to include an individual’s date of birth, alongside social and linguistic factors.
Despite the reported competition between the be- and get-passives, most studies focus on only one of the two alternatives. More recently several variationist studies offer tantalizing evidence of the interplay of language-internal constraints as well as social factors on the variation between be and get: Allen (Reference Allen2022), Fehringer (Reference Fehringer2022), Bohmann et al. (Reference Bohmann, Müller, Honkanen, Neuhausen, Busse, Dumrukcic and Kleiber2023) and Hundt et al. (Reference Hundt, Dallas and Nakanishi2024). The following section briefly reviews the key results of these studies, summarizing the internal and external factors that have been shown to have a significant effect on the passive alternation. These results inform our contribution by providing a foundation from which to analyze variation between the be- and get-passives in the Canadian English of Ontario.
2. Effects on the alternation between be and get
Studies modelling the simultaneous impact of competing social and linguistic factors are necessary as they offer crucial insight into how be and get alternate internal to the linguistics system and in different sociolinguistic contexts, i.e. the outside world. One such study is by Fehringer (Reference Fehringer2022), who analyzed the effect of adversativity semantics, subject animacy, aspect, tense, grammatical person, clause and sentence type, polarity of the sentence, presence of a human by-agent, gender and age on the variants in Tyneside English, which is spoken in an area in northeast England. Figure 2 displays the rate of get-and be-passive variants by age group, divided into old (51+), middle (31–50) and young (16–30).
get- vs be-passives (%) by age group in Tyneside English (Fehringer Reference Fehringer2022: 352)

Figure 2 demonstrates a marked age effect. The be-passive is used twice as much as get by older (ages 51+) and middle-aged (ages 31–50) individuals. This trend moves in the opposite direction for younger individuals (ages 16–30), who use the get-passive about 15 per cent more than be (Fehringer Reference Fehringer2022: 344–52). This pattern suggests a change in apparent time in a corpus of Tyneside English, DECTE,Footnote 4 with an evident acceleration that can be pinpointed to the late 1900s when this youngest cohort was born (Fehringer Reference Fehringer2022: 334), making it consistent with the trend for the most recent period of figure 1.Footnote 5 However, Fehringer (Reference Fehringer2022) also points out that even in the youngest age group, semantic and syntactic constraints influence the choice of form. Present and past tense contexts and telic verbs were found to significantly favour get. In contrast, inanimate subjects and neutral semantic contexts significantly favoured be. The latter result is consistent with the findings of Leech et al. (Reference Leech, Hundt, Mair and Smith2009: chapter 7). Distributionally, verbs with adversative semantics preferred get (by ~20%) and neutral verbs greatly preferred be (by ~50%). However, in the case of benefactive semantics, the difference between be and get is small (<10%).
Notably, the effect of these constraints is stable. Fehringer compared the interaction between the language-internal constraints (adversativity, animacy and telicity) and age group in her study but found that none of the interactions in her statistical model reached significance. Although the proportion of get changes over time, alternation between the variants is still constrained by the same factors. However, it is also important to note that the youngest group in the dataset consists of primarily university students. Thus, despite earlier reports in the literature that get is highly infrequent in the speech of highly educated and higher-class speakers (e.g. Macaulay Reference Macaulay1991; Givón Reference Givón1993: 70–1; Givón & Yang Reference Givón, Yang, Fox and Hopper1994: 138–9), it is evidently no longer the case that the use of get is restricted to lower class or less-educated speakers, at least not in Tyneside.
Allen’s (Reference Allen2022) study on the be-/get-passive alternation in Victoria (Canadian) EnglishFootnote 6 found a significant effect of gender: men favoured get. This effect was not found in Fehringer’s (Reference Fehringer2022) study. The significant effects that were parallel across the two studies were animacy and adversativity, which both favoured get. Moreover, in Victoria English as well, the passive alternation is undergoing change. Figure 3 illustrates the rate of the get-passive relative to the be-passive by decade of birth and gender in Victoria English.
get-constructions (%) by decade of birth and gender (M = man, W = woman) in Victoria English (Allen Reference Allen2022: 55)

Although get usage increases across both genders in Victoria English, the change towards get is led by men (in blue) from as early as the 1910s. Allen (Reference Allen2022: 54) argues that while the trend is not linear, in most decades men use get more than women.Footnote 7 This analysis complements Fehringer’s findings that despite (apparent) change over time, both linguistic and social factors affect the use of get. Moreover, the gain in momentum for get among individuals born in the late twentieth century is evident across studies.
Whether and which language-internal constraints play an important role in the diachronic change of the passive continues to take centre stage in Bohmann et al. (Reference Bohmann, Müller, Honkanen, Neuhausen, Busse, Dumrukcic and Kleiber2023) and Hundt et al. (Reference Hundt, Dallas and Nakanishi2024). From Bohmann and colleagues’ investigation of the COHA corpus,Footnote 8 animate subjects were not only found to occur more with get than inanimate subjects or body parts, but a mixed-effects logistic regression also found that this effect becomes stronger over time (Bohmann et al. Reference Bohmann, Müller, Honkanen, Neuhausen, Busse, Dumrukcic and Kleiber2023: 42). The same trend was observed for the presence or absence of an explicit agent in a prepositional by-phrase, such as a wave in (6): agent prepositional phrases with by are found to occur with get less. However, unlike animacy, the strength of this effect is stable over time.
Likewise, Hundt et al.’s (Reference Hundt, Dallas and Nakanishi2024) comparison of different varieties of EnglishFootnote 9 in the ICE corporaFootnote 10 found animacy, informal register and adversativity to be the top three predictors to favour get, respectively. Further, these internal factors have a much larger effect on the passive-alternation than variety. However, since neither year of data collection, date of birth, nor age of the individual are available in the ICE corpora, it was impossible to determine whether these effects are diachronically stable.
In sum, studies of the get-passive report that it has increased over time, putting it into increasing competition with the be-passive, and subsequent multi-factor studies have determined that multiple internal and external factors constrain this variation: subject animacy, whether there is an explicit by-agent, and adversativity. However, whether these factors are consistent across time is not uniform across studies where this was possible to test. Next to these three main factors, tense, aspect, mood, negation, and whether an intervening adverb is present have also been considered, but the size of their effects and their patterning diachronically are less clear (Schwarz Reference Schwarz2017; Fehringer Reference Fehringer2022; Bohmann et al. Reference Bohmann, Müller, Honkanen, Neuhausen, Busse, Dumrukcic and Kleiber2023; Hundt et al. Reference Hundt, Dallas and Nakanishi2024).
The goal of the present study is to combine the diachronic perspective of the Ontario Dialects Project and multiple factors, both internal and external, to determine if any of these effects are stable over time and if so, how. Further, analyzing the passive alternation in a large corpus of vernacular English in a variety of Canadian (specifically Ontario) English that has not yet been studied, will offer new insight into how this change is progressing. The following section details the careful methodology undertaken to investigate this variation.
3. Data and methods
The data in this article come from the Ontario Dialects Project, a large-scale documentation project that comprises oral histories from communities across the province of Ontario, from the largest city, Toronto, to small rural hamlets in the countryside.Footnote 11 The corpus is stratified by date of birth, age at the time of data collection and perceived gender; it comprises a range of educational and occupational backgrounds, making it ideal for tracking the evolution of the passive alternation. In an attempt to tease apart ongoing trends, we focus on three communities from the larger corpus with maximal difference by population and distance from the metropole: (i) Toronto, the largest city and the cultural and economic centre of the province; (ii) Almonte, a small rural town in the east of the province; and (iii) Thunder Bay, one of the largest cities in northwestern Ontario but approximately 900 kilometers from Toronto.
3.1. Defining the variable context
To extract the passives, we pulled all instances of be and get followed by a past participle with AntConc (Anthony Reference Anthony2022). The corpus is not POS tagged, so we used regular expressions to obtain the tokens. The regular expressions allow for one word to intervene between the participle (i.e. a word ending in -ed or an irregular participle) and all forms of get and be, including contracted forms like I’m or you’re, within a sentence. We extracted tokens with regular and irregular participles with separate regular expressions, using the 139 irregular verbs listed in Nelson (Reference Nelson2001: 150–7) to obtain a set of irregular participles. Using this procedure, we were not only able to extract tokens for non-negated get (7)–(8) and be (9) but also negated tokens with be (10) and tokens with an intervening adverb (11). The regular expressions are included in appendix 1.

However, as the examples show, not every extracted token is an example of a passive construction. On one hand, some tokens were extracted because they formally correspond to the regular expression, even though they are not a passive(-like) token. An example is (8), in which get is followed by a word ending in -ed that is not verbal: hundred (after an intervening word, another). Examples like these were manually coded as ‘other’ in our dataset (see table 1 below).
Coding procedure for distinguishing between central, semi-, pseudo-passives and ‘other’ constructions

On the other hand, passive-like constructions are also characterized by a complex semantic picture that poses a methodological challenge: although there are many forms that resemble the target forms superficially, they differ in the extent to which the participle is verbal rather than adjectival. Therefore, the first step in studying the passive alternation is categorizing these different passive types. Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985: 167–71) identify three classes that fall along a ‘passive gradient’: central passives, semi-passives and pseudo-passives. Central passives are the prototypical passives and are defined by whether they ‘can be placed in direct correspondence with a unique active counterpart’ (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 167). This ‘active counterpart’ can contain an animate (12a) or inanimate (12b) agent, be agentless (12c), or be ambiguous (12d).

Semi-passives are passive-like in the sense that they similarly have an active counterpart (13a, b); however, they are also adjectival as the participle can be coordinated with an adjective (13c), modified by degree adverbs such as quite or rather (13d), or replaced by a lexical copular verb such as feel (13e) or seem (13f). Semi-passives frequently lack by-phrases, but exceptions occur, so this is not a reliable test/diagnostic.

Pseudo-passives, on the other hand, despite following the same be/get + past participle structure, have no active counterpart and cannot have an explicit agent (14a). It is also often possible to replace the verb with another non-copular verb (14b). Because (14a) has a stative reading due to the state that results from the act of demolishing, demolished is adjectival rather than part of a passive construction since Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985: 168) state that ‘all participial adjectives have a stative meaning’. In this case, get is called a ‘resulting copula’ whereby sentences cannot be expanded with an agent.

The extreme end of the passive gradient includes obvious adjectival complements such as (15) where the possibility of inserting very confirms the adjectival rather than participial status of tired according to Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985: 167):
Distinguishing between these three passive-like constructions is crucial in a variationist study, because get and be are not interchangeable in every one of these constructions. For instance, Labov (Reference Labov and Austerlitz1975) directly investigates the question of whether get is truly interchangeable, i.e. semantically equivalent to be in context of explicit and non-explicit causatives. In his study, he conducted an experiment in which he manipulated get and be clauses for these interpretations (16)–(17).


He found that causative get is not identical to be. In the get version, subject he is an agent (16a), whereas the subject is a patient under the be variant (16b). However, this distinction is only discernable when the sentence is explicitly causative. When the context is no longer clearly causative (as in 17), get and be are interchangeable and subject he is interpreted as the patient for both (17a) and (17b).
In this article, we analyze only the central passives, as is common in other variationist work (e.g. Schwarz Reference Schwarz2015, Reference Schwarz2017; Allen Reference Allen2022). In her study of the decline of the be-passive in a corpus of American English soap operas, Schwarz (Reference Schwarz2015) followed Quirk et al.’s (Reference Quirk, Greenbaum, Leech and Svartvik1985) classification system and established explicit criteria for singling out central passives. We follow this coding scheme, distinguishing between central passives, semi-passives, pseudo-passives and ‘other’ constructions. Two research assistants completed the manual annotation using the coding procedure shown in table 1.
Our final dataset is comprised of 6,200 central passives used by 336 individuals born between 1884 and 1997 and raised in one of the three comparison communities Almonte, Thunder Bay and Toronto. A total of 4,796 tokens have be, while 1,404 occur with get.
3.2. Internal and external factors
We included in our study three language-internal and five language-external variables. For the linguistic factors, we first tested subject animacy distinguishing between animate, including collective subjects, (18a) and inanimate (19b). As explained above, animacy is in many studies used as a proxy for subject responsibility, which is hypothesized to be more salient for get-passives. Thus, as expected, get has been found to occur more with animate than inanimate subjects in earlier studies (e.g. Bohmann et al. Reference Bohmann, Müller, Honkanen, Neuhausen, Busse, Dumrukcic and Kleiber2023: 42). We expect to see this pattern replicated.
Second, we also coded for whether there was an explicit agent in the by-phrase (19). Notably, for an overwhelming portion of the data (94 per cent), the agent is not expressed explicitly, even higher than the 80 per cent rate attested by Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985: 164–5), who also argue that get is more likely to appear in clauses without explicit animate agents.


Third, we considered the degree of adversativity of the main verb. This concerns whether the main verb expresses something that has a negative effect on the subject, as in (20a). Previous research (e.g. Fehringer Reference Fehringer2022: 349) found that get is used more in adversative contexts. We adopt the method used by Bohmann et al. (Reference Bohmann, Müller, Honkanen, Neuhausen, Busse, Dumrukcic and Kleiber2023)Footnote 14 through which the numerical sentiment value of the past participle is extracted from SentiWordNet,Footnote 15 a computational dictionary that expresses how negative a particular word is. This procedure also takes negation into account, resulting in an adversativity score that ranges from -1 to 1, with 1 indicating a very negative verb. For example, damaged in (20a) has a score of 0.75, illustrating that the sentence expresses something with a high amount of adversativity.Footnote 16 In turn, hidden in (20b) is relatively neutral but still more negative than admired in (20c), which is highly positive.

For the social factors, we include community, perceived speaker gender (woman or man), education level (with or without higher education), occupation (blue collar, white collar, or student) and year of birth. To examine whether the language-internal constraints have remained stable over time, we also investigate the interaction between year of birth and each of the linguistic constraints. Table 2 provides an overview of the distribution of our data over the predictors mentioned.
Data distribution per variable

3.3. Statistical analysis
To carry out a multi-factor analysis, we use two statistical tools available in R (R Core Team 2021): (i) a mixed-effects logistic regression using glmer from the lme4 package (Bates et al. Reference Bates, Mächler, Bolker and Walker2015) and (2) a decision tree using glmertree (Fokkema et al. Reference Fokkema, Smits, Zeileis, Hothorn and Kelderman2018). Glmer fits a logistic mixed-effects regression model to the data, i.e. it takes both fixed and random effects into account (cf. Johnson Reference Johnson2009). For our analysis, most of the predictors under investigation are fixed effects (i.e. they are uniformly predictable, constant and replicable across the population), whereas both individual and main verb are treated as random effects. This ensures that any patterns found by the model are not skewed by idiolects of the individuals or unforeseen verbal collocates. While interaction effects between fixed predictors can be included in a glmer, it can be difficult to deduce the nuances of how factor groups relate to one another.
This is where a decision tree, or ctree from the partykit package (Hothorn et al. Reference Hothorn, Hornik and Zeileis2006; Hothorn & Zeileis Reference Hothorn and Zeileis2015), has been used to help supplement the analysis of a mixed-effects model (cf. Strobl et al. Reference Strobl, Malley and Tutz2009; Tagliamonte & Baayen Reference Tagliamonte and Baayen2012; Gries Reference Gries2018; Tagliamonte Reference Tagliamonte2025). The greatest advantage of using a ctree is that it can illustrate how multiple variants and predictors operate in tandem by visualizing their hierarchical relationships. These relationships manifest as a multi-branched tree where the higher branches have a more significant effect on the dependent variable than the lower branches. Specifically, each variable is assigned a p-value which indicates its level of significance, its relationship to the other factor groups in the model, and whether the splits in the tree are significantly different from one another. Thus, a ctree provides a comprehensive view of the broad patterns and connections among social and linguistic factors, i.e. how they work together to predict variant choice, particularly through the lens of apparent change since the model displays date of birth as a continuous factor where significant branches of the tree indicate key transitions in apparent time. The glmertree extends the decision tree methodology to accommodate random effects and is preferred for data that have nested structures, which is the case in our Ontario data. This allows us to combine decision tree algorithms with generalized linear mixed models, meaning random effects can be considered as well.
We test for model fit with the C Index obtained from somers2 from the Hmisc package (Harrell Reference Harrell2025) for which a value closer to 1 signals a strong model fit.
4. Results
This section first presents the results of the culminating mixed-effects model, and the four predictors found to have a significant effect on the passive alternation: year of birth, subject animacy, explicit agent and gender. To investigate how these factors interact in apparent time, section 4.2 reviews the results of the glmertree analysis and how they supplement the results of section 4.1.
4.1. Mixed-effects Logistic Regression (glmer)
Mixed-effects modelling was conducted with a binary dependent variable: the be or get-passive. Social fixed effects included community (Almonte vs Thunder Bay vs Toronto), year of birth (centred), perceived binary speaker gender (woman vs man), education level (no higher education vs some higher education) and occupation type (blue-collar, white-collar or student). The three language-internal predictors were subject animacy, explicit agent and adversativity. As stated above, individual and main verb were included as random intercepts. Multiple model iterations were run, starting with a model that included all social factors, as well as the interaction between year of birth and all language-internal predictors as fixed effects. From each subsequent model, non-significant effects were removed until all predictors reached significance at an alpha level of 0.05.
During this process, community Footnote 17 and occupation were ultimately excluded because they were not found to be significant. For the culminating model, somers2 returned an excellent model fit of C = 0.92 and 87 per cent of predictions were correct (compared to a baseline of 77 per cent) for the final mixed-effects model:

Table 3 shows the results of the mixed effects model. The first column lists the fixed predictors run in the model with the level in parentheses. The N column lists the number of observations per level and rate represents the proportion of get. The log-odds estimate is the coefficient fitted by the model for each effect, and the standard error, z-score and p-value for each effect are provided in the remaining columns.
Logistic mixed-effects model results

Table 3 reveals that five predictors were found to have a significant effect on the alternation between be and get: year of birth, subject animacy, explicit agent, gender and education. adversativity was not found to have a significant effect at all.Footnote 18 In the rest of this section, we plot the model’s predicted proportions against each significant fixed effect. First and foremost, our data indicate a significant rise in the proportion of the get-passive over time (p < 0.001) in Ontario English. Figure 4 plots the predicted proportion of get on the y-axis (i.e. 1.00 marks 100 per cent get) against year of birth on the x-axis. The light blue band around the plotted line is the confidence interval which shows that the model is highly confident in the predicted values.
Predicted proportion of get by year of birth

Figure 4 shows that the use of get increases among individuals born in later decades of the twentieth century, consistent with the previous findings reported in sections 1 and 2. This result further supports that there is a change in apparent time: younger people use the get-passive more than previous generations.
A second predictor that reaches significance in the model (p < 0.001) is subject animacy. Figure 5 shows the predicted proportion of get on the y-axis, but this time the animacy of the subject is on the x-axis with each predicted value bracketed by error bars. Animate subjects have a much higher predicted rate for get, with its lower error bar still ranking higher than the higher error bar for inanimate subjects, demonstrating how robust the effect is.
Predicted proportion of get by subject animacy

Third, the effect of explicit agent also reached significance (p < 0.029). get is favoured in sentences without an explicit by-phrase, as in figure 6.
Predicted proportion of get by explicit agent

The next significant effect (p < 0.001) found by the model was perceived gender and the results confirms Allen’s (2001) findings that men use significantly more get. Figure 7 illustrates the higher predicted proportion of get found by the model for men over women. The minimal overlap between error bars demonstrates the high confidence the model has in the contrast in gender.
Predicted proportion of get by gender

Finally, education level was found to be significant, (p < 0.042), as shown in figure 8 where individuals without any post-secondary education are slightly more likely to use the get- than be-passive than individuals with some or more post-secondary education.
Predicted proportion of get by education

Taken together, these results paint a picture of change toward a variant that is favoured by youth, men, and less educated individuals which is consistent with a non-standard variant in a change from below. From the time of early studies of sociolinguistic variation linguistic innovations have been documented as spreading upward from working class sectors of society where individuals typically have dense social networks and less formal education (e.g. Labov Reference Labov1963 et seq; Kroch Reference Kroch1978). While changes led by men are less common than those led by women, men are found to favour the spread of get in this case, probably because women typically conform less than men to sociolinguistic norms that are not overtly prescribed (cf. Labov’s (Reference Labov2001) gender paradox). Further insight can be gleaned from testing how these factors interact as the change progresses. However, interaction effects were not supported by the mixed-effects algorithm, most likely due to the complexity of the patterns in the data. For this reason, we now turn to the results of a decision tree analysis.
4.2. Decision tree analysis (glmertree)
To see how the external and internal factors interact and evolve over time, we conducted a decision tree analysis with glmertree, which enables us to visualize the interactions within the passive alternation in apparent time. As in the regression model, we included random intercepts for individual and verb, and defined the fixed effects structure of the model as follows:
Figure 9 illustrates the full glmertree visualization, which performs very well with a C-value of 0.92. Each node in the tree represents a significant contrast that influences the choice of passive construction, and the splits indicate significant thresholds. The tree shows that the model found a tree depth of five levels and nine significant branches. Initially, we used the standard hyperparameters for the glmertree() function in the glmertree library (Fokkema et al. Reference Fokkema, Smits, Zeileis, Hothorn and Kelderman2018), which led to a tree with 29 branches. However, the only differences between that model and the one in figure 9 were more fine-grained distinctions within adversativity. Moreover, these nodes all held fewer than 30 tokens, making these distinctions far less reliable. To produce the most robust hierarchy, a limit was set such that each node needed a minimum of 50 tokens, resulting in the culminating glmertree in figure 9.
glmertree for probability of get/be-passives by subject animacy, year of birth (YOB), adversativity, gender and education

The decision tree analysis begins with subject animacy as the primary factor (node 1) influencing the choice between be and get. Animate subjects branch off on the left side of the tree where we see that get, in the black bars at the bottom, is generally preferred more than with inanimate subjects on the right side of the tree. This split reveals that the get-passive is more agentive, consistent with the mixed-effects model in section 4.1 as well as with what has been reported in the literature laid out in sections 1 and 2. Animacy is, of course, well known to be a deep organizing principle of grammatical change with a main contrast of animate vs inanimate (e.g. Silverstein Reference Silverstein and Dixon1976).
year of birth is the second most important factor in the tree for both animate and inanimate subjects. Focusing on the left side of the tree, the animate subjects only, year of birth is an especially critical factor. With several splits in the tree based on this variable, the results indicate a bona fide generational shift in the use of passive alternates. Younger individuals, especially those born after 1931 (node 2) and after 1982 (node 8), show a higher tendency to use get-passives, indicating a shift in apparent time. The rest of this section will centre in on subbranches of figure 9.
Figure 10 shows a snapshot of the left most side of the tree beginning with the year of birth branch (node 2) which comprises only animate subjects among individuals born in or before 1931, the earliest time point in the data. In this group, year of birth splits at 1916 (node 3). People born between 1917 and 1930 use the get-passive more and the tree also shows a nascent effect of adversativity (node 5), revealing that this cohort shows a trend toward get in adversative contexts. However, it is not clear how robust this result is as the direction of the adversativity trend is not expected: get is claimed to be used more with verbs with low adversativity scores (node 6) rather than verbs with higher adversativity scores (node 7), which is in direct opposition with the results reported in much of the literature.
Nodes 3–7 of the glmertree for probability of get/be-passives

There are three possible reasons for this result. First, it could be an artefact of low numbers; there are only 53 tokens in node 7, with nearly all the tokens in node 6 (N = 547). Of those 547 tokens, only six have a negative adversativity score. The rest of the tokens are neutral with adversativity scores of 0. Further, the sentences that had a negative adversativity score were sentences that contain a highly adversative verb whose sign was reversed through the automatic procedure of Bohmann et al. (Reference Bohmann, Müller, Honkanen, Neuhausen, Busse, Dumrukcic and Kleiber2023) because a negator was present in the sentence. For example, killed in (16) has an initial adversativity score of 0.5, but because it is negated, the score becomes -0.5.
This suggests that the method of assigning adversity may have weaknesses; however, there are only six tokens in this sector so we cannot draw any conclusions on the validity of the method based on these tokens alone. Most tokens produced by people born between 1916 and 1931 have a neutral adversativity score.
Returning to the second half of node 2 in figure 11, it is evident adversativity is not significant for the individuals born between 1931 and 1982. In this cohort, get has developed social meaning, constrained by gender (node 9) and education (node 10). Specifically, it is used more by men (node 13) than women, but when used by women, they tend not to have any post-secondary education (node 11). This patterning reflects the well-attested gender paradox in language change in progress whereby women conform more closely than men to sociolinguistic norms that are overtly prescribed but conform less than men when they are not (Labov Reference Labov1990).
Nodes 8–14 of the glmertree for proportion of get/be-passives

For the very youngest individuals in the data, those born after 1982, the social conditioning of the earlier cohort has disappeared and get is used in all contexts and by all individuals, reaching a proportion of nearly 60 per cent in central passive contexts (node 14). Overall, we can conclude that with animate subjects, get has become increasingly grammaticalized in the late twentieth century but has been influenced by social variables, whereby men and those with less education used it more at earlier stages of its development. This aspect of the shift has evidently changed, adding further support for the interpretation of the process as a change in progress rather than stable variation (see Sankoff & Thibault Reference Sankoff, Thibault, Johns and Strong1981: 213).
The inanimate subjects are in focus in figure 12, originally at the right side of the tree in figure 9. In this sector, the picture is somewhat different. First, the same change in progress is evident: individuals born before 1928 do not use get with inanimates at all (node 16). Individuals born after 1928 not only use more get, they also exhibit an effect of verb adversativity (node 17): get is favoured in more adversative contexts (node 19), i.e. when the adversativity score for the main verb is higher than 0.Footnote 19 For the youngest individuals we can conclude that with inanimate subjects, get is not fully grammaticalized because it is partitioned by a prominent semantic divide. Although get has increased in frequency, demonstrating a change in apparent time, in this case even younger individuals rarely use get in inanimate subjects. The effect of adversativity appears to have been stable since the generation born after 1928.
Nodes 15–19 of the glmertree for proportion of get/be-passives

Among the inanimate contexts, there is also no significant effect of community. For both models, we considered this factor in detail and found no differences between any of the towns, not even when we condensed the distinction to contrast small towns (Almonte) with cities (Toronto, Thunder Bay). Additionally, the glmertree did not find any significant effects of explicit agent even though the glmer did. However, it did find an effect of adversativity which was not significant in the glmer. The capacity of the decision tree to expose subgroups in the data is thus critical to providing deeper insight into which factors are at play and at which point in an apparent time trajectory.
In sum, the glmertree shows that get steadily increases across the twentieth century with three distinct year of birth watersheds for animate subjects: 1916, 1931 and 1982. In the earliest years, a potential effect of adversativity was found for animate subjects, but for individuals born between 1931 and 1982, social factors intersected in a complicated pattern to exert the highest influence over the rate of the get-passive with animate subjects. In contrast, for individuals born after 1982, no other factors influence these particular passives, signalling that neither social nor linguistic factors exert a significant role anymore and the use of the get-passive with animate subjects has grammaticalized (though not to the extent of be).
For inanimate subjects, individuals born after 1928 are significantly more likely to use get than those born before 1928. Crucially, inanimate subjects are also subject to an adversativity effect in the speech of individuals born after 1928. However, these rates of get never pass 20 per cent, indicating that speakers are much slower to use inanimate subjects with get than they are with animate subjects. Moreover, because the analysis indicates that there has been no change in the system since the generation born after 1928, the rate of get with inanimate subjects may remain low for the near future.
5. Discussion
In summary, although the get-passive has been an innovative variant for quite some time, it has seen significant reorganization of the social and linguistic factors that condition its use across the twentieth century in Canada. On the one hand, our analyses confirm the results from other recent studies on the passive alternation (e.g. Allen Reference Allen2022; Fehringer Reference Fehringer2022), as they demonstrate that social factors are indeed taking part in this linguistic development. More specifically, there is (i) a date of birth effect such that younger individuals use more get, as also noted in Tyneside English in England (Fehringer Reference Fehringer2022);Footnote 20 and (ii) a gender effect such that men, particularly younger men, use more get, as also evidenced in Victoria English in Canada (Allen Reference Allen2022). Moreover, our study demonstrates that an individual’s level of education is also involved in the rise of the get-passive. The consistency across studies strengthens the claim that the passive alternation has witnessed grammatical change in the twentieth century.
On the other hand, we also find that the reorganization in the system is largely dependent on the prevailing influence of the animacy of the subject in the passive constructions, which is stable across time. With animate subjects, we find evidence for semantic bleaching,Footnote 21 as the data show that the impact of language-internal factors decreases with younger speakers. According to the decision tree, the oldest speakers in our data (born before 1916) rarely use the incoming variant get. However, in the second age group (born between 1916 and 1931), the use of the get-passive increases and adversativity seems to have an effect (though potentially not in the expected direction). This pattern aligns with previous literature that describes the get-passive as a variant that was initially mostly used in contexts with high involvement or responsibility of the agent, as reflected by the fact that it is more frequent with animate subjects. Mair (Reference Mair2006: 114), for example, argues that the get-passive originated as:
a construction with a fairly specific semantics partly determined by related constructions which featured get as an inchoative, causative, or reflexive-causative verb; in the twentieth century this new passive was grammaticalized further, with the semantic and stylistic constraints on its use lessening to the point that it is now a serious rival to the be-passive.
Interestingly, the data for the younger participants in our sample further corroborate Mair’s analysis, at least with animate subjects: for speakers born after 1931, the effect of adversativity has disappeared. Within this group, the get-passive seems to still carry social meaning for the older set of speakers (born between 1931 and 1962), as it is preferred more by men and women of this age group without post-secondary education. For the youngest speakers, get can be used across the board, without any effect of social or language-internal factors. Thus, with animate subjects we witness semantic bleaching (a reduction of the effect of adversativity) and a general ‘normalization’ of the use of the get-passive in all contexts and by any speaker, regardless of their social background.
Overall, then, our analyses show that the rise of the get-passive with animate subjects in Ontario English is a change in progress propelled by younger generations of individuals across the twentieth century. The beginnings of this increase in get are consistent with a story of change from below whereby linguistic change develops in the vernacular and then eventually spreads higher in the social sphere. Because the emerging variant is unconsciously associated with informality (e.g. Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 161: ‘The get-passive is avoided in formal style, and even in informal English it is far less frequent than the be-passive’), it is avoided by the groups who most subscribe to sociolinguistic norms (the highly educated, white-collar workers, women, older individuals) and in turn is advanced most by young speakers who tend to resist conforming to adult institutional practices, as well as men and less-educated individuals who do not value sociolinguistic conformity as highly. Over time, the get variant lost its stigma and gradually became used by the groups who initially used it to a much lesser degree. The fact that no significant effect of community was found by either model, neither in terms of population size nor distance from metropole, also supports this interpretation as a change from below.
In contrast, with inanimate subjects, the get-passive does not show as stark a rise in frequency. An explanation for this finding may be that the agent responsibility constraint (subject animacy) is stable over time: get remains dispreferred with inanimate subjects across the board, also by younger speakers. In fact, we see only a minor increase in frequency of get-passives with inanimates. More specifically, the decision tree shows that, similarly to what we see with animate subjects, the oldest participants in the dataset (born before 1926) generally avoid the get-passive altogether with inanimate subjects. With younger speakers (born after 1926), the get-passive increases to some extent but only with adversative events. Thus, the picture for inanimate subjects seems more conservative: over time, get-passives mostly rise in frequency in the most prototypical context (i.e. for adversative events). Their lower frequency in the inanimate context may hinder the process of semantic bleaching to a larger extent than in the animate context.
Our study is also a key illustration of why defaulting to a single large model, such as a maximal mixed-effects model, is often not sufficiently informative to provide the full picture of intersecting influences on variation when dealing with sociolinguistic data. The results for the mixed effects model reported in section 4.1 did not capture the reorganization going on in the system the way the decision tree analysis in section 4.2 was able to. The solution for methodological best practice is to triangulate with multiple complementary statistical tools, in this case mixed-effects models with decision trees.
6. Conclusion
In Ontario, Canada, as elsewhere, there is a change in progress in the passive alternation towards the get-passive. Our most significant finding is the enormous contrast in rates and patterns of use between tokens with animate vs inanimate subjects – a split which dominates the system. With animate subjects, on the one hand, get is emerging in a change from below. Originally spearheaded by men and the less educated, it increases generation by generation among individuals born across the twentieth century, until the 1980s generation. At this transition point, these social factors that were once significant are no longer operational. In contrast, with inanimate subjects – a context traditionally not favourable for the get-passive – there is a notably different trajectory. While get is also on the rise in this sector, the semantic constraints of the construction dominate. Specifically, adversative verb semantics favour the get-passive with inanimate subjects and, crucially, no social factors are significant. The contrasting patterns of semantic interpretation and social influences across different sectors of this variable system require a triangulation of different statistical tools, underscoring the importance of studying the interaction between internal and external factors in vernacular language and the value of complementary analytic methods to uncover meaningful patterns.
Acknowledgements
The first and second authors declare no competing interests. The third author gratefully acknowledges the support of the Social Sciences and Humanities Research Council of Canada (SSHRC) for research grants 2001–present and the Canada Research Chairs Program. We thank the fieldworkers, transcribers and research assistants of the UofT Variationist Sociolinguistic Lab for their help in creating the Ontario Dialects Project corpora. We are grateful to Kaleigh Woolford and Lauren Bigelow for their assistance in coding the data.
Appendix 1: Regular expressions used for data extraction
1. Auxiliary + regular participle

2. Auxiliary + irregular participle
The list of irregular participles comes from Nelson (Reference Nelson2001: 150–7).










