5.1 Overview
This chapter compares specific linguistic patterns of homeland and heritage varieties, seeking differences among generations of heritage speakers. It considers whether variation patterns are similar in minoritized and majority languages. Further, it illustrates that, in some cases, heritage languages (HLs) develop their own norms, differing from those of the related homeland variety. As was illustrated for Spanish in New York City by Otheguy et al. (Reference Otheguy, Zentella and Livert2007), distinct speech communities, with their own sets of norms, are developing in Toronto’s HLs. In fact, in some cases, we may consider that we are documenting the development of Canadian varieties of the language that will become increasingly independent of homeland varieties, as explored further in Nagy (Reference Nagy, Côté, Knooihuizen and Nerbonne2016b).
This chapter reports trends of continuity and occasional divergence within the heritage generations and between each heritage and the corresponding homeland varieties, noting the degrees of similarities between the homeland and heritage varieties in terms of rates of use of innovative forms and linguistic and social factors that condition the variability. We demonstrate that there is little difference in production between heritage and homeland grammars in spontaneous speech by applying comparative variationist sociolinguistic methods to examine three dependent variables (VOT), (CASE) and (PRODROP), in several languages. (VOT) is the variable that identifies variation in duration of aspiration or Voice Onset Time. (CASE) refers to variable case-marking on nouns and pronouns. (PRODROP) is the variation between presence and absence of overt subject pronouns.
Findings for this chapter include revisons of those published in Łyskawa et al. (Reference Łyskawa, Maddeaux, Melara and Nagy2016), Łyskawa & Nagy, (Reference Łyskawa and Nagy2019), Nagy (Reference Nagy2015), Nagy et al. (Reference Nagy, Iannozzi and Heap2018a), Nagy & Kochetov (Reference Nagy, Kochetov, Siemund, Gogolin, Schulz and Davydova2013), Nodari et al. (Reference Nodari, Celata and Nagy2019), Tan & Nagy (Reference Tan and Nagy2017), and Umbal (Reference Umbal2023).
5.2 Voice Onset Time (VOT)
In the domain of pronunciation, views vary on whether HL speakers sound just like monolinguals or not, as reviewed in Kupisch (Reference Kupisch2020). She notes that while Polinsky & Scontras (Reference Polinsky and Scontras2020b, p. 10) suggest the existence of a distinctive “heritage accent,” others have suggested that heritage speakers sound like monolinguals (cf. Montrul, Reference Montrul2008; Rothman, Reference Rothman2009). Since those papers, numerous studies of adult heritage speakers’ segmental properties have been published (Amengual, Reference Amengual2012, Reference Amengual2016; Einfeldt et al., Reference Einfeldt, van de Weijer and Kupisch2019; Elias et al., Reference Elias, McKinnon and Milla-Muñoz2017; Kissling, Reference Kissling2018; Mazzaro et al., Reference Mazzaro, Cuza and Colantoni2016; Ronquest, Reference Ronquest2012).
In this section, we explore VOT variation within seven languages. As described in Chapter 4, VOT is the duration between the release of a consonant and the onset of vocal pulses for the following vowel. VOT is an acoustic cue that often distinguishes between series of consonants within a language but is not an important cue in others. It has been extensively studied in language contact settings owing to this cross-linguistic variability (Lisker & Abramson, Reference Lisker and Abramson1964).
In Toronto, all HLs have contact with English. English aspirates word-initial voiceless stops, producing long-lag VOT. Lisker and Abramson (Reference Lisker and Abramson1964) cite 60, 70, and 80 ms for /p, t, k/ respectively, while Kang et al. (Reference Kang, George and Soo2016) report even longer values (see Table 4.5). We examine VOT of voiceless stops in spontaneous speech in Cantonese, Korean, Italian, Russian, Tagalog, and Ukrainian. As described in Section 4.3.3.5, these languages are characterized as having short-lag VOT in voiceless consonants, except for Cantonese. We focus on word-initial position, as this is where we expect the biggest contrast with English. For Korean, we consider both the lenis and aspirated series, in separate analyses.
For each language, we examine the production of speakers from three heritage generations (two for Cantonese, Korean, and Tagalog, for which not enough Gen3 speakers have been recorded to date). We compare the average VOT to published results (or our own samples, in the case of Cantonese, Korean, Tagalog, and Ukrainian) for the homeland variety and for English (using measurements provided by Michol Hoffman from the Contact in the City corpus) (Hoffman & Walker, Reference Hoffman and Walker2010). The distribution of our tokens is shown in Table 5.1; these counts include some tokens that were later excluded for better comparability across contexts. Actual token counts are noted for each result.
Table 5.1 (VOT) data sample: speaker count and token count, by language and generation
| Language | Total | Homeland | Gen1 | Gen2 | Gen3 | ||||
|---|---|---|---|---|---|---|---|---|---|
| # tokens | # speakers | # tokens | # speakers | # tokens | # speakers | # tokens | # speakers | # tokens | |
| Cantonese | 166 | 5 | 45 | 5 | 68 | 8 | 53 | ||
| Italian | 1,342 | 6 | 463 | 9 | 434 | 8 | 445 | ||
| Korean – lenis | 3,166 | 117 | 346 | 6 | 1,494 | 6 | 1,326 | ||
| Korean – aspirated | 590 | 117 | 323 | 6 | 123 | 6 | 144 | ||
| Russian | 729 | 4 | 268 | 4 | 271 | 3 | 190 | ||
| Tagalog | 606 | 12 | 288 | 8 | 162 | 8 | 156 | ||
| Ukrainian | 1,802 | 12 | 873 | 3 | 269 | 4 | 323 | 4 | 337 |
| English | 508 | 23 | 508 | ||||||
We first provide measurements of VOT in each homeland variety and in Toronto English. These measurements from other sources can serve to calibrate the variability found in the Heritage Language Variation and Change in Toronto (HLVC) data. In the published homeland data, measurements come from controlled elicitation tasks, primarily isolated word or sentence reading tasks. Many report findings from a small number of speakers. Methodological details can be found in the original publications, listed along with mean VOT for each place of articulation in Table 5.2. It is easy to see that English values for voiceless stop aspiration are considerably longer than all other values shown here (except for the Korean aspirated stop series). Further details of the methods are described in Section 4.3.3.5.
Table 5.2 Sample VOT values for homeland varieties and English, in msec
| Language, series | p | t | k | Source | Data type |
|---|---|---|---|---|---|
| CAN plain | 11 | 15 | 34 | (Lisker & Abramson, Reference Lisker and Abramson1964) | sentence-initial, reading task, 1 speaker |
| CAN aspirated | 58 | 62 | 68 | ||
| ITA Italian | 17 | 16 | 24 | (Sorianello, Reference Sorianello1996, p. 134) | pretonic, non-phrase-final, reading task, 3 Cosenza speakers |
| ITA Calabrian dialect | 14 | 7 | 29 | ||
| KOR lenis | 34 | 45 | 51 | (Kang & Han, Reference Kang and Han2013, p. 67) | phrase-initial, reading task, 1 Seoul speaker |
| KOR aspirated | 79 | 85 | 137 | ||
| RUS | 18 | 20 | 38 | (Ringen & Kulikov, Reference Ringen and Kulikov2012, p. 12) | reading list, 14 St. Petersburg speakers |
| TAG | 15 | 18 | 30 | (Kang et al., Reference Kang, George and Soo2016) | reading list, 10 Toronto “native” speakers |
| UKR | 24 | 26 | 31 | Unpublished HLVC analysis | conversation, 6 Lviv speakers |
| ENG | 80 | 80 | 85 | (Kang et al., Reference Kang, George and Soo2016, pp. 199–201) | reading list, 12 Toronto speakers |
Turning to the HLs, Figure 5.1 shows the mean VOT for each generation of each language, averaged across contexts. The homeland average is shown when it comes from the same type of data. For some languages (Russian,Footnote 1 Tagalog, Ukrainian), we see drift from the homeland toward the English-like VOT (shown at the far right), though even the latest generations do not reach English-like values. Other languages (Cantonese, Italian) do not show consistent cross-generational drift toward English-like values. Korean shows cross-generational change in both series, but that is in keeping with an established change in progress in Seoul Korean.

Figure 5.1 Raw VOT (in msec) for each language, by generation (n = 8,909)
However, considering measures that are merged across contexts may obscure important details needed to understand the cause of any changes observed. One important conditioning factor is the place of articulation of the consonant. Figure 5.2 shows that English VOT increases /p/ < /t/ < /k/, a finding robustly supported in other studies of English. However, this pattern is not replicated, in our data, in any other language except Tagalog. The effect of Consonant (specifically, its place of articulation), will be examined within each language to see how and if it contributes evidence to change in the heritage varieties. That is, if later generations show a shift toward a pattern in which stops with the closure further back in the mouth have longer VOT, it suggests a change caused by influence from English.

Figure 5.2 Consonant effect on raw VOT (in msec), across languages (numbers above each bar show token counts, n = 8,909)
In contrast, there is a consistent effect of Vowel height in Italian and both Korean series, as illustrated in Figure 5.3. Tagalog is, again, exceptional, but we see in Section 5.2.6 that a model considering Generation, Sex, Consonant and Vowel height, as well as the interactions between the social and linguistic factors, found no main effect of Vowel height. In the other language samples, (VOT) tokens were not selected before high vowels, so we lack appropriate data to examine the Vowel height effect. Figure 5.3 also shows a lack of Vowel height effect for English. Thus, decrease or loss of a height effect across generations of an HL could indicate influence from English.

Figure 5.3 Vowel height effect on raw VOT (in msec), across languages (n = 8,909)
We now consider differences in how the effects of Consonant (labial versus coronal versus velar) and Vowel height (high versus non-high) behave within each language, comparing across generations, between men and women, and according to ethnic orientation (EO) scores. We use mixed-effects models to determine which factors have significant effects. In order to account for the source of any changes, we report the differences between the fit of models that include generation and those with EO scores. The following sections provide more information related to the consonant series in each language. The description of analysis of Cantonese is most detailed, further illustrating methods of model comparison that are then summarized more briefly for the other languages. R code and all models are available for examination online at http://ngn.artsci.utoronto.ca/HLVC/HLVC_VOT_analysis_3Dec2022.html.
5.2.1 Cantonese
There are “two series of plain stops of Cantonese, usually described as three voiceless unaspirated stops contrasting with three voiceless aspirated stops at the labial, dental and velar places of articulation,” according to Yue-Hashimoto (Reference Yue-Hashimoto1972). The series can also be considered as contrasting short lag and long lag stops (Clumeck et al., Reference Clumeck, Barton, Macken and Huntington1981). Examples 1 and 2 show Cantonese words in each series, beginning with sounds at each place of articulation (Tan & Nagy, Reference Tan and Nagy2017).
| Bilabial | Alveolar | Velar | |
| (1) Voiceless unaspirated or short lag (10–50 msec) | paa4, paa1 爸 | taa2 打 | kaa1 加 |
| “father” | “to hit” | “to add” | |
| (2) Voiceless aspirated or long-lag (48–85 msec) | pʰaa3 怕 | tʰaa 她 | kʰaa1 卡 |
| “scared of” | “she” | “card” |
Clumeck et al. (Reference Clumeck, Barton, Macken and Huntington1981) analyzed speech from monolingual Cantonese-speaking families in the San Francisco Bay area who had had minimal contact with English. They elicited data from young children by using toys and picture stimuli and from adults by isolated-word reading tasks. For those adults, the values for unaspirated stops fall almost entirely within the short lag range (<30 msec). Aspirated stops fall within the long lag range (>30 msec). This replicates Lisker and Abramson’s (Reference Lisker and Abramson1964) report on a single speaker of Cantonese in an isolated task, and confirms the phonological contrast between unaspirated and aspirated (rather than voiced and voiceless stops) described in Chao (Reference Chao1951, p. 20) and Yue-Hashimoto (Reference Yue-Hashimoto1972, p. 88). Tan & Nagy (Reference Tan and Nagy2017) examined the variable (VOT) in both the aspirated and the unaspirated series of Cantonese stops. That pilot analysis of the aspirated series showed values averaging 10 msec and drifting away from English values toward longer values (as reported in Lisker & Abramson, Reference Lisker and Abramson1964) across all places of articulation. The dataset was prepared and measured by an undergraduate student, Ziwen Tan, working solo on an undergraduate independent study project, and is thus a smaller sample than we have for the other languages.
Here, we model only the aspirated series, for maximal comparability with English. In the aspirated series, the homeland values are very similar to those reported for English. This means that we cannot really see “drift toward English” for this language when we look at VOT values themselves, but we will consider the effects of conditioning factors in our models.
An important additional predictor to consider for Cantonese is Lexical tone. Tse (Reference Tse2012) reported an experimental task in which words were elicited in a carrier phrase. He showed that some, but not all, tone contrasts significantly affect VOT of long-lag stops in Cantonese. He noted that the effect is inversely correlated with the fundamental frequency (F0), but that tone is a stronger predictor of VOT than F0.
We now describe the series of Cantonese model comparisons to illustrate the methodology that is key to our interpretation of sources of variation (identity-marking, contact, internal change).
The first model considered is for data from Homeland, Gen1, and Gen2 speakers. The equation for the model, which is calculated using the R package lmerTest (Kuznetsova et al., Reference Kuznetsova, Brockoff and Christensen2017) is given in Example 3.
(3) lmer (VOT_ms ~ Generation + Gender + Consonant + Tone + (1|Speaker))
Thus, it simultaneously considers the effects on (VOT) of Generation and Sex (of the speaker) with outlier effects controlled by including Speaker as a random effect, Consonant, and the Lexical tone of the syllable. In the Cantonese models, Vowel height is not considered because no high vowels were coded. This model includes data from sixteen speakers, with 145 tokens. Velars are excluded from the analysis because of low token counts. The Residual Maximum Likelihood (REML) criterion, or fit of the model, is 1,368, calculated via t-tests and Satterthwaite’s method (Kuznetsova et al., Reference Kuznetsova, Brockoff and Christensen2017).
The model, as reported in the lmerTest output, is shown in Table 5.3. Two columns are added to provide the token count and raw average VOT for each level. In addition to the lmerTest output, rows are added to indicate the identity and token count of the reference levels (italicized in the table) for each predictor. The estimates show that there is no significant effect of Generation, although Gen1 VOT averages about 15 msec less than the reference (Homeland) value and Gen2 averages about 15 msec more. Sex also has no significant effect – males and females produce similar VOT. Turning to the linguistic factors, there is no significant difference between /ph/ and /kh/, the reference value for Consonant, although /ph/ is slightly shorter. Finally, tone4 tokens have significantly longer VOT than the reference level (tone1), but tone3 tokens do not. (Other tones were too rarely represented to include.)
Table 5.3 (VOT) model for Cantonese, testing Generation and Sex effects (n = 145)
| REML criterion at convergence: 1,368 | ||||||||
| Scaled residuals: | ||||||||
| Min | 1Q | Median | 3Q | Max | ||||
| –2.16 | –0.666 | –0.127 | 0.591 | 3.462 | ||||
| Random effects: | ||||||||
| Groups | Name | Variance | Std. Dev. | |||||
| Speaker | Intercept | 224 | 15 | |||||
| Residual | 917 | 30.3 | ||||||
| Number of obs: 145, groups: Speaker, 16 | ||||||||
| Fixed effects: | ||||||||
| Estimate | Std. Error | df | t value | Pr(>|t|) | VOT | n | ||
| Intercept | 53 | 11 | 7.93 | 4.79 | 0.00 | ** | ||
| Homeland | 63 | 39 | ||||||
| Gen1 | –16 | 13 | 4.78 | –1.25 | 0.27 | 40 | 60 | |
| Gen2 | 15 | 12 | 6.08 | 1.27 | 0.25 | 64 | 46 | |
| Female | 54 | 102 | ||||||
| Male | –4 | 10 | 5.83 | –0.38 | 0.72 | 53 | 43 | |
| /th/ | 55 | 77 | ||||||
| /ph/ | –6 | 6 | 133.51 | –0.94 | 0.35 | 52 | 68 | |
| Tone1 | 45 | 50 | ||||||
| Tone3 | 9 | 8 | 136.95 | 1.22 | 0.22 | 48 | 60 | |
| Tone4 | 20 | 8 | 137.97 | 2.52 | 0.01 | * | 70 | 35 |
In output from the lmerTest package, significance levels are indicated by the asterisks in the final column. Significance is coded as follows: “***” means p <0.001, “**” means p <0.01 “*” means p <0.05, and a blank cell means a lack of significance. However, only the standard p <0.05 threshold (“*”) for significance is considered in discussions in this book.
After the Intercept row, which tells us that the average VOT for this sample is 53 msec, when considering tokens at the reference level of each variable, we see three rows reporting on Generation. For Generation, the reference level is Homeland, so the negative estimate for Gen1 indicates a (non-significantly) lower VOT for Gen1 than for Homeland speakers (with all other factors considered at the reference level). The positive estimate for Gen2 indicates a (non-significantly) higher VOT for Gen2 than for Homeland speakers.
Next, the negative estimate for male indicates that males have a slightly and non-significantly lower VOT than the reference level (females), again when all other predictors are set to their reference level. The similarity across generations and sexes in VOT provides preliminary evidence of a lack of change.
The row /ph/ indicates that /ph/ has a non-significantly lower VOT than the reference level /th/, when other factors are considered at their reference levels. Recall that /kh/ tokens were excluded because they were so infrequent in the sample (n = 9). However, taking this lack of Consonant effect at face value, the lack of difference between consonants is further evidence of a lack of influence from English.
Finally, the rows for tone3 and tone4 indicate that Tone3 words have non-significantly higher VOT than the reference level, Tone1; while Tone4 words have significantly higher VOT, by about 20 msec. This confirms Tse’s (Reference Tse2012) findings from an experimental elicitation task.
A series of model comparisons further investigates whether the conditioning effects change between groups of speakers. For every language, the REML is better for the models with interactions between social and linguistic factors than without, so discussion of the other languages will begin at that step.
We turn next to a model that included interaction factors for Generation with each linguistic factor and for Sex with each linguistic factor. For Cantonese, this improved the fit (REML = 1,274, lower than for the model in Table 5.3). While no interactions with Sex were significant, there was a significant interaction of Tone with Generation, which is illustrated in Figure 5.4, a plot of raw VOT (rather than model estimates, for ease of interpretation). In this graph we can see an interaction between Gen1 and tone3: Tone3 is higher than both other tones only in Gen1. The second interaction is between Gen2 and tone4: In Gen2, Tone4 has a much higher VOT than anything else in the sample. However, there is still no overall cross-generational (or homeland versus heritage) difference in VOT, and thus we have no evidence of drift toward English values. The increase in the effect size for tone cannot be attributed to contact with English, as English lacks lexical tone.

Figure 5.4 Interaction between Generation and Tone in Cantonese (VOT), in ms. (n = 154)
An interaction between Consonant and Generation, in which later generations had VOT values closer to the /ph/ < /th/ < /kh/ pattern found in English would have indicated some influence from English, but that was not found. Similarly, any movement away from a Vowel height effect in which high vowels trigger longer VOT would be a sign of dissimilation from English (which lacks that effect). However, that could not be tested owing to the lack of high vowels in this small sample.
Two models of homeland data only are constructed to indicate the “initial state” of the HL, that is, the input for Gen1, despite the passage of time. One model tests linguistic and social factors as main effects, plus Speaker as a random effect. The second adds interaction factors between the linguistic and social factors. Perhaps because of the small sample size for Homeland Cantonese (n = 39, with data from only five speakers), no effects are significant in either model. This step is mentioned for completeness in illustrating the approach. In all cases, the REML for the model with interactions was smaller than (or the same as) that for the simpler model. Only minor differences in factor effects were found between the two.
Next, the Generation factor was reduced to a binary contrast between homeland and heritage (for Cantonese, this combines Gen1 and Gen2). Since Generation was not previously significant, I expected the fit to improve. However, this reduction in levels did not result in a lower REML (REML = 1,326 for Cantonese) for any of the languages, so models that consider only the binary contrast between homeland and heritage speakers are not discussed further.
Finally, we turn to a model that includes only heritage speakers so that we can consider the two EO factors. Recall that EO_language and EO_culture are calculated for heritage speakers only and that these two measures are independent of each other (as a result of the Principal Components Analysis data reduction method used to calculate them). However, they are collinear with Generation and so cannot be included in the models given here. For all languages, models that include only heritage speakers and test EO factors have lower REML than models of only heritage speakers that test Generation. The models testing Generation only within heritage speakers are not described here as they do not further illuminate our understanding beyond what we see from the models that are presented.
In the EO model for Cantonese (VOT), Tone remains significant. EO_culture emerges as a significant main effect, while EO_language just misses the threshold. Speakers with a stronger orientation toward their heritage culture have longer VOT. There are interactions between EO_language and tone3, and between EO_language and the bilabial stop. The latter shows that speakers who use Cantonese more often produce /ph/ longer relative to /th/ than other speakers, a movement away from an English-like pattern. More generally, this suggests that speakers with more self-reported use of Cantonese produce phonetically distinctive VOT compared with those speakers with less self-reported use of Cantonese. There is also a significant interaction between EO_culture and tone4. Effects of EO_culture suggests construction of identity via VOT: Speakers with a stronger orientation toward their heritage culture are producing phonetic patterns that differ significantly from those with a weaker orientation of this nature (albeit only for words bearing Tone4). Because of the limited data set, we cannot probe deeper to determine whether this might be a lexical effect of a particular word. For now, we note that twelve of the nineteen Tone4 tokens are po4 婆, part of a word meaning “grandma” and four are tong4 堂 “hall,” used in phrases about classes or lessons. Both words seem likely to bear important cultural connotations.
To summarize, VOT is stable overall in Heritage Cantonese and does not show any direct effect of contact with English, in terms of change in Consonant or Vowel height effects. Some models show effects that might be attributed to greater use of Cantonese than English and of stronger orientation toward Hong Kong than Canadian culture. This was determined by comparing:
models for all speakers that include interaction factors between the social factors (Generation and Sex) and each linguistic factor;
model of homeland speakers only to see the “initial state” of the variety;
model of heritage speakers only to compare to test the effects of EO scores, both as main effects and in interaction with linguistic factors.
These same steps (as well as the other intermediary ones that were not productive and will not be discussed further) were tested for each language and are reported next.
5.2.2 Italian
The HLVC project has examined (VOT) in Italian in Celata & Nagy (Reference Celata and Nagy2022), Nagy & Kochetov (Reference Nagy, Kochetov, Siemund, Gogolin, Schulz and Davydova2013), and Nodari et al. (Reference Nodari, Celata and Nagy2019). Here, we focus only on the word-initial context, with a sample of 1,342 tokens from twenty-three speakers. Homeland Italian has short-lag voiceless stops: The mean VOT is less than 30 ms (Sorianello, Reference Sorianello1996). Owing to the availability of that homeland data, we have not examined (VOT) in our own homeland sample, though we recognize that Sorianello’s (Reference Sorianello1996) read speech data may not be exactly comparable. Furthermore, Sorianello’s measurements (see Table 5.2) are for pre-tonic (not necessarily word-initial), intervocalic, non-phrase-final words, produced by three speakers in a sentence reading task. They were asked to produce Italian and then Calabrian utterances.
All HLVC generations have an average VOT that is slightly longer than the means reported in Sorianello’s (Reference Sorianello1996, p. 134) study of Homeland Calabrese Italian speech (see Figure 5.1). Previous HLVC publications reported no significant effect for Generation as a main effect in Italian (VOT). Table 5.4 presents the fixed effects from a fitted model with data from three generations of heritage speakers tested, including interactions between Generation and each linguistic factor, as well as between Sex and each linguistic factor.
Table 5.4 (VOT) model for Italian, testing generation and gender effects (n = 1,342)
| Estimate | Std. Error | df | t value | p | VOT | n | ||
|---|---|---|---|---|---|---|---|---|
| Intercept | 28 | 4.63 | 28.69 | 6.00 | 0.00 | *** | ||
| Main effects | ||||||||
| Generation1 | 24 | 463 | ||||||
| Generation2 | 2 | 5.02 | 29.24 | 0.40 | 0.69 | 24 | 434 | |
| Generation3 | 15 | 5.12 | 29.54 | 2.95 | 0.01 | ** | 37 | 445 |
| GenderF | 28 | 619 | ||||||
| GenderM | 2 | 4.08 | 32.24 | 0.42 | 0.68 | 29 | 723 | |
| height_binaryHigh | 35 | 363 | ||||||
| height_binaryNon-high | –11 | 2.06 | 1311.27 | –5.16 | 0.00 | *** | 26 | 979 |
| ConsonantT | 25 | 304 | ||||||
| ConsonantP | –1 | 2.26 | 1311.03 | –0.34 | 0.73 | 26 | 584 | |
| ConsonantK | 13 | 2.49 | 1310.42 | 5.18 | 0.00 | *** | 33 | 454 |
| Interaction effects | ||||||||
| Generation2: Non-high | –2 | 2.52 | 1316.62 | –0.93 | 0.35 | |||
| Generation3: Non-high | –8 | 2.38 | 1313.67 | –3.32 | 0.00 | *** | ||
| Generation2: ConsonantP | 0 | 2.71 | .60 | –0.16 | 0.87 | |||
| Generation3: ConsonantP | 0 | 2.61 | 1314.24 | –0.15 | 0.88 | |||
| Generation2: ConsonantK | –2 | 2.98 | 1312.16 | –0.71 | 0.48 | |||
| Generation3: ConsonantK | 0 | 2.82 | 1311.61 | –0.18 | 0.86 | |||
| GenderM: Non-high | 1 | 2.03 | 1314.98 | 0.57 | 0.57 | |||
| GenderM: ConsonantP | 1 | 2.23 | 1314.19 | 0.50 | 0.62 | |||
| GenderM: ConsonantK | –3 | 2.41 | 1310.58 | –1.36 | 0.17 |
We see a main effect of Generation, but not Sex, in the interaction model in Table 5.4. Gen3 has a VOT that is significantly longer than Gen1, when all factors are set to their reference levels. There is an effect of Consonant: /k/ is significantly longer than /t/, but /p/ is only insignificantly shorter than /t/. This is consistent with Sorianello’s homeland data: see Table 5.2. The usual effect for Vowel height, with following high vowels triggering longer VOT, is significant.
Interestingly, there is a significant interaction between Generation and Vowel height, with the height effect increasing for Gen3 speakers. This is a significant shift away from the English non-effect of height. It is coupled with a lack of interaction between Generation and Consonant, meaning that the non-English-like lack of a /p/-/t/ difference is maintained. There are no significant interactions with Sex.
We next consider the Italian model with EO scores rather than Generation. This model fits the data considerably better (REML 5,340 versus 11,073 for the generation model) but has no significant interactions. The only significant effect is the main effect of Vowel height, with tokens before low vowels again having a mean VOT about 11 msec shorter than high vowels.
To summarize, for the Italian (VOT) data, we see a slight increase in VOT for Gen3 (15 msec more than Gen1, at reference levels), but that is coupled with an increase in the Vowel height effect, suggesting that the contexts in which this change occurs are causing dissimilation from the English pattern. The lack of Sex and EO score effects mean we lack support for any sort of identity-marking role for the change.
5.2.3 Korean
We turn next to Korean (VOT). For Korean, we test Accentual Phrase-initial words beginning with stops from two series, lenis and aspirated. Examples 4–6 provide words illustrating the contrast in each place of articulation.
| Lenis (slightly aspirated) | Aspirated (heavily aspirated) | |
| (4) Labial | 불 /pul/ “fire” | 풀 /phul/ “grass” |
| (5) Coronal | 달 /tal/ “moon” | 탈 /thal/ “mask” |
| (6) Dorsal | 근 /kɨn/ “pound” | 큰 /khɨn/ “big” |
We consider both series as each could potentially be becoming more English-like. For Korean (VOT), measurements were extracted using the script function in Praat (Boersma & Weenink, Reference Boersma and Weenink2022), as part of Kang and Nagy’s (Reference Kang and Nagy2016) investigation of tonogenesis, an ongoing change in both Seoul and Toronto Korean. These measurements are a subset of the data examined in Kang & Nagy (Reference Kang and Nagy2016). Here, we do not consider pitch effects, as they were thoroughly explored in that paper. Homeland comparison data comes from the National Institute of the Korean Language (NIKL) (2005) corpus and is continuous, read speech. Each of 117 speakers contributed three tokens for each series. For the heritage data, we lack high vowel tokens, so, as for Cantonese, we cannot consider the Vowel height effect.
We consider first the effects in the lenis series, referred to in the cross-language comparisons below simply as “Korean.” The average raw VOT is 52 msec, the longest in any language examined, though still shorter than English. In the model in which interactions between Generation and Sex, and each linguistic factor, are tested, we find several significant effects. First, there is a main effect of Generation, with Gen2 having significantly longer VOT than Homeland speakers, and Gen1 having a smaller, non-significant increase when compared with Homeland.
As with Italian, /k/ is significantly longer than /t/ (16 msec). /p/ is also slightly and insignificantly longer (by about 2 msec). Both heritage generations show significantly shorter VOT for /k/ than the homeland sample. This effect is coupled with a significant increase in VOT for /p/ among male speakers, to such an extent that for males the VOT of /p/ is significantly longer than /t/. All this suggests movement away from the initial English-like place of articulation pattern.
The homeland-only Korean model confirms the significant effect of /k/ > /t/, and also /p/ > /t/, as well as showing a significant positive age correlation with both these differences. For this continuous factor, older speakers exhibit (very slightly) longer VOT for lenis /p/ and /k/, but no Sex effect emerges.
Turning next to the heritage-only model, in order to consider EO effects in Korean, we find no main effect of either EO measure. However, there are two significant interactions: Speakers who report using Korean more have a bigger drop in VOT from /t/ to /p/ than speakers who report using Korean less, while speakers who report more orientation toward Korean culture have a bigger drop in VOT from /t/ to /k/ than less Korean-culture oriented speakers. Together, these small (<5 msec) but significant effects suggest neither movement toward nor away from the homeland pattern. This model fits the data better than the Generation model that indicated a difference between Gen2 and Homeland VOT.
We turn now to the Korean aspirated series, referred to in the cross-language comparisons as “Korean_asp.” For this sample of 590 tokens (homeland and heritage), the mean raw VOT is 69 msec, longer than in the lenis series. In the mixed-effects model with interactions, both Gen1 and Gen2 yield significantly longer VOT than the Homeland speakers, but the difference is bigger for Gen1 (30 msec) than Gen2 (22 msec). /ph/ and /kh/ are both significantly longer than /th/, with /kh/ increased by 12 msec more than /ph/. All tested interactions are significant, with differences as illustrated by raw VOT values in Figure 5.5. Each Generation and Sex replicate a U-shaped pattern where /th/ is shorter than the other consonants, but the size of this effect differs across groups. As the /ph/ > /th/ difference is smaller in the heritage speakers than the homeland speakers, we can interpret it as a change toward a pattern that is more consistent with English. However, by the same logic, we must interpret the /kh/ > /th/ difference also being smaller in the heritage speakers than the homeland speakers as indicating movement away from an English-like pattern. The same is true when we contrast males and females, leaving us with no overwhelming indication of a contact effect.

Figure 5.5 Interactions between social factors and Consonant in (VOT) of Heritage Korean aspirated series (n = 590)
We next consider the homeland data alone. As with the lenis series, there are significant effects of Consonant: /kh/ and /ph/ are both significantly longer than /th/ (34 and 24 msec, respectively), again with a positive Age interaction. Males produce longer VOT than females (by 11 msec).
Finally, we consider the model of (VOT) for the Korean aspirated series, with EO scores in place of Generation, which fits the data better than the Generation models. In this model, however, no effects are significant except the 23 msec longer VOT of males than females.
To summarize, for Korean, there is a main effect of Generation in both the lenis and aspirated series, with movement toward more English-like VOT. The fact that this difference is not significant for Gen1 in the lenis series may be because of counteraction by the ongoing change in progress in homeland speech, where older speakers have longer VOT, especially for labial and velar stops. As our Gen1 speakers are older than our Gen2 speakers, this may offset any effect of contact with English. Males consistently produce longer VOT than females, in both homeland and heritage datasets, so we cannot incorporate that effect into any contact-related account but may interpret it as marking identity. Significant differences according to place of articulation do not lend themselves to explanation by contact. Effects of Vowel height were not tested owing to a lack of some vowels in some generations.
5.2.4 Russian
We now turn to the languages in which previous examination (cf. Nagy & Kochetov, Reference Nagy, Kochetov, Siemund, Gogolin, Schulz and Davydova2013) of the main effect of Generation already suggested an effect of contact with English. Like Italian, Homeland Russian has short-lag voiceless stops with mean VOT less than 30 msec. Ringen & Kulikov (Reference Ringen and Kulikov2012), report VOT averages of 18, 20, and 38 msec, according to place of articulation. In this sample of 729 Heritage Russian tokens, we find a slightly longer mean raw VOT of 35 msec, and 32, 30, 42 msec, according to place of articulation. We have not tested (VOT) in homeland data, as we have not collected any.
In the mixed-effects model that considers Generation both as a main effect and in interaction with place of articulation, Generation has no significant effect. This differs from previous results based on ANOVA comparisons. As we saw for Cantonese and Italian, /k/ is significantly longer than /t/, but /p/ does not differ significantly. There is one significant interaction: The difference between /p/ and /t/ in Gen3 speakers is significantly bigger than in the other generations, a movement away from the English like /p/ < /t/ pattern. No other effects are significant in this model that tests interactions of Sex and Generation with Consonant in the homeland sample.
Similarly, in the model with EO scores in place of Generation, the only main effect is /k/ > /t/ by 14 msec. The one significant interaction is for this same consonant contrast, where the difference is smaller for males.
In the overall distribution, there is less resemblance to the English /p/ < /t/ < /k/ effect among females than males and in each successive generation. This is illustrated in Figure 5.6.

Figure 5.6 Interactions between social factors and Consonant in (VOT) of Heritage Russian (n = 729)
5.2.5 Ukrainian
As we were not able to find published studies of (VOT) for Homeland Ukrainian speakers, we analyzed our Homeland Ukrainian speakers and found that, as anticipated by Nagy & Kochetov (Reference Nagy, Kochetov, Siemund, Gogolin, Schulz and Davydova2013), Ukrainian (VOT) is similar to Russian and Italian. Research assistant Christopher Zhu marked and measured seventy-five word-initial tokens of words starting with /p, t, k/ followed by /a/ or /o/, from each of twelve homeland speakers. In the combined homeland and heritage sample of 1,802 tokens, we find an average raw VOT of 32 msec. The model considers interactions of Generation and Sex with Consonant (but, again, does not consider Vowel height because only tokens preceding /a/ and /o/ were measured); there is a main effect of Generation, with Gen2 and Gen3 producing variants that are longer than those produced by Homeland Ukrainian speakers (by 8 and 12 msec, respectively), but Gen1 is non-significantly shorter than Homeland. There is no main effect of Sex. There is a main effect of Consonant, with /k/ significantly longer than /t/, by 8 msec, but /p/ only non-significantly shorter (and by only 1 msec). Several interactions emerge as significant. These are best illustrated by the bar chart in Figure 5.7. Here, females are shown on the left and males on the right. We can see that the females approach English-like values more quickly than the males, while the two sexes have similar rates in the homeland. However, at the same time, females diverge from the English-like /p/ < /t/ < /k/ more quickly. We see the English-like pattern in both sexes in the homeland data, but not in any heritage group. Rather, each group shows a lower value for /t/ than /p/ or /k/. Although only the /t/ versus /k/ difference is significant, we return to the possibility raised in Nagy & Kochetov (Reference Nagy, Kochetov, Siemund, Gogolin, Schulz and Davydova2013, p. 27) that it is easier for speakers to maintain a distinction in the coronal class than in other consonant categories because coronals are realized as dentals in the Slavic languages but as alveolars in English. This is in keeping with the predictions of Flege’s (Reference Flege1987) Equivalence Classification Principle (revisited in Flege & Bohn, Reference Flege, Bohn and Wayland2021).

Figure 5.7 (VOT) effects in Homeland and Heritage Ukrainian (n = 1,802)
The model for Homeland Ukrainian indicates that only the /k/ > /t/ difference is significant and reveals no effect for Sex or Age, indicating homeland stability for (VOT) in Ukrainian.
The EO model again fits the data better than the model with Generation. However, the only significant effect is the interaction between EO_culture and /k/, suggesting that speakers with stronger affiliation with Ukrainian culture produce phonetically distinctive VOT compared with those speakers with weaker orientation. Specifically, their VOT for /k/ diverges more from that of (/p/ and) /t/. This identity-marking effect is clear in Figure 5.8, which divides Heritage Ukrainian speakers with low and with high EO culture scores. The higher EO_culture score group has a less English-like pattern than the lower EO_culture score group. It might be best explained by the lower value of /t/ for the strongly heritage-oriented speakers, in keeping with the earlier discussion regarding the lack of equivalence for some speakers between Ukrainian and English coronals.

Figure 5.8 Effect of EO_culture on Heritage Ukrainian (VOT) (n = 810)
For Ukrainian, then, we see a drift toward English-like VOT in later generations, and more so for females, but this is counteracted by a movement away from the place of articulation effect found in English, meaning that we, again, do not see wholesale adoption of English-like patterns that might argue for contact effects only. Additionally, we see the effect illustrated in Figure 5.8, suggesting an identity-marking pattern.
5.2.6 Tagalog
Recent experimental work has shown that (a different group of) bilingual heritage speakers in Toronto maintained differences between English and Tagalog VOT in the voiceless stop series, although the voiced series showed evidence of some influence of English (Kang et al., Reference Kang, George and Soo2016). Here, we consider a sample from the larger dataset extracted and measured by Pocholo Umbal as part of his 2023 dissertation research. I am grateful for his generosity with the data, and acknowledge that although we have discussed (VOT) patterns in this dataset together, we have made independent decisions on how to model the data.
The overall mean raw VOT is 19 msec for this sample of 606 tokens.
The model of all speakers, considering interactions between Generation and Sex with the two linguistic factors, shows a main effect of Generation, in which Gen2 has 15 msec longer VOT (at reference levels for the other predictors) than Homeland Tagalog speakers; and of Sex, where males have 7 msec longer VOT than females. There is no main effect of Vowel height, but there is an interaction: Gen2 speakers generate a bigger VOT contrast depending on Vowel height than Homeland speakers. Again, this is movement away from the English non-effect of vowel height.
Additionally, there is a main effect in which /k/ is significantly longer than /t/, but /p/ does not differ significantly, as we have seen in most of the languages considered here. Figure 5.9 shows divergence from the “initial state” or homeland pattern in which the English-like place of articulation effect is present. By Gen 2, /p/ and /t/ are nearly identical and /k/ is much longer. That is, /k/ has lengthened across generations more than the other stops, amplifying a homeland effect.

Figure 5.9 Interaction between Generation and Consonant in Tagalog (VOT) (n = 606)
In the homeland-only model, there is a significant effect of /k/ > /t/ and an insignificant difference of /p/ < /t/, resembling English. No other effects are significant.
In the EO model, we see only an interaction between EO_culture and /k/ – again, it is the speakers with the stronger orientation to their culture who are carrying the /k/-lengthening effect. As the EO model fits the data better than the Generation model, this is, once again, best interpreted as identity-marking, and not due to English interference. Additionally, it amplifies a pattern already present in Homeland Tagalog.
5.2.7 (VOT) Summary
For all languages, Gen1 speakers maintain the VOT values of the homeland variety. The only languages for which we reported a possible homeland versus Gen1 difference (main effect) is when we rely on published read-speech data for homeland values. Therefore, I interpret the two apparent differences between Homeland and Gen1 (in Italian and Russian) as quite possibly methodologically driven. However, in three languages, including both Korean series, Tagalog, and Ukrainian, Gen2 VOT is significantly longer than homeland VOT, according to interactions revealed by mixed effects models. It is of note, however, that in all languages examined here, even the Gen3 speakers maintain a shorter VOT than that found in Toronto English.
Table 5.5 summarizes what we have learned about (VOT) for each language. Based on details provided in Section 5.2.1–5.2.6, this table suggests the most likely source of variability in HL (VOT) patterns. Factors are interpreted in terms of movement toward or away from the innovative (English-like) values or patterns. The rows labelled “rate” interpret inter-group differences in VOT. The rows labelled “C” report interactions between the Consonant (place of articulation) factor and the social factors (Generation, Sex, EO_language, EO_culture). The rows labelled “V” report interactions between the Vowel height factor and the social factors (Generation, Sex, EO_language, EO_culture). One additional row, “T,” reports on Lexical tone effects for Cantonese, in interaction with the same social factors.
Table 5.5 Summary of (VOT) effects (“S” indicates stability, “ToE” and “FromE” mark convergence/divergence with English, “I” marks differences attributed to identity-marking or internal change). The “C” row reports effects related to the consonant’s place of articulation, the “V” row reports effects related to vowel height, and the “T” row reports tone effects (for Cantonese only)
| Language | HOM Age | HOM Sex | HOM v G1 | HOM v G2 | G1 v G2 | HER EO | HER Sex | |
|---|---|---|---|---|---|---|---|---|
| CAN | rate | S | S | S | S | S | I; FromE | S |
| C | S | S | S | S | S | FromE | S | |
| V | ||||||||
| T | S | S | I | I | I | I | S | |
| ITA | rate | ToE | S | S | ||||
| C | S | S | S | |||||
| V | FromE | S | S | |||||
| KOR | rate | I (+) | S | S | ToE | S | S | S |
| C | I (+) | S | FromE | FromE | S | I | S | |
| V | ||||||||
| KOR_asp | rate | I (+) | S | S | ToE | S | S | I (M>F)Footnote 1 |
| C | I (+) | I (M>F) | I | I | S | S | I | |
| V | ||||||||
| RUS | rate | S | S | S | ||||
| C | FromE | only in EO model, k~t effect smaller for males | ||||||
| V | ||||||||
| TAG | rate | S | S | ToE | ToE | S | I (M>F) | |
| C | S | S | FromE | I | I | I (M>F) | ||
| V | S | S | FromE | FromE | S | S | S | |
| UKR | rate | S | S | S | ToE | S | S | S |
| C | S | S | I | FromE | FromE | FromE | I (M>F) | |
| V | ||||||||
1 This Sex effect is found in the EO model but not the model testing Generation.
The first two columns (after the headers) summarize the homeland variety. The next two columns report significant intergenerational VOT differences and interactions between Generation and the linguistic predictors. These significant effects indicate differences in the grammar among the generations. The EO column shows what models including EO_language and EO_culture, rather than Generation, suggest about sources of change. The final column notes where either a main effect of Sex or an interaction between Sex and a linguistic factor is observed in the heritage variety.
Cells filled with “S” indicate stability according to the linguistic factor in that row and the cross-group comparison in that column. Patterns of inter-group differences that indicate assimilation toward English are marked “ToE,” while those that indicate divergence away from English are marked “FromE.” In some cases, neither interpretation makes sense, and “I” suggests that the change must be either internal or due to identity-marking.
For the age effects, “+” indicates that older speakers have longer VOT. It is of note that this suggests retrograde motion away from the effect expected due to contact with English, since English has longer VOT than the heritage variety.
Shading indicates that we do not have the data to test a particular effect (e.g., no homeland data was collected or no high vowels were coded in a particular language).
It is of note that, while VOT values themselves suggest either stability or movement from homeland toward English-like values, the constraint hierarchies often tell a different story. As illustrated in Table 5.5, the effect of the consonant’s place of articulation remains stable across languages in many cases. However, whenever the ranking of the levels of place of articulation (“C”) interacts with Generation, we find movement away from, and not toward, the English-like /p/ < /t/ < /k/ pattern. This may be of two types: Either the effect size is shrinking, or the ranking of levels diverges from the English-like pattern, in successive generations.
Similarly, when the Vowel height effect is not stable across generations, its effect always diverges from the non-effect of this predictor in English, increasing across generations. This is the case for Italian and Tagalog; elsewhere the vowel height effect is stable.
In the rate row, we can compare the HOM Sex columns to see whether social factors are successfully transmitted when a HL is established. In the only two cases of a main effect of Sex on (VOT) (Korean, Tagalog), it is maintained in the heritage variety. Where there is no Sex effect in homeland speech, there is also no main effect for Sex in the heritage variety.
Finally, we have established that the effect of EO scores on (VOT) produces models that in every case match the raw data better than models that consider Generation. Where it is the EO_culture score that plays a role, it seems that orientation toward the heritage culture may have been masquerading as a generational effect in many previous analyses of (VOT) that compare generations of speakers directly. As it is the case that Gen1 speakers tend to have stronger orientation toward the culture of the country in which they were born (the homeland) than Gen2 speakers, it is easy to confound these effects, but important to tease them apart and attribute patterns of variation to the factor that best models it, giving serious consideration to intra-generational distinctions among speakers.
In the first two columns, we see evidence of (VOT) as stable variation in most homeland varieties, but an Age effect for both Korean series (longer VOT for older speakers), accompanied by a Sex difference in the Korean aspirated series, and a Sex difference in Tagalog. In both Korean aspirated and Tagalog, males have longer VOT than females.
5.3 Variation between Null and Overt Subject Pronouns (PRODROP)
The second variable that is examined across a broad set of HLVC languages is (PRODROP), the alternation between overt and null pronoun surface forms. The label prodrop is not meant to imply that a pronoun was originally present and then deleted during the derivation. We use it simply as shorthand for the variation between overt and null pronoun surface forms, not to motivate either a derivational process or a particular theoretical construct. For this variable, the envelope of variation includes each clause that contains a finite verb but no noun subject. In addition to examining the rates of null versus overt subjects, we analyze the effects of a constellation of factors that have been shown to influence variation in the presence versus absence of subject pronouns.
Torres Cacoullos and Travis (Reference Torres Cacoullos and Travis2015, p. 365) remind us of the often-observed difference between contact effects at the phonic and morphosyntactic levels: that phonetics and phonology are more labile than morphosyntax. As with this variable we seemingly switch from the domain of phonetics to morphosyntax, we would expect even less effect on HLs owing to contact with English than we observed in (VOT) (Section 5.2). However, Torres Cacoullos and Travis (Reference Torres Cacoullos, Travis, Rivera-Mills and Villa2010) and Weir (Reference Weir2008, Reference Weir2012) question where to situate the phenomenon of prodrop in English. Because we are interested in the possibility of contact effects, we turn first to consider how (PRODROP) operates in English.
5.3.1 (PRODROP) in English
English, a non-null subject language, permits null subjects only in certain discourse contexts, often described as “diary drop” (cf. Haegeman & Ihsane, Reference Haegeman and Ihsane2001). Early descriptions of prodrop in English (cf. Napoli, Reference Napoli1982, p. 99; Roberts & Holmberg, Reference Roberts, Holmberg, Biberauer, Holmberg, Roberts and Sheehan2010, p. 5) suggest a syntactic explanation: Subject pronouns may be omitted if they are in the initial position of the sentence. Modifying this explanation slightly, Harvie (Reference Harvie1998) shows a significant effect in which tokens in clause-initial position have fewer overt subject pronouns than those found elsewhere. However, Torres Cacoullos and Travis (Reference Torres Cacoullos and Travis2015) note that “sentence” is not clearly defined in these reports, making the claim difficult to test. Torres Cacoullos and Travis (Reference Torres Cacoullos, Travis, Rivera-Mills and Villa2010, p. 378) revise the description of the licit context for prodrop to suggest that, for English, subject pronouns may be null only if they occupy the left-edge of the intonational phrase. Weir (Reference Weir2008, Reference Weir2012) shows that this is not, strictly speaking, a constraint restricted to prodrop, as elements besides subject pronouns may also be null in this left-edge position.
Given these restrictions on the distribution of (PRODROP) in English, if English influence accounts for HL variation, we would expect successive generations of heritage speakers to change in two ways. First, we would predict a shrinking of the contexts in which prodrop is permitted and a concomitant drastically lower rate of null subjects. Second, we would predict that the constraint hierarchies of the HLs will increasingly resemble the hierarchy for English.
Several other restrictions have been reported regarding where null subjects may surface in English. For example, null subjects are reported to be strictly ungrammatical in subordinate clauses in English (Weir, Reference Weir2008, p. 13), as well as in several other contexts that do not concern us here as they have not been examined in the HLs. In contrast, they are licit in the second conjunct of coordinated clauses (Torres Cacoullos & Travis, Reference Torres Cacoullos and Travis2014, p. 22 and citations therein). In English, we must consider the possibility that a conjoined clause with a null subject may be the surface representation of either conjoined sentences or conjoined verb phrases (VPs) (that share a single subject pronoun). However, when the two clauses refer to different events (Givón, Reference Givón and Givón1983) or their subjects have different referents, there is no VP-conjunction alternative.
A range of other constraints have been suggested for English (PRODROP). In a variationist analysis of 400 clauses extracted from the Toronto English archive (Tagliamonte & Denis, Reference Tagliamonte and Denis2010). Nagy et al. (Reference Nagy, Aghdasi, Denis and Motut2011) show a significant effect for a factor consisting of the interaction between clause type (simple or conjoined) and switch reference (same or different). This effect is illustrated in Figure 5.10, which shows that when a clause has the same referent as the previous clause, it is less likely to have an overt subject. This aspect of the effect has been found cross-linguistically, perhaps universally. Figure 5.10 also shows that an overt pronoun is more likely in a conjoined clause (in the second conjunct) than in a simple clause (or first conjunct), and that these two factors interact. The lowest probability (0.14) of an overt subject occurs when it has the same referent as the subject of the previous clause and when it is also conjoined to the previous clause. If it has the same referent, but it is not conjoined, the probability rises to 0.47. If the referent of the subject changes from that in the previous clause and the clause is conjoined to the previous clause, then it rises further (0.66). And if the subject neither shares a referent with the previous subject nor is conjoined to the previous clause, the probability is highest: 0.79.

Figure 5.10 Overt pronoun (estimated) rates in English
However, because the direction of effect remains consistent (more overt in same than switch referent, no matter the clause type; and more overt in simple than conjoined, no matter the reference type), we can seek similar effects in heritage varieties without models that include interaction effects of this type. A strengthening of the effect of Clause Type or Switch Reference, from one generation to the next, could be deemed to be owing to contact with English. The causality would be stronger if the effect size changes by Clause type, because switch reference seems to have a universal effect. Torres Cacoullos and Travis (Reference Torres Cacoullos and Travis2014) confirm the effects of clause type, in a different variety of English, but show that the apparent switch reference effect is an epiphenomenon of other factors (which are not yet coded in the HLVC project). The switch referent effect was also found to be significant in Harvie’s (Reference Harvie1998) analysis of Ottawa English.
Nagy et al. (Reference Nagy, Aghdasi, Denis and Motut2011) tested the effects of Tense and Grammatical person and Number as well. These were not found to have a significant effect. Torres Cacoullos and Travis (Reference Torres Cacoullos and Travis2014) also tested for the effect of Tense on English (PRODROP) and did not find it. Thus, weakening effects, from one generation to the next, of Tense, Grammatical person and Number could be interpreted as due to contact with English.
With this knowledge of how (PRODROP) operates in Toronto English speech, regarding both the low rate of occurrence of null subjects (2 percent) and its strong conditioning by the degree of connection between one clause and the preceding one, we turn to examining (PRODROP) in the HLs, well-prepared to detect influence from English. As a reminder, we expect any change from generation to generation, away from the homeland variety’s pattern and toward English, to go hand in hand with differences in measures of EO. If the variation is related to identity marking, then people who identify more as Canadian (lower EO_culture scores), might exhibit more innovation that increases similarity to English than people who strongly identify with their heritage culture (higher EO_culture scores). On the other hand, if the variation is related to frequency of use of the HL versus English, then we expect effects of the EO_language scores instead. This dichotomy distinguishes the effects of cultures in contact from those of languages in contact. It is critical to keep in mind that languages may also undergo internal change and that such variability should lack correlation with EO. A third possibility is that the language is not changing (regarding prodrop). Sometimes languages’ features survive intact under contact with languages with which they differ in important ways.
We want to understand why we sometimes observe stability and why there are contact effects at other times, as well as when and why languages change independent of contact. Consider a language such as Latin, which has no subject pronouns. French developed from Latin and has a complete set of obligatory subject pronouns. However, there are also languages such as Italian and Spanish, with variable presence of subject pronouns, although they also evolved from Latin. During the transition in which these languages changed from having no subject pronouns to variable or obligatory full sets of subject pronouns, these languages must have shown variable presence of subject pronouns. We can hypothesize that this path would be replicated if the presence of subject pronouns in HLs increases under the influence of English. This can be tested by comparing speakers who have less contact with English with people who have more contact with English – for example, earlier and later generations of heritage speakers. It is in this light that we examine (PRODROP) variability in the HLs.
5.3.2 (PRODROP) in Other Languages
Before we can investigate the effects of contact with English on the HLs, it is important to consider what constraints operate on (PRODROP) more broadly. A wealth of literature is available suggesting important factors. Both experimental psycholinguistic and variationist linguists have frequently examined subject pronoun variation in HL speakers, primarily for Spanish, since at least the 1980s (cf. White, Reference White1985, Reference White, Moravcsik, Wirth and Eckman1986; Paredes Silva, Reference Paredes Silva1993). Experimental paradigms have reported that HL speakers treat null subjects differently from both monolingual first-language speakers and second-language learners. Sociolinguists have reported systematic patterns of variation in spontaneous speech among HL speakers, as well as monolingual speakers and language learners. The bulk of studies investigated Spanish (cf. Bayley & Pease-Alvarez, Reference Bayley and Pease-Alvarez1997; Erker & Otheguy, Reference Erker and Otheguy2016; Flores-Ferrán, Reference Flores-Ferrán2004; Guy, Reference Guy, Amaral and Carvalho2014; Harvie, Reference Harvie1998; Otheguy et al., Reference Otheguy, Zentella and Livert2007; Raña Risso, Reference Raña Risso2010; Schmitz, Di Venanzio, & Scherger, Reference Schmitz, DiVenanzio and Scherger2016; Silva-Corvalán, Reference Silva-Corvalán1994; Torres Cacoullos & Travis, Reference Torres Cacoullos, Travis, Rivera-Mills and Villa2010, Reference Torres Cacoullos and Travis2011). It is of note that the sociolinguistic studies that apply multivariate variationist method to conversational speech data do not find evidence of contact effects, in contrast with the experimental studies. Because so many factors contribute to (PRODROP) variation across languages, this variable creates a rich opportunity to investigate which kinds of factors hold cross-linguistically.
The “left-edge” effect that is so important for constraining the envelope of variation for English (PRODROP) has not been shown for other languages. Rather, constraints that are more obviously morphosyntactic have been shown to play a role: phi-features and information status of the subject, tense, aspect, and mood features of the verb, clause type, and clause structure. Kupisch (Reference Kupisch2020, p. 29) notes that a characteristic of HL speakers, more than of monolinguals, is preserving contrasts to avoid ambiguity. While this has been noted particularly for phonetic features, Iannozzi (in prep.) looks at this feature for (PRODROP) in several varieties and finds less evidence of inter-group difference. In the following analysis, we consider the effects of the features noted in this paragraph that have been shown to correlate with the selection of overt versus null pronouns.
5.3.3 (PRODROP) Methods
We have examined (PRODROP) in spontaneous speech in Cantonese, Faetar, Korean, Italian, Polish, Russian, and Ukrainian. As explained in Section 4.3.3.1, these languages are all characterized as prodrop languages. Detailed results have previously been published: Chociej (Reference Chociej2011) for Polish; Heap and Nagy (Reference Heap and Paradis1998); Nagy & Heap (Reference Nagy, Heap, Gruber, Higgins, Olson and Wysocki1998) and Nagy et al. (Reference Nagy, Iannozzi and Heap2018) for Faetar; Nagy et al. (Reference Nagy, Aghdasi, Denis and Motut2011) and Nagy (Reference Nagy2015) for Cantonese, Italian and Russian; and Pustovalova (Reference Pustovalova2011) for Homeland Russian. Full details will not be provided here for those languages, except Cantonese, for which much additional data has now been coded. Models for all languages, including Korean and Ukrainian, for which the HLVC project has not yet published (PRODROP) findings, are available in the supplementary material (http://ngn.artsci.utoronto.ca/pdf/HLVC/HLVC_PRODROP_analysis_31October2023.html). For each language, we examine the production of speakers from two heritage generations and from homeland speakers. We examine how the HL patterns differ from, and resemble, patterns for (PRODROP) in English. As with (VOT), this is done via a series of comparisons of mixed effects models. Before discussing the models, we describe the constraints tested in them, and how they are coded.
5.3.3.1 Constraints on (PRODROP)
Although previous studies tested a wider range of factors, for this comparative work, we restrict our investigation to seven linguistic constraints that have been coded for (nearly) all the languages. To the greatest extent possible, given the different coding decisions made in the initial analyses by different student research teams, we code them using the same levels in each language.Footnote 2 These factors, with their levels, are listed in Table 5.6. The ones that are most important for comparison to English are listed first, with examples from the corpus. No examples are provided for standard terms such as phi-features and tense (which may or may not be overtly marked), but the levels coded are listed. Examples are from Italian, except where otherwise noted to clarify coding decisions that differ slightly between languages and to complete the paradigms.
Table 5.6 Dependent and independent linguistic variables coded for (PRODROP) models, with examples from the HLVC corpus
| Dependent Variable: Subject form (null or overt) | |
| overtFootnote 1 | |
| magari tu mi stai simpatica e ti frequento (IXF24C 14:19) | |
| but you were nice to me, and I hang out with you | |
| null | |
| magari tu mi stai simpatica e Ø ti frequento (IXF24C 14:19) | |
| but you were nice to me, and I hang out with you | |
| Switch reference | |
| same referent The clause’s subject has the same referent as the previous clause’s subject (which may have been produced by a different interlocutor) | |
| Ø ho vissuto con loro per tanti anni. Poi Ø mi sono sposato (I1M62A, 2:54) | |
| Ø-I lived with them for many years. Then Ø-I got married. | |
| different referent The clause’s subject has a different referent from the previous clause’s subject | |
| e mi ho comprato questa casa e Ø siamo rimasti qui a Toronto (I1M62A, 2:57) | |
| and I bought this house, and Ø-we stayed here in Toronto | |
| Clause type | |
| conjoined clause The clause is conjoined to the previous clause by a conjunction such as and, or, but. | |
| e cos’hanno fatto ma loro l’hanno fatto sicuramente (IXM28, 37:53) | |
| and they did it like that, but they did it securely | |
| simple clause The clause is not conjoined to the previous clause. | |
| Ø sono venuto solo per opportunità per un migliore futuro per la sua famiglia. (I2F44A_IV, 2:21) | |
| They came only for the opportunity for a better future for their family. | |
| subordinate clause The clause is subordinate to a main clause (coded only in Ukrainian) | |
| бо ти є наказанй бо ти в куті (U1F57A, 17:37) | |
| because you are punished because you are in the corner | |
| Pre-verbal element | |
| no The verb is the first element in the clause. | |
| e cos’hanno fatto ma loro l’hanno fatto sicuramente (IXM28, 37:53) | |
| and they did it like that, but they did it securely | |
| yes A direct or indirect object, reflexive pronoun, negative particle, adverb, etc., is present before the verb. | |
| magari tu mi stai simpatica e ti frequento (IXF24C 14:19) | |
| but you were nice to me, and I hang out with you | |
| Grammatical person of the subject (Person) | first |
| second | |
| third | |
| Grammatical number of the subject (Number) | singular |
| plural | |
| Grammatical gender of the subject (Gender) | feminine |
| masculine | |
| neuter | |
| Tense of the verb (Tense) | present |
| non-present | |
| Or, in some languages | past |
| non-past | |
1 Faetar has two series of subject markers, a strong pronoun and a weak pronoun or clitic. In a departure from previous analyses, only tokens with strong pronouns were coded as overt, while tokens with the clitic subject marker only were coded as null.
5.3.3.2 Coding (PRODROP) in ELAN
Tokens are selected from the sociolinguistic interviews (and, if needed to enlarge the sample, from the First Words task), starting about fifteen minutes into the recording. For some languages, all finite clauses were sampled exhaustively, while, in others, the first fifty to a hundred main finite clauses with subjects consisting of overt pronouns or null forms were selected (for Ukrainian, to date we could only code twenty-five tokens per speaker). We excluded verbs that occur in subordinate clauses (except in Ukrainian), had nouns as subjects, or were part of discourse markers.
Example 7 provides an example of (PRODROP) variation from Cantonese, a language categorized as a radical prodrop language because of the lack of verb morphology indicating phi-features (i.e., person, number, gender). Thus, null subjects might be expected to be rare in discourse. However, Example 7 illustrates the variability in Cantonese. It shows a sentence that was produced with no overt subject but could have included the subject pronoun 我 ngo5 “I.”
(7) Cantonese prodrop sample sentence
| 因為 | Ø / 我 | 冇 | 家人 | 喺 | 度 |
| jan1 wai6 | Ø / ngo5 | mou5 | gaa1 jan4 | hai2 | dou6 |
| because | 1SG. | do not have | relative | at | here |
| Because I do not have any relatives here [C1F50A] | |||||
The factors listed in Table 5.6 are coded for each token within the envelope of variation, annotating the ELAN transcription file to show the level of each variable represented by each token. Figure 4.2 illustrates the process for coding (PRODROP) in Faetar, with each variable coded on a separate tier below the transcription (“speaker”) and translation tiers. The first annotation, [si kwatrá si fata la m bitʃiklɛt] “This boy does it on a bike,” shows a clause that has an overt third person subject pronoun [si]. As a first pass, this was coded as “refl.” (for reflexive) on the dependent variable tier. Later, that was collapsed to “null,” in opposition to nominative pronouns. The presence of the reflexive pronoun was coded as a level of the pre-verbal content predictor. The middle annotation, [e la diŋge diŋgje la koriirə] “and there inside, inside the bus,” is not coded because it lacks a finite verb. The third annotation, [i vɛn apre a la koriər] “he goes after the bus,” is a clause that has a third person subject [i], a subject clitic or weak pronoun. This subject is initially coded as weak, later collapsed to null. In the next annotation tier, “grm. person,” we code for grammatical person, selecting third singular for both examples. The last visible tier indicates whether the referent is the same as the previous clause or a new (different) referent. The first subject’s referent is new (determined by reference to the preceding context, not shown), but the second subject has the same referent as the first. Tiers for other factors are not visible in this illustration.
After coding all tokens for all variables, the codes, examples, and timestamps are extracted from ELAN, as described in Section 4.2.7, and their distributions are calculated and hypotheses tested, as described in Section 4.2.9.
To address the effect of subject continuity, we code every token for whether its referent was the same as the referent of the subject of the previous clause. Cross-linguistically, if the subject of the clause being considered (bolded in Examples 8–10) refers to the same real-world entity as the subject of the previous clause (italicized in the examples below), as in Example 8, it is more likely to be marked with a null pronoun than if it has a new referent. Example 9 illustrates that this generalization is not categorical: The second clause has a first-person referent (from context not shown here) but a null subject, even though it necessarily has a different referent than the first clause, whose overt subject is third person. The different referent example in Table 5.6 shows that the same is true for Italian.
(8) “It has the old red and gold Woolworth’s sign right on the corner, [IV: yeah] Ø had those little creaky wood, hardwood floors.” (EXM37A=fdon)
(9) “It’s about two hundred miles from here and Ø spent a lot of time there.” (AXM68A=#075A)
Another factor that often has a strong effect, and is also important in English, is the type of clause containing the subject: main, coordinated (to the previous clause), or subordinate. Although a coordinated clause with a null subject might be analyzed as an instance of VP-coordination, rather than sentence-coordination, we account for all such clauses. This is necessary because of the existence of sentences such as Example 9, which must be sentence-rather than VP-coordination, as the subjects of the two clauses have different referents. In addition, speakers produce overt pronouns that do have the same referent as the previous clause’s subject, as in Example 10, which is another type of structure that cannot be interpreted as VP-coordination, showing this to be a variable context.
(10) “He’s in the army and he goes to England three-or-four times a year.” (EXM44A)
5.3.3.3 (PRODROP) Analysis by Model Comparison
The first model produced for each language is a logistic regression that included, as predictors for the binary dependent variable, all the linguistic factors listed in Table 5.6, plus Age and Sex of the speaker, as main effects, and Speaker as a random effect. Speakers from all generations are included. For the dependent variable, the application value is overt. That is, every model shows the probability of overt pronouns for each context. Models reveal which factors significantly influence the frequency of overt subjects. The second model constructed was the same as the first but with interaction factors between each linguistic factor and Generation, as well as between each linguistic factor and Sex. These interaction factors show whether the effect of any linguistic factor changes with the sex or generation of the speaker. If there is a significant effect of an interaction factor that includes Generation, we must account for the inter-group difference in the grammar. The Akaike Information Criterion (AIC) of these two models are compared, to determine which fits the data better. The better model is presented as the grammar of that language.
We next compare two models for homeland speakers only, designed as described in the previous paragraph but with Age replacing Generation. If there is a significant main effect of Age, we interpret it as an internal change in the homeland variety. A significant effect of an interaction factor including Age suggests an internal change in the grammar. In both cases, we will then want to see if the heritage variety continues the change.
As with the model of all speakers together, a significant effect of an interaction factor including Sex suggests that males and females treat that linguistic factor differently, and there may be indexical motivation. Again, we select the homeland model with the lowest AIC value to present.
The next step is to compare four models for heritage speakers, all including speaker as a random effect:
linguistic main effects plus Generation and Sex as main effects;
linguistic main effects plus Generation and Sex as main effects and interactions for each combination of linguistic and social factors;
linguistic main effects plus EO_language, EO_culture and Sex as main effects;
linguistic main effects plus EO_language, EO_culture and Sex as main effects, and interactions for each combination of linguistic and social factors.
Of these four models, the one with the lowest AIC is assigned to represent the grammar of the heritage speakers. This allows us to determine whether EO_language, EO_culture or Generation best predicts (PRODROP) behaviour. It also shows which linguistic factors affect (PRODROP) as main effects, and, finally, which linguistic factors’ effects may be undergoing change from one generation to the next (suggesting a change in progress) or differently by sex (suggesting an indexical effect).
Finally, in each language, the best-fitting homeland and best-fitting heritage models are compared to see whether there are differences. If there are none, then we suggest that (PRODROP) is a stable part of the language. If there are differences, then they must be interpreted as representing convergence to or divergence from English, or identity-marking (indexicality), or internal change.
We will illustrate our method of comparison in detail for Cantonese, and then provide summaries of effects for all languages in Tables 5.11, 5.12, and 5.13. We will then discuss how each predictor affects each language.
5.3.3.4 (PRODROP) Dataset
The sources of our tokens are shown in Table 5.7. These counts include some tokens that were later excluded for better comparability across contexts. It also excludes fifteen Homeland Faetar speakers who produce categorically null subjects. Counts of analyzed tokens are noted with each result. The Faetar analysis is based on Nagy et al. (Reference Nagy, Iannozzi and Heap2018), Italian on Nagy (Reference Nagy2017), Polish on Chociej (Reference Chociej2011), Homeland Russian on Pustovalova (Reference Pustovalova2011) and Heritage Russian on Nagy et al. (Reference Nagy, Aghdasi, Denis and Motut2011) and Nagy (Reference Nagy2015); Cantonese, Korean and Ukrainian analyses are novel. As noted earlier, an analysis of 400 English tokens was presented in Nagy et al. (Reference Nagy, Aghdasi, Denis and Motut2011). Reanalysis of all HL data was conducted during preparation of this book to improve comparability across languages.
Table 5.7 (PRODROP) data sample: speaker count and token count, by language and generation (n = 9,190)
| Language | Total | Homeland | Gen1 | Gen2 | |||
|---|---|---|---|---|---|---|---|
| # tokens | # speakers | # tokens | # speakers | # tokens | # speakers | # tokens | |
| Cantonese | 3,509 | 8 | 708 | 14 | 1,400 | 14 | 1,401 |
| Faetar | 2,384 | 6 | 1,573 | 8 | 578 | 5 | 233 |
| Italian | 1,793 | 16 | 748 | 4 | 375 | 7 | 670 |
| Korean | 991 | 6 | 376 | 4 | 169 | 6 | 446 |
| Polish | 987 | 2 | 209 | 6 | 392 | 7 | 386 |
| Ukrainian | 300 | 4 | 100 | 4 | 100 | 4 | 100 |
5.3.4 Distributional Results for (PRODROP)
We begin by comparing the rates of subject pronoun expression between homeland and each heritage generation. Figure 5.11 shows the percentage rate of overt subjects by language and generation. The totals are slightly higher than the values noted earlier because categorical speakers are included, as well as Russian speakers, in order to provide the most complete comparison. For each language, the first bar, with stripes, is the raw rate of overt pronouns for homeland speakers. The black bar in the middle is Gen1 and the grey bar is Gen2. As is standard in variationist analysis, significance levels for inter-group differences are not reported here as these values do not take into account the uneven distribution across linguistic contexts (and other social factors) in each group.

Figure 5.11 Cross-linguistic comparison of overt subject pronoun rates, by generation (n = 14,802)
For most languages, we see stability across generations: There is no sharp increase toward the high rate of overt subjects for English (98 percent, reported in Nagy, Reference Nagy2015 and Nagy et al., Reference Nagy, Aghdasi, Denis and Motut2011) in later generations of heritage speakers. This is the first indication that contact with English is not causing a change in Toronto HLs, with respect to (PRODROP).
There are two exceptions, Russian and Faetar, but there are ready explanations for both. Data from Pustovalova (Reference Pustovalova2011) suggests a change in progress, at least in apparent time, in Homeland Russian, with younger speakers producing more overt pronouns. This suggests that the cross-generational effects in Russian are the continuation of a homeland pattern of change in the heritage varieties, that is, evidence of the transfer of a social factor into the heritage grammar. (The models given here do not include Russian, so we will not see further evidence of the change from Homeland to Heritage Russian.)
The same direction of change is found for Homeland Faetar (where there is a significant Age effect). The ongoing change in Homeland Faetar, coupled with the lack of a generational difference among the Heritage Faetar speakers, means that it is unlikely to be an effect owing to contact with English. Gen2 speakers have more contact with English than Gen1, but do not differ significantly in (PRODROP) rates (or conditioning effects). Rather it would seem that heritage speakers replicate the homeland pattern.
A second important difference exists for the homeland data used for these two languages. The Homeland Russian data comes from the Russian National Corpus (Institute of Russian Language, Russian Academy of Sciences, 2003). This dataset was collected a few years earlier than the HLVC data. Additionally, it captures different types of interactions between speakers (multiple genres, but no long sociolinguistic interviews). The Homeland Faetar speakers were recorded between 1992 and 1994 while the heritage speakers were recorded between 2009 and 2011. Again, applying the apparent time construct and considering possible real-time change in that twenty-year interval, the homeland–heritage difference may be accounted for. Nagy et al. (Reference Nagy, Iannozzi and Heap2018) show that the (PRODROP) rate is changing in parallel (similar slopes and rates) in the homeland and heritage varieties, if one adjusts for the time-gap in data collection times.
Yet another explanation is available for the Faetar effect. The homeland sample has a larger proportion of tokens with generic reference, reporting new information and/or with non-past temporal reference. Both of these are contexts that favour null subjects and go some way towards accounting for the difference in raw rates of overt pronouns. We shall return to this the type of comparison after presenting the summaries of heritage models.
In any case, we may safely interpret the differences between homeland and heritage rates for Faetar and for Russian as reflecting consistency, rather than difference, between the varieties, both of which are undergoing internal change. It is important to note that if we had only examined the heritage data, we might attribute the inter-generational differences in Faetar and Russian to contact with English. However, because we see exactly the same trend for the Homeland and Heritage Faetar speakers, although virtually nobody spoke English in Faeto at the time that the homeland speakers were recorded, we cannot support that interpretation. Most of the recorded heritage speakers left Faeto in the 1950s and 1960s to move to Toronto. For at least half a century after this migration, they had little oral communication with those who stayed in Faeto. So, although the language developed independently in the two places, the pattern is identical in both. We therefore conclude that it is an internal change and not a context-induced change. Our observations support the possibility of internal change that is parallel to what would be expected under influence from contact, at least regarding (PRODROP).
However, stronger evidence is provided by comparing constraint hierarchies. As noted in the (VOT) section, considering measures that are merged across contexts may obscure important details needed to understand the cause of any observed changes.
Russian models are not included because there were too many discrepancies in coding levels to allow useful comparison with other languages; but see Nagy (Reference Nagy2015). We note that the Ukrainian sample is small, 100 tokens per generation (and just coded in 2022), so the lack of effects found in Ukrainian may be an effect of the sample size.
5.3.5 Mixed Effects Models for (PRODROP)
To illustrate the series of model comparisons employed, models for Cantonese (PRODROP) are reported and discussed. These models indicate variability, but little evidence of change and no clear evidence of an effect of contact with English, with one possible exception discussed with reference to Table 5.10. As noted in Section 7.2, English is also used in Hong Kong, and, while we might thus expect some influence from English already in the homeland speakers (and thus the input to the heritage speakers), we know that Gen2 speakers in Toronto have considerably more exposure to English than people who grew up in Hong Kong.
5.3.5.1 Models for Cantonese (PRODROP)
In analyzing the full data set, the model for Cantonese with an interaction factor fits the data better than the model without (AIC of 3,985 versus 4,015). This model is shown in Table 5.8. In all models of (PRODROP) in this book, the application value is overt pronoun.
Table 5.8 Mixed effects model for Cantonese (PRODROP) (n = 3,509, all thirty-six speakers)
| AIC | BIC | logLik | deviance | df.residual | |||
| 3,985 | 4,188 | –1,959 | 3,919 | 3,435 | |||
| Scaled residuals: | |||||||
| Min | 1Q | Median | 3Q | Max | |||
| –7.72 | –0.792 | 0.397 | 0.675 | 2.846 | |||
| Random effects | |||||||
| Groups | Variance | Std.Dev. | |||||
| Speaker intercept | 0.495 | 0.703 | |||||
| Residual | 917 | 30.3 | |||||
| Fixed effects | |||||||
| Estimate Std. | Error z | value | Pr(>|z|) | % Overt | n | ||
| Intercept | 0.06 | 0.37 | 0.16 | 0.87 | 62% | 3,509 | |
| Main effects | |||||||
| First person | 59% | 2,064 | |||||
| Second person | 1.74 | 0.38 | 4.59 | 0.00 | *** | 83% | 369 |
| Third person | 0.50 | 0.28 | 1.78 | 0.07 | 61% | 1,035 | |
| Singular | 65% | 2,541 | |||||
| Plural | –0.40 | 0.29 | –1.38 | 0.17 | 53% | 927 | |
| Present tense | 65% | 2,204 | |||||
| Non-present tense | –1.11 | 0.25 | –4.37 | 0.00 | *** | 56% | 1,083 |
| Other tense | –2.57 | 1.22 | –2.11 | 0.04 | * | 59% | 222 |
| Same referent | 52% | 2,045 | |||||
| Different referent | 0.99 | 0.23 | 4.22 | 0.00 | *** | 75% | 1,464 |
| Simple clause | 61% | 3,134 | |||||
| Conjoined clause | –0.71 | 0.38 | –1.87 | 0.06 | 70% | 375 | |
| Male | 56% | 1,497 | |||||
| Female | 0.23 | 0.29 | 0.80 | 0.43 | 66% | 2,012 | |
| Homeland | 61% | 708 | |||||
| Generation1 | –0.05 | 0.38 | –0.14 | 0.89 | 57% | 1,400 | |
| Generation2 | 0.14 | 0.37 | 0.37 | 0.71 | 67% | 1,401 | |
| Interaction effects | |||||||
| Person2nd:SexF | –0.59 | 0.33 | –1.78 | 0.08 | |||
| Person3rd:SexF | 0.28 | 0.20 | 1.42 | 0.15 | |||
| Person2nd:Gen1 | –0.97 | 0.42 | –2.31 | 0.02 | * | ||
| Person3rd:Gen1 | –0.49 | 0.27 | –1.77 | 0.08 | |||
| Person2nd:Gen2 | 0.06 | 0.44 | 0.13 | 0.90 | |||
| Person3rd:Gen2 | –0.11 | 0.29 | –0.38 | 0.70 | |||
| NumberPl:SexF | –0.45 | 0.21 | –2.13 | 0.03 | * | ||
| NumberPl:Gen1 | –0.12 | 0.29 | –0.40 | 0.69 | |||
| NumberPl:Gen2 | 0.11 | 0.30 | 0.39 | 0.70 | |||
| Nonpresent:SexF | 0.62 | 0.20 | 3.18 | 0.00 | ** | ||
| TenseOther:SexF | 0.74 | 0.36 | 2.05 | 0.04 | * | ||
| Nonpresent:Gen1 | 0.52 | 0.26 | 1.99 | 0.05 | * | ||
| TenseOther:Gen1 | 2.13 | 1.22 | 1.74 | 0.08 | |||
| Nonpresent:Gen2 | 0.43 | 0.26 | 1.63 | 0.10 | |||
| TenseOther:Gen2 | 2.18 | 1.20 | 1.81 | 0.07 | |||
| Switch_discDiff: SexF | 0.18 | 0.17 | 1.02 | 0.31 | |||
| Switch_discDiff: Gen1 | 0.05 | 0.24 | 0.19 | 0.85 | |||
| Switch_discDiff: Gen2 | –0.18 | 0.24 | –0.73 | 0.46 | |||
| Conjoined:SexF | 0.03 | 0.32 | 0.08 | 0.94 | |||
| Conjoined:Gen1 | 1.87 | 0.39 | 4.80 | 0.00 | *** | ||
| Conjoined:Gen2 | 1.34 | 0.36 | 3.70 | 0.00 | *** | ||
The mixed-effects model in Table 5.8 shows that for this sample of Cantonese speakers, there are main effects of three predictors: Second person tokens have significantly more overt pronouns than first person (the reference level); non-present tense tokens have significantly more, and other tenses have significantly fewer overt pronouns than present tense; and subjects with a different referent than the previous subject have significantly more overt pronouns.Footnote 3 We note, however, that this semantic effect is smaller than the morphosyntactic effects. Additionally, several significant interactions appear. Figures 5.12 and 5.13 illustrate the effects involving Generation, graphing raw values, for ease of interpretation. Figure 5.12 shows two effects that can be categorized as internal changes as they represent changes in predictors that are not significant in English (Tense, Grammatical person). In contrast, Figure 5.13 shows a cross-generational difference in an effect that is significant in English: Clause type. Later generational speakers have a pattern less like English, with more overt pronouns in conjoined clauses. This may or may not be due to contact, but it is a divergence from the English pattern illustrated in Figure 5.10.

Figure 5.12 Internal changes in (PRODROP) for Cantonese, for Grammatical person and Tense (n = 3,509)

Figure 5.13 Change in (PRODROP) for Cantonese, for Clause type (n = 3,509)
We now turn to comparison of constraint effects in homeland and heritage varieties. As a reminder, we report for each group the best-fitting model. In the case of Homeland Cantonese, this is a model with interaction effects, shown in Table 5.9.
Table 5.9 Mixed effects model for (PRODROP) in Homeland Cantonese (n = 708, eight speakers)
| AIC | BIC | logLik | deviance | df.residual | ||||
| 770 | 879 | –361 | 722 | 666 | ||||
| Scaled residuals | ||||||||
| Min | 1Q | Median | 3Q | Max | ||||
| –7.61 | –0.70 | 0.33 | 0.58 | 3.74 | ||||
| Random effects | ||||||||
| Groups | Variance | Std.Dev. | ||||||
| Speaker Intercept | 0.77 | 0.88 | ||||||
| Fixed effects | ||||||||
| Estimate | Std. Error | z value | Pr(>|z|) | % Overt | n | |||
| Intercept | 1.66 | 1.04 | 1.59 | 0.11 | ||||
| Main effects | ||||||||
| First person | 58% | 375 | ||||||
| Second person | 1.75 | 0.94 | 1.87 | 0.06 | 87% | 121 | ||
| Third person | –0.17 | 0.74 | –0.23 | 0.82 | 57% | 194 | ||
| Singular | 66% | 548 | ||||||
| Plural | 1.01 | 0.71 | 1.43 | 0.15 | 52% | 142 | ||
| Present tense | 65% | 477 | ||||||
| Non-present tense | –1.09 | 0.65 | –1.67 | 0.09 | 55% | 226 | ||
| Other tense | –0.05 | 3.91 | –0.01 | 0.99 | 20% | 5 | ||
| Same referent | 52% | 378 | ||||||
| Different referent | –0.12 | 0.59 | –0.21 | 0.84 | 73% | 330 | ||
| Simple clause | 64% | 599 | ||||||
| Conjoined clause | –2.97 | 0.82 | –3.64 | 0.00 | *** | 46% | 109 | |
| Male | 70% | 197 | ||||||
| Female | –0.59 | 0.85 | –0.69 | 0.49 | 58% | 511 | ||
| Age | –0.03 | 0.02 | –1.47 | 0.14 | ||||
| Interaction effects | ||||||||
| Person2nd:SexF | –1.18 | 0.69 | –1.72 | 0.09 | ||||
| Person3rd:SexF | 0.11 | 0.63 | 0.17 | 0.86 | ||||
| Person2nd:Age | 0.01 | 0.02 | 0.53 | 0.60 | ||||
| Person3rd:Age | 0.02 | 0.01 | 1.97 | 0.05 | * | |||
| NumberPl:SexF | –0.91 | 0.63 | –1.45 | 0.15 | ||||
| NumberPl:Age | –0.03 | 0.01 | –2.41 | 0.02 | * | |||
| nonpresent:SexF | 0.24 | 0.52 | 0.46 | 0.65 | ||||
| nonpresent:Age | 0.00 | 0.01 | 0.26 | 0.79 | ||||
| TenseOther:Age | –0.08 | 0.15 | –0.52 | 0.60 | ||||
| Switch_Diff:SexF | 0.92 | 0.48 | 1.93 | 0.05 | ||||
| Switch_Diff:Age | 0.02 | 0.01 | 1.68 | 0.09 | ||||
| Conjoined:SexF | 2.16 | 0.74 | 2.91 | 0.00 | ** | |||
| Conjoined:Age | 0.02 | 0.02 | 1.13 | 0.26 | ||||
5.3.5.2 Homeland Cantonese (PRODROP) Model
In the Homeland Cantonese (PRODROP) model, the only main effect is for Clause type, with significantly more overt pronouns in simple clauses than conjoined – again, the opposite pattern found for English, establishing this as a conflict site when we examine Heritage Cantonese. Additionally, there are three significant indications of change in progress in the homeland grammar. The first have minute effect sizes: fewer overt tokens for third person among younger speakers, more for plural among younger speakers, and more overt tokens in conjoined clauses among women. These suggest (small) internal changes away from the English pattern, where these factors play no role. In contrast, the effect of Clause type, the biggest effect in the model, shows that it is women whose grammar least resembles English, in that females have a large positive estimate for conjoined clause. (see Figure 5.10).
5.3.5.3 Heritage Cantonese (PRODROP) Model
The best-fitting model for the Heritage Cantonese speakers is presented next. This is a model that includes EO scores rather than Generation, with both main and interaction effects. Table 5.10 presents only the significant effects for this model, represented by the formula in Example 11.
Table 5.10 Mixed effects model for (PRODROP) in Heritage Cantonese (n = 2,801, twenty-one speakers)
| AIC | BIC | logLik | deviance | df.resid | ||||
| 2,460 | 2,646 | (1,197) | 2,394 | 2,055 | ||||
| Scaled residuals: | ||||||||
| Min | 1Q | Median | 3Q | Max | ||||
| –5.21 | –0.78 | 0.39 | 0.72 | 3.43 | ||||
| Random effects | ||||||||
| Groups | Variance | Std.Dev. | ||||||
| Speaker Intercept | 0.318 | 0.564 | ||||||
| Residual | 917 | 30.3 | ||||||
| Fixed effects | ||||||||
| Estimate | Std. Error | z value | Pr(>|z|) | % Overt | n | |||
| Intercept | 0.76 | 0.40 | 1.89 | 0.06 | ||||
| First person | 59% | 1,689 | ||||||
| Second person | 1.52 | 0.32 | 4.70 | 0.00 | *** | 82% | 248 | |
| Third person | 0.44 | 0.18 | 2.42 | 0.02 | * | 62% | 841 | |
| Singular | 65% | 1,993 | ||||||
| Plural | –0.41 | 0.19 | –2.09 | 0.04 | * | 54% | 785 | |
| Present tense | 65% | 1,727 | ||||||
| Non-present tense | –0.80 | 0.17 | –4.73 | 0.00 | *** | 56% | 857 | |
| Other tense | –0.64 | 0.29 | –2.25 | 0.02 | * | 60% | 217 | |
| Same referent | 53% | 1,667 | ||||||
| Different referent | 1.06 | 0.15 | 6.82 | 0.00 | *** | 76% | 1,134 | |
| Main clause | 60% | 2,535 | ||||||
| Conjoined clause | 1.01 | 0.35 | –2.92 | 0.00 | ** | 80% | 266 | |
| Interaction effects | ||||||||
| Person2nd EO_language | 0.39 | 0.12 | 3.31 | 0.00 | *** | |||
| Person3rd:EO_culture | 0.72 | 0.17 | 4.23 | 0.00 | *** | |||
| NumberPl:EO_ language | 0.29 | 0.09 | 3.22 | 0.00 | ** | |||
| NumberPl:EO_culture | –0.52 | 0.18 | –2.89 | 0.00 | ** | |||
| nonpresent:SexF | 0.80 | 0.26 | 3.07 | 0.00 | ** | |||
| TenseOther:SexF | 0.93 | 0.46 | 2.02 | 0.04 | * | |||
Prodrop ~ Person + Number + Tense + Switch_discourse + Clause_type +
Sex + EO_language + EO_culture +
Person * Sex + Person * EO_ language + Person * EO_culture +
Number * Sex + Number * EO_ language + Number * EO_culture +
Tense * Sex + Tense * EO_ language + Tense * EO_culture +
Switch_ discourse * Sex + Switch_ discourse * EO_lang + Switch_ discourse * EO_culture + Clause_type * Sex + Clause_type * EO_ language +
Clause_type * EO_culture
+ (1 | Speaker)
There are main significant effects for Person, Number, and Tense, all factors that did not have a significant effect in the best-fitting homeland model. Thus, this is a divergence away from English (which also lacks effects for these factors) among heritage speakers. In contrast, the Switch referent factor is significant in both the homeland and heritage models, but with an opposite direction of effect: In the heritage model, there are more overt pronouns in conjoined clauses, one indication of convergence toward an English-like pattern. We note also that the Switch reference factor is significant, as it was in the model for all data, but not for the much fewer homeland speakers alone.
While there are no main effects of the social factors in this model, there are several significant interactions, involving both EO scores and Sex. EO_language interacts with Person and Number, increasing the probability of an overt pronoun for second person and plural subjects for speakers reporting more Cantonese language use. In contrast, EO_culture, while interacting with the same linguistic factors, increases the probability of an overt pronoun for third person and singular subjects. This is clear evidence of the different roles of these two EO factors. However, compared with the effect sizes for main effects (particularly the second person main effect), these interaction effects are small. We also find an interaction of Tense and Sex, where females have more overt tokens for non-present tense and other tenses. All of these effects are present only in the heritage data, suggesting that there is a more complex grammar for (PRODROP) in the heritage variety than the homeland (as reported in Nagy & Gadanidis, Reference Nagy and Gadanidis2021).
Given the different sizes of the datasets for homeland and heritage speakers, it is critical to keep in mind that some differences between groups may be best attributed to sparse data. Additionally, some factors’ effects could not be calculated in some languages – these are indicated with “NA” in tables comparing across groups (Tables 5.11, 5.12, and 5.13).
Table 5.11 (PRODROP) rates of overt subjects and significant linguistic effects in each variety (best) models for all speakers in each language (n = 9,964)
| Language | % Overt | Person | Number | Tense | Switch | Clause | Pre-Verb content |
|---|---|---|---|---|---|---|---|
| Cantonese | 62% | 2>1 | present>other | D>S | NA | ||
| Korean | 21% | 1>2>3 | sg>pl | NA | D>S | conjoined> simple | yes>no |
| Polish | 23% | 3>1 | NA | D>S | |||
| Ukrainian | 80% | D>S | subord.> conjoined | NA | |||
| Faetar | 9% | NA | |||||
| Italian | 23% | 1>2>3; 2nd, 3rd*Gen1+; 3rd*Gen2+ | sg>pl | nonpresent* Gen2-; nonpresent *F+ | D>S | yes*Gen2 |
Table 5.12 (PRODROP) rates of overt subjects and significant effects in homeland varieties (best-fitting models, n = 3,714)
| Language | % Overt | Person | Number | Tense | Switch | Clause | Pre-V content |
|---|---|---|---|---|---|---|---|
| Cantonese | 61% | 3rd*age+ | sg*age+ | simple>conj; conj *female+ | NA | ||
| Faetar | 3% | 2>1 | NA | NA | NA | yes>no | |
| Italian | 22% | 1>2>3 | sg>pl | Non-present present | D>S | ||
| Korean | 19% | 1>3 | NA | ||||
| Polish | 18% | 3>1 | NA | D>S | |||
| Ukrainian | 83% | NA |
Table 5.13 (PRODROP) rates of overt subjects and significant effects in heritage varieties (best-fitting models, n = 6,250)
| Language | % Overt | Person | Number | Tense | Switch | Clause | Pre-V content |
|---|---|---|---|---|---|---|---|
| Cantonese (*EO) | 62% | 2>3>1; EO_language*2nd +; EO_culture*3rd + | EO_language* pl +; EO_culture*pl - | present>other, non-present; non-present, other* female + | D>S | conjoined> simple | NA |
| Faetar (Gen) | 20% | NA | NA | ||||
| Italian (*EO) | 24% | 3>1 | sg>pl | D>S | |||
| Korean (*EO) | 23% | NA | NA | ||||
| Polish (Gen) | 24% | D>S | |||||
| Ukrainian (EO) | 78% | D>S | NA |
5.3.6 Results for (PRODROP), All Speakers, All Languages
Next, we summarize the effects of (PRODROP) in each language, determined via the process of model comparisons that was illustrated for Cantonese in Section 5.3.5. Tables 5.11, and 5.12, and the text that follows them discuss how each predictor affects each language. The models all show the estimated rates for overt tokens (i.e., overt is the application value). Note that these tables report the raw rate of overt subject pronouns.
As already seen in Figure 5.11, all these languages have rates of overt subject pronouns quite a bit lower than the 98 percent that has been reported for English (Nagy, Reference Nagy2015; Nagy et al., Reference Nagy, Aghdasi, Denis and Motut2011). The additional columns in these three tables report the direction of significant effects. For example, in Table 5.11, the Person effect for Cantonese is noted as “2>1,” meaning there are more overt pronouns in second person than first person, the effect quantified in Table 5.8. “NA” indicates cells where the effect of relevant predictor could not be determined for a particular language, owing to a small sample or lack of appropriate coding. Grammatical gender is omitted from all tables as it never emerges as significant for (PRODROP), in any language.
This information comes from mixed-effects models where each constraint is tested as a main effect, Age and Sex are also included, and Speaker is included as a random effect. We see that the Person and Number of the subject are significant in several but not all languages. The Tense of the verb is significant only for Cantonese and Italian. These all indicate differences from English, where these factors do not predict (PRODROP).
The next three factors are important for English (PRODROP) (cf. Nagy, Reference Nagy2015; Nagy et al., Reference Nagy, Aghdasi, Denis and Motut2011). For five languages, Switch reference, which codes whether the subject refers to the same referent as the previously uttered subject, has a consistent effect, the same effect that has been reported in many studies of (PRODROP): An overt subject is more likely if the referent is different (“D”) than if it is the same (“S”). This effect emerges for all languages except Faetar, a language for which we will see that the effect of this factor differs between heritage and homeland varieties (see Section 5.3.9.)
Clause type, along with Switch reference, constitutes the two most important constraints for this analysis, as they are significant in English. We see more overt subjects in conjoined than main clauses only for Korean, and no effect for most languages. The one significant effect, where levels are comparable with English, is in the opposite direction to English.
Finally, we see an effect of the presence of content before the verb for two languages: more overt subjects in this context for Italian and Korean. As discussed earlier, this is the opposite of English, where null subjects are reported to be possible only when the subject occupies the left-most position in the intonational unit. While we did not code our data according to intonational unit, the pre-verbal content factor helps us to see whether such an effect could be present in other languages. This was tested in five languages. Although intonation unit and sentence are not synonymous, the fact that we find more overt subjects where there is pre-verbal content (other than a subject pronoun) suggests that, if anything, the effect is the opposite to that found for English. That is, null subjects appear less often when the position might not be intonation-unit initial (because there is some sort of other content before the verb). This is the only context where we expect null subjects at all in English.
The main effects discussed here cannot be considered as evidence of sources of variation in heritage varieties, as they are based on combined heritage and homeland data. However, the interaction effects from Table 5.11 indicate change in the size of an effect between homeland (the reference level) and (one or both) heritage generations (for Italian Person, Tense, and Pre-verbal content) or overall sex-related indexical effects (only in the case of Italian pre-verbal content). These are included in the summary of sources of variation in Table 5.15.
5.3.7 Homeland Results for (PRODROP)
Let us now take a closer look at homeland speakers. Again, the rates of overt pronouns range widely, from 3 to 83 percent. Regarding the low (3 percent) rate for Faetar, we must remember that almost 800 tokens, from fifteen homeland speakers, were omitted from the analysis because of categorically null tokens. Including them, the overt pronoun rate is lower still.
Table 5.12 summarizes the effects that emerge as significant in the best-fitting model for each language (with or without interactions). The better fit is for the model without interaction effects, in every case except Cantonese.
Person and Number are significant in several homeland varieties, setting this up as a conflict site for comparison to English, where these phi-features have not been shown to have an effect. The same is true for Tense, though only in Italian.
Switch reference, Clause type, and Pre-verbal content are all significant in fewer homeland varieties (Table 5.12) than in the overall data (Table 5.11). This might be because of the smaller sample sizes. However, where the corresponding effects are found, they are in the same direction as in the overall data.
Age was tested in the homeland samples but not found significant except in Faetar. In Faetar, younger speakers have a higher rate of overt subjects, suggesting an effect of divergence from Italian as contact with Italian increases.The increase in contact is because younger speakers are more likely than older speakers to attend school in Italian or work outside of Faeto (most employment outside Faeto requires Italian). Increasing mobility expands contact with Italian speakers in the rest of the country. Sex is significant only in Korean, where females use more overt pronouns than males. Elsewhere, there is no evidence that (PRODROP) has a sex-related indexical effect.
Although we did not re-analyze the Russian data for this chapter, an important fact from previous analyses comes into play here: Russian (PRODROP) data shows a change in progress in the homeland variety, with higher rates of overt pronouns for younger speakers. This effect is replicated in heritage speakers (Hollett, Reference Hollett2010, pp. 67–68; Nagy, Reference Nagy2015, pp. 320–321; Pustovalova, Reference Pustovalova2011). We can see a change in progress transfer from homeland to heritage. Further, it shows that change can be observed in real time in (PRODROP).
5.3.8 Heritage Results for (PRODROP)
Here, we discuss how each predictor affects each language. Table 5.13 presents the (raw) overt subject rate and the significant factors from the best-fitting models for the heritage speakers. Where the best-fitting model includes (interaction factors with) either Generation or EO scores, this is indicated in the first column. For Faetar, no EO scores exist, so the comparison was only between a no-interaction model (which was the best fit) and a model that tests interactions with Generation. We will discuss next how each predictor affects each language.
Person is significant in only two heritage varieties, but five homeland varieties. The direction of effects differs between homeland and heritage varieties in both languages where Person is significant in both. This change cannot be attributed to contact with English, where there is no Person effect in (PRODROP). For Cantonese, the homeland data shows that a higher rate in third person is more prevalent for older than younger speakers (interaction with Age shown in Table 5.12). For Heritage Cantonese speakers, those with a higher EO_culture score also have a higher rate in third person, suggesting a linguistic effect of their stronger affiliation with Cantonese culture.
Number is significant for Italian, in both homeland and heritage varieties. For Italian the direction of effect is the same in both varieties: more overt pronouns with singular than plural subjects. For Homeland Cantonese, there is no main effect of Number, but older speakers have a higher probability of overt pronouns in singular contexts. For heritage speakers, those who report strong affiliation to Cantonese/Chinese culture (EO_culture) also favour overt pronouns more in singular contexts. However, for those who report using Cantonese more (EO_language), it is the opposite!
Tense has an effect only in Homeland Italian, and in Heritage Cantonese.
While we might consider Cantonese’s loss of an effect for Person, a predictor that is not significant in English, as a sign of convergence toward English, no other evidence for contact effects emerges from comparison of the phi-features and Tense among homeland and heritage varieties.
Switch reference, having a nearly universal effect (reported in much literature), also does not provide strong evidence for sources of variability. However, according to the methods prescribed for this analysis, we might say that the fact that Switch reference is significant in two heritage varieties where it was not significant in the corresponding homeland variety provides evidence of influence from English. It might, however, also be due to smaller sample sizes for Homeland Cantonese and Ukrainian.
Clause type, as already discussed in Section 5.3.5, has opposite effects in Homeland and Heritage Cantonese, with the latter being less English-like.
Pre-verbal content has a significant effect only in Homeland Italian. There, we see a greater likelihood of overt subjects when there is pre-verbal content, that is, when the subject might not be at the left edge of an intonation unity. This effect is absent in Heritage Italian. This suggests that under influence from English, a contradictory effect is lost.
From the linguistic factors, then, we see some evidence of an effect of cultural affiliation on Person and Number effects in Cantonese, suggesting indexical motivation for the effect. We see convergence toward the English pattern for Switch reference in Cantonese, Pre-verbal content in Italian, and Tense in Ukrainian, but divergence from English for Clause type in Cantonese.
5.3.9 Using Variationist Data to Understand Faetar Grammar
Before closing this chapter with a summary of what we have learned about the sources of variation and change regarding (PRODROP) in this set of HLs, we illustrate one other type of knowledge that emerges from this type of comparative analysis. Like the other HLs, Faetar shows little sign of accommodating to English’s virtually categorical presence of subject pronouns, nor to the conditioning effects found in English. However, work with (PRODROP) has improved our understanding of the grammar of Faetar.
Faetar has two series of subject pronouns (for now, we refer to these as strong and weak pronouns). Sentences may thus surface with no, one, or two overt subject pronouns. These constructions are illustrated by the Faetar sentences in Table 5.14. All options may occur in phrases with or without [+Argument] subjects, though strong pronouns are rare in sentences that also have noun subjects.
Table 5.14 Distribution of types of subject pronouns in conversational Faetar (adapted from Nagy et al., Reference Nagy, Iannozzi and Heap2018)
| Subject form | Homeland | Heritage |
|---|---|---|
| No overt subject pronoun /poi anda bej a kandʒi lo ʃift/ then, [Ø=they] really changed the shift (F1F79, 1:42) | 902 | 373 |
| Weak pronoun /e i stávo vakánt/ and it was vacant (F1M92A, 9:38) | 627 | 410 |
| Strong + Weak pronoun /vussə vus tənəvandə vint annə/ you you had twenty years [were twenty years old] (F2M58A, 47:45) | 14 | 35 |
| Strong pronoun / dʒi m e vəni l an apre/ I REFL came the year after (F1F79A, 19:37) | 30 | 122 |
| TOTAL | 1,573 | 940 |
Because Faetar has two series of subject markers, we needed to determine which count as overt pronouns before making cross-linguistic comparisons regarding null subjects. That is, we had to decide whether the weak forms, for example, /i/ “he/she,” are pronouns or clitics: If they are pronouns, then a sentence such as /e i stávo vakánt/ in Table 5.17 has an overt subject, but if /i/ is a clitic, that is, a verbal prefix, then that is a null subject sentence. Comparing the behaviour of null subjects in singular, [+human] contexts in better described languages helps select between these two options. In Italian, Polish, and Russian, we find considerably more null than overt subjects in first singular and almost exclusively null subjects in third singular. The same distribution is seen for Faetar only if we consider the weak pronouns as clitics. Otherwise, Faetar has many more overt subjects than the other languages, including Italian, to which it is both related and in frequent contact. In fact, if we do not categorize the light forms as clitics, then this Faetar sample has no null subjects in first and third person, distinguishing it from the other languages. Given the “given” nature of participant pronouns in discourse, this seems an unlikely pattern.
Another benefit that emerges if we consider the light forms as clitics is resolution of a previously reported oddity. Earlier reports on Faetar (PRODROP), which considered the weak forms as overt pronouns, reported more overt pronouns with given information than with new information (cf. Nagy et al., Reference Nagy, Iannozzi and Heap2018; Nagy & Heap, Reference Nagy, Heap, Gruber, Higgins, Olson and Wysocki1998). Such a finding stands out in contradiction to that reported for virtually every other language. Under the “weak pronoun” account, for example, in a sample of 1,702 tokens from homeland speakers, we see a rate of 36 percent overt subject pronouns for the 1,302 tokens with subjects whose referents are new, but 63 percent overt for the 400 tokens with old referents. If we instead consider the light forms as clitics, the rates for the two contexts are nearly identical: 3 percent overt for new information and 2 percent for old. This latter account is not in stark opposition to information status effects for many other languages, and thus serves as a second motivation for interpretation of the “light” pronoun series as clitics. We take the effects of Person and Information status as contributing to a growing body of evidence for the clitic status of this light subject series.
5.3.10 (PRODROP) Summary
The patterns discussed in Section 5.3.6 and Section 5.3.7 are summarized in Table 5.15, using the same indicators as for the (VOT) summary in Table 5.5. In this table, “S” indicates stability, “ToE” and “FromE” mark convergence/divergence with English, “I” marks differences attributed to identity-marking or internal change. Linguistic factors are defined in Table 5.6.
Table 5.15 Summary of (PRODROP) effects
| Language | HOM Age | HOM Sex | HOM v G1 | HOM v G2 | G1 v G2 | EO_language | EO_culture | HER Sex |
|---|---|---|---|---|---|---|---|---|
| CAN rate | S | S | S | S | S | S | S | S |
| Person | S | S | S | S | S | S | I / ToE | S |
| Number | S | S | S | S | S | I | I | S |
| Tense | S | S | S | S | S | S | S | S |
| Switch reference | S | S | ToE | ToE | S | S | S | S |
| Clause type | S | S | FromE | FromE | S | S | S | S |
| Pre-verbal element | ||||||||
| FAE rate | I (+) | S | S | S | S | S | ||
| Person | S | S | S | S | S | S | ||
| Number | S | S | S | S | S | |||
| Tense | ||||||||
| Switch reference | S | S | S | S | ||||
| Clause type | ||||||||
| Pre-verbal element | S | S | S | S | S | S | S | S |
| ITA rate | S | S | S | S | S | FromE | S | S |
| Person | S | S | FromE | FromE | S | S | S | S |
| Number | S | S | S | S | S | S | S | S |
| Tense | S | S | S | ToE | S | S | S | S |
| Switch reference | S | S | S | S | S | S | S | S |
| Clause type | S | S | S | S | S | S | S | S |
| Pre-verbal element | S | S | ToE | ToE | S | S | S | FromE |
| KOR rate | S | I (F>M) | S | S | S | S | ToE | S |
| Person | S | S | S | S | S | S | S | S |
| Number | S | S | S | S | S | S | S | S |
| Tense | ||||||||
| Switch reference | S | S | S | S | S | S | S | S |
| Clause type | S | S | ||||||
| Pre-verbal element | S | S | S | S | S | S | S | S |
| POL rate | S | S | S | S | S | S | S | S |
| Person | S | S | S | S | S | S | S | S |
| Number | S | S | S | S | S | S | S | S |
| Tense | ||||||||
| Switch reference | S | S | S | S | S | S | S | S |
| Clause | S | S | S | S | S | S | S | S |
| Pre-verbal | S | S | S | S | S | S | S | S |
| UKR rate | S | S | S | S | S | S | S | S |
| Person | S | S | S | S | S | S | S | S |
| Number | S | S | S | S | S | S | S | S |
| Tense | S | S | S | S | S | S | S | S |
| Switch reference | S | S | ToE | ToE | S | S | S | S |
| Clause type | S | S | S | S | S | S | S | S |
| Pre-verbal element |
Each cell in Table 5.15 is a space where there could be systematic heterogeneity and, if it existed, could be attributed to either internal change, indexical marking, or contact effects. For homeland varieties, we see an Age effect for Faetar (older speakers have more overt pronouns) and a Sex effect for Korean (females have more overt pronouns than males). Otherwise, homeland varieties are stable (Age does not interact with any linguistic factor).
Overall, there is vanishingly little evidence of contact with English causing change in the (PRODROP) grammar in HLs. Generation never emerges as a significant main effect in any heritage model. Most cells, with an “S” for stable, show no significant change for that predictor in that language.
In the thirty-six models constructed and compared, we find main effects of EO_language only for Heritage Italian speakers (higher EO scores correspond to fewer overt tokens) in the best-fitting model; and for EO_culture for Korean (higher EO scores correspond to more overt tokens) in a model that is nearly as good a fit as the best model (but does not converge). For Italian, then, there may be a minor contact effect where speakers who use Italian more often retain a more homeland-like (lower) rate of overt tokens. In contrast, for Korean, it is perhaps an indexical effect such that speakers who orient more toward their Korean culture produce a slightly more English-like rate. This set of results, then, stands in contrast to the findings reported by Polinsky and Scontras (Reference Polinsky and Scontras2020a and Reference Polinsky and Scontras2020b) and others in that volume, which examined prodrop with different methods and sampled different populations.
Given the different sizes of the datasets for homeland and heritage speakers, however, it is critical to keep in mind that some differences between groups may be best attributed to sparse data as well as the fact that some factors’ effects could not be calculated in some languages.
In each HL, a number of factors govern the variable presence of subject pronouns. The effects of these factors are, in every case, variable, not categorical. All the heritage varieties are resisting the influence of English. We see this in the similar rates of null subjects in homeland and heritage varieties and in the lack of major changes in constraint hierarchies toward the simplistic hierarchy established for English. We are pleased to be able to contribute this set of analyses to the growing body of literature showing systematic behaviour in minoritized and endangered language varieties.
5.4 Case Marking (CASE)
In Slavic languages, case, the syntactic role of a noun phrase (NP), is marked on nouns and pronouns. Additionally, every noun or pronoun must be declined, meaning that nominal forms indicate gender, number, and membership in a particular noun class. Russian, for example, has 87 classes of nouns. The choice of case marker (usually a suffix, but often analytic rather than agglutinative) is determined by syntactic structure (e.g., nominative for most subjects) or assigned lexically by a verb or a preposition (e.g., the Polish verb pomagać “to help” takes a dative argument), and it also depends on the declension. Thus, each noun has many forms. There is syncretism or homophony among some of these contexts, but it does not reduce the set of forms in a regular way.
Because case is not marked on nouns in English, this complex area of grammar is considered a conflict site – a place where, if English grammar influences the grammar of HLs, we will see effects. The most likely possibilities are case being marked in fewer contexts or fewer distinctions being maintained in heritage case systems. Because case is marked in English pronouns, although with a smaller number of distinctions than in Slavic languages, we might also expect different case-marking behaviour on nouns than on verbs. We investigate these possibilities by looking at how case-marking is applied in conversational speech by speakers in the HLVC corpus. This is supplemented by a small sample of Homeland Russian data from the Russian National Corpus (Institute of Russian Language, Russian Academy of Sciences, 2003).Footnote 4 The Homeland Russian speakers were not included in previous analyses. Heritage Russian data is also augmented in this analysis, with almost 3,000 new tokens, compared with the dataset in Łyskawa and Nagy’s (Reference Łyskawa and Nagy2019) analysis. This section analyzes the larger dataset and compares case-marking in the heritage and homeland samples of the three Slavic languages in the HLVC project.
Discussions about vulnerable areas of grammar often mention case (cf. de Groot, Reference de Groot and Fenyvesi2005; Leisiö, Reference Leisiö2006; Montrul & Bowles, Reference Montrul and Bowles2009; Polinsky, Reference Polinsky2018). However, parallels between established diachronic change and innovation in the heritage varieties suggest that this may not be due to speakers’ difficulties with case, but rather internal change of the same type that has taken place in homeland varieties. This conclusion becomes more robust when we consider that the “vulnerability” is limited to a few specific parts of the case system. For example, during diachronic change, cross-linguistically, pronouns retain more case forms than nouns (Blake, Reference Blake2001). Our heritage speakers do the same – pronouns favour use of the canonical case more than nouns.
In our corpus, speakers of each language use these six cases: Accusative (ACC), Dative (DAT), Genitive (GEN), Instrumental (INS), Locative (LOC), Nominative (NOM). Prescriptively NOM contexts (e.g., subjects of many verbs) show no variability and are thus not examined. Vocative is elicited too rarely in this dataset to analyze. To support the following description of case systems, Appendix C of Łyskawa and Nagy (Reference Łyskawa and Nagy2019) includes example sentences in each language, illustrating the functions of each case, based on standard reference grammars such as Gruszczyński and Bralczyk (Reference Gruszczyński and Bralczyk2002), Press and Pugh (Reference Press and Pugh2015), and Timberlake (Reference Timberlake2004). See https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1111%2Flang.12348&file=lang12348-sup-0001-SuppMat.pdf.
Laskowski (Reference Laskowski2009) investigated Heritage Polish spoken by children in Sweden. As Swedish has a more extensive case system than English, comparing this with studies in majority-English environments can help distinguish language-specific from more general patterns. Her studies established the implicational scale in Example 12, indicating which cases are retained most often.
(12) (intact) NOM > ACC > GEN > INS > LOC > DAT (virtually gone)
This hierarchy parallels those established for diachronic change and for first-language acquisition. While the literature does not agree on an exact order of cases, generally, NOM is acquired first (and retained longer) than GEN and ACC, while oblique cases (LOC and DAT) are acquired last (and lost first). If heritage speakers do not use the canonical case-marker for oblique cases the most, then they are paralleling historic patterns of change.
In some contexts, syntactic heads (i.e., verbs or prepositions) select the case for their arguments, while in other contexts, the case is determined by structural position. NOM is the default or citation form. It marks many subjects and complements to a conjunction or copula. When syntax calls for NOM, that is invariably what speakers produce, so NOM contexts are not included in the analyses. GEN marks the complement to a noun, as a possessive or a measure; complement to a numeral greater than five; the negated object of a verb that otherwise would select for an ACC; the subject of some verbs or the object of some verbs or prepositions. ACC marks the object of most verbs and some prepositions (including directional). DAT marks the semantic goal, the object of some prepositions and of some verbs. INST marks the object of the copula “to be,” a semantic instrument and the object of some prepositions. Finally, LOC marks the object of some prepositions (Example 13) (including non-directional). Nouns and personal and question pronouns are case-marked, though there is a distinct set of declensions (which varies by person, gender, and case for personal pronouns and by animacy and case for question pronouns). We contextualize the heritage varieties by first considering variation in the homeland varieties. Polinsky & Kagan (Reference Polinsky and Kagan2007) show that HLs may exhibit variation of the type present in homeland varieties, where the variation is limited to non-standard dialects or child language. For example, in Heritage Russian, prepositional case replaces other oblique cases. Additionally, Homeland Polish shows some variability in case marking, as illustrated by Example 13, where the pronoun bearing the ACC marker would prescriptively bear the GEN marker: tej (this.fem.gen).
| poszukać | tą, | tą, | tą | Gdańska |
| look-for.inf | this.fem.acc | this.fem.acc | this.fem.acc | Gdańsk |
| “to look for the one from Gdańsk” (P0F43A, 37:57) | ||||
Buttler (Reference Buttler1976), Glovinskaja et al. (Reference Glovinskaja, Zemskaja and Bobrik2001), and Muszyńska (Reference Muszyńska2001) note similarities between changes in homeland and heritage varieties.
5.4.1 Methods for the Analysis of Case-Marking
To examine variability that might reflect change in the heritage varieties, we consider both rates of canonical case-marking and contexts in which case mismatch is found most often, again using mixed effect models to determine which contextual effects play a significant role and which social factors distinguish linguistic behaviour.
Given the discussion in Section 5.4, we hypothesize, as in Łyskawa & Nagy (Reference Łyskawa and Nagy2019), that these linguistic factors will influence the selection of case marker:
Nominal type (noun or pronoun);
Canonical case for the given context (i.e., the most frequently produced form);
Case assigner (the context, verb or preposition that assigns case to the NP).
In the models, the latter two factors are treated as one complex factor, given their partial interdependence. That is, for example, if a noun is assigned case by an impersonal expression, the only case it can be assigned (canonically) is DAT. Declension class was predicted to have an effect in Łyskawa & Nagy (Reference Łyskawa and Nagy2019), but as it did not, it is not considered in the analyses here. The large number of classes, which differ across languages, makes it unwieldy for our comparative goals. Thus, models include two linguistic factors (Nominal type, Case assigner * Canonical case) and two or three social factors. For Case assigner * Canonical case, the default level is the context in which GEN case is prescriptively assigned by a possessive, partitive, negative, or “other” construction. This level includes possessives, one of few contexts where English overtly assigns case for both nouns and pronouns (e.g., “her book,” “the girl’s book”). A second context where English overtly assigns case is verbs assigning accusative case to their direct object, in some pronouns (e.g., “I saw her”) but not other pronouns (e.g., “I saw you”), nor nouns. A third context is where a preposition assigns case (e.g., “I showed him to her”). The full set of levels for this variable is given in Table 5.16, distinguishing the contexts that do and do not differ from English in terms of whether case is overtly marked.
Table 5.16 Levels for the predictors Nominal Type and Prescriptive case and Case assigner for (CASE)
| Predictor | English marks case? | Level |
|---|---|---|
| Nominal type | yes (sometimes) | Pronoun |
| no | Noun | |
| Canonical case and Case assigner | yes | GEN.merged |
| ACC.verb | ||
| ACC.preposition | ||
| no | ACC.number, quantifier | |
| ACC.other | ||
| ACC.partitive | ||
| ACC.possessive, negative | ||
| DAT.impersonal | ||
| DAT.other | ||
| DAT.possessive, negative | ||
| DAT.preposition | ||
| DAT.verb | ||
| GEN.number, quantifier | ||
| GEN.partitive | ||
| GEN. possessive, negative | ||
| GEN.preposition | ||
| GEN.verb | ||
| INS.number, quantifier | ||
| INS.other | ||
| INS.possessive, negative | ||
| INS.preposition | ||
| INS.verb | ||
| LOC.number, quantifier | ||
| LOC.other | ||
| LOC.partitive | ||
| LOC.possessive, negative | ||
| LOC.preposition | ||
| LOC.verb |
Generation is tested in heritage models, and then models with the two EO scores rather than Generation (with which they are collinear) are tested. In homeland models, Age is tested. Sex is tested in every model. Speaker is always included as a random effect to mitigate outlier effects.
We consider the social factors first as main effects and then, in separate models, as interaction effects with the linguistic predictors. This allows us to identify any significant differences in rate of case-matching among speaker groups, as well as whether the match rate changes more in certain linguistic contexts (changes in the constraint hierarchy). Comparing the models for homeland and heritage allows us to see whether heritage and homeland patterns of variation mirror each other.
HLs lack defined standards, and thus we have no a priori certitude about their case systems’ structure. Therefore, we operationalize the dependent variable as follows. Tokens are organized by context (based on the factors listed in Table 5.16) and the majority form for each context is determined. In every context, in both heritage and homeland data, we find that the most frequently used form matches the normative form given in the reference grammars cited earlier. For each context, this is considered the “match” form and any other form is coded as “mismatch.” The majority form also matches the intuitions of the native-speaker students who coded the data. The “match” forms are thus equivalent to canonical forms. Example 14 is an example of mismatch: The normative case is genitive (времени) because it is in a time construction, but the speaker produced a NOM form. Example 15 is an instance of match: ACC is canonical and produced.
(14) Mismatch: “We didn’t have time to memorize.” (R2F12A 00:11:38)
у | нас | не | было | время | запомнить |
by | us | not | was | time | to memorize |
(15) Match: “My husband went and bought cards.” (R1F47A 00:10:55)
муж | мой | поехал | купил | kарточки |
husband | my | went | bought | cards |
The dependent variable in all (CASE) models is mismatch versus match, with all models reflecting the probability of a match (between observed and canonical forms) in each context. The envelope of variation for this variable is any noun or pronoun normatively marked with ACC, GEN, DAT, or LOC, that is, prescriptively non-NOM NPs. For Russian and Ukrainian, the first 100 instances of case-marked nouns and pronouns produced by each speaker, excluding normatively NOM contexts) were coded, starting ten minutes after the beginning of each recording. For Polish, initially the first 150 all-case-marked tokens were coded (from the same starting point), and then NOM ones were excluded.
If a noun prescriptively requires a genitive suffix –a and such was observed, it was coded as match even though the same suffix could also mark the accusative. That is, we give the speakers the benefit of the doubt in contexts with syncretic forms. Where two forms compete for a particular lexical item, for example, -u and -a in Polish masculine genitive singular, both were treated as normative or match.
This initial data set has 9,666 tokens, across the three languages. Our seventy-eight speakers are distributed by language, generation, and age group, as summarized in Table 5.17. Within each cell, males and females were originally as balanced as possible. Several speakers had to be excluded from statistical analysis owing to their categorical behaviour (100 percent match). Excluded speakers in each generation are reported in Table 5.17. Nine out of twenty-eight homeland speakers are excluded, six of eighteen Gen1, and two of twenty-five Gen2. None of the seven Gen3 speakers exhibited categorical behaviour. So, at this gross level, we see an expected pattern of more variability in later generations. This is illustrated Figure 5.14, a graph of the distribution across the whole data set, including tokens for categorically matched speakers and contexts. After excluding the seventeen speakers with categorical case-marking, 7,739 tokens remained.
Table 5.17 (CASE) data sample: speaker count, by language and generation (n = 7,739)
| Polish | Russian | Ukrainian | Total | |
|---|---|---|---|---|
| Homeland | (5 excluded) | (2 excluded) | (2 excluded) | 19 (+ 9 excluded) |
| 60+ years | 2 | 2 | 4 | |
| 39–60 years | 1 | 2?Footnote 1 | 3 | 6 |
| 12–38 years | 4 | 2 | 3 | 9 |
| Gen1 | (1 excluded) | (5 excluded) | 12 (+ 6 excluded) | |
| 60+ years | 2 | 3 | 2 | 7 |
| 39–60 years | 0 | 4 | 1 | 5 |
| Gen2 | (2 excluded) | 23 (+ 2 excluded) | ||
| 60+ years | 1 | 1 | 0 | 2 |
| 39–60 years | 1 | 3 | 2 | 6 |
| 12–38 years | 6 | 7 | 2 | 15 |
| Gen3 | 7 | |||
| 60+ years | 0 | 0 | 2 | 2 |
| 39–60 years | 0 | 1 | 1 | 2 |
| 12–38 years | 0 | 2 | 1 | 3 |
| Total | 17 | 25 | 19 | 61 (+ 17 excluded) |
1 No age data is available for these two speakers from the Russian National Corpus.

Figure 5.14 Match rates for (CASE), by language and generation (n = 9,666)
Table 5.18 reports the rate of match for each combination of canonical case and case-assigner, by language. Greyed-out cells are combinations that do not exist. “X” indicates contexts that are excluded because there is little data in homeland, heritage, or both samples (<5 tokens/cell). Two more contexts are excluded because of categorical match rates: the impersonal dative for Ukrainian and the verb-assigned dative for Polish. Some case-assigner contexts are merged for genitive, to avoid cells with too little data: The partitive, possessive, and negative case-assigners are merged with all “Other” leaving tokens where GEN is assigned by a verb, preposition, number, or quantifier as separate levels in the variable. Table 5.18 shows that in nearly every context for which there is enough data to examine, in every language, the match rate is well above 90 percent. The exceptions are for GEN assigned by a verb in Polish and Russian, GEN assigned by a possessive or negative in Ukrainian, and INS assigned by a verb in Ukrainian. In total, we have 7,503 tokens in variable contexts. These variable contexts will be analyzed in a series of mixed effects regression models as described earlier.
Table 5.18 Rate of match for (CASE) by context and language
| Case-assigner | ACC | DAT | GEN | INS | LOC | n | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| POL | RUS | UKR | POL | RUS | UKR | POL | RUS | UKR | POL | RUS | UKR | POL | RUS | UKR | ||
| Verb | 94% | 97% | 99% | 100% | 97% | 95% | 62% | 75% | 96% | 79% | 96% | 45% | X | 3,273 | ||
| Preposition | 95% | 96% | 97% | X | 91% | 91% | 98% | 93% | 96% | 94% | 92% | 97% | 95% | 3,070 | ||
| Number, quantifier | X | NA | X | 92% | 95% | X | X | 325 | ||||||||
| impersonal | NA | X | 98% | 100% | NA | NA | NA | 213 | ||||||||
| Partitive | X | NA | merged | NA | X | 88 | ||||||||||
| Possessive, negative | X | X | 94% | 87% | 65% | X | X | 372 | ||||||||
| Other | X | X | X | X | 162 | |||||||||||
| n | 192 | 1,457 | 315 | 36 | 439 | 67 | 236 | 2,761 | 394 | 95 | 357 | 112 | 156 | 678 | 196 | 7,503 |
5.4.2 (CASE) Results, All Speakers Combined
Models testing interactions between the linguistic predictors and the social predictors were constructed. In each case, they fit the model less well than simpler models without interactions. We therefore compare the models that include the linguistic and social predictors as main effects, without interactions, along with Speaker as a random effect. These models, including all variable speakers and all variable contexts, are summarized in Table 5.19. In these models, the estimate for the intercept shows the match rate when all predictors are set to the reference level. The reference level for each predictor is in italics in the first column. The estimates for each other level indicate how much higher (positive estimates) or lower (negative estimates) the probability of producing the canonical case is in that context compared with the reference level.Footnote 5 The number of tokens (n) and the percentage match rate for each level (context) are provided. As a reminder, seventeen categorical speakers and numerous contexts (combinations of canonical case and assigner) were excluded because of categorical behaviour or too few tokens, as listed in Table 5.18. The complete models are available at ngn.artsci.utoronto.ca/pdf/HLVC/HLVC_CASE_analysis_22Jan2023.html.
Table 5.19 Mixed effects models for (CASE) in three languages, all variable speakers, all variable contexts (n = 7,313, sixty-one speakers)
| Polish (n = 1590, 17 speakers) | Russian (n = 3965, 25 speakers) | Ukrainian (n = 1762, 19 speakers) | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Estimate | Std. Error | z value | Pr(>|z|) | n | % match | Estimate | Std. Error | z value | Pr(>|z|) | n | % match | Estimate | Std. Error | z value | Pr(>|z|) | n | % match | ||||
| Intercept | 5.47 | 1.02 | 5.35 | 0.00 | *** | 4.10 | 0.79 | 5.16 | 0.00 | *** | 3.72 | 0.56 | 6.69 | 0.00 | *** | ||||||
| Generation | |||||||||||||||||||||
| Homeland | 909 | 99% | 371 | 99% | 711 | 98% | |||||||||||||||
| Gen1 | –1.55 | 1.14 | –1.37 | 0.17 | 153 | 97% | –0.77 | 0.80 | –0.96 | 0.34 | 961 | 98% | –1.76 | 0.43 | –4.11 | 0.00 | *** | 287 | 94% | ||
| Gen2 | –2.54 | 0.80 | –3.16 | 0.00 | ** | 528 | 88% | –2.07 | 0.75 | –2.77 | 0.01 | ** | 2122 | 95% | –1.42 | 0.39 | –3.64 | 0.00 | *** | 387 | 95% |
| Gen3 | 0 | –2.57 | 0.83 | –3.12 | 0.00 | ** | 511 | 89% | –2.41 | 0.36 | –6.75 | 0.00 | *** | 377 | 89% | ||||||
| Sex | |||||||||||||||||||||
| Male | 683 | 92% | 1349 | 96% | 838 | 95% | |||||||||||||||
| Female | 0.44 | 0.71 | 0.62 | 0.53 | 907 | 98% | –0.35 | 0.38 | –0.91 | 0.36 | 2616 | 95% | –0.66 | 0.25 | –2.59 | 0.01 | ** | 924 | 94% | ||
| Nominal type | |||||||||||||||||||||
| Pronoun | 256 | 97% | 2283 | 96% | 269 | 96% | |||||||||||||||
| Noun | –0.44 | 0.46 | –0.94 | 0.35 | 1334 | 95% | –0.39 | 0.18 | –2.15 | 0.03 | * | 1682 | 94% | –0.69 | 0.42 | –1.62 | 0.10 | 1493 | 94% | ||
| Canonical case * Case assigner | |||||||||||||||||||||
| GEN.merged | 204 | 97% | 391 | 87% | 125 | 81% | |||||||||||||||
| ACC.preposition | 1.21 | 0.91 | 1.34 | 0.18 | 130 | 99% | 1.42 | 0.31 | 4.54 | 0.00 | *** | 394 | 96% | 2.10 | 0.49 | 4.25 | 0.00 | *** | 197 | 97% | |
| ACC.verb | 1.01 | 0.61 | 1.67 | 0.10 | 409 | 98% | 1.72 | 0.24 | 7.08 | 0.00 | *** | 1030 | 97% | 3.22 | 0.57 | 5.67 | 0.00 | *** | 368 | 99% | |
| DAT.impersonal | 0 | 2.49 | 0.54 | 4.58 | 0.00 | *** | 189 | 98% | 0 | ||||||||||||
| DAT.verb | 0 | 1.58 | 0.49 | 3.21 | 0.00 | ** | 157 | 97% | 1.22 | 0.71 | 1.71 | 0.09 | 48 | 94% | |||||||
| GEN.number, quantifier | 0 | 0.83 | 0.33 | 2.54 | 0.01 | * | 186 | 91% | 2.20 | 0.47 | 4.65 | 0.00 | *** | 214 | 97% | ||||||
| GEN.preposition | 0.10 | 0.60 | 0.17 | 0.87 | 280 | 96% | 2.46 | 0.34 | 7.28 | 0.00 | *** | 583 | 98% | 3.31 | 0.64 | 5.19 | 0.00 | *** | 296 | 99% | |
| GEN.verb | –2.67 | 0.57 | –4.70 | 0.00 | *** | 60 | 68% | –1.30 | 0.54 | –2.42 | 0.02 | * | 28 | 75% | 1.99 | 1.07 | 1.85 | 0.06 | 35 | 97% | |
| INS.preposition | –0.62 | 0.65 | –0.96 | 0.34 | 145 | 96% | 1.60 | 0.37 | 4.30 | 0.00 | *** | 261 | 96% | 1.79 | 0.54 | 3.34 | 0.00 | *** | 101 | 95% | |
| INS.verb | –1.38 | 0.67 | –2.05 | 0.04 | * | 46 | 85% | 1.38 | 0.55 | 2.50 | 0.01 | * | 73 | 95% | –1.09 | 0.37 | –2.98 | 0.00 | ** | 82 | 65% |
| LOC.preposition | 0.35 | 0.57 | 0.62 | 0.54 | 316 | 96% | 1.77 | 0.28 | 6.38 | 0.00 | *** | 673 | 97% | 2.20 | 0.41 | 5.39 | 0.00 | *** | 296 | 96% | |
Significant differences between homeland and heritage speakers emerge for match rate. In Polish, Gen2 speakers have a match rate that is significantly lower than Homeland speakers (we have no Polish Gen3 data). For Russian, this is true for Gen2 and Gen3. For Ukrainian, all three heritage generations have lower match rates than Homeland, but Gen1 differs by more than Gen2. In Ukrainian, females have a match rate significantly lower than males. There is no Sex effect in the other languages.
Among the linguistic predictors, there is again variability across the languages. Nominal type is significant only for Russian: Nouns are less likely to be produced with the canonical form than pronouns. A non-significant trend in the same direction exists in the Polish and Ukrainian data. However, the best fitting model is one without interactions – this effect is not notably different across generations.
For Prescribed case * Case assigner, ACC.preposition and ACC.verb, two contexts where English also assigns case, have high match rates, significantly higher than GEN.merged in Russian and Ukrainian. However, comparing rates in these contexts in which English overtly marks case to those in which English does not overtly mark case, we find little consistency across languages. The contexts that significantly differ from GEN.merged vary by language. It is noteworthy, however, that GEN.merged has a significantly lower match rate than most other levels (those with positive estimates), even though it includes a context where English marks case (possessives). In sum, while two contexts where English marks case have higher match rates than GEN, so do many other levels.
The models discussed so far include data for all speaker groups together. We will return to the task of teasing out evidence for contact effects once we separate homeland speakers from heritage, to explore whether the contexts where there is more similarity to English behave differently for the homeland versus the heritage speakers.
Overall, speakers exhibit high rates of agreement and cross-generational consistency. The overall accuracy (match) is 95 percent (7,115 of 7,463 tokens), after excluding categorically matched speakers and contexts (and 96 percent before). Although small, the differences between homeland and later heritage generations (Gen2, Gen3) are significant within each language. In Ukrainian, Gen1 also differs from Homeland. However, the overall match rate does not fall below 89 percent in any group (even after excluding categorically matched contexts and speakers).
Before we can compare homeland and heritage patterns in greater detail, we must first determine the best representation of the patterns in each group. To select the best model for each HL, we construct and compare four types of models. The first includes Generation and Sex as main effects, plus the two linguistic predictors (Nominal type and Canonical case * Case assigner). The second type are models with those factors plus interactions between the linguistic and social factors. The third type are models with two EO scores (EO_language and EO_culture) and Sex as main effects, plus the two linguistic predictors. Finally, models were constructed with those factors plus interactions between the linguistic and social factors. In every language, the model that best fitted the data, determined by comparing AIC scores, is the model with EO scores and Sex, but no interactions.
To select the best model for each homeland language, a similar process was followed, comparing models that tested Age and Sex, plus the linguistic predictors, against models with those factors plus interactions between the linguistic and social factors. Again, the best-fitting model had no interactions.
We then compare the heritage model including EO scores and the homeland model including age, for each language. We have considerably more heritage than homeland data, so we must keep in mind the possibility of more effects emerging as significant in the heritage models for that reason alone. The best-fitting MEM for each variety (homeland and heritage, for three languages) is shown in Table 5.20. In this table, homeland and heritage models for each language appear side by side. An estimate of 1.00 is added in each level that was categorically matched, although these contexts could not be included in the model. This allows for clear comparison between homeland and heritage varieties where a level had to be excluded from one group or the other.
Table 5.20 Mixed effects models for (CASE) in homeland and heritage varieties (n = 5,408)
| Estimate | Std. Error | z value | Pr(>|z|) | Estimate | Std. Error | z value | Pr(>|z|) | |||
|---|---|---|---|---|---|---|---|---|---|---|
| Homeland Polish (n = 396, 7 speakers) | Heritage Polish (n = 424, 6 speakers) | |||||||||
| Intercept | 3.77 | 1.35 | 2.79 | 0.01 | ** | 5.52 | 1.00 | 5.51 | 0.00 | *** |
| Age / EO_language | –0.02 | 0.02 | –1.04 | 0.30 | –0.99 | 0.25 | –3.95 | 0.00 | *** | |
| / EO_culture | –1.92 | 0.33 | –5.86 | 0.00 | *** | |||||
| Sex | ||||||||||
| Male | ||||||||||
| Female | 0.86 | 0.98 | 0.88 | 0.38 | ||||||
| Nominal type | ||||||||||
| Pronoun | ||||||||||
| Noun | 1.65 | 1.03 | 1.61 | 0.11 | –0.62 | 0.64 | –0.97 | 0.33 | ||
| Canonical case * Case assigner | ||||||||||
| GEN.merged | ||||||||||
| ACC.verb | 1.00 | 0.50 | 0.78 | 0.64 | 0.52 | |||||
| ACC.preposition | 1.00 | 1.34 | 1.28 | 1.05 | 0.30 | |||||
| DAT.verb | 1.00 | 1.00 | ||||||||
| DAT.impersonal | no heritage tokens | 1.00 | ||||||||
| GEN.verb | –3.51 | 1.40 | –2.51 | 0.01 | * | –3.48 | 0.80 | –4.33 | 0.00 | *** |
| GEN.preposition | 0.46 | 1.43 | 0.32 | 0.75 | –0.48 | 0.80 | –0.60 | 0.55 | ||
| INS.verb | 1.00 | –2.10 | 0.81 | –2.58 | 0.01 | ** | ||||
| INS.preposition | 1.00 | –0.49 | 1.04 | –0.47 | 0.64 | |||||
| LOC.preposition | 1.00 | –0.08 | 0.75 | –0.11 | 0.92 | |||||
| Homeland Russian (n=102, 4 speakers) | Heritage Russian (n = 2967, 18 speakers) | |||||||||
| Intercept | 3.17 | 1.05 | 3.01 | 0.00 | ** | 2.25 | 0.40 | 5.68 | 0.00 | *** |
| age | insufficient information | –0.27 | 0.17 | –1.65 | 0.10 | |||||
| –0.01 | 0.19 | –0.03 | 0.97 | |||||||
| Sex | Cannot include this factor | |||||||||
| Male | 1.00 | |||||||||
| Female | –0.19 | 0.57 | –0.33 | 0.74 | ||||||
| Nominal type | ||||||||||
| Pronoun | ||||||||||
| Noun | –0.92 | 1.31 | –0.70 | 0.48 | –0.38 | 0.19 | –2.02 | 0.04 | * | |
| Canonical case * Case assigner | ||||||||||
| GEN.merged | ||||||||||
| ACC.verb | 1.00 | 1.77 | 0.25 | 7.07 | 0.00 | *** | ||||
| ACC.preposition | 0.88 | 1.50 | 0.59 | 0.56 | 1.59 | 0.33 | 4.81 | 0.00 | *** | |
| DAT.verb | 1.00 | 1.62 | 0.50 | 3.25 | 0.00 | ** | ||||
| DAT.impersonal | 1.00 | 2.57 | 0.55 | 4.71 | 0.00 | *** | ||||
| GEN.verb | 1.00 | –1.62 | 0.60 | –2.69 | 0.01 | ** | ||||
| GEN.preposition | 0.68 | 1.45 | 0.46 | 0.64 | 2.69 | 0.36 | 7.39 | 0.00 | *** | |
| GEN.number, quantifier | 1.00 | 0.98 | 0.35 | 2.83 | 0.00 | ** | ||||
| INS.verb | 1.00 | 1.37 | 0.56 | 2.44 | 0.01 | * | ||||
| INS.preposition | 1.00 | 2.05 | 0.43 | 4.77 | 0.00 | *** | ||||
| LOC.preposition | 1.00 | 1.80 | 0.28 | 6.35 | 0.00 | *** | ||||
| Homeland Ukrainian (n= 579, 8 speakers) | Heritage Ukrainian (n = 940, 10 speakers) | |||||||||
| Intercept | 3.46 | 1.39 | 2.48 | 0.01 | * | 1.66 | 0.54 | 3.05 | 0.00 | ** |
| age | 0.00 | 0.01 | 0.28 | 0.78 | –0.03 | 0.14 | –0.20 | 0.84 | ||
| –0.12 | 0.25 | –0.49 | 0.62 | |||||||
| Sex | ||||||||||
| Male | ||||||||||
| Female | –0.45 | 0.63 | –0.71 | 0.48 | –1.04 | 0.62 | –1.69 | 0.09 | ||
| Nominal type | ||||||||||
| Pronoun | ||||||||||
| Noun | 0.16 | 1.08 | 0.15 | 0.88 | –0.76 | 0.48 | –1.57 | 0.12 | ||
| Canonical case * Case assigner | ||||||||||
| GEN.merged | ||||||||||
| ACC.verb | 0.93 | 1.01 | 0.92 | 0.36 | 3.99 | 0.77 | 5.19 | 0.00 | *** | |
| ACC.preposition | –0.09 | 0.94 | –0.10 | 0.92 | 2.94 | 0.66 | 4.48 | 0.00 | *** | |
| DAT.verb | –1.42 | 1.59 | –0.89 | 0.37 | 1.78 | 0.84 | 2.10 | 0.04 | * | |
| DAT.impersonal | 1.00 | 1.00 | ||||||||
| GEN.verb | 1.00 | 1.86 | 1.10 | 1.69 | 0.09 | |||||
| GEN.preposition | 1.00 | 3.82 | 0.77 | 4.96 | 0.00 | *** | ||||
| GEN.number, quantifier | 0.89 | 1.25 | 0.72 | 0.47 | 2.33 | 0.51 | 4.54 | 0.00 | *** | |
| INS.verb | –1.16 | 0.95 | –1.21 | 0.23 | –0.98 | 0.44 | –2.23 | 0.03 | * | |
| INS.preposition | 0.06 | 1.25 | 0.05 | 0.96 | 2.13 | 0.60 | 3.55 | 0.00 | *** | |
| LOC.preposition | 1.19 | 1.24 | 0.96 | 0.34 | 2.35 | 0.44 | 5.37 | 0.00 | *** | |
Let us first consider the social predictors. The models with EO scores fit the data better than those with Generation as a predictor, and the models without interactions between social factors and linguistic factors fit the data better than those with interactions. This means that, within the heritage samples, differences are best accounted for by EO scores, and those are only significant in Polish. (In the models that tested Generation, there were significant differences between generations in Russian, but not in either other language.) The varieties are otherwise homogeneous in terms of match rates.
The set of models selected as best fitting the data also means that we have little evidence of differences in the grammar – no context where within-model groups treat the predictors differently, either within the heritage or within the homeland samples. Sex never emerges as a significant factor in these best-fitting models, so there is little indication of identity-marking. As Age is never significant in the homeland models, internal change starting in the homeland variety can also be ruled out. Both EO_language and EO_culture are significant main effects for Heritage Polish, but not the other two HLs. In both cases, the estimates are negative. This indicates that the more the speaker orients to Polish culture or reports using Polish, the lower their match rate is. This does not suggest an effect of cultural or linguistic contact.
We turn next to the linguistic factors, looking first at Nominal type. Recall the expectation that heritage speakers retain canonical case more in pronouns than nouns, given the existence of richer case-marking on English pronouns than nouns. However, no differences between homeland and heritage speakers emerge here. In Polish and Ukrainian, this factor is not significant in either heritage or homeland. Interestingly, though, the trend between groups is similar in both languages: The estimate is positive in the homeland model but negative in the heritage model. This suggests that homeland speakers select the canonical case more in nouns than pronouns, while heritage speakers do so more in pronouns than nouns, as predicted (though without a significant effect). In Russian we see a negative estimate for noun in both varieties, though the difference from pronoun is only significant in the heritage variety. This is one shred of evidence supporting English influence.
We next consider the complex factor that includes canonical case and the case assigner. The reference level for this predictor is GEN.merged, so positive estimates indicate contexts with more canonically marked tokens than in GEN.merged.
In Polish, there are six categorical contexts where every homeland speaker always produced the canonical case. In most of these contexts, the heritage speakers also have categorical match (DAT.verb) or have a positive estimate, indicating a high rate of canonical case-marking (ACC.verb, ACC.preposition). One other similarity is that both groups have large negative estimates for GEN.verb, indicating that this is a context with relatively less canonical case-assignment. There is only one context with divergent effects between the groups, out of the eight levels that are comparable: INS.verb is categorically matched by homeland speakers but has a negative estimate for heritage speakers. INS.preposition and LOC.preposition also have categorical matches among homeland speakers but negative estimates for heritage speakers. However, these estimates are small and the effects non-significant.
In Russian, eight contexts had to be excluded from the homeland analysis because of categorical match. Of these, seven have positive significant estimates in the heritage data, indicating a strong match rate there as well. Only for GEN.verb do we see a difference, where homeland speakers categorically match but heritage speakers have significantly lower match rates than for GEN.merged. We note also that every level in the heritage model differs significantly from the reference level, while no level does in the homeland model. To account for this, we might say that the heritage grammar is considerably more complex here, with speakers producing different rates of match across contexts. And we could add that the lack of EO effects indicates that this pattern is not related to language use or cultural orientation. However, it is more likely that the significance levels are affected by the fact that there are nearly 3,000 tokens in the heritage sample but only about 100 in the homeland sample (after eliminating many categorical contexts and speakers).
Ukrainian is similar to the other languages. Categorical contexts in the homeland data correspond, in the heritage data, to either categorical behaviour (for DAT.impersonal) or positive (for GEN.verb) or positive, large, and significant estimates (for GEN.preposition), all indicating high match rates. There are four contexts with the same direction of effect, compared with GEN.merged, in both varieties: positive estimates for GEN.number, INS.preposition, and LOC.preposition, and one negative estimate, for INS.verb. There are only two levels with differently-signed estimates, though the estimates are not significant for the homeland data: ACC.preposition and DAT.verb. As one of these shows a higher match rate in a context where English marks case and the other where English does not, they are considered a wash – no overall change toward or away from English is extractable from these patterns.
Comparing heritage to homeland models, then, we see only a few types and few tokens of systematic distinction. For further investigation, we compare percentage match rather than the estimates, as those depend on the match rate for GEN.merged, which differs considerably across languages (for the heritage varieties: 95 percent match for Polish and 90 percent match for Russian but 65 percent match for Ukrainian). Only five contexts show match rates below 85 percent among heritage speakers: GEN.merged and INS.verb for Ukrainian, INS.verb for Polish, and GEN.verb in Polish and Russian. GEN.verb also has a low match rate in Homeland Polish: 87 percent. The low match rate for GEN.verb in Homeland Polish suggests a change in progress. We must discountthis heritage rate from any tally of non-canonical behaviour given that it may rather be an instance of heritage speakers advancing a change that exists also in Homeland Polish. All other contexts have match rates in the high 90 percents in the homeland and heritage varieties.
Across the languages, as detailed earlier, there are five levels among the linguistic predictors with clear differences between heritage and homeland, out of thirty-three. Thus, the analysis reveals systematicity in the grammar of heritage speakers and illustrates that they operate with the same types of rules as homeland speakers. They neither exhibit a random collection of errors nor do they systematically simplify or apply “default form” rules, such as the simplest or most frequent form.
We can drill still deeper. In a smaller dataset, Łyskawa & Nagy (Reference Łyskawa and Nagy2019) reported a cross-tabulation of match rates across cases. They showed that, outside the many contexts that are almost categorically realized by the normative form in both homeland and heritage groups, other contexts showed variation, indicating two strategies. The first is default (NOM) replacement and the second is a specific non-NOM case used in place of the normative one. That exercise is repeated in Table 5.21 with the larger data set. Most tokens are observed with their canonical case (bolded numbers). Most non-match tokens are accounted for by the first strategy: We see tokens, for homeland and heritage speakers, in the Observed NOM and ACC columns for all canonical cases except DAT. As noted in Łyskawa & Nagy (Reference Łyskawa and Nagy2019), the dearth of non-canonical tokens that are prescriptively DAT counter Laskowski’s (Reference Laskowski, Hentschel and Laskowski1993) implicational hierarchy in which DAT is the most affected case. Heritage speakers use NOM in place of all cases more than DAT. However, this may be because many DAT tokens are pronouns, which are more often matched than noun tokens.
Table 5.21 (CASE) crosstab: canonical and observed case assignment (9,661 tokens)
| Homeland (n = 3,329) | Heritage (n = 6,332) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Observed Case | Observed Case | |||||||||||
| Canonical case | ACC | DAT | GEN | INS | LOC | NOM | ACC | DAT | GEN | INS | LOC | NOM |
| ACC | 1112 | 4 | 3 | 2049 | 8 | 10 | 6 | 9 | 34 | |||
| DAT | 229 | 1 | 1 | 3 | 557 | 3 | 5 | |||||
| GEN | 7 | 1 | 1083 | 1 | 66 | 5 | 1844 | 7 | 71 | |||
| INS | 9 | 1 | 315 | 2 | 3 | 4 | 1 | 3 | 538 | 9 | 47 | |
| LOC | 2 | 554 | 1 | 16 | 2 | 8 | 1000 | 27 | ||||
Furthermore, there does not seem to be wholesale substitution of NOM or ACC marking, because we also see forty heritage tokens and seven homeland tokens that are canonically ACC but assigned to another case (the italicized tokens in the ACC row) – most of these from Heritage Russian speakers. We would expect these to bear ACC (or NOM) marking if speakers had selected a preferred default case. Instead, they are produced with a variety of cases, but not frequently enough to allow inferential analysis.
Homeland speakers occasionally produce mismatches, in two contexts. The first is unique to Polish: 9 of 193 prescriptively INS tokens are realized as ACC (~5 percent). The second context is common to the three languages: 7 of 1,092 prescriptively GEN tokens are realized as ACC (0.06 percent). Heritage speakers replicate the second homeland trend, but not the first, producing 66 of 1,995 GEN tokens as ACC. Of note, there are twenty other mismatches, scattered through 3,329 homeland tokens.
5.4.3 (CASE) Summary
In variable contexts, we found match rates from 88 to 98 percent for the three heritage varieties and 98 to 99 percent for the homeland varieties (see Table 5.18). The small number of contexts (five) for which heritage and homeland varieties differ (see Table 5.20) contradicts claims of vulnerability of the case system in HLs. The biggest difference is that heritage speakers sometimes substitute NOM forms for any case – this is rare among homeland speakers. We have also seen that speakers match more with pronouns than nouns, evidence that heritage speakers fully retain the syntax of case. That is, they are aware of when to use each case, and, for pronouns, with many fewer forms to choose among than nouns, they are quite homeland-like. This replicates the cross-linguistic tendency for case to be retained more on pronouns than nouns (cf. Blake, Reference Blake2001), including in English (as it has reduced case-marking over the centuries). Focussing on the entire case system (rather than just its most vulnerable areas), we see that heritage speakers use case inflection much like homeland speakers, in spontaneous speech. Table 5.22 summarizes the evidence for stability, internal or identity-marking change, and change toward or away from English, in parallel with what has been shown for (VOT) and (PRODROP). In this table, “S” indicates stability, “ToE” and “FromE” mark convergence/divergence with English, “I” marks differences attributed to identity-marking or internal change. Linguistic variables are defined in Table 5.16. Although the predictor Canonical Case * Case Assigner provides evidence of change toward an English-like pattern, it is important to keep in mind that an effect appears in only one or two levels (out of almost thirty) for each language.
Table 5.22 Summary of (CASE) effects
| Language | HOM age | HOM sex | HOM v G1 | HOM v G2 | G1 v G2 | EO_lang | EO_cult | HER sex | |
|---|---|---|---|---|---|---|---|---|---|
| POL | rate | S | S | S | ToE | S | FromE | FromE | S |
| Nom Type | S | S | S | S | S | S | S | S | |
| CaseAssigner | S | S | ToE | ToE | S | S | S | S | |
| RUS | rate | S | S | S | ToE | FromE | S | S | S |
| Nom Type | S | S | ToE | ToE | S | S | S | S | |
| CaseAssigner | S | S | ToE | ToE | ToE | S | S | S | |
| UKR | rate | S | S | ToE | ToE | S | S | S | S |
| Nom Type | S | S | S | S | S | S | S | S | |
| CaseAssigner | S | S | ToE | ToE | S | S | S | I | |
Two final points noted in Łyskawa and Nagy (Reference Łyskawa and Nagy2019) bear repeating. First, these findings also contrast with earlier findings such as the 87 percent mismatch rate reported by Polinsky (Reference Polinsky1995) for one type of construction, but align with Anstatt’s (Reference Anstatt2011, Reference Anstatt, Kempgen, Wingender, Franz and Jakiša2013) study showing maintenance in Heritage Russian and Polish case systems for speakers in Germany.
Second, GEN-ACC substitution is a completed historical change in Russian but apparently ongoing in Polish and Ukrainian, suggesting that HLs evolve along parallel lines with majority languages, as noted by Glovinskaja et al. (Reference Glovinskaja, Zemskaja and Bobrik2001).
5.5 Summary of Cross-Variety Comparisons
What have we learned from comparison of samples of spontaneous speech produced by homeland and several generations of heritage speakers? First, the phonetic variable (VOT), while showing variation across groups, provides little evidence to support interpreting these patterns as contact-induced or as signs of incomplete acquisition of HL patterns. Second, the stability in rates and conditioning effects on (PRODROP) across generations contrasts with findings for this variable in experimental paradigms (cf. Polinsky & Scontras, Reference Polinsky and Scontras2020a and other articles in the same issue). This variable also suggests considerably more similarity than difference between homeland and heritage speakers. Third, our findings suggest robust stability in most parts of the case-marking system and a general lack of simplification and overgeneralization. Again, data from the (CASE) analysis of three languages stands in contrast to claims in the experimental literature (same issue, especially Montrul & Mason, Reference Montrul and Mason2020).













