Hostname: page-component-68c7f8b79f-b57wx Total loading time: 0 Render date: 2026-01-02T10:56:02.103Z Has data issue: false hasContentIssue false

Shifting rules across generations: Variable subject expression in the Canberra Vietnamese heritage language

Published online by Cambridge University Press:  02 January 2026

Li Nguyen*
Affiliation:
Linguistics and Multilingual Studies, Nanyang Technological University, Singapore
Jasper Sim
Affiliation:
English Language & Literature, National Institute of Education, Nanyang Technological University, Singapore
*
Corresponding author: Li Nguyen; Email: li.nguyen@ntu.edu.sg
Rights & Permissions [Opens in a new window]

Abstract

This paper examines subject expression in heritage Vietnamese, focusing on its variation in a diasporic, cross-generational context, using corpus data from 45 speakers in Canberra, Australia. While subject expression has been widely studied in other languages, little is known about its use in languages like Vietnamese, which has an “open-class” pronominal system. Results show that although the rates of unexpressed subjects remain stable, the linguistic conditions underlying this variable have undergone change: first-generation speakers are least likely to drop second-person subjects, while second-generation speakers are least likely to drop first-person subjects. Both patterns contradict expectations given the pragmatic constraints of pro-drop in Vietnamese. We further interpret this as potentially a form of community bricolage to re-establish a more equal cross-generational relationship in a diaspora setting. Ultimately, we present a case of pragmatic change driving grammatical choices, thereby also highlighting that contrary to the traditional description, Vietnamese subject expression is perhaps not so “radical” after all.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press.

Introduction

This paper investigates cross-generational variable subject expression (otherwise known as “null subject”) in heritage Vietnamese. Subject pronouns in Vietnamese are not a closed class, but rather are productively derived from a system of kin terms, personal pronouns, and personal names (cf. Song & Nguyen, Reference Song and Nguyen2022). These forms can be expressed or unexpressed, as Example (1) illustrates.

Of all the options, kin terms are the most commonly used in speech. While personal pronouns and proper names in Vietnamese have been said to imply “a lack of deference and high degree of arrogance toward the addressee and/or third-party pronominal referent of superior age” (Ngo, Reference Ngo2006:4), kin terms are said to show a “very deep concern for respect and good feeling” among the interlocutors (Clark, Reference Clark, Tran, Nguyen and Le1988:21). As such, younger speakers are obliged to use kin terms rather than proper names and personal pronouns when speaking to or about their seniors. This is somewhat similar to the honorific system in Japanese but marks a striking difference from languages like English or Chinese, where pronouns are neutral.

Our study is motivated by several key observations. First, subject expression in Vietnamese is pragmatically conditioned by factors such as formality, intimacy, and social relationships—dimensions that are often less clearly defined in diaspora contexts than in the homeland (Tuc, Reference Tuc2003:27-28). Second, there is no straightforward correspondence between the pronoun systems of the contact languages, Vietnamese and English, complicating transfer and alignment. Finally, although Vietnamese is commonly classified as a language with “radical null subjects” (Biberauer et al., Reference Biberauer, Holmberg, Roberts and Sheehan2010), the linguistic and social factors underlying this variability remain underexplored. This study, therefore, seeks to address the following research questions:

  1. 1. What are the linguistic and social factors that significantly condition variable subject expression in heritage Vietnamese?

  2. 2. Is there a difference across generations concerning such conditions?

In doing so, we first sketch out the linguistic and cultural landscape of the community before reviewing some previous works on subject expression in different contexts. We then discuss the data and approach that we use before presenting our results and discussion.

The Canberra Vietnamese community

The community of interest here is the Vietnamese migrant community in Canberra, Australia—a young yet well-established community. The subjects of investigation are late bilingual immigrants, whom we refer to as first-generation speakers (Gen1), and early bilinguals raised in Canberra, whom we refer to as second-generation speakers (Gen2). It is worth making it clear that both Gen1 and Gen2 are considered Vietnamese “heritage language speakers” in the present work. Although different researchers still have different definitions of what the “heritage” component entails, most agree that a heritage language is not a result of any linguistic “deficit” but rather a complete system on its own (e.g., Aalberse et al., Reference Aalberse, Backus and Muysken2019; Polinsky, Reference Polinsky2018). For immigrant early-bilingual heritage language speakers (Gen2) in particular, the sole focus on divergence from a monolingual baseline can be rather meaningless, given that the input for their heritage language acquisition may come from the late bilinguals (Gen1) who are themselves outside their monolingual milieu (Polinsky, Reference Polinsky2018:15-17; Polinsky & Scontras, Reference Polinsky and Scontras2020:7; Sim, Reference Sim2023:2344; Sim & Post, Reference Sim and Post2024:1290-1293). In this sense, a study of an immigrant heritage language is not just a study of early bilinguals per se, but is in fact an enquiry into the transition from Gen1 to Gen2 speakers.

Studies in language contact have shown that the linguistic landscape of a migrant community is likely to be affected by several distinct characteristics, most notably: the circumstances of arrival; the age at arrival; and the level of integration into the host society. Due to the political tension in Australia following the fall of Saigon in 1975, the Vietnamese community here mainly built their life by clustering with each other, setting up family businesses and services in the Vietnamese/Chinese-dense suburbs where they did not have to communicate in English on a regular basis. Their need to congregate has been reported to be a result of “on the one hand, experiences of racism and social exclusion in Australia, and on the other, the desire to be close to compatriots and to rebuild a sense of community” (Carruthers, Reference Carruthers2008:102). It is thus no surprise that Vietnamese is particularly well-maintained within this diaspora (Ben-Moshe & Pyke, Reference Ben-Moshe and Pyke2012).

After the continuous arrival of Vietnamese refugees, a new generation of Vietnamese migrants began to arrive in Australia in the mid-1990s, primarily made up of international students, entrepreneurs, and specialist workers (Australian Bureau of Statistics, 2022). The Vietnamese language used in Australia is thus a combination of Old Saigon Vietnamese, maintained mostly by refugees, and modern Vietnamese homeland varieties from different sources, primarily spoken by new migrants.

In Canberra, although the population is still largely English-dominant (71.3% of locals speak only English at home), Vietnamese remains the third most popular heritage language spoken at home (n = 4082), after Mandarin Chinese (n = 12,149) and Nepali (n = 5689). Members of the Vietnamese community in Canberra, like many other inhabitants of the city, are relatively young, highly paid, and highly educated. Contrary to the densely populated cities of Sydney or Melbourne, where Vietnamese speakers tend to cluster in neighborhoods and are employed in nonspecialist or family businesses, most Vietnamese speakers in Canberra work in education or the public sector, or have a partner doing so (Australian Bureau of Statistics, 2022).

Members of the community engage in regular interactions with each other at a prominent Vietnamese language school in Dickson, North Canberra. Participation extends beyond children and parents to include teachers and administrative staff, often international students or retired community members. Existing members frequently introduce new Vietnamese speakers, fostering new friendships and expanding the community. These school-based interactions form a strong practical and emotional network that extends beyond the classroom, supporting a range of community-bonding activities such as charity stalls, weekly choir practice, karaoke nights, variety shows, and lễ phát phần thưởng ‘traditional end-of-year award ceremonies.’ These shared experiences play a key role in building and maintaining the social fabric of the group (Milroy, Reference Milroy1987:109-138; Wenger, Reference Wenger1998:72-73).

Although no studies have specifically examined Canberra-Vietnamese attitudes toward the homeland, a national survey of 466 Vietnamese speakers in Australia found that while 88% identified as Vietnamese, only 51.5% felt “close” or “very close” to Vietnam. A significant minority (34%) expressed ambivalence, and a smaller group reported feeling “distant” or “very distant” (Ben-Moshe & Pyke, Reference Ben-Moshe and Pyke2012:30). This emotional distance was echoed in a follow-up study five years later, which reported “very weak ties to Vietnam” across various domains, including travel, political engagement, and remittances (Baldassar et al., Reference Baldassar, Pyke and Ben-Moshe2017:938).

Such emotional detachment may also have linguistic implications. Despite high reported levels of Vietnamese language proficiency—90% of respondents claimed strong skills—the so-called Tiếng Việt Cộng Sản ‘Communist Vietnamese variety’ remains a contentious issue. It was even the focus of three consecutive sessions on the Australian Vietnamese Radio Network (Nguyen, Reference Nguyen2012:87). Overall, while Vietnamese identity and language use remain strong in the diaspora, they are maintained as distinct from the homeland.

Null subjects

Subject expression in Vietnamese

Vietnamese was previously classified as a radical null subject language (Biberauer et al., Reference Biberauer, Holmberg, Roberts and Sheehan2010:8), that is, a language that permits the omission of pronominal forms without verbal agreement of any kind. Other languages that also display this behavior include Chinese, Japanese, Korean, and Thai, as well as some others concentrated in Asia and Africa. The key trait that distinguishes Vietnamese from other radical pro-drop languages is that anaphoric reference in Vietnamese can be established not only by reduced pronominal forms but also by kinship terms, as we previously saw in Example (1).

The rationale for treating kin terms as pronouns in Vietnamese rests on two key observations, as outlined in Ngo (Reference Ngo2019). First, when used pronominally, kin terms lose their literal kinship meanings and instead index social features such as gender and relative age. For example, in Example (1b), chị does not mean ‘sister’ but signals that speaker B is female and older than speaker A. Second, these pronominal kin terms can appear in bare form or with a demonstrative (e.g., ông ấy, literally ‘that old male person’), where the demonstrative functions anaphorically, referring to a discourse antecedent with the appropriate social attributes. These features set kin terms apart from ordinary lexical noun phrases, which typically lack social indexing and cannot be modified by demonstratives in this way.

Similarly, Vietnamese personal names, when used pronominally, differ from their use in many other languages, where they typically serve as fixed third-person referents akin to lexical NPs. In Vietnamese, however, personal names can flexibly index first, second, or third person, as well as relative age and social status, functioning much like pronominal kin terms. They are also highly productive in discourse, frequently occupying pronominal positions. In contrast, lexical NPs in Vietnamese lack this grammatical flexibility and do not carry the same discourse prominence.

This rich indexicality of pronominal subjects in Vietnamese imposes additional pragmatic constraints on their use in discourse.Footnote 2 Notably, it is generally considered inappropriate for younger or lower-status speakers to omit first- and second-person pronominal forms (Nguyen, Reference Nguyen1997:96). Ton (Reference Ton2018:201) supported this by reporting that 98.5% (n = 208) of subject drops involving terms of address in her study occurred either between speakers of the same generation or from older to younger speakers. Likewise, Le (Reference Le2011:284) observed in a study of 64 natural utterances that second-person kin terms—used as forms of address—must be overtly expressed to convey proper respect. These Vietnamese-specific pragmatic norms are crucial conditioning factors that need to be accounted for in data analysis.

It is important to note, however, that this pragmatic constraint can be alleviated in several ways. Specifically, in spoken Vietnamese, the politeness markers dạ (utterance initial), vâng (utterance initial, Northern varieties), or (utterance final) are often used to offset first-person pro-drop by younger generations. This practice is demonstrated in Example (2):

Here, the 2SG pronominal form em produced by Speaker A indicates that Speaker B is younger/socially inferior to Speaker A. Although Speaker B dropped the 1SG pronominal form in her response to A, the construction would still be considered appropriate, as the discourse marker dạ offsets the load for politeness. It is also crucial to note that this politeness-offset mechanism only works for first-person, but not for second-person pro-drop—which is strongly resistant to being dropped by younger/lower-ranked speakers, even in the presence of politeness markers of all kinds (Nguyen, Reference Nguyen1997:211). As such, we can expect that the second-person is least likely to be left null by Gen2 speakers.

Previous work on null subjects across generations

Successful transmission of subject pronouns appears to be variety-specific, as shown by contrasting results across different contact varieties (cf. Silva-Corvalán, Reference Silva-Corvalán1994:163 for Spanish; Margaza & Bel, Reference Margaza and Bel2006:421-425 for Greek; Sorace & Filiaci, Reference Sorace and Filiaci2006:353-356 for Italian). For example, Otheguy et al. (Reference Otheguy, Zentella and Livert2007) found that Spanish speakers who arrived in New York City after age 16 and lived there less than six years produced significantly more null subject pronouns than New York City-born second-generation speakers. They attributed this to widespread bilingualism among the second generation, accompanied by reduced Spanish proficiency and use (Otheguy et al., Reference Otheguy, Zentella and Livert2007:795). In contrast, Torres Cacoullos and Travis (Reference Torres Cacoullos and Travis2018:127) found continuity rather than change when comparing contemporary Spanish in New Mexico with an earlier stage of the same variety.

Nagy (Reference Nagy2015) also highlighted variety-specific patterns by comparing Cantonese, Italian, and Russian speakers across three generations in Toronto, Canada. For Cantonese and Italian, no significant difference in null subject use appeared between homeland-born (Gen1) and Toronto-born (Gen2 + 3) speakers. However, for Russian, Nagy identified two cross-generational changes:

  1. 1. a reordering of grammatical person hierarchy affecting null subject likelihood—from third- > second- > first-person in Gen1 to second- > first- > third-person in Gen2; and

  2. 2. negation emerging as a significant predictor for null subjects in Gen2 but not Gen1.

Unlike Otheguy et al. (Reference Otheguy, Zentella and Livert2007), these changes were unrelated to Russian usage frequency or English conditioning factors, suggesting internal cross-generational change independent of language contact (Nagy, Reference Nagy2015:320).

In the Vietnamese context, Tuc (Reference Tuc2003:121) noted that while kin terms generally convey solidarity and respect, pronouns like tao ‘I’ and mày ‘you’ can express either hostility or solidarity depending on the relationship between speakers. These relationships are shaped by social networks and structures, which tend to be less clearly defined in diaspora settings than in the homeland. Nguyen (Reference Nguyen2018) further highlighted this complexity by documenting consistent use of Vietnamese kin terms in code-switching discourse for self- and interlocutor-reference. Crucially, despite some nuanced differences in interpretation, both first- and second-generation speakers regard pronominal kin terms as markers of their “Vietnamese-ness,” using these forms to index their cultural identity. This underscores not only the pragmatic weight carried by Vietnamese pronominal forms but also the speakers’ metalinguistic awareness.

Data and method

The data for this study come from the Canberra Vietnamese-English Corpus (CanVEC; Nguyen & Bryant, Reference Nguyen and Bryant2020), which contains natural speech from 45 Vietnamese-English bilingual speakers (21 men, 24 women) in Canberra, Australia, spanning two generations. First-generation speakers (Gen1) are the first in their families to emigrate, having lived in Vietnam until at least age 18 and residing continuously in Canberra for at least 10 years. Gen1 speakers in the sample range in age from 28 to 67, covering several waves of immigration: some are refugees who fled the Vietnam War, and some are recent economic migrants. Gen2 speakers are the children of Gen1 migrants and were either born in Australia or arrived before the age of five. The age limit for second-generation participants was to ensure early exposure to English-speaking communities before formal schooling and minimal time spent in Vietnam (cf. Hoffman & Walker, Reference Hoffman and Walker2010:44).Footnote 3 Table 1 summarizes the demographic make-up of CanVEC.

Table 1. CanVEC speakers’ demographic information

Data collection

The primary principle in collecting data for CanVEC was to capture vernacular speech with “minimum attention paid to speech” (Labov, Reference Labov, Baugh and Sherzer1984:29). Participants used their mobile phones to self-record a 30-minute conversation or two 15-minute conversations with Vietnamese-English bilinguals from their social networks. No instructions were given regarding language choice. Interlocutors were people the participants normally spoke to casually, such as friends, colleagues, or family members. This approach is a major strength of the corpus, as it maximizes the likelihood of natural language use—especially important given the complex pragmatics of subject expression. This results in 23 recordings in total, most of which are intergenerational conversations (n = 16, 70%); both Gen1 (n = 22, 96%) and Gen2 (n = 17, 73%) speakers are present in most recordings.

After submitting their recordings, participants completed a questionnaire designed to collect data on independent variables relevant to variation in their speech. This included self-rated proficiency in Vietnamese and English, as well as details about their social networks, language attitudes, and other factors. The recordings were then manually transcribed, anonymized, and semi-automatically annotated for language identification and part-of-speech tags (Nguyen & Bryant, Reference Nguyen and Bryant2020). The resulting corpus includes monolingual English, monolingual Vietnamese, and code-switched segments produced by the same speakers. This study focuses on the monolingual Vietnamese portion (n = 7508 clauses), where subject expression shows significant variation.

Extracting and coding for subject expression in Vietnamese

Identifying the variable context

Recall that Vietnamese pronominal forms are not a closed class but derive from kin terms, personal pronouns, or speakers’ names. Since all three categories are productively used for self-, interlocutor-, or third-party reference in the corpus, it is essential first to identify the relevant variable contexts.

Lexicalized set phrases with limited variability were excluded, such as thôi kệ ‘just ignore/leave it;’ nếu mà nói là ‘if I/you say that,’ n = 5; and common polite phrases like cảm ơn ‘thank you’ and xin lỗi ‘sorry.’ Nonhuman subjects, typically lexical NPs (n = 722, 12.5%), and cases where the subject was ambiguous in person, number, or coreferentiality (n = 24, 0.4%) were also excluded. In instances of subject repetition or repair (n = 45, 0.8%), only the final occurrence was counted.

Additionally, third-person singular can function as a neutral pronoun or a nonobligatory expletive. Since this study focuses on referential subjects, expletive uses of (e.g., Nó cứ thấy thế nào ấy ‘It feels a bit odd;’ n = 15, 0.3%) were excluded. Similarly, mình can refer to a nonspecific first-person plural (‘we’, similar to English generic ‘you;’ n = 15) or a specific first-person plural including speaker and interlocutor (n = 249). Only the specific mình instances were included.

In total, 14.5% of the data (837 out of 5,781 clauses) were excluded based on these criteria. Table 2 presents an overview of the rate of null and expressed subjects by types in the final dataset.

Table 2. Cross-generational distribution of Vietnamese expressed subjects

As we can see, speakers produce expressed subjects nearly 70% of the time—a striking figure for a radical pro-drop language. We will revisit this finding in the Discussion. For now, the key point is that there is no significant difference in overall rates of expressed subjects between first- and second-generation speakers. However, as noted earlier, while the overall rates remain stable, the conditioning factors influencing subject expression may be changing. To investigate this, we turn to multivariate analyses that examine the combined effects of linguistic and extra-linguistic factors.

Coding linguistic factors

Person-Number: Grammatical person and number have consistently been presented as two of the strongest conditioning factors for subject expression. For instance, first-person is reportedly the most commonly expressed subject pronoun in Spanish (Bayley & Pease-Alvarez, Reference Bayley and Pease-Alvarez1997:363), European Portuguese (Barbosa et al., Reference Barbosa, Duarte and Kato2005:22), and Mandarin Chinese (Jia & Bayley, Reference Jia and Bayley2002:110), while second- and third-person are the most frequent overt pronouns in Russian (Nagy et al., Reference Nagy, Aghdasi, Denis and Motut2011:142), Brazilian Portuguese (Barbosa et al., Reference Barbosa, Duarte and Kato2005:22), and Santomean Portuguese (Bouchard, Reference Bouchard2018:17). In Vietnamese, however, we have previously seen that such norms vary with interlocutor and their associated status. The presence of these different discourse conventions means that, for grammatical person-number at least, instead of looking for absolute universals, we need to consider variety-specific patterns that are currently in play (Schroter, Reference Schroter2019:29).

Crucially, Vietnamese pronominal forms do not map neatly onto grammatical person-number. This is again largely due to the kinship-based system, where a single form can shift reference depending on the speaker and discourse context—for instance, the pronominal form con in Example (3) may refer to second-person singular when used by Tanner but first-person singular when used by Nina. Note that every CanVEC example presented features a transcript name (e.g., Tanner.Nina.0609) and a timestamp, with the subscript accompanying the speaker name indicating their generation membership (1 = Gen1, 2 = Gen2). English is given in regular print, while all non-English morphemes are given in italics.

The coding of person-number in Vietnamese thus relies entirely on the interpretation of each use within the wider discourse context. Thanks to the rich conversational data in the CanVEC corpus, this interpretation is straightforward. In this study, we code person and number as separate variables to isolate their individual effects.

Clause type: Cross-linguistic research also shows that clause type influences subject expression in many languages, including English (Harvie, Reference Harvie1998:21), Spanish (Torres Cacoullos & Travis, Reference Torres Cacoullos and Travis2018:91-94), Russian (Nagy et al., Reference Nagy, Aghdasi, Denis and Motut2011:142), and Chinese (Jia & Bayley, Reference Jia and Bayley2002:111; Li et al., Reference Li, Chen and Chen2012:112). In Cantonese, for example, main clauses tend to favor null subjects, whereas conjoined clauses tend to disfavor them (Nagy et al., Reference Nagy, Aghdasi, Denis and Motut2011:141). Given that Vietnamese falls into the same group of “radical pro-drop” as Cantonese, we might expect comparable patterns.

Coreferentiality: This concept relates to the well-established notion of referent accessibility. Since Givón’s (Reference Givón and Givón1983) work on topic continuity, numerous studies have shown that more accessible referents require less explicit coding—unexpressed forms represent the lowest coding effort and thus signal high accessibility. Here, accessibility is operationalized as coreferentiality, defined by whether the subject in the current clause shares the same referent as the subject in the preceding clause, regardless of the speakers. This approach reflects the discourse-dependent, co-constructed nature of reference. Examples (4) and (5) illustrate the same and switched referents, respectively.

Finally, cases of partial coreferentiality (e.g., we → I) were simply marked as if they were fully coreferential, mainly because these cases are extremely rare in the corpus (n = 5), and a separate treatment becomes too fine-grained.

Coding extra-linguistic factors

Data for extra-linguistic factors was extracted from the questionnaire responses containing speakers’ information on age, gender, primary language of the social network, self-assessed proficiency in each language, attitude toward each language, and speakers’ ethnic orientation—all of which have been shown to shape language variation (e.g., Kiesling, Reference Kiesling, Di Paolo and Yaeger-Dror2009; Labov, Reference Labov1972, Reference Labov, Baugh and Sherzer1984; Milroy & Milroy, Reference Milroy and Milroy1992).

Furthermore, given the honorific indexicality of Vietnamese pronominal forms, pragmatic constraints are likely to have an effect. Although it is not possible to conclusively define situational “respect” or “politeness,” the obvious factors here are the age gap between speakers and their respective social statuses. This pragmatic constraint was thus operationalized as “Interlocutor’s Age” and “Interlocutor’s Generation.” “Interlocutor’s Age” was coded as a binary factor: older/younger (in relation to the speaker themselves).Footnote 4

Statistical modeling

Generalized linear mixed-effects (GLMM) modeling was conducted using the lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in R (version 4.2.1; R Core Team, 2022). Subject expression was treated as a binary response variable (null/dropped = 0, expressed = 1). The random effect structure was kept maximal for Speaker, as justified by the data. Categorical predictors were weighted effect-coded to account for the unbalanced sample sizes (Darlington & Hayes, Reference Darlington and Hayes2017:298-300; te Grotenhuis et al., Reference te Grotenhuis, Pelzer, Eisinga, Nieuwenhuis, Schmidt-Catran and König2017:163-166), while the only continuous predictor—speaker age—was z-standardized.

To identify significant predictors and to achieve a more parsimonious model, likelihood ratio tests were conducted by comparing the full model (with all predictors included) to separate restricted models, each excluding only one predictor at a time. Only predictors that significantly improved model fit—that is, those that contributed meaningfully to explaining the outcome based on model comparisons—were included in the best-fitting model. Interaction terms were further investigated using the emmeans package (Lenth, Reference Lenth2018). The fixed and random effects included in the models are described in the next section.

Multicollinearity was assessed by cross-tabulation, adjusted Generalized Variance Inflation Factor (GVIF), and inspection of regression diagnostics. Due to high collinearity with ethnic orientation, attitudinal and self-assessed proficiency variables for Vietnamese and English were removed. Given the cultural load of subject expression in Vietnamese, ethnic orientation was deemed a more relevant predictor.

Finally, since clause type is expected to correlate to some degree with person (e.g., imperatives and interrogatives are more likely to occur with second-person subjects) and with Generation (e.g., more imperatives are likely to be produced by Gen1 speakers), we examine the distribution of these three factors in the dataset, as shown in Table 3. It is clear from the table that the counts for non-main clauses become relatively low when subdivided by person and generation. Including them in the analysis could introduce issues of non-orthogonality and confound interpretation. We thus proceeded with the analyses focusing on main clauses only.

Table 3. Distribution of Clause Type by Person and Generation (percentages reflect distribution by clause type)

Variable subject expression in the Canberra-Vietnamese community

Focusing on main clauses only, regression analysis was next performed to ascertain whether there are generational differences in subject expression, controlling for the various linguistic and social factors as described. The list of fixed effects included in the maximal model can be found in Table 4. The random effect structure in the maximal and best-fitting models included a random intercept for Speaker and a by-speaker random slope for Person.

Table 4. Fixed effects included in the maximal generalized linear mixed-effect model, including number of observations per level of each categorical variable, contrast weights, and the justifications for the interaction terms

Note: There were only five tokens of 2PL, which were therefore removed from the dataset. For this reason, an interaction term between Number and Person could not be included in the maximal model.

In the best-fitting GLMM model (n = 4197, conditional R 2 = 0.19), the main effects of Number (χ2(1) = 65.6, p < 0.0001) and the two-way interaction between Generation and Person (χ2(2) = 18.1, p < 0.0001) significantly improved model fit. The adjusted GVIF values for all fixed effects in the final model were below 1.02. As Table 5 illustrates, the main effects of Number and Person, and the two-way interaction between Generation and Person, were significant predictors. Compared to the weighted grand mean, null subjects were significantly more likely in singular subjects but less likely in second-person subjects.Footnote 5

Table 5. Summary of fixed effects from the best-fitting generalized linear mixed model with Subject expression as response (0 = null, 1 = expressed) and random intercept for Speaker

Note: Syntax of best-fitting model: glmer (Subject ∼ Number + Generation*Person + (1 + Person|Speaker)). Values in bold are statistically significant.

We then probed the interaction between Generation and Person. The plot of the marginal effects of this two-way interaction is presented in Figure 1. Pairwise comparisons within generations revealed that for Gen1 speakers, second-person subjects (% null = 20.1) were least likely to be null compared to first-person (% null = 33.4; OR = 2.10, SE = 0.49, z = 3.20, p = 0.004) and third-person subjects (% null = 31.1; OR = 1.87, SE = 0.46, z = 2.54, p = 0.03). Contrastingly, for Gen2 speakers, first-person subjects (% null = 22.6) were least likely to be null compared to third-person (% null = 38.0; OR = 0.34, SE = 0.12, z = -3.20, p = 0.004) and second-person subjects (% null = 23.7; OR = 0.73, SE = 0.33, z = -0.72, p = 0.75).

Figure 1. Marginal means of the interaction between Generation and Person for subject expression, with error bars representing 95% confidence intervals.

In summary, while there was no overall difference in the rate of unexpressed subjects, generational differences were observed in the conditioning factorsFootnote 6:

  • Gen1 speakers were least likely to drop second-person subjects: first ≈ third > second; whereas

  • Gen2 speakers were least likely to drop first-person subjects: third ≈ second> first

In other words, the hierarchy of the Person constraints has been reordered across generations when it comes to unexpressed subjects.

Discussion: subject expression in heritage Vietnamese

In much of the existing literature, a significant linguistic predictor—especially with reordered factor rankings—is often taken as evidence of cross-generational change in speakers’ grammar or competence (cf. Nagy et al., Reference Nagy, Aghdasi, Denis and Motut2011). However, in Vietnamese, subject expression carries culturally embedded meaning, making the notion of “grammatical change” less clear-cut. We therefore turn to the role of pragmatic norms in explaining the observed cross-generational variation in subject omission.

The peculiar direction of person effects: pragmatic shifts driving grammatical choices

As our analyses revealed, while the likelihood of subject omission is similar across generations, the types of subjects omitted differ. Specifically, Gen1 speakers were most likely to drop first-person subjects, whereas Gen2 speakers were more likely to drop second-person subjects. This contrast is exemplified in Example (6).

This specific direction of effects runs counter to expectations. In fact, we anticipated that Gen2 speakers, due to age/status differences compared to Gen1, would be more likely to express second-person subjects. Our findings are thus particularly striking for second-person subjects.

To further probe this, we examine the patterning of different expressed forms across generations, as demonstrated in Table 6.

Table 6. Distribution of subject expression by Type, Person, and Generation

As shown, kin terms are widely used for all person subjects across both generations. For second-person subjects in particular, Gen2 speakers not only exhibit a higher rate of null subjects than Gen1 speakers (24% versus 20%), but also show a higher rate of kin term usage (75% versus 52%) and a much lower rate of personal names (2.0% versus 28%). This pattern suggests that the key pragmatic difference lies in the avoidance of personal names.

While this may align with expectations for younger speakers in family settings, it is important to note that we cannot conclusively determine whether the observed behavior is driven by age or by generational status. This is because all instances of second-person forms produced by Gen2 speakers in the corpus occur in conversations with Gen1 interlocutors who are also older, as shown in Figure 2.

Figure 2. Distribution of second-person subject types by Speaker’s Generation, Interlocutor’s Generation, and Interlocutor’s Age.

We can also observe from Figure 2 that, in contrast to the Gen2 pattern of avoiding personal names, Gen1 speakers show a clear tendency to express second-person subjects more frequently than not (i.e., null). Specifically, Gen1 speakers consistently use overt forms, even when addressing interlocutors who are younger, from a later generation, or both. This pattern may reflect a degree of cultural assimilation into Australian society. Given that most speakers in Canberra are highly educated, they have likely formed more diverse social networks through their jobs and education, providing exposure to broader communicative norms. These experiences may foster the development of distinct linguistic styles (cf. Le Page & Tabouret-Keller, Reference Le Page and Tabouret-Keller1985), or what Eckert (Reference Eckert, Chiang, Chun, Mahalingappa and Mehus2004:42) described as a community bricolage, where individuals draw on a range of resources to create new or reinterpreted meanings. In this context, the use of overt second-person subjects when addressing younger or later-generation speakers may serve to index an identity as “modern” Vietnamese—those who engage more equally with socially “lesser” interlocutors, thereby subtly distancing themselves from traditional homeland norms, shaped in part by past emotional experiences.

Such emotional distance could manifest in linguistic distance in various forms. Since Gen1 speakers also serve as linguistic input for Gen2 speakers, it follows that the pragmatic norms of explicitly expressing second-person subjects toward older interlocutors may not have been properly transmitted and acquired by the second generation. It is also probable that the bricolage was innovated by Gen2 speakers themselves, and by frequently dropping second-person subjects directed at older speakers, these younger speakers are trying to reject the Vietnamese social hierarchy entrenched in the language, thereby establishing a more equal relationship with the older generation.Footnote 7

Ultimately, it is important to emphasize that the reduced use of second-person pronominal subjects among Gen2 speakers cannot be explained by an increased reliance on alternative politeness strategies. As previously discussed, while politeness markers may reduce the need for an explicit first-person subject, they do not have the same effect on second-person subject drop. Moreover, such markers are relatively rare in our corpus (n = 55), and most appear in idiomatic expressions that were already excluded from the analysis. Taken together, these observations support the interpretation that the observed pattern of second-person subject drop is primarily driven by a shift in speakers’ pragmatic norms. This interpretation aligns with earlier research in Australia (e.g., Clyne, Reference Clyne2003:1-19), as well as with Sharma’s (Reference Sharma2011:484) study of second-generation British Asians, which found that this group’s linguistic practices across various contexts reflect a gradual but systematic move toward the norms of the dominant society.

Lack of Coreferentiality effects

Another result that is worth commenting on is the lack of the Coreferentiality effect in conditioning subject drop in heritage Vietnamese. For avoidance of doubt, we present Tables 7 and 8. Table 7 shows the proportion of coreferential subjects across different subject types, along with pairwise comparisons from a logistic regression model in which Coreferentiality is the response variable (non-coreferential = 0, coreferential = 1) and Subject Type is the fixed effect. Table 8 displays the rates of subject expression in coreferential versus non-coreferential contexts, again restricted to main clauses to control for potential clause-type effects.

Table 7. Distribution of Coreferentiality (no/yes) by Subject Type (top) and pairwise comparisons of Subject Type predicting Coreferentiality (bottom)

Table 8. Distribution of Coreferentiality by Person in Vietnamese main clauses

As Table 7 demonstrates, there is no clear division of labor between null and expressed subjects across subject types, and no particular subject type appears significantly more sensitive to Coreferentiality as a conditioning factor. Likewise, Table 8 demonstrates that the rates of expressed and unexpressed subjects are remarkably similar, regardless of whether the context is coreferential or not. Taken together, these results suggest that the absence of a Coreferentiality effect is not due to collinearity or interactions with Subject Type. Rather, the absence of such an effect strongly appears to be genuine.

Although the absence of a Coreferentiality effect may seem to contradict cross-linguistic trends (e.g., Torres Cacoullos & Travis, 2018; Frascarelli, Reference Frascarelli, Cognola and Casalicchio2018; Jia & Bayley, Reference Jia and Bayley2002; Nagy et al., Reference Nagy, Aghdasi, Denis and Motut2011; Owens et al., Reference Owens, Dodsworth and Kohn2013), it aligns with Huang’s (Reference Huang1984) account of Mandarin Chinese, where null objects—not subjects—are more closely tied to discourse topics. In fact, later experimental works by Ngo (Reference Ngo2019) also found that Vietnamese speakers favor object reference in pronoun resolution, challenging the widely observed subject preference bias (cf. Chafe, Reference Chafe and Li1976). Ngo (Reference Ngo2019) also showed that both null and overt pronouns in Vietnamese are equally sensitive to grammatical role and structural parallelism—factors central to determining coreferentiality.Footnote 8 She further concluded that Vietnamese patterns more like Chinese than like Italian or Spanish, where a clearer division of labor in coreferential functions typically exists (Ngo, Reference Ngo2019:23). Given that variationist studies on subject expression have largely been dominated by languages of the latter type, it is perhaps unsurprising that coreferentiality plays a weaker role in the present work (see also Jia & Bayley, Reference Jia and Bayley2002:109 on Mandarin; Nagy et al., Reference Nagy, Aghdasi, Denis and Motut2011:210 on Cantonese).

Has Vietnamese co-evolved with speakers’ English?

Given that we have seen signs of evolution of Canberra Vietnamese unexpressed subjects across generations, an important question to ask is whether this has evolved in tandem with the speakers’ English. Although we do not have homeland data to confirm or refute contact effects (e.g., Torres Cacoullos & Travis, 2018; Nagy, Reference Nagy2015), comparing the patterns between those directly in contact still allows us to gauge the extent to which these languages interact and influence each other.

A close look at the English subset shows that the distribution of expressed subjects is near categorical. In fact, out of more than 2,500 English clauses, there were only 39 instances of unexpressed subjects (Gen1 = 25/1380; Gen2 = 14/1202). All of these are either 2SG within an imperative clause (Example 7) or within a conjoined clause with or without an overt conjunction (Examples 8 and 9, respectively).

Two conclusions can therefore be made:

  1. 1. pronominal subjects are almost always expressed in the speakers’ English; and

  2. 2. when an unexpressed subject occurs, it occurs in the expected environment.

This is in stark contrast with the pattern of unexpressed subjects in the speakers’ Vietnamese, both in terms of frequency and linguistic distribution. Further observations from the code-switching subset of the data also show that subject-drop grammars remain clearly separated within stretches of single-language use. Examples (10) and (11) provide some illustrations.

As we can see, across both generations, Vietnamese subjects are left null largely in Vietnamese stretches. In contrast, English subjects are consistently expressed in English segments. This separation is clear despite the close proximity of the two languages in discourse. The observation is thus consistent with Nagy’s (Reference Nagy2015) conclusions for heritage Cantonese, Italian, and Russian in Toronto, as well as Torres Cacoullos and Travis's (Reference Torres Cacoullos and Travis2018) work on New Mexican Spanish. The consensus is that the underlying grammar of subject expression in English and the substrate varieties remains separate, despite the highly bilingual nature of the communities and their sustained contact.

A note on overt subject expression

Before concluding, we would like to return to the observation in Table 2 about the distinction between null and overt forms in the corpus. Specifically, speakers’ preference for expressed forms over unexpressed forms is striking, accounting for at least 70% of all instances in the variable context. This counters the typical assumption of low rates of subject expression in “discourse pro-drop” languages. Although part of this high proportion of expressed subjects might be attributed to the “extended use” of overt forms in contact scenarios (e.g., Montrul, Reference Montrul2002 et seq.; Otheguy et al., Reference Otheguy, Zentella and Livert2007; Polinsky, Reference Polinsky2018 et seq.; Rothman, Reference Rothman2009; Silva-Corvalán, Reference Silva-Corvalán1994; Sorace & Filiaci, Reference Sorace and Filiaci2006 et seq.; Tsimpli et al., Reference Tsimpli, Sorace, Heycock and Filiaci2003; i.a.), let us not forget several pragmatic constraints that condition Vietnamese unexpressed subjects even in a monolingual variety. It is worth noting then that, to date, most of the existing work on pro-drop languages has only focused on discourse conditions (e.g., coreferentiality, ambiguity, distance from the previous mention, etc.; see Frascarelli, Reference Frascarelli, Cognola and Casalicchio2018; Nagy et al., Reference Nagy, Aghdasi, Denis and Motut2011; Owens et al., Reference Owens, Dodsworth and Kohn2013; i.a.) rather than pragmatic factors that regulate the expression of subjects. This highlights a gap in our understanding of Vietnamese-type pro-drop languages, where established discourse factors such as coreferentiality play less of a role (cf. Ngo, Reference Ngo2019), but pragmatic factors such as politeness, age, and perceived social status are considered more important (cf. Song & Nguyen, Reference Song and Nguyen2022; Song et al., Reference Song, Nguyen, Biberauer and Patterson2023).

More broadly, our findings here also corroborate those of Torres Cacoullos and Travis (Reference Torres Cacoullos and Travis2018), who previously showed how Polish and European Portuguese (both null-subject languages) are closer to Japanese and Korean (both considered “discourse pro-drop” languages), respectively, in terms of subject expression rates, than they are to each other. The expression (or lack thereof) of subjects in the so-called “radical pro-drop” languages is thus perhaps not so radical after all. This invites further typological inquiry into subject expression, especially in light of recent quantitative studies that have challenged such traditional typologies (cf. Torres Cacoullos & Travis, Reference Torres Cacoullos and Travis2019). The findings also highlight the importance of understanding social context in developing theoretical work.

Conclusion

In this paper, we investigated subject expression across generations in the Vietnamese community living in Canberra, Australia. Cross-generational effects were detected in relation to Person as a conditioning factor: while unexpressed subjects were most favored with first-person subjects for Gen1 speakers, they were most favored with second-person subjects for the Gen2 speakers. Given that Gen2 speakers are socially obliged to overtly realize forms referring to their older interlocutors (which is the case for all but one Gen2 in the corpus), this specific direction of effects runs counter to expectations.

Given the community background and the consistent patterns observed across the corpus, we interpret this finding as a possible reflection of the speakers’ cultural integration into Australian society—more specifically, a form of community bricolage to promote more egalitarian relationships between generations in the diaspora. Although it remains to be seen whether the observed “change” reflects a temporary shift in community norms or a lasting development in the heritage language, the consistent patterns found suggest that this behavior has taken hold within the community. Ultimately, the findings reported here present a case where pragmatic norms can, in fact, drive grammatical changes in a language-contact situation.

Linguistically, we offer two key findings. First, the overwhelmingly high rate of expressed subjects across both generations in this work calls into question the classification of Vietnamese as a “radical” pro-drop language. Second, we find that coreferentiality is not a significant predictor of null subject use in Vietnamese, challenging a widely observed cross-linguistic trend in this area of research. We account for these results by drawing on prior theoretical and experimental work on Vietnamese and Mandarin Chinese as a typologically similar language.

On a broader scale, we emphasize the importance of studying under-described heritage languages and the communities where they are spoken. Heritage language is still an emerging field, and as Polinsky and Scontras (Reference Polinsky and Scontras2020:13) noted, “we have barely scraped the surface” of its rich empirical landscape. The lack of data from a diverse source of communities and language varieties has only exacerbated this problem (Stanford, Reference Stanford2016:528). It is our hope that this study—which explored a lesser-documented heritage language like Vietnamese within an atypical community such as Canberra—represented a step toward addressing this gap.

Acknowledgements

We thank the editors and three anonymous reviewers for their helpful feedback.

Competing interest declarations

The author(s) declare none.

Footnotes

1. We thank Mark Brunelle for providing these examples upon request, drawn from recordings of southern Vietnamese conversations to which he gave us access in 2020. While not part of our corpus, they represent naturally occurring Vietnamese speech and are included here to illustrate how Vietnamese subjects can be either expressed or omitted, independently of our own data. Brunelle’s project is funded by Social Sciences and Humanities Research Council of Canada grant 435-2012-0468.

2. Recent research has also demonstrated that these same constraints also shape how pronominal reference is handled in Vietnamese-English code-switching, with measurable effects on machine translation quality (Nguyen et al., Reference Nguyen, Mayeux and Yuan2023).

3. It is important to stress that generation membership is not necessarily age-correlated in the context of this study. The decision to not group younger speakers together as “Gen2” is justified on the basis that, both culturally and linguistically, migrants arriving as adults (as refugees or for economic reasons) have more in common with each other than with those born in Australia or who arrived as young children. Best efforts were made to recruit an equal number of first- and second-generation speakers, however the nature of the data collection method and its emphasis on natural speech meant that it was very difficult to perfectly balance the dataset.

4. Speaker’s Age was modeled as a continuous variable to avoid arbitrarily drawing a line in the population, but it was not necessary to do so for the “Interlocutor’s Age.” The only element that has pragmatic relevance here is whether the interlocutor is deemed older or younger than the speaker, hence the binary coding older/younger for this factor.

5. A reviewer pointed out that the apparent lack of an effect from the interlocutor’s age may not be genuine, but rather obscured by a generational effect. We thus replaced Generation with Interlocutor Age in the best-fitting model to ascertain if this might be the case. We found that Interlocutor Age as a main effect was not significant in this model, but the two-way interaction between Interlocutor Age and Person was significant. However, pairwise comparisons using estimated marginal means (Bonferroni-adjusted) did not reveal any meaningful findings: when the interlocutors were of the same age, third-person pronouns were more likely to be null compared to second-person pronouns (OR = 15.63, SE = 13.13, z = 3.27, p = 0.003). We also found that younger interlocutors were more likely to express third-person subjects compared to when the interlocutors were of the same age (OR = 0.15, SE = 0.11, z = 2.60, p = 0.03).

6. Ranking was achieved by using odds ratios and derived probabilities as proxy for effect size.

7. Although “Interlocutor’s Generation” did not emerge as a significant predictor in the multivariate analysis, this does not mean that this factor is not linguistically significant. As Torres Cacoullos and Travis (Reference Torres Cacoullos and Travis2018:8) pointed out, other than the general susceptibility of statistics to sample size, it is also the case for speech data that overall rates fluctuate due to extra-linguistic factors such as interviewer, topic, or genre. The American Statistical Association (ASA) therefore also cautions that “statistical significance is thus not equivalent to scientific, human, or economic significance” (ASA Statement, Wasserstein & Lazar, Reference Wasserstein and Lazar2016).

8. Within CanVEC, we also have results to show that coreferentiality is indeed a strong predictor for null objects, but not null subjects, in heritage Vietnamese (Nguyen & Sim, forthcoming).

References

Aalberse, Suzanne, Backus, Ad, & Muysken, Pieter. (2019). Heritage languages: A language contact approach. Amsterdam: John Benjamins Publishing Company. https://doi.org/10.1075/sibil.58.CrossRefGoogle Scholar
Australian Bureau of Statistics. (2022). Census 2021: Australian capital territory. Technical Report. Canberra, ACT: ABS.Google Scholar
Baldassar, Loretta, Pyke, Joanne, & Ben-Moshe, Danny. (2017). The Vietnamese in Australia: Diaspora identity, intra-group tensions, transnational ties and ‘victim’ status. Journal of Ethnic and Migration Studies 43(6):937-955. https://doi.org/10.1080/1369183X.2016.1274565.CrossRefGoogle Scholar
Barbosa, Pilar, Duarte, Maria Eugênia L., & Kato, Mary Aizawa. (2005). Null subjects in European and Brazilian Portuguese. Journal of Portuguese Linguistics 4(2):11-52. https://doi.org/10.5334/jpl.158.CrossRefGoogle Scholar
Bates, Douglas, Mächler, Martin, Bolker, Ben, & Walker, Steve. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1):1-48. https://doi.org/10.18637/jss.v067.i01.CrossRefGoogle Scholar
Bayley, Robert & Pease-Alvarez, Lucinda. (1997). Null pronoun variation in Mexican-descent children’s narrative discourse. Language Variation and Change 9(3):349-371. 10.1017/S0954394500001964.10.1017/S0954394500001964CrossRefGoogle Scholar
Ben-Moshe, Danny & Pyke, Joanne. (2012). The Vietnamese diaspora in Australia: Current and potential links with the homeland. Report of an Australian Research Council Linkage Project. Technical Report. Canberra: The Australian Research Council.Google Scholar
Biberauer, Theresa, Holmberg, Anders, Roberts, Ian, & Sheehan, Michelle. (2010). Parametric variation: Null subjects in minimalist theory. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511770784.Google Scholar
Bouchard, Marie-Eve. (2018). Subject pronoun expression in Santomean Portuguese. Journal of Portuguese Linguistics 17(1):5. https://doi.org/10.5334/jpl.191.CrossRefGoogle Scholar
Carruthers, Ashley. (2008). Vietnamese. In The dictionary of Sydney. Sydney. https://dictionaryofsydney.org/entry/vietnamese.Google Scholar
Chafe, Wallace. (1976). Givenness, contrastiveness, definiteness, subjects, topics, and point of view. In Li, C. N. (ed.), Subject and topic. New York: Academic Press. 25-56.Google Scholar
Clark, Marybeth. (1988). Vietnamese language and culture. In Tran, T. N. N., Nguyen, H., & Le, L. (eds.), Vietnamese language and attitudes towards personal relations. South Australia: Vietnamese Community in Australia. 21-25.Google Scholar
Clyne, Michael. (2003). Dynamics of language contact: English and immigrant languages. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511606526.CrossRefGoogle Scholar
Darlington, Richard B. & Hayes, Andrew F. (2017). Regression analysis and linear models: Concepts, applications, and implementation. New York: Guilford Press.Google Scholar
Eckert, Penelope. (2004). The meaning of style. In Chiang, W. F., Chun, E., Mahalingappa, L., & Mehus, S. (eds.), Symposium About Language and Society, SALSA 11. Austin: Texas Linguistics Forum 47. 41-53.Google Scholar
Frascarelli, Mara. (2018). The interpretation of pro in consistent and partial null-subject languages: A comparative interface analysis. In Cognola, F. & Casalicchio, J. (eds.), Null subjects in generative grammar: A synchronic and diachronic perspective. Oxford: Oxford University Press. 211-239.Google Scholar
Givón, Talmy. (1983). Topic continuity in discourse: An introduction. In Givón, T. (ed.), Topic continuity in discourse: A quantitative cross-linguistic study. Amsterdam: John Benjamins Publishing Company. 1-41.10.1075/tsl.3CrossRefGoogle Scholar
Harvie, Dawn. (1998). Null subject in English: Wonder if it exists? Cahiers Linguistiques d’Ottawa 16:15-25.Google Scholar
Hoffman, Michol F. & Walker, James A. (2010). Ethnolects and the city: Ethnic orientation and linguistic variation in Toronto English. Language Variation and Change 22(1):37-67. https://doi.org/10.1017/S0954394509990238.CrossRefGoogle Scholar
Huang, C.-T. James. (1984). On the distribution and reference of empty pronouns. Linguistic Inquiry 15(4):531-574.Google Scholar
Jia, Li & Bayley, Robert. (2002). Null pronoun variation in Mandarin Chinese. University of Pennsylvania Working Papers in Linguistics 8(3):103-116.Google Scholar
Kiesling, Scott. (2009). Style as stance: Stance as the explanation for patterns of sociolinguistic variation. In Di Paolo, M. & Yaeger-Dror, M. (eds.), Stance: Sociolinguistic perspectives. Oxford: Oxford University Press. 171-194.10.1093/acprof:oso/9780195331646.003.0008CrossRefGoogle Scholar
Labov, William. (1972). Some principles of linguistic methodology. Language in Society 1(1):97-120. https://doi.org/10.1017/S0047404500006576.CrossRefGoogle Scholar
Labov, William. (1984). Field methods of the project on linguistic change and variation. In Baugh, J. & Sherzer, J. (eds.), Language in use: Readings in sociolinguistics. Englewood Cliffs, NJ: Prentice Hall. 28-53.Google Scholar
Le, Phuc Thien. (2011). Transnational variation in linguistic politeness in Vietnamese: Australia and Vietnam. Doctoral dissertation, Victoria University, Melbourne.Google Scholar
Le Page, Robert Brock & Tabouret-Keller, Andrée. (1985). Acts of identity: Creole-based approaches to language and ethnicity. Cambridge: Cambridge University Press.Google Scholar
Lenth, Russell. (2018). Estimated marginal means, aka least-squares means. R package.Google Scholar
Li, Xiaoshi, Chen, Xiaoqing, & Chen, Wen-Hsin. (2012). Variation of subject pronominal expression in Mandarin Chinese. Sociolinguistic Studies 6(1):91-119. https://doi.org/10.1558/sols.v6i1.91.CrossRefGoogle Scholar
Margaza, Panagiota & Bel, Aurora. (2006). Null subjects at the syntax-pragmatics interface: Evidence from Spanish interlanguage of Greek speakers. In M. G. O’Brien, C. Shea, & J. Archibald (eds.), Proceedings of the 8th Generative Approaches to Second Language Acquisition Conference. Somerville, MA: Cascadilla Press. 88-97.Google Scholar
Milroy, Lesley. (1987). Language and social networks. Oxford: Basil Blackwell.Google Scholar
Milroy, Lesley & Milroy, James. (1992). Social network and social class: Toward an integrated sociolinguistic model. Language in Society 21(1):1-26. 10.1017/S0047404500015013.10.1017/S0047404500015013CrossRefGoogle Scholar
Montrul, Silvina. (2002). Incomplete acquisition and attrition of Spanish tense/aspect distinctions in adult bilinguals. Bilingualism: Language and Cognition 5(1):39-68. https://doi.org/10.1017/S1366728902000135.CrossRefGoogle Scholar
Nagy, Naomi. (2015). A sociolinguistic view of null subjects and VOT in Toronto heritage languages. Lingua 164:309-327. https://doi.org/10.1016/j.lingua.2014.04.012.CrossRefGoogle Scholar
Nagy, Naomi, Aghdasi, Nina, Denis, Derek, & Motut, Alexandra. (2011). Null subjects in heritage languages: Contact effects in a cross-linguistic context. University of Pennsylvania Working Papers in Linguistics 17(2):135-144.Google Scholar
Ngo, Binh. (2019). Vietnamese pronouns in discourse. PhD dissertation, University of Southern California.Google Scholar
Ngo, Binh. (2020). Vietnamese: An essential grammar. New York: Routledge.10.4324/9781315454610CrossRefGoogle Scholar
Ngo, Thanh. (2006). Translation of Vietnamese terms of address and reference. Translation Journal 10(4):Online publishing. https://translationjournal.net/journal/38viet.htm. Accessed May 2020.Google Scholar
Nguyen, Dinh Hoa. (1997). Vietnamese. Amsterdam: John Benjamins Publishing Company.10.1075/loall.9CrossRefGoogle Scholar
Nguyen, Li. (2018). Borrowing or code-switching? Traces of community norms in Vietnamese-English speech. The Australian Journal of Linguistics 38(4):443-466. https://doi.org/10.1080/07268602.2018.1510727.CrossRefGoogle Scholar
Nguyen, Li & Bryant, Christopher. (2020). Canvec – The Canberra Vietnamese-English codeswitching natural speech corpus. In Proceedings of the 2020 International Conference on Language Resources and Evaluation. Marseille, France: Language Resources and Evaluation. 4121-4129.Google Scholar
Nguyen, Li, Mayeux, Oliver, & Yuan, Zheng. (2023). Code-switching input for machine translation: A case study of Vietnamese–English data. International Journal of Multilingualism 21(4):2268-2289. https://doi.org/10.1080/14790718.2023.2224013.CrossRefGoogle Scholar
Nguyen, Thy Tan Lan. (2012). Code choice in the Vietnamese community in Sydney. Doctoral dissertation, Australian National University.Google Scholar
Otheguy, Ricardo, Zentella, Ana Celia, & Livert, David. (2007). Language and dialect contact in Spanish in New York: Toward the formation of a speech community. Language 83(4):770-802. https://doi.org/10.1353/lan.2008.0019.CrossRefGoogle Scholar
Owens, Jonathan, Dodsworth, Robin, & Kohn, Mary. (2013). Subject expression and discourse embeddedness in Emirati Arabic. Language Variation and Change 25(2):255-285. https://doi.org/10.1017/S0954394513000173.CrossRefGoogle Scholar
Polinsky, Maria. (2018). Heritage languages and their speakers. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781107252349.CrossRefGoogle Scholar
Polinsky, Maria & Scontras, Gregory. (2020). Understanding heritage languages. Bilingualism: Language and Cognition 23(1):4-20. https://doi.org/10.1017/S1366728919000245.CrossRefGoogle Scholar
R Core Team. (2022). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org/. Accessed December 2024.Google Scholar
Rothman, Jason. (2009). Understanding the nature and outcomes of early bilingualism: Romance languages as heritage languages. International Journal of Bilingualism 13(2):155-163. https://doi.org/10.1177/1367006909339814.CrossRefGoogle Scholar
Schroter, Verena. (2019). Null subjects in Englishes: A comparison of British English and Asian Englishes. Berlin: de Gruyter. https://doi.org/10.1515/9783110649260.CrossRefGoogle Scholar
Sharma, Devyani. (2011). Style repertoire and social change in British Asian English. Journal of Sociolinguistics 15:464-492. https://doi.org/10.1111/j.1467-9841.2011.00503.x.CrossRefGoogle Scholar
Silva-Corvalán, Carmen. (1994). Language contact and change: Spanish in Los Angeles. Oxford: Clarendon Press.10.1093/oso/9780198242871.001.0001CrossRefGoogle Scholar
Sim, Jasper Hong. (2023). Influence of bilingualism or caregiver input? Variation in VOT in simultaneous bilingual preschoolers in Singapore. In R. Skarnitzl & J. Volín (eds.), Proceedings of the 20th International Congress of Phonetic Sciences. Prague: Guarant International. 2344-2348.Google Scholar
Sim, Jasper Hong & Post, Brechtje. (2024). Influence of caregiver input and language experience on the production of coda laterals by English–Malay bilingual preschoolers in multi-accent Singapore. Journal of Child Language 51(6):1290-1315. https://doi.org/10.1017/S0305000923000375.CrossRefGoogle Scholar
Song, Chenchen & Nguyen, Li. (2022). Noncanonical pronouns in Vietnamese and Chinese. Journal of the South-East Asian Linguistics Society Special Publication No. 8:207-232.Google Scholar
Song, Chenchen, Nguyen, Li, & Biberauer, Theresa. (2023). Alternative pronominal items: Noncanonical pronouns in Chinese, Vietnamese, and Afrikaans. In Patterson, L. (ed.), The Routledge handbook of pronouns. New York: Routledge. 148-164. https://doi.org/10.4324/9781003349891-13.CrossRefGoogle Scholar
Sorace, Antonella & Filiaci, Francesca. (2006). Anaphora resolution in near-native speakers of Italian. Second Language Research 22(3):339-368. https://doi.org/10.1191/0267658306sr271oa.CrossRefGoogle Scholar
Stanford, James. (2016). A call for more diverse sources of data: Variationist approaches in non-English contexts. Journal of Sociolinguistics 20(4):525-541. https://doi.org/10.1111/josl.12190.CrossRefGoogle Scholar
te Grotenhuis, Manfred, Pelzer, Ben, Eisinga, Rob, Nieuwenhuis, Rense, Schmidt-Catran, Alexander, & König, Roman. (2017). When size matters: Advantages of weighted effect coding in observational studies. International Journal of Public Health 62(1):163-167. https://doi.org/10.1007/s00038-016-0901-1.CrossRefGoogle ScholarPubMed
Ton, Thoai Nu-Linh. (2018). Ellipsis of terms of address and reference in casual communication events in Vietnamese. Language and Linguistics 19(1):196-208. https://doi.org/10.1075/lali.00007.ton.Google Scholar
Torres Cacoullos, Rena & Travis, Catherine E. (2018). Bilingualism in the community: Codeswitching and grammars in contact. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781108235259.CrossRefGoogle Scholar
Torres Cacoullos, Rena & Travis, Catherine E. (2019). Variationist typology: Shared probabilistic constraints across (non-)null subject languages. Linguistics 57(3):653-692. https://doi.org/10.1515/ling-2019-0011.CrossRefGoogle Scholar
Tsimpli, Ianthi, Sorace, Antonella, Heycock, Caroline, & Filiaci, Francesca. (2003). Subjects in L1 attrition: Evidence from Greek and Italian near-native speakers of English. In B. Beachley, A. Brown, & F. Conlin (eds.), Proceedings of the 27th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. 787-797.Google Scholar
Tuc, Ho-Dac. (2003). Vietnamese-English bilingualism: Patterns of code-switching. London: Routledge.Google Scholar
Wasserstein, Ronald L. & Lazar, Nicole A. (2016). The ASA’s statement on p-values: Context, process, and purpose. The American Statistician 70(2):129-133. 10.1080/00031305.2016.1154108.10.1080/00031305.2016.1154108CrossRefGoogle Scholar
Wenger, Etienne. (1998). Communities of practice: Learning, meaning, and identity. Cambridge: Cambridge University Press.10.1017/CBO9780511803932CrossRefGoogle Scholar
Figure 0

Table 1. CanVEC speakers’ demographic information

Figure 1

Table 2. Cross-generational distribution of Vietnamese expressed subjects

Figure 2

Table 3. Distribution of Clause Type by Person and Generation (percentages reflect distribution by clause type)

Figure 3

Table 4. Fixed effects included in the maximal generalized linear mixed-effect model, including number of observations per level of each categorical variable, contrast weights, and the justifications for the interaction terms

Figure 4

Table 5. Summary of fixed effects from the best-fitting generalized linear mixed model with Subject expression as response (0 = null, 1 = expressed) and random intercept for Speaker

Figure 5

Figure 1. Marginal means of the interaction between Generation and Person for subject expression, with error bars representing 95% confidence intervals.

Figure 6

Table 6. Distribution of subject expression by Type, Person, and Generation

Figure 7

Figure 2. Distribution of second-person subject types by Speaker’s Generation, Interlocutor’s Generation, and Interlocutor’s Age.

Figure 8

Table 7. Distribution of Coreferentiality (no/yes) by Subject Type (top) and pairwise comparisons of Subject Type predicting Coreferentiality (bottom)

Figure 9

Table 8. Distribution of Coreferentiality by Person in Vietnamese main clauses