22.1 Introduction
Early bilinguals are individuals who hear two languages regularly as young children. Some will have exposure to both languages from birth, or shortly thereafter, while others will initially hear only one language (L1) with the second-learned language (L2) introduced at a later stage during childhood. The former are referred to as simultaneous bilinguals, the latter as early sequential (or successive/consecutive) bilinguals. Simultaneous bilinguals hear both languages in the home, while early sequential bilinguals typically hear their L1 in the home and the L2 in the community and/or in educational settings (see Reference Paradis, Genesee and CragoParadis, Genesee, & Crago, 2021). Early bilinguals are raised in a range of social settings, including heritage-language environments, bilingual societies, and language revitalization contexts.
In the first few years of life, children undergo monumental cognitive and biological changes, while increasingly interacting with their physical, social, and linguistic world. This period has been shown to be highly critical for speech development (e.g., Reference DeCasper and SpenceDeCasper & Spence, 1986). Thus, infants – whether monolingual or bilingual – start out as language-general perceivers, capable of discriminating both native and non-native sound contrasts. However, over time they gradually attune to the speech patterns of their native language(s), and become selective listeners, thereby losing the ability to discriminate contrasts that do not occur in their environment (Reference Bosch and Sebastián-GallésBosch & Sebastián-Gallés, 2003; Reference Werker and TeesWerker & Tees, 1984; see also Chapter 12, this volume). There is increasing evidence that the patterns established during this period are deeply ingrained and may be carried over into adulthood (e.g., Reference Au, Oh, Knightly, Jun and RomoAu et al., 2008; Reference Choi, Cutler and BroersmaChoi, Cutler, & Broersma, 2017; Reference Hyltenstam, Bylund, Abrahamsson and ParkHyltenstam et al., 2009). For example, the accents of bilinguals with early linguistic experience are often similar to, albeit not usually identical with, those of monolingual speakers of the language, while those of late bilinguals, who lack early experience with their L2, are usually noticeably influenced by their L1 (e.g., Reference Au, Oh, Knightly, Jun and RomoAu et al., 2008; Reference Flege, Schirru and MacKayFlege, Schirru, & MacKay, 2003; Reference Flores and RatoFlores & Rato, 2016; Reference GuionGuion, 2003; Reference Kupisch, Barton and HailerKupisch et al., 2014; Reference Lloyd-Smith, Einfeldt and KupischLloyd-Smith, Einfeldt, & Kupisch, 2020). Nevertheless, these early patterns are not immutable and can be overridden under certain circumstances. Thus, even individuals who acquire an L2 only in adolescence and adulthood, and hence have early experience only in their native language, may undergo attrition in their L1 accent, in particular in long-term L2-immersion settings (e.g., Reference De Leeuw, Schmid and Mennende Leeuw, Schmid, & Mennen, 2010; Reference De Leeuw, Tusha and Schmidde Leeuw, Tusha, & Schmid, 2017; Reference Kornder and MennenKornder & Mennen, 2021a, Reference Kornder and Mennen2021b; Reference Mayr, Price and MennenMayr, Price, & Mennen, 2012; Reference Mayr, Sánchez and MennenMayr, Sánchez, & Mennen, 2020; see also Chapter 32, this volume).
This chapter aims to explore the phonetic and phonological characteristics of early bilinguals and to consider explanations for the observed patterns. In so doing, it will be broad in its scope and include methodologically diverse studies on speech perception and production that focus on various segmental and suprasegmental areas of pronunciation, as well as global accent-rating experiments. Moreover, it will review studies from a range of bilingual settings, including minority language contexts in bilingual societies and heritage-language settings. While the chapter will mostly discuss the evidence from adolescent and adult participants, it will also make reference to the child bilingualism literature where appropriate (see also Chapters 12–16, this volume).
In Section 22.2, we will start with a review of studies on the accents of early versus late bilinguals. This will be followed in Section 22.3 by an account of the various explanations for the observed differences. Subsequently, Section 22.4 will consider the significance of early experience with a specific language for an individual’s pronunciation patterns. In this context, we will focus specifically on the literature on childhood overhearers and international adoptees, and the extent to which these individuals are able to access knowledge about the sound patterns of their childhood language. These populations are of particular interest since disruption in their use of the childhood language allows for the role of early linguistic experience to be investigated systematically. Section 22.5 will then shift to studies comparing the pronunciation patterns of two types of early bilinguals, simultaneous and early sequential bilinguals. These individuals differ from each other predominantly in the timing of exposure to their two languages. However, language dominance, continued use, and the role of the environment are also critical for accentual patterns. Section 22.6 will hence review the role of these factors in early bilinguals’ speech. Finally, in Section 22.7 the effect of sociolinguistic factors on bilinguals’ accents will be considered. The chapter will then conclude with an outline of areas to consider for future research (Section 22.8).
22.2 The Speech Patterns of Early versus Late Bilinguals
It is well known that age critically affects bilinguals’ pronunciation development. Thus, individuals who learn a language in adolescence or adulthood, that is, late bilinguals, usually exhibit an influence of their native language in their L2 accent, and as such typically differ noticeably in their pronunciation patterns from monolinguals. This phenomenon is commonly referred to as foreign accent (see Reference Colantoni, Steele and EscuderoColantoni, Steele, and Escudero [2015] for an overview). However, since the term is inappropriate in some settings, for example in multilingual societies, we will use it in this chapter only to describe methodological issues (e.g., foreign accent ratings). Patterns that are influenced by the L1 are widely attested in late bilinguals’ production and perception of L2 consonants (e.g., Reference Al-Kendi and KhattabAl-Kendi & Khattab, 2021; Reference AmengualAmengual, 2021; Reference Flege, Munro and MacKayFlege, Munro, & MacKay, 1995), vowels (e.g., Reference Bohn and FlegeBohn & Flege, 1992; Reference Mayr and EscuderoMayr & Escudero, 2010), and suprasegmentals (e.g., Reference Francis, Giocca, Ma and FennFrancis et al., 2008). Moreover, such patterns can be reliably detected in as little as 30 milliseconds (Reference FlegeFlege, 1984) and are even identifiable in distorted speech, such as during low-pass filtering (Reference Kolly, Leemann and DellwoKolly, Leemann, & Dellwo, 2014) or when speech is played backwards (Reference Munro, Derwing, Burgess, Solé, Recasens and RomeroMunro, Derwing, & Burgess, 2003).
In contrast, the accents of early bilinguals tend to be closer to those of monolinguals (Reference Baker and TrofimovichBaker & Trofimovich, 2005; Reference Flege, Munro and MacKayFlege et al., 1995, Reference Flege, Schirru and MacKay2003; Reference Flores and RatoFlores & Rato, 2016; Reference GuionGuion, 2003; Reference Kupisch, Barton and HailerKupisch et al., 2014, Reference Kupisch, Kolb, Rodina and Urek2021; Reference Lloyd-Smith, Einfeldt and KupischLloyd-Smith et al., 2020; Reference MacLeod, Stoel-Gammon and WassinkMacLeod, Stoel-Gammon, & Wassink, 2009; Reference McCarthy, Evans and MahonMcCarthy, Evans, & Mahon, 2013). For example, Reference MacLeod, Stoel-Gammon and WassinkMacLeod et al. (2009) showed that early Canadian French-Canadian English bilinguals with an onset age of L2 learning below four years did not differ in their vowel realizations from monolingual speakers of either language. Likewise, the early L1 Quichua-L2 Spanish bilinguals in Reference GuionGuion (2003) produced their vowels distinctly in the two languages, while late bilinguals in the study realized them all with L1 Quichua-like properties, and therefore had not developed separate categories for Spanish vowels. Reference Flege, Schirru and MacKayFlege et al. (2003), in turn, found that while late Italian-English bilinguals failed to produce English /eɪ/ with sufficient vowel-inherent spectral change, suggesting assimilation with its Italian monophthongal counterpart /e/, the early bilinguals produced the L2 vowel with exaggerated formant movement compared with monolinguals’ productions, thereby enhancing cross-linguistic contrastivity.
At the same time, however, studies of early bilinguals – whether simultaneous or early sequential – have also shown a tendency for speakers to diverge in their pronunciation patterns from those of monolinguals, at least slightly, and to be perceived as differing from the latter (Reference Casillas and SimonetCasillas & Simonet, 2016; Reference Chang, Yao, Haynes and RhodesChang et al., 2011; Reference Kupisch, Barton and HailerKupisch et al., 2014, Reference Kupisch, Kolb, Rodina and Urek2021; Reference Lloyd-Smith, Einfeldt and KupischLloyd-Smith et al., 2020; Reference MacLeod, Stoel-Gammon and WassinkMacLeod et al., 2009; Reference Mayr and SiddikaMayr & Siddika, 2018). In the context of heritage speakers, Reference PolinskyPolinsky (2018, p. 116) refers to such subtle phonetic differences from monolinguals as a “heritage accent.” Thus, the early Italian-English bilinguals in Reference Flege, Schirru and MacKayFlege et al. (2003), despite their greater accuracy on English /eɪ/ compared with late bilinguals, were rated as less accurate than monolingual English speakers in a separate accent-rating experiment. Similarly, most of the early bilinguals from Germany, France, and Italy in Reference Kupisch, Barton and HailerKupisch et al. (2014) were not considered native in their heritage language by monolingual raters despite exposure to it from birth; nor were the Italian heritage speakers in Reference Lloyd-Smith, Einfeldt and KupischLloyd-Smith et al. (2020). Together, these studies suggest that the pronunciation patterns of early bilinguals tend to differ less from those of monolinguals than do those of late bilinguals.
22.3 Explanatory Accounts: Biological and Interactional Factors
A number of explanations have been offered for the observed differences between early and late bilinguals. Some have argued that they are due to maturational changes in cerebral plasticity, postulating a critical or sensitive period for language learning (Reference LennebergLenneberg, 1967). The most influential of these is Reference LennebergLenneberg (1967) who argued that the completion of cerebral lateralization at puberty is the decisive factor for age-related differences in language and speech learning. While he initially conceived of a critical period for L1 learning, it was subsequently extended to L2 acquisition, including its pronunciation patterns (see, e.g., Reference ScovelScovel, 1988). However, while the degree of perceived foreign accent is strongly correlated with the onset age of L2 learning/age of arrival in a foreign country, no sharp discontinuities around a possible critical period have been observed (Reference Flege, Munro and MacKayFlege et al., 1995). As a result, many researchers have rejected the notion of maturational constraints for speech learning.
Alternatively, accentual differences between monolinguals and late bilinguals may be due to a loss of the motoric abilities required for the acquisition of new articulatory gestures (Reference McLaughlinMcLaughlin, 1977). However, although fine-grained motor control may become more difficult with age, there are no drastic changes in this ability in general until well into adulthood (Reference Klein and MagnussonKlein, 1995). Thus, a number of studies have shown that adults can mimic the sounds of an unknown language accurately (e.g., Reference Flege and HammondFlege & Hammond, 1982), and hence there do not appear to be any biological barriers to accurate production of L2 phones in late learners. Moreover, pronunciation patterns in the L2 that do not differ from those of monolingual speakers, albeit rare, are possible in post-pubescent learners (Reference Birdsong, Bohn and MunroBirdsong, 2007; Reference Bongaerts, Mennen and van der SlikBongaerts, Mennen, & van der Slik, 2000; Reference GnevshevaGnevsheva, 2017; but see Reference Abrahamsson and HyltenstamAbrahamsson & Hyltenstam, 2009). This suggests that speech is malleable, at least to some extent, throughout the life span (Reference HarringtonHarrington, 2006).
Finally, the interaction hypothesis (Reference Flege and BirdsongFlege, 1999, Reference Flege, Cole and Hualde2007) maintains that the differences in the speech patterns of early versus late bilinguals are grounded in the way a bilingual’s L1 and L2 interact at different ages, consistent with the Speech Learning Model (Reference Flege and StrangeFlege, 1995), including its recent revision, the SLM-r (Reference Flege, Bohn and WaylandFlege & Bohn, 2021). Specifically, this model is based on the fact that bilinguals have separate, but non-autonomous, sound systems that share a common representational network and constantly interact with each other (Reference Flege, Schirru and MacKayFlege et al., 2003; Reference ParadisParadis, 2001). Thus, in early bilinguals the L1 is less well established than in late bilinguals, and as a result it will exert less of an influence on the L2 (Reference Iverson, Kuhl and Akahane-YamadaIverson et al., 2003). On the other hand, as L1 categories become more robust with age, the acquisition of new L2 categories gets increasingly difficult, in particular for categories that are cross-linguistically similar (Reference Flege and StrangeFlege, 1995; Reference Flege, Bohn and WaylandFlege & Bohn, 2021). Moreover, the interaction hypothesis is the only account that can explain not only why the speech patterns of early bilinguals differ from those of late bilinguals but also why they diverge from monolinguals’ accents: the moment children are exposed to two languages, their sound systems mutually influence each other. Nevertheless, cross-linguistic interaction, while a fundamental piece of the jigsaw, cannot fully explain bilingual speech patterns on its own. In Section 22.4, we will consider the significance of early linguistic experience.
22.4 The Significance of Early Linguistic Experience
A particularly interesting piece of evidence for the critical role of early linguistic experience in speech development comes from studies on adults with exposure to a language in childhood and discontinued use thereafter. To begin with, let us consider the most extreme scenario: that of international adoptees. Unlike early bilinguals whose two languages interact from the moment dual language exposure begins, international adoptees encounter early linguistic experience only in one language, that is, their birth language. As there are usually no opportunities to use the birth language in their adoptive home environment, international adoptees stop using it altogether upon adoption, instead acquiring the language of their new environment. This scenario has led to a debate surrounding the role of early experience and the extent to which these individuals retain some long-term residual knowledge of their birth language, including its sound patterns. On the one hand, there are those who have argued that international adoptees completely replace their birth language with the language of their adoptive home (Reference Pallier, Dehaene and PolinePallier et al., 2003; Reference Ventureyra, Pallier and YooVentureyra, Pallier, & Yoo, 2004). Thus, Reference Ventureyra, Pallier and YooVentureyra et al. (2004) found that native Koreans who were adopted by French-speaking families were unable to perceive Korean voiceless plosives any better than matched native French speakers with no prior experience of Korean. Similarly, Reference Pallier, Dehaene and PolinePallier et al. (2003) reported that neither the international adoptees nor the native French speakers showed any responsiveness to Korean stimuli in behavioral tests and during brain imaging via event-related functional magnetic resonance imaging (fMRI). These findings suggest that international adoptees might be best described as successive monolinguals.
In contrast, others argue on the basis of studies showing differences between international adoptees and matched controls (Reference Choi, Cutler and BroersmaChoi et al., 2017; Reference Hyltenstam, Bylund, Abrahamsson and ParkHyltenstam et al., 2009; Reference Pierce, Chen, Delcenserie, Genesee and KleinPierce et al., 2015) that the former can retain residual knowledge of their birth language into adulthood. Specifically, Reference Hyltenstam, Bylund, Abrahamsson and ParkHyltenstam et al. (2009) found that native Koreans adopted by Swedish families in early childhood outperformed native Swedish late L2 learners of Korean in a Korean voice onset time (VOT) perception task. Moreover, the older the adoptees were at the time of adoption, the more likely they were to be able to access remnants of their birth language upon re-exposure to it. Similarly, Reference Choi, Cutler and BroersmaChoi et al. (2017) showed that Korean adoptees in the Netherlands with no conscious knowledge of their birth language learned Korean plosive contrasts more rapidly than otherwise matched Dutch speakers without early experience with Korean. Other studies, in turn, have found evidence for brain activation patterns in international adoptees that differ from those of monolingual speakers of their adopted language and reflect prior experience with their birth language (Reference Pierce, Chen, Delcenserie, Genesee and KleinPierce et al., 2015). Together, these findings suggest that international adoptees experience a (radical) change in language dominance, rather than a complete loss of their birth language (see Section 22.6 for further discussion of the role of language dominance).
In addition to studies on international adoptees, work on childhood overhearers can provide interesting insight into the role of early experience on speech learning (Reference Au, Knightly, Jun and OhAu et al., 2002, Reference Au, Oh, Knightly, Jun and Romo2008; Reference Knightly, Jun, Oh and AuKnightly et al., 2003; Reference Oh, Jun, Knightly and AuOh et al., 2003). These individuals were virtually never addressed in the minority language as children; nor were they encouraged to speak it themselves. They only regularly overheard conversations in it by adult native speakers within the first few years of life. In a series of studies, Au and his associates (Reference Au, Knightly, Jun and OhAu et al., 2002, Reference Au, Oh, Knightly, Jun and Romo2008; Reference Knightly, Jun, Oh and AuKnightly et al., 2003) showed that this early linguistic exposure can be beneficial for L2 speech learning. Specifically, they showed that when learning Spanish in language classes in adulthood, childhood overhearers were more similar to Spanish monolinguals than matched L2 learners without childhood experience with the language, based on acoustic analyses and accent ratings by native speakers. In contrast, Reference Oh, Jun, Knightly and AuOh et al. (2003) found no difference between childhood hearers of Korean and novice learners in a phoneme production task in Korean. However, the former outperformed the latter in a phoneme perception task in the language. Moreover, both groups were outperformed in both these tasks as well as an accent-rating experiment by adults who not only heard but also spoke Korean as children but subsequently stopped using the language altogether until they were re-exposed to it in adulthood. Together, much of the evidence from international adoptees and childhood overhearers suggests not only that early linguistic experience has a critical role in speech learning but also that it can have long-lasting effects. At the same time, less input in the minority language than in more typical early bilingual scenarios may explain why their speech patterns tend to differ comparatively more from those of monolinguals. In Section 22.5, we will consider to what extent simultaneous and early sequential bilinguals who differ in the extent and timing of their language experience but did not discontinue their exposure to each language differ in their speech patterns.
22.5 The Speech Patterns of Simultaneous versus Early Sequential Bilinguals
Most children growing up bilingually either are exposed to two languages at the same time (simultaneous bilinguals) or encounter one language before the other (early sequential bilinguals). The former have two native languages, Language A and Language Alpha in Reference De HouwerDe Houwer’s (2009) terminology, and will usually hear both in the home. In contrast, early sequential bilinguals typically hear only one language, their L1, in the home, and the other language, their L2, outside it, notably in educational settings.
There has been some debate about the cut-off point between simultaneous and sequential bilingualism. Some, such as Reference McLaughlinMcLaughlin (1984), consider the critical age to be three years. However, as Reference Yip, Grosjean and PingYip (2013, p. 119) points out, “there will necessarily be developmental differences between a child whose exposure to both languages begins at birth and one whose exposure to one of those languages begins between ages 2 and 3.” In contrast, others suggest a more restrictive definition, with simultaneous bilingualism requiring exposure to both languages by age one (Reference Paradis, Genesee and CragoParadis et al., 2021) or from birth (Reference De HouwerDe Houwer, 2009). For the purposes of this chapter, we will retain the label applied by the authors in the articles reviewed here; however, where simultaneous bilingualism does not refer, or refers not only, to exposure to two languages from birth, this will be made explicit.
A number of studies have found that the speech patterns in the home language of sequential bilinguals were more like those of monolinguals than the productions of simultaneous bilinguals (Reference AmengualAmengual, 2019; Reference Kupisch, Kolb, Rodina and UrekKupisch et al., 2021; Reference Sebastián-Gallés, Echeverría and BoschSebastián-Gallés, Echeverría, & Bosch, 2005). For example, Reference AmengualAmengual (2019) found that Spanish heritage speakers in the United States from entirely Spanish-speaking homes produced spirantization patterns more similar to those of monolingual Spanish speakers than individuals who also spoke English in the home as children, although both sets of bilinguals had had exposure to the heritage language from birth. Similarly, in Reference Sebastián-Gallés, Echeverría and BoschSebastián-Gallés et al.’s (2005) study, sequential Catalan-Spanish bilinguals (with exposure to Spanish around the age of four) outperformed simultaneous Catalan-Spanish bilinguals in a lexical decision task that involved the Catalan-specific /e-ε/ contrast, which Spanish native speakers find difficult to perceive. However, the simultaneous bilinguals performed better on this contrast than sequential Spanish-Catalan bilinguals (with exposure to Catalan around the age of four). These findings are corroborated by global accent-rating studies of bilingual children. For example, Reference Kupisch, Kolb, Rodina and UrekKupisch et al. (2021) found that Russian heritage preschoolers and primary school children in Germany were perceived more like Russian monolinguals if they were raised in Russian-only homes (sequential bilinguals) than in mixed Russian-German homes (simultaneous bilinguals). Together, these studies indicate that: (1) sequential bilinguals’ greater cumulative experience with the minority language may be beneficial for the acquisition of language-specific speech patterns; and (2) these early differences in linguistic experience across different types of bilinguals are responsible for individuals’ pronunciation patterns in adulthood.
Other studies, in contrast, have found commensurate speech patterns in simultaneous and sequential bilinguals (Reference MacLeod and Stoel-GammonMacLeod & Stoel-Gammon, 2010; Reference Mayr, Morris, Mennen and WilliamsMayr et al., 2017; Reference Mennen, Kelly, Mayr and MorrisMennen et al., 2020; Reference NanceNance, 2020). Note that these studies are all set in bilingual societies or revitalization contexts. Specifically, the simultaneous and sequential Canadian French-Canadian English bilinguals in Reference MacLeod and Stoel-GammonMacLeod and Stoel-Gammon (2010) did not exhibit any differences in their production of word-initial bilabial and coronal stops, and high vowels. Note, however, that simultaneous bilingualism was defined as exposure to both languages by the age of three years. Hence, some of those classed as simultaneous bilinguals will have encountered one language before the other. However, the authors offer a sociolinguistic explanation for their findings instead: both French and English have high status in the community and are used in most contexts of everyday activities, an aspect that differs from many migration settings in which increased L2 proficiency typically coincides with decreased proficiency in the L1 (Reference Flege, Munro and MacKayFlege et al., 1995). Moreover, the authors argue that the findings are consistent with the participants’ self-identification: both sets defined themselves as bilinguals, rather than as French or English speakers. We will return to the role of sociolinguistic factors in Section 22.7, where socially oriented studies that did not observe differences between simultaneous and early sequential bilinguals are discussed in further detail.
Finally, some work has compared the speech perception and production skills of simultaneous and early sequential bilinguals on novel sounds. Specifically, Reference D’SouzaD’Souza (2018) tested simultaneous Canadian French-Canadian English bilinguals and sequential bilinguals with Canadian French as their L1 and first exposure to English after age seven on the perception and production of sounds and sound contrasts not present in either of their languages, such as the Hindi dental-retroflex stop contrast and the Farsi voiced uvular stop /q/. The results revealed significantly better performance in the discrimination of the Hindi contrast by the simultaneous bilinguals, but no differences between the two types of bilinguals on any of the Hindi and Farsi imitation tasks, as judged by monolingual native speakers of these languages. These findings suggest that simultaneous and sequential bilingualism may affect auditory perception abilities differently, and that simultaneous bilingualism may “confer an advantage later in life for perceiving the sounds of a new foreign language” (Reference D’SouzaD’Souza, 2018, p. 5). In Section 22.6, we will consider the role of language dominance, continued use, and language environment in the speech patterns of early bilinguals.
22.6 The Role of Language Dominance, Continued Use, and Language Environment in Early Bilinguals’ Speech Patterns
While most early bilinguals are highly proficient users of both their languages, their competence is not usually equally distributed. In other words, they tend to have a dominant language. Language dominance, in turn, refers to an imbalance in proficiency and/or use across a bilingual’s two languages (Reference BirdsongBirdsong, 2014). As such, it constitutes “a by-product of linguistic experience, determined by factors, such as age of acquisition, self-reported frequencies of language use, self-rated proficiency, and even linguistic ideologies” (Reference Amengual and SimonetAmengual & Simonet, 2020, p. 848). Crucially, a bilingual’s orientation toward her two languages may change over time and lead to changes in language dominance (Reference Al-AzamiAl-Azami, 2014; Reference Kupisch, Kolb, Rodina and UrekKupisch et al., 2021; Reference McCarthy, Rosen, Mahon and EvansMcCarthy et al., 2014).
With respect to phonetics and phonology, a substantial body of evidence has shown that language dominance predicts the perception and production of early bilinguals’ speech patterns (see Chapter 29 in this volume; Reference AmengualAmengual, 2016a, Reference Amengual2016b; Reference Amengual and ChamorroAmengual & Chamorro, 2015; Reference Amengual and SimonetAmengual & Simonet, 2020; Reference Mayr, López-Bueno, Vázquez Fernández and Tomé LouridoMayr et al., 2019; Reference Ramírez and SimonetRamírez & Simonet, 2018; Reference SimonetSimonet, 2010; Reference Simonet and AmengualSimonet & Amengual, 2020; Reference Tomé Lourido and EvansTomé Lourido & Evans, 2019), but note that dominance is not measured identically across these studies. For example, Reference Ramírez and SimonetRamírez and Simonet (2018) showed that Spanish-dominant bilinguals were less accurate in the discrimination of the Catalan-specific /ʎ/-/ʒ/ contrast than Catalan-dominant ones. Similarly, Reference SimonetSimonet’s (2010) study of alveolar lateral production in Catalan and Spanish by early Catalan-dominant and Spanish-dominant bilinguals from Majorca revealed a tendency to transfer phonetic features from the dominant to the nondominant language despite early exposure to both languages. Moreover, it showed that while most early bilinguals differentiated lateral categories cross-linguistically, those that did not tended to be Spanish-dominant females, which the author attributed to socio-indexical factors in that “these speakers may intend to distance themselves from what they may perceive as Catalan-accented Spanish” (Reference SimonetSimonet, 2010, p. 676). Additionally, several studies have shown that Spanish-dominant bilinguals struggle to acquire the language-specific mid vowel categories of Catalan and Galician, since mid vowels do not exist in their dominant language (Reference AmengualAmengual, 2016a, Reference Amengual2016b; Reference Amengual and ChamorroAmengual & Chamorro, 2015; Reference Tomé Lourido and EvansTomé Lourido & Evans, 2019). In contrast, Catalan-dominant and Galician-dominant bilinguals exhibited two separate perceptual distributions for the mid vowel contrasts and produced these categories distinctly (but see Reference Mayr, López-Bueno, Vázquez Fernández and Tomé LouridoMayr et al., 2019). Finally, language dominance effects have even been shown in simultaneous bilinguals. Specifically, Reference Sebastián-Gallés, Echeverría and BoschSebastián-Gallés et al. (2005) found that simultaneous Catalan-Spanish bilinguals who heard Catalan from their mothers and Spanish from their fathers were more accurate in a lexical decision task that involved Catalan /e/-/ε/ than otherwise matched simultaneous bilinguals with reversed parental languages, suggesting a particular role for maternal language input.
Nevertheless, language dominance does not appear to affect bilinguals’ speech patterns across the board. For example, in a study of cluster acquisition in Welsh-English bilingual children aged 2;6 to 5;0, Reference Mayr, Howells and LewisMayr, Howells, and Lewis (2015) found that while Welsh-dominant children outperformed English-dominant ones on Welsh clusters, they did not lag behind them on English ones. The authors explained this asymmetry with reference to the sociolinguistic situation in Wales, in which English constitutes the majority language and Welsh the minority language (see also Reference Mayr, Jones, Mennen, Thomas and MennenMayr, Jones, & Mennen, 2014).
Moreover, language dominance does not affect all areas of pronunciation in early bilinguals equally. Thus, Reference Amengual and SimonetAmengual and Simonet (2020) showed that while Catalan-dominant bilinguals from Majorca outperformed Spanish-dominant bilinguals in the production of the Catalan-specific mid vowels /e/-/ɛ/ and /o/-/ɔ/, the two groups of bilinguals did not differ from each other in their production of the Catalan [a]~[ə] alternation, a phonological process affecting stressed versus unstressed vowels. These findings differ from those in studies of phonological processes in late bilinguals, which found cross-linguistic interaction effects. For example, the late German-English bilinguals in Reference Smith, Hayes-Harb, Bruss and HarkerSmith et al. (2009) transferred aspects of word-final voicing, which involves devoicing in German, but not in English, from their L1 to their L2. Reference Amengual and SimonetAmengual and Simonet (2020) conclude that in early bilinguals, dominance effects may not surface in phonological processes that are predictable and hence relatively easy to learn, such as the [a]~[ə] alternation in Catalan. In contrast, dominance effects are likely to show up in unpredictable phonemic contrasts, such as the Catalan mid vowels, especially if they have a low functional load. Moreover, they may be mediated by lexical status (Reference AmengualAmengual, 2016b) and enhanced language coactivation (Reference Simonet and AmengualSimonet & Amengual, 2020), with cognates and bilingual settings increasing the likelihood of interactions.
Early dominance in a minority language, while critical, is not the only factor that affects the acquisition of language-specific speech patterns. Continued high use of the language also appears to be required (Reference Mora and NadeuMora & Nadeu, 2012), in particular in contexts where input is restricted to a small number of speakers (Reference Mayr and MontanariMayr & Montanari, 2015). The usage patterns of early bilinguals are dynamic and may wax or wane depending on their social circumstances. In extreme cases, this can result in complete abandonment of a childhood language. As discussed in Section 22.4, children under such conditions may retain some residual knowledge of the childhood language and outperform novice learners, but their speech patterns will be substantially different from those of continuous users of the language (Reference Au, Oh, Knightly, Jun and RomoAu et al., 2008; Reference Oh, Jun, Knightly and AuOh et al., 2003).
The effects of a change in usage patterns have also been documented in more typical circumstances, that is, where the use of a language is reduced, rather than stopped altogether (Reference Kupisch, Kolb, Rodina and UrekKupisch et al., 2021; Reference McCarthy, Rosen, Mahon and EvansMcCarthy et al., 2014; Reference Montanari, Mayr and SubrahmanyamMontanari, Mayr, & Subrahmanyam, 2018). One of the most pervasive events that causes a decrease in the usage patterns of a minority language is the onset of mainstream education, which tends to be restricted to the majority language (Reference De HouwerDe Houwer, 2009). While this may support children’s speech development in the latter (Reference McCarthy, Rosen, Mahon and EvansMcCarthy et al., 2014), it tends to have a detrimental effect on the pronunciation patterns of the minority language. For example, Reference Kupisch, Kolb, Rodina and UrekKupisch et al. (2021) showed that four to six-year-old preschool children growing up in Russian heritage settings in Germany were perceived more like Russian monolinguals in a global accent-rating experiment than otherwise matched seven to nine-year-old Russian heritage primary school children. However, the reverse was true for the children’s German accent, suggesting a language use-induced shift from the heritage to the majority language. An important implication of such findings is that late-developing phonological properties that are not yet stable at school entry may be particularly vulnerable to cross-linguistic interaction when children begin to receive massive exposure to the societal language through schooling.
Moreover, there is evidence that continuous high use of a minority language not only matters in childhood but may also affect speech patterns in adulthood. For example, Reference Mora and NadeuMora and Nadeu (2012) showed that Catalan-dominant bilinguals from Catalan-only homes who were exposed to Spanish only systematically in mainstream education between four and five years of age were less native-like in the perception of the Catalan /e/-/ε/ contrast and produced these two categories with more Spanish-like acoustic targets if their daily use of Spanish as adults was high rather than low.
Finally, continuous high use of a language may be more likely when it can be used in the wider community, not just in individuals’ homes. For example, Reference Cortés, Lleó and BenetCortés, Lleó, and Benet (2019) showed that the predominant language of the environment, not home language use, was the strongest predictor for acquisition of the Catalan /e/-/ε/ contrast by Catalan-Spanish bilingual children. Interestingly, they found that even children from Catalan-speaking homes did not differentiate the contrast if they lived in a predominantly Spanish-speaking neighborhood in Barcelona. The ability to use a minority language in the community has also been shown to be critical in heritage-language settings. For example, Reference Mayr and SiddikaMayr and Siddika (2018) showed incremental changes in the plosive productions of Sylheti heritage speakers in Cardiff across successive generations, with first-generation mothers the most accurate and third-generation children the least (see also Reference Hrycyna, Lapinskaya, Kochetov and NagyHrycyna et al., 2011; Reference Mayr, Siddika, Morris and MontanariMayr et al., 2021). The authors argued that this gradual drift away from the heritage language and toward the majority language may have been reinforced by the relatively small size of the Sylheti-speaking community in Cardiff and the lack of new first-generation migrants who could reinforce homeland norms. This stands in contrast to large, close-knit migrant communities, as documented in studies of Sylheti speakers in London (Reference Kirkham and McCarthyKirkham & McCarthy, 2021; Reference McCarthy and de LeeuwMcCarthy & de Leeuw, 2021; Reference McCarthy, Evans and MahonMcCarthy et al., 2013), Spanish speakers in the United States (Reference Montanari, Mayr and SubrahmanyamMontanari et al., 2018, Reference Montanari, Mayr and Subrahmanyam2020; Reference Ruiz-Felter, Cooperson, Bedore and PeñaRuiz-Felter et al., 2016), and Turkish speakers in Germany (Reference Darcy and KrügerDarcy & Krüger, 2012; Reference Kupisch, Lloyd-Smith, Stangen and BayramKupisch, Lloyd-Smith, & Stangen, 2020). Such communities not only offer more stable input patterns in the heritage language but may also give rise to contact varieties over time (compare with Reference Heselwood and McChrystalHeselwood & McChrystal, 2000). In Section 22.7, we will turn in more detail to sociolinguistic factors in the speech patterns of early bilinguals.
22.7 The Role of Sociolinguistic Factors
The work outlined so far has shown that there are a number of social factors that might influence the speech production of both child bilingual learners and adult early bilingual speakers. This section examines work on early bilingual speakers from a variationist sociolinguistics lens. In particular, we focus on sociophonetic and sociophonological studies of long-standing bilingual communities.
Work in this vein considers the wider social characteristics of different groups of speakers, showing how such social characteristics may lead to differences in speech production even for those who have acquired their languages in a similar way. Reference Davidson and RaoDavidson (2020), for instance, examined /l/-darkening in the Catalan and Spanish of bilingual speakers in Catalonia (as well as Spanish monolinguals from Madrid for comparison). Although /l/-darkening is known to be a property of Catalan and Catalan-influenced Spanish (as opposed to other varieties of Spanish which have light /l/), the degree to which productions are dark in both languages often varies and is influenced by factors such as linguistic background (Reference SimonetSimonet, 2010). In the analysis of the bilingual speakers’ Catalan and Spanish data, Reference Davidson and RaoDavidson (2020) found clear differences among L1-Catalan speakers despite these speakers all acquiring Catalan from birth. More specifically, the L1-Catalan bilinguals from a rural area in Catalonia produced more velarized productions of /l/ in both languages compared to L1-Catalan bilinguals from an urban area.
A similar conclusion was reached in a study of cross-linguistic and intralinguistic variation in the Welsh and the English of sixteen to eighteen-year-old early bilinguals in North Wales (Reference MorrisMorris, 2017, Reference Morris2021, Reference Morris2022). In this study, the extent to which speaker gender, home language (Welsh or English), and speech style influenced variation in both a Welsh-dominant and an English-dominant community was compared. Reference MorrisMorris (2021) found that, in the Welsh-dominant area, gender and speech style were a significant predictor of /r/ variation in English and that female speakers were more likely to produce the Welsh alveolar trill in the more formal context (instead of the alveolar approximant traditionally associated with English). In Welsh, however, there was an interaction between home language and speech style, with those from Welsh-speaking homes being more likely to produce the alveolar trill and the alveolar tap in the formal context. The distinction between speech styles was not as apparent in the speech of those from the English-dominant area (regardless of linguistic background) or in the speech of those from English-speaking homes in the same area. These findings demonstrate that variation in bilingual speech patterns can be understood comprehensively only if mediating social factors are carefully considered.
Although there are few variationist studies of bilingual communities which incorporate ethnographic elements (e.g., Reference GruffyddGruffydd, 2022; Reference NanceNance, 2013), it appears that home language differences can be affected by local social structures, such as peer-group membership. Similar to Reference MorrisMorris (2021), Reference MorrisMorris (2022) found sociolinguistic patterning for fundamental frequency range (FFR), which included interactions between home language and gender in the Welsh-dominant area. In the participating schools in this area, peer-group membership was based primarily on students’ home language, and females from Welsh-speaking homes exhibited a higher pitch level in both languages than females from English-speaking homes. In the English-dominant area, in contrast, peer-group membership was not found to coincide with home language and no home language differences in FFR were found.
In other immersion education contexts, it has been argued that acquisitional differences may be overridden by more homogenous local social structures and peer-group dynamics. Reference NanceNance (2020) found that any initial differences in the realization of Gaelic and English laterals and stops between Gaelic-English simultaneous bilingual children and early sequential bilingual children who acquired Gaelic solely via immersion education were leveled out by late primary school. Similarly, Reference Mayr, Morris, Mennen and WilliamsMayr et al. (2017) and Reference Mennen, Kelly, Mayr and MorrisMennen et al. (2020) failed to identify any differences in vowel and lexical stress productions, respectively, between Welsh-English bilinguals in South Wales with Welsh as primary home language and sequential bilinguals from entirely English-speaking homes who learned Welsh solely through immersion education. The authors of these studies argued that the close-knit peer groups with shared values and practices, which encompassed both the simultaneous and the early sequential bilinguals, had had a homogenizing effect on their pronunciation patterns.
Together, these studies suggest that explanations that rely solely on the role of language exposure and early linguistic experience do not go far enough, and that homogeneous peer groups and social context also play a critical role in explaining variation in early bilinguals’ pronunciation patterns. The same claim can be made of speakers’ identity and the extent to which this might contribute to variation in one or both of their languages. Work in heritage-language contexts (e.g., Reference Kupisch, Lloyd-Smith, Stangen and BayramKupisch et al., 2020) and late bilinguals (Reference Nance, McLeod, O’Rourke and DunmoreNance et al., 2016) has shown that novel patterns of variation in the majority language may be a way in which bilingual speakers index their ethnolinguistic identity (Reference Gafter and HoreshGafter & Horesh, 2020).
22.8 Concluding Remarks and Future Work
This chapter aimed to explore the phonetic and phonological characteristics of early bilinguals and to consider explanations for the observed patterns. In particular, we reviewed the role of the language environment, early linguistic experience, and language use on speech patterns, as well as the significance of social factors. While early bilingual speech is a thriving research area, and we have seen significant improvements in our understanding of it in recent years, there are a number of outstanding questions that need to be explored in future research.
First, an increasing number of studies have started to consider the influence of cognitive and social factors on early bilingual speech development. The influence of childhood language dominance, which refers to a myriad of environmental and attitudinal factors affecting language use, has been shown to shape bilingual speech patterns into adulthood (e.g., Reference AmengualAmengual, 2019). Future work is needed to extend this line of enquiry through longitudinal studies, which would shed light on the fluid nature of language dominance and its effect on pronunciation across the life span.
A further avenue of research lies in the application of variationist sociolinguistic methods to bilingual speech communities. The evidence so far suggests that both simultaneous and early sequential bilinguals might show no differences in speech production (or that any differences disappear over time), or that differences between simultaneous and sequential bilinguals may occur in one or both languages in different ways. Both results point to the need to examine social structures in bilingual communities, communities of practice, and social networks, as well as the social meaning of linguistic features in both languages. The role of nonstandard varieties in the perception of early bilinguals’ accents also requires further exploration (see, e.g., Reference Einfeldt, van de Weijer and KupischEinfeldt, van de Weijer, & Kupisch, 2019).
Finally, most studies to date have examined early bilingual speech in typologically related languages, such as Catalan and Spanish, that often contain substantial numbers of cognates, with a strong focus on English-speaking and European countries. While the effects of cognates on pronunciation patterns have been explored, we know little about the role that language typology plays in early bilinguals’ speech. Future research is needed that elucidates this matter by systematically varying typological proximity and distance. Moreover, more work in a greater diversity of sociolinguistic and sociopolitical settings is needed. Together, such new forms of inquiry will take us a step closer to understanding the complexity of speech patterns in early bilinguals.
23.1 Introduction
Although few studies on the acquisition of L2 phonetics and phonology have explicitly compared classroom learners to learners in other contexts, with the notable exception of study abroad research, it is clear that classroom learners constitute a distinct group.Footnote 1 Our knowledge about the factors that play important roles in adult L2 acquisition of phonetics and phonology, such as quality and quantity of L2 input, allows us to predict the distinctive aspects of classroom acquisition (Reference Piske, Munro and BohnPiske, 2007). Some of these predictions have been directly or indirectly addressed in the research conducted to date. Others remain to be examined. In this chapter, I will outline several such hypothesized directions where classroom learning can be expected to diverge from acquisition in other contexts, primarily contrasting classroom learning with L2-immersed naturalistic L2 acquisition, and review relevant findings of the research conducted thus far. Without attempting an exhaustive review, this chapter will provide an overview of the following major themes and surrounding research:
the role of the quantity and quality of L2 input;
the role of different phonological targets;
the role of explicit pronunciation instruction and corrective feedback;
the extent of individual differences and their role in classroom learning;
L2 phonetic category formation and L1 restructuring in classroom learning.
This review is largely limited to studies conducted on adult (college-age) learners permanently residing in the countries where their L1 is spoken and whose second language experience is current and limited to an instructional setting – a typical context of adult L2 classroom acquisition. This restricted scope ensures consistency among the reviewed studies and limits overlap with other chapters in the present volume. It also reflects the reality that most research on the topic has been conducted with college students as participants. Laboratory training studies are also excluded from this overview. Although training studies have implications for classroom learning, their characteristics, such as short duration, high intensity, and restricted focus in terms of the phonological material, do not match the experience of a typical classroom learner.
Finally, this review prioritizes studies that assessed the phonological and phonetic abilities of L2 learners using the tools of instrumental acoustic analysis or controlled perceptual experiments. Although holistic evaluations of comprehensibility, intelligibility, or accentedness by human raters are frequently used to estimate classroom learners’ oral abilities, such assessments are inevitably influenced by aspects other than phonetics and phonology, such as lexical and grammatical characteristics (Reference Saito, Webb, Trofimovich and IsaacsSaito et al., 2016; Reference Saito and PlonskySaito & Plonsky, 2019). Therefore, their efficacy in assessing phonological development to the exclusion of other factors is debatable.
To begin with, I introduce some of the terms used in the chapter and further delineate the scope of the chapter. The term L2 immersion, as used here, implies residence in the country where the L2 is spoken, excepting domestic immersion. Naturalistic acquisition, as opposed to instructed learning, is defined as learning that takes place in the immersion setting and in the absence of any systematic L2 instruction. Of course, it is possible to combine L2 immersion with instructed learning, as is typically the case in study abroad programs, but such categories of learners are not the primary target of this chapter. The overview in this chapter does, however, include studies of learners in domestic immersion programs, as this setting, I believe, does fall within the scope of classroom learning, albeit of a special kind. Domestic immersion here is understood as a context of L2 learning in which learners are engaged in the typical L2 instructional learning in their country of residence, while their use of and exposure to language are limited as much as possible to the target L2, both inside and outside the classroom.
Before proceeding to the subject matter, I briefly outline the state in which this area of research currently finds itself, recent historical developments, and its position within the larger research context. Research on the phonetics and phonology of adult classroom L2 learners lies at the intersection of two broader topics: adult bilingual phonetics and phonology, and instructed L2 acquisition. Within these topics, it occupies only a small niche. In the following paragraphs, I offer several possible explanations for the relatively small, but expanding, size of the field.
In the area of instructed language learning, acquisition of phonetics and phonology is primarily approached under the guise of “pronunciation teaching and learning” and both research and practice in this area have gone through a not-so-distant period of relative neglect. Some scholars (Reference KramschKramsch, 2003) hypothesized that interest toward pronunciation teaching and learning in adult students waned as a result of research on maturational constraints and the critical period in the acquisition of phonology, as well as related conclusions that native-like attainment of L2 phonetics and phonology was impossible beyond the age of six (Reference LongLong, 1990). Others proposed that the rise of the communicative language teaching approach in the 1980s contributed to this neglect, as this approach was among those advocating against targeting native-like pronunciation in favor of overall communicative competence and comprehensibility (Reference Nagle, Sachs and Zárate–SándezNagle, Sachs, & Zárate–Sández, 2018). Finally, the perception that L2 phonology is special, exceptionally difficult, subject to great learner variability, and that its mastery requires a special talent, massive amounts of input, and a longer time frame may have discouraged researchers from engaging with the topic. Nevertheless, responding to sustained interest from language learners and instructors, research on learning and teaching L2 pronunciation has been undergoing a revival in recent decades. The currently circulating literature on the topic, including some works in the present overview, dates back only as far as the early 1990s, with the majority of the studies emerging after the turn of the century.
Turning to the broader context of adult bilingual phonetics and phonology, this area of research has been heavily dominated by investigations of immersed learners, at the expense of other populations such as classroom learners. The reasons for this lack of balance are not entirely clear and typically are not made explicit in the literature. Nevertheless, it is possible that practical considerations may have played a role. Typically, only a handful of foreign languages are taught in schools and colleges, some of which draw small numbers of learners. Researchers could be opting for immigrant populations to optimize the recruitment pool. In addition, although not without exception, well-integrated, long-term immigrants can often be expected to possess a certain level of L2 experience and proficiency, while classroom learners may lack the level of L2 knowledge necessary for testing some hypotheses. The implicit assumption that appears to permeate the literature is that purely classroom-based learning has an upper limit for the acquisition of phonetics and phonology (see, for example, Reference Saito and HanzawaSaito & Hanzawa, 2016), and this perception may have played its role in guiding the choice of research population. Nevertheless, in recent years researchers have been repeatedly turning their attention to classroom learners, with novel and sometimes surprising results that enrich and inform the broader discipline.
Despite the recently renewed interest in the acquisition of phonetics and phonology by adult classroom learners, the field thus far is relatively limited in its breadth and diversity. For instance, the existing research is strongly English-centric. Among the language pairings studied, English-speaking learners of Spanish are by far the most targeted population. In addition, certain topics dominate. For instance, segmental acquisition figures prominently, especially the acquisition of voicing contrasts, while the acquisition of phonological processes receives less attention, with a notable exception of inter-sonorant lenition/spirantization in Spanish. Finally, investigations of speech production appear to be more common than those examining speech perception.
23.2 Quantity and Quality of L2 Input and the Development of L2 Phonetics and Phonology in a Classroom Setting
The assumption that input is essential for first and second language development is an uncontroversial one and has been acknowledged by all major theoretical approaches to L2 learning (see, for example, Chapters 7–11, this volume). Oral input, in particular, is necessary for the development of speech production and perception abilities in the L2. Directly measuring the amount of input learners receive over time is difficult, and it has become a common practice to use duration of residence abroad as a way to estimate input for immersed learners and duration of instruction as a way to estimate input for instructed learners.
Virtually every author who has written on the topic of adult classroom acquisition of L2 has commented on the quantity and quality of L2 input as one of the defining characteristics of instructed language learning (for example, see Reference Tyler, Nyvad, Hejná, Højen, Jespersen and SørensenTyler, 2019). A typical L2 class meets for only a few hours a week, with lengthy breaks for school holidays. The L2 input learners receive within a given semester often comes from a single instructor, who may or may not be a native speaker of the target language and does not always speak the target language exclusively in the classroom, and their classmates, whose L2 speech is likely not representative of typical L2 input in an immersion setting and likely unstable in its use of L2 phonetic and phonological properties. It is quite safe to conclude, as many have done before me, that when L2 input is available only in a classroom context, it is limited in its quantity, quality, and variability, relative to naturalistic settings. Given the slow injection of input and depending on the difficulty of the phonological target, the prediction that emerges is that classroom learners may not progress very quickly, or at all, in their acquisition of L2 phonetics and phonology as they move through semesters.
Some of the available research, including several longitudinal studies, support this prediction. For example, the acquisition of Spanish spirantization by native speakers of English was shown to plateau in intermediate level learners (second to fourth semester), such that additional input from several weeks to two semesters of instruction did not improve spirant pronunciation, assessed via impressionistic coding or instrumental acoustic analysis (Reference Díaz-CamposDíaz-Campos, 2004, Reference Díaz-Campos, Klee and Face2006; Reference KisslingKissling, 2013; Reference NagleNagle, 2017; Reference ZampiniZampini, 1994).
Much research targeted acquisition of voicing in stop consonants, using voice onset time (VOT) as the primary acoustic measure. Several studies found no change or eventual plateauing in the VOT of voiceless or voiced L2 stops produced by classroom learners in the course of instruction lasting from several weeks to several semesters, for a variety of L1–L2 pairs: English-speaking learners of Spanish (Reference González López, Counselman, Amaro, Lord, de Prada Pérez and AaronGonzález López & Counselman, 2013; Reference NagleNagle, 2019; Reference ReederReeder, 1998; Reference Schuhmann and HuffmanSchuhmann & Huffman, 2019), Japanese learners of English (Reference Riney and TakagiRiney & Takagi, 1999), Catalan-Spanish learners of English (Reference Mora, Pérez-Vidal, Juan-Garau and BelMora, 2008), and Mandarin learners of Russian (Reference Yang, Chen and XiaoYang, Chen, & Xiao, 2022). In a number of studies, learners’ L1 and L2 stops were statistically indistinguishable in terms of VOT, indicating a lack of L2 phonetic learning, despite the intermediate classroom placement and the prior classroom experience of the participants (Reference Hutchinson and DmitrievaHutchinson & Dmitrieva, 2021; Reference Yang, Chen and XiaoYang et al., 2022). Moreover, in Reference Yang, Chen and XiaoYang et al. (2022) the production patterns mirrored perceptual difficulties, as learners were unable to discriminate L2 categories and assimilated both voiced and voiceless L2 stops to the same L1 category. A different phonological target, English word-initial /ɹ/, also did not improve in the production of Japanese college freshmen learning English over the course of one semester of instruction (Reference SaitoSaito, 2019b), assessed acoustically across the three dimensions of second formant frequency (F2), third formant frequency (F3), and duration.
In perceptual investigations, findings also suggest that intermediate level learners with certain L1 backgrounds may continue to encounter perceptual difficulties with some of the fundamental contrastive features of their L2, such as the backness contrast in French and German rounded vowels, singleton-geminate contrast in Japanese, and voicing contrasts in Russian stops (Reference Darcy, Daidone and KojimaDarcy, Daidone, & Kojima, 2013; Reference Darcy, Dekydtspotter and SprouseDarcy, Dekydtspotter et al., 2012; Reference Hardison and SaigoHardison & Saigo, 2010; Reference Hayes-Harb and MasudaHayes-Harb & Masuda, 2008; Reference Yang, Chen and XiaoYang et al., 2022), especially as pertains to lexical encoding of L2 phonological contrasts.
The hypothesis that the quality or quantity of input is at the root of such plateaus in phonological development is supported by findings that learners with additional immersive L2 experience often outperform traditional learners on equivalent L2 production or perception tasks (see, for example, “advanced” learners in Reference Darcy, Daidone and KojimaDarcy, Daidone, et al., 2013; Reference Darcy, Dekydtspotter and SprouseDarcy, Dekydtspotter et al., 2012). Additionally, even for learners with exclusively classroom L2 experience, individual oral production success was found to be correlated with measures of L2 input, such as the number of hours they spend studying their L2 inside and outside the classroom (Reference Saito and HanzawaSaito & Hanzawa, 2016) and the number of content-based courses taught in their L2 (Reference Saito and HanzawaSaito & Hanzawa, 2018). Ultimately, some scholars concluded that there is an upper limit for improvement of oral ability in instructed settings, and that more intensive exposure and interaction in L2, for example via study abroad, is needed to continue refining students’ L2 phonological abilities (Reference Saito and HanzawaSaito & Hanzawa, 2016).
Summarizing this brief and selective survey, we could conclude that the acquisition of L2 phonetics and phonology in the traditional L2 classroom proceeds slowly and can plateau over time. However, the discussion so far has glossed over the fact that additional classroom learning tools, such as explicit pronunciation instruction, can contribute greatly to accelerating learners’ phonetic and phonological development. In addition, research suggests that the specific phonological L2 targets examined (segmental or suprasegmental features of L2) matter strongly for the gains observed with limited classroom input. These two aspects of classroom acquisition are addressed in the following sections.
23.3 The Effect of Different Phonological Targets on Classroom Learning of L2 Phonetics and Phonology
Different segmental and suprasegmental properties of L2 phonology are not acquired equally readily by L2 learners, independently of learning context. In fact, all theoretical models of L2 speech learning make explicit predictions regarding relative difficulty in the acquisition of L2 sounds based on acoustic and perceptual comparisons with the learners’ L1. For example, the Speech Learning Model (SLM/SLM-r; Reference Flege and StrangeFlege, 1995, Reference Flege, Meyer and Schiller2003; Reference Flege, Bohn and WaylandFlege & Bohn, 2021) postulates that L2 sounds that are perceived to be phonetically similar to L1 counterparts are more difficult to acquire than more dissimilar L2 sounds. The hypothesized reason is “interlingual identification” of phonetically similar L2 sounds as highly acceptable instances of L1 categories by beginner learners, which prevents the mutual discrimination of such sounds both in perception and in production. In a similar vein, the Perceptual Assimilation Model (PAM-L2) predicts that some L2 sounds can be perceptually assimilated to the closest L1 categories, with consequences for their production and perception (Reference Best, Tyler, Bohn and MunroBest & Tyler [2007] and Chapter 7, this volume). All models also predict that these acquisitional asymmetries would eventually erode with sufficient input and accumulation of L2 experience.
In the classroom learning context, such acquisitional asymmetries could be made especially pronounced by the lack of sufficiently prolonged and intensive L2 input, which could eventually place all L2 sounds on an equal footing. As the following review shows, even within the same phonological contrast, different members can be acquired with different speed and level of success. Asymmetric acquisition has been routinely found for L2 voicing categories, although there is no consensus on whether the voiced or voiceless category is easier to acquire even for the same L1–L2 language pair. For example, some production studies reported greater difficulties in the acquisition of Spanish voiced than voiceless stops by English-speaking learners (Reference NagleNagle, 2019; Reference Schuhmann and HuffmanSchuhmann & Huffman, 2019; Reference ZampiniZampini, 1998), while others found that voiceless stops in Spanish and French were more difficult to acquire for English-speaking learners (Reference CasillasCasillas, 2020b; Reference Hutchinson and DmitrievaHutchinson & Dmitrieva, 2021). Such asymmetries in the acquisition of voicing also arise in other language pairs, such as English–Dutch and English–Brazilian Portuguese (e.g., Reference Osborne and SimonetOsborne & Simonet, 2021; Reference SimonSimon, 2009).
Spirantization in Spanish has consistently proved to be a difficult target for English-speaking classroom learners, and even pronunciation instruction interventions and study abroad enrichment remedy the situation only to an extent (Reference Díaz-CamposDíaz-Campos, 2004, Reference Díaz-Campos, Klee and Face2006; Reference Face, Menke, Collentine, García, Lafford and Marcos MarínFace & Menke, 2009; Reference LordLord, 2005; Reference NagleNagle, 2017; Reference Rogers and AlvordRogers & Alvord, 2014; Reference ShivelyShively, 2008; Reference ZampiniZampini, 1994). The body of research on Spanish spirantization suggests that noncontrastive, orthographically opaque, and functionally low-load phonological processes in the L2 may be especially difficult to acquire in the classroom. While little is known thus far about the acquisition of noncontrastive allophonic variation in an L2 classroom beyond Spanish spirantization, data presented in Reference Smith and PetersonSmith and Peterson (2012) and Reference Dmitrieva, Jongman and SerenoDmitrieva, Jongman, and Sereno (2020) on the acquisition of final devoicing by English-speaking learners of German and Russian are compatible with this assumption. In addition, unlearning L1 phonological processes in an L2 may also present challenges, as was shown for unstressed vowel reduction by English learners of Spanish (Reference Cobb and SimonetCobb & Simonet, 2015; Reference Menke and FaceMenke & Face, 2010) and for voicing assimilation by Dutch learners of English (Reference SimonSimon, 2009).
Painting a more optimistic picture, stressed monophthongal Spanish vowels presented little difficulty for English-speaking learners (Reference Cobb and SimonetCobb & Simonet, 2015; Reference Díaz and SimonetDíaz & Simonet, 2015; Reference Menke and FaceMenke & Face, 2010). Certain Spanish consonants, such as palatal nasals or nonvelarized final laterals, also demonstrated high levels of accuracy even at the beginner levels of acquisition (Reference Díaz-CamposDíaz-Campos, 2004; Reference Stefanich and CabrelliStefanich & Cabrelli, 2021). Perceptually, even novice learners discriminated certain L2 contrasts well (Reference Darcy, Daidone and KojimaDarcy, Daidone, et al., 2013; Reference Darcy, Dekydtspotter and SprouseDarcy, Dekydtspotter, et al., 2012; Reference Hayes-Harb and MasudaHayes-Harb & Masuda, 2008) and were shown to retain perceptual gains long term (Reference Mora, Pérez-Vidal, Juan-Garau and BelMora, 2008).
To conclude, research has demonstrated that in the conditions of limited L2 input in the context of classroom learning, acquisition of more “difficult” L2 phonological targets tends to proceed slowly, sometimes without visible signs of progress over the course of semesters. But even in these challenging settings, some L2 phonological targets are “easy” enough to demonstrate accuracy in production and perception, even at the beginning levels of instruction. Research on classroom acquisition of L2 phonetics and phonology needs to continue taking these differences into account, for example when assessing learning progress over time. Finally, as Section 23.4 demonstrates, explicit pronunciation instruction often provides the boost necessary to jump-start the acquisition of challenging L2 phonology.
23.4 The Role of Explicit Pronunciation Instruction and Corrective Feedback in Classroom Acquisition of L2 Phonetics and Phonology
Access to the tools of explicit instruction and learning is a unique advantage of the classroom acquisition of L2 phonetics and phonology, setting it apart from naturalistic learning, which is believed to proceed in a largely incidental and implicit fashion (Reference DeKeyser, Doughty and LongDeKeyser, 2003). It can be hypothesized that pronunciation training and feedback can effectively accelerate phonological development in adult classroom learners, who are cognitively well-equipped for explicit learning (Reference Muñoz and MuñozMuñoz, 2006, Reference Muñoz2014). Indeed, the available work overwhelmingly supports the effectiveness of pronunciation instruction and corrective feedback in the adult classroom learning of L2 phonological targets. The effects of explicit pronunciation instruction on the development of L2 phonetics in classroom learners is a rapidly developing area of research and is reviewed only selectively here. For more comprehensive overviews, see Reference Thomson and DerwingThomson and Derwing (2015), Reference Lee, Plonsky and SaitoLee, Plonsky, and Saito (2020), Reference SaitoSaito (2012), Reference Lee, Jang and PlonskyLee, Jang, and Plonsky (2015), as well as Chapter 35, this volume.
Supporting the effectiveness of pronunciation instruction, only two among fifteen intervention studies reviewed in Reference SaitoSaito (2012) showed no improvement due to pronunciation teaching. A more extensive meta-analysis of eighty-six studies showed that pronunciation instruction had a large statistical effect size in both within- and between-group designs (Reference Lee, Jang and PlonskyLee et al., 2015). Focusing specifically on studies that conducted detailed impressionistic coding or instrumental acoustic analyses, an equally encouraging picture emerges. For example, for English-speaking learners of Spanish, pronunciation of trills and diphthongs responded well to pronunciation training, which included standard instruction and practice with voice analysis software (Reference LordLord, 2005). While Spanish intervocalic spirants proved more resistant to pronunciation training, they too were subject to slight improvements in some assessments (Reference KisslingKissling, 2013; Reference LordLord, 2005, Reference Lord2010). Instrumental measurements of VOT in Spanish voiceless stops indicated significant effects of pronunciation instruction for English-speaking classroom learners, who were assessed in within-group, between-group, and longitudinal designs and received different types of training, such as articulatory phonetics instruction and/or visual feedback with acoustic analysis software (Reference González López, Counselman, Amaro, Lord, de Prada Pérez and AaronGonzález López & Counselman, 2013; Reference LordLord, 2005; Reference Offerman and OlsonOfferman & Olson, 2016; Reference OlsonOlson, 2019; Reference Olson and OffermanOlson & Offerman, 2021; Reference Schuhmann and HuffmanSchuhmann & Huffman, 2019). Accuracy in the realization of Spanish voiced stops by English-speaking learners also increased after intervention, indicating a positive effect of pronunciation training (Reference Schuhmann and HuffmanSchuhmann & Huffman, 2019). Additional evidence suggests that learners’ productions can be improved by advancing their perceptual skills (Reference KisslingKissling, 2014, Reference Kissling2015).
The role of corrective feedback in the development of L2 phonology in classroom learners has received considerably less attention in empirical work that utilizes objective measures of phonological development. While Reference Lyster, Saito and SatoLyster, Saito, and Sato (2013, p. 22) suggested that phonological and lexical errors may be particularly amenable to corrective feedback in the classroom, they also noted that “very few empirical studies have actually tested the acquisitional value of corrective feedback in such domains.” Nevertheless, available results are encouraging. For example, Japanese learners of English, who were recruited in Montreal and participated in pronunciation-focused activities in a pseudo-classroom setting, improved their pronunciation of English /ɹ/ and /æ/ (as determined via acoustic analyses) when they received corrective feedback in the form of recasts – corrected renditions of their original utterances (Reference Saito and LysterSaito & Lyster, 2012a, Reference Saito and Lyster2012b). In a later overview chapter, Reference Saito, Kartchava and NassajiSaito (2021) concluded that corrective feedback can be especially effective in the classroom when learners have enough conversational experience and phonetic knowledge in the target language, when feedback demonstrates target pronunciation (e.g., recasts instead of prompts), and when the target sound is communicatively important and perceptually salient.
Besides affirming the demonstrable effectiveness of classroom pronunciation and feedback intervention, recent reviews also pointed out the limitations of the current research. These include small sample sizes and overreliance on null hypothesis testing; the lack of diversity in terms of age and the L1 and target languages of the participants (predominantly over thirteen years old with English as either their first or their target language); the need for delayed post-tests and between-group experimental comparisons; the need for more authentic testing materials; and the need for investigations of aptitude-treatment interactions (Reference Lee, Jang and PlonskyLee et al., 2015). Researchers have also called for more valid outcome measures in pronunciation research that isolate the features of phonological development to the exclusion of other aspects of oral proficiency, such as lexical, grammatical, and pragmatic aspects (Reference SaitoSaito, 2012).
Finally, it needs to be acknowledged that, although explicit phonetic instruction could counteract the lack of abundant and authentic oral L2 input in a classroom setting, they remain unavailable to most learners, despite repeated calls for their incorporation in the curriculum from the research community (e.g. Reference Colantoni, Escudero, Marrero-Aguiar and SteeleColantoni et al., 2021; Reference DarcyDarcy, 2018; Reference Darcy, Ewert, Lidster, Levis and LeVelleDarcy, Ewert, & Lidster, 2012; Reference DerwingDerwing, 2018). The reasons include the lack of instructional materials and teacher training opportunities, as well as instructors’ beliefs that lexical and morphosyntactic skills are of primary importance for developing successful communicative competence and/or that pronunciation learning can take place without explicit instruction (Reference Derwing and MunroDerwing & Munro, 2009; Reference Nagle, Sachs and Zárate–SándezNagle et al., 2018; Reference Rossiter, Derwing, Manimtim and ThomsonRossiter et al., 2010; Reference Thomson and DerwingThomson & Derwing, 2015).
23.5 The Extent of Individual Differences and Their Role in Classroom Learning of L2 Phonetics and Phonology
It can be hypothesized that in the challenging circumstances of limited L2 oral input, individual differences among classroom learners could play a very important role in determining trajectories and outcomes of learning L2 phonetics and phonology. In fact, hardly any empirical work on classroom phonological acquisition failed to mention substantial individual variability. Nevertheless, few investigations addressed it in detail. Those that did often reported that individual learning profiles ran contrary to group trends. That is, even in studies where, as a group, classroom learners evidenced no phonetic learning of L2, individual learners sometimes achieved native-like values in specific acoustic aspects of the target L2 sounds (Reference Hutchinson and DmitrievaHutchinson & Dmitrieva, 2021; Reference NagleNagle, 2017). In addition, researchers identified a number of distinct patterns across individuals in acquiring the members of L2 phonological contrasts. For example, there were asymmetric learners, who were on target for one member of a phonological opposition but not the other; symmetric developers, who were on target for both; nonlearners, who did not acquire either member; and near-native learners, who approached a native-like range for at least one member of the opposition (Reference CasillasCasillas, 2020b; Reference Hutchinson and DmitrievaHutchinson & Dmitrieva, 2021; Reference NagleNagle, 2019).
In longitudinal studies, much individual variability in the trajectory of learning gains has been reported (Reference CasillasCasillas, 2020b; Reference NagleNagle, 2019). When pronunciation intervention was provided, individual learners differed on the presence of post-training improvement, on the timing of initial signs of improvement and their magnitude, and on the continuation of the improvement trend post-intervention (Reference Schuhmann and HuffmanSchuhmann & Huffman, 2019).
Given this extent of individual variability, a logical question is what learner characteristics are causing such differences. Motivation, aptitude, learning strategies, and emotional profiles are among the characteristics frequently suggested as influential in predicting individuals’ ability to succeed in classroom L2 learning, including phonological acquisition. Nevertheless, comprehensive investigations of the connections between such individual qualities and phonological acquisition in the classroom are yet to be undertaken. This is one of the areas of classroom L2 research that is ripe for further development.
Available assessments of phonological and phonetic learning in a classroom setting reported that greater individual phonemic coding ability (ability to analyze and remember unfamiliar sounds) was linked to higher pronunciation accuracy in Japanese learners of English, evaluated impressionistically by human raters (Reference SaitoSaito, 2017; Reference Saito, Suzukida and SunSaito, Suzukida, & Sun 2019). Greater phonemic coding ability was also linked to more target-like acoustic properties of L2 sounds, measured instrumentally, for example F2 of English /ɹ/ produced by Japanese learners (Reference SaitoSaito, 2019a). Participants with better associative memory demonstrated more target-like performance on transition duration and F3 for English /ɹ/ (Reference SaitoSaito, 2019a). In addition, individual differences in explicit “noticing” or “awareness” of the acoustic/articulatory differences between L1 and L2 sounds were linked to pronunciation performance with respect to the durational properties of English /ɹ/ as realized by Japanese classroom learners (Reference SaitoSaito, 2019b). For studies reporting on the effects of motivation, effort, and emotions (e.g., enjoyment versus anxiety) on L2 pronunciation assessed via holistic measures of comprehensibility or accentedness, see, for example, Reference Saito, Dewaele and HanzawaSaito, Dewaele, and Hanzawa (2017), Reference Saito, Dewaele, Abe and In’namiSaito et al. (2018), and Reference NagleNagle (2018).
Perceptual abilities of classroom L2 learners were also shown to correlate with aspects of individual cognitive abilities, most notably memory capacity. For example, greater phonological short-term memory predicted better identification of French nasal vowels by learners with an Australian English L1 background (Reference InceogluInceoglu, 2019) and better identification and discrimination of English vowels by learners with a bilingual Spanish-Catalan L1 background (Reference Aliaga-García, Mora and Cerviño-PovedanoAliaga-García, Mora, & Cerviño-Povedano, 2011).
Several scholars noted that different components of aptitude may be important at different stages of acquisition (Reference Hu, Ackermann and MartinHu et al., 2013; Reference Saito, Suzukida and SunSaito et al., 2019). Some specifically proposed that explicit learning abilities may play a more important role at the earlier stages of classroom acquisition, while implicit learning abilities become more important at later stages (Reference SaitoSaito, 2017; Reference Saito, Suzukida and SunSaito et al., 2019).
Finally, some research suggested that the role of individual aptitude may be limited by culturally specific circumstances of classroom L2 acquisition. For example, the teaching of English in Vietnam traditionally tends to emphasize grammar translation and production-based techniques, which do not develop reliance on auditory acuity in L2 learning. Possibly for that reason, individual auditory acuity did not predict phonological accuracy of Vietnamese beginner to intermediate classroom learners of English in Reference Saito, Suzukida, Tran and TierneySaito et al. (2021).
As this brief overview demonstrates, studies that investigated the links between individual characteristics and phonological/phonetic development in classroom learners, where learners’ L2 phonology was assessed in a targeted and objective way, are not abundant. Two aspects of individual variability, which I believe merit special attention in future research, are the motivational profiles of classroom learners and identity-related considerations (Reference NagleNagle, 2018).
In the classroom, accented L2 speech is expected. Moreover, learners may avoid attempting to adopt authentically sounding properties of L2 for fear of ridicule if done improperly, because they do not want to “show off,” or in order to signal solidarity with their classmates and compatriots via the common use of accent (see Reference Gatbonton, Trofimovich and MagidGatbonton, Trofimovich, & Magid [2005] for evidence of association between strength of accent and ethnic affiliation). Identity-related individual differences may affect L2 phonology in the classroom, in addition to differences in aptitude, but have received little attention in the literature to date (although see Reference MoyerMoyer [2017] for the discussion of choice in L2 phonology).
In addition, motivational profiles of classroom learners in general, and those who choose a specific language to study in particular, are likely to be different from motivational profiles of learners in the immersive setting. While it is reasonable to assume that immersed L2 learners, especially immigrants, are likely to have strong integrative and instrumental motivation, as they seek to access educational and employment opportunities and integrate in the culture, classroom learners often have no such pressures. Only classroom learners sometimes choose to study a language for purely “examination-driven purposes” (Reference Saito and HanzawaSaito & Hanzawa, 2018) or because it is required for a specific study major, not because they are genuinely interested in the language and culture or have associated professional or personal plans. Such unique motivational profiles may show up especially often for languages perceived as “easy” in a given learner population. Such motivational characteristics, combined with the lack of immediate communicative pressures to sound comprehensible to speakers of L2, may potentially hinder the acquisition of L2 phonology in the classroom.
Further work investigating the individual motivational profiles of classroom learners across language pairs, and comparing them to learners in other contexts, is needed to explore the hypotheses presented here. For now, to summarize this review, we know that great individual variability in the L2 phonology of classroom learners can be partially predicted by individual differences in aptitude, in particular phonemic coding ability, associative memory, and phonological short-term memory.
23.6 L2 Phonetic Category Formation and L1 Restructuring in Classroom Learning
One of the most influential theories of the acquisition of second language speech, the Speech Learning Model (SLM: Reference Flege and StrangeFlege, 1995, Reference Flege, Meyer and Schiller2003; SLM-r: Reference Flege, Bohn and WaylandFlege & Bohn, 2021), assigns critical importance to L2 phonetic category formation. According to the SLM/SLM-r, forming L2 categories that are distinct from the ones in a speaker’s L1 is the single most important precursor to effective perception and production of L2 speech.
While there is no general agreement on what constitutes evidence for L2 phonetic category formation in L2 learning, a clear acoustic difference between comparable L1 and L2 categories in the production of learners can be used as an indication that the L2 category formation process has at least begun. Such evidence would suggest that learners noticed the acoustic-phonetic distinction between similar L1 and L2 phones and are no longer considering them equivalent. Distinct L1 and L2 phonological category boundaries in perception can also serve as evidence of L2 category formation. In identification experiments, the perceptual category boundary is the value of the acoustic parameter (e.g., VOT) at which the perceiver switches from “hearing” one category (e.g., voiced stops) to another one (voiceless stops).
In the situation of limited L2 input and rare availability of explicit pronunciation instruction, one may hypothesize that classroom learners would not reach the important milestone of L2 category formation until advanced levels of proficiency. Yet, available reports suggest otherwise.
Not every inquiry into classroom acquisition of L2 phonology included speech samples of learners’ L1, but those that did reported that often even beginner and intermediate-level classroom learners were able to implement some degree of expected acoustic difference between L1 and L2 phones. For example, L1 and L2 stops (voiced or voiceless) were shown to be acoustically different from each other in terms of VOT for several groups of beginner to intermediate English-speaking learners of Spanish or Russian (Reference Dmitrieva, Jongman and SerenoDmitrieva et al., 2020; Reference González LópezGonzález López, 2012; Reference González López, Counselman, Amaro, Lord, de Prada Pérez and AaronGonzález López & Counselman, 2013; Reference Schuhmann and HuffmanSchuhmann & Huffman, 2019). Occasionally, the new L2 categories learners created were acoustically distinct from L1 categories but not in the way that made them more similar to L2 targets, as was shown for English voiced stops produced by Brazilian learners in Reference Osborne and SimonetOsborne and Simonet (2021). Nevertheless, such findings also suggest that the learners realized the importance of separating the phonetic spaces of their L1 and their L2.
In contrast to the studies reviewed so far, there are also reports of learners with substantial classroom experience who nevertheless produced L1 and L2 stops as acoustically identical, for example American students learning French in Reference Hutchinson and DmitrievaHutchinson and Dmitrieva (2021) and speakers of Mandarin learning Russian in Reference Yang, Chen and XiaoYang et al. (2022). Given the importance of category formation in second language learning, further research is necessary to understand the reasons for the lack of L2 category formation in these circumstances.
Perceptual studies also suggest that, with sufficient input, L2 category formation can occur in classroom settings. For example, after only seven weeks of domestic immersion or by the end of the fifth semester of traditional college-level Spanish courses, English-speaking learners demonstrated language-specific perceptual boundaries between voiced and voiceless stops (Reference Casillas and SimonetCasillas & Simonet, 2018). Moreover, by the end of the seven-week domestic immersion program, learners categorized voicing in Spanish stops in a manner comparable to Spanish-English bilinguals (Reference CasillasCasillas, 2020a).
Given this evidence for L2 category formation in classroom learners, one may wonder whether it leads to restructuring of the L1 phonetic space, as has often been reported for immersed L2 learners (Reference ChangChang, 2012; Reference FlegeFlege, 1987). Such restructuring is evidenced by phonetic changes in L1 sound categories under the effect of acoustically proximal L2 sounds. These changes can be assimilatory or dissimilatory in nature, whereby L1 sounds move toward or away from similar L2 categories.
Perhaps somewhat surprisingly, even in classroom learners and with restricted L2 input, L2 learning can lead to changes in the phonetics of the L1. These changes can be detected by comparing the native speech of learners with more L2 experience to the native speech of learners with less L2 experience. For example, more advanced classroom learners of Spanish produced acoustically more Spanish-like English (L1) voiced stops and vowels when compared to lower proficiency classroom learners in Reference Herd, Walden, Knight and AlexanderHerd et al. (2015).
Alternatively, a comparison between the native speech of monolinguals and the native speech of those who study a second language can be made. For example, native English voiced and voiceless stops produced by classroom learners of Russian were acoustically different from those produced by English-speaking nonlearners in Reference Dmitrieva, Jongman and SerenoDmitrieva et al. (2020). Similarly, classroom learners of English in Brazil produced L1 (Brazilian Portuguese) voiced stops that were significantly different from those produced by nonlearners in Reference Osborne and SimonetOsborne and Simonet (2021). To conclude, the findings from this small group of studies suggest that classroom L2 exposure can trigger at least the beginning stages of L2 phonetic category formation and the corresponding restructuring in the phonetic space of L1.
23.7 Conclusion and Future Directions
In this chapter, I reviewed some of the recent findings pertaining to the phonetics and phonology of adult L2 learners in the classroom. These findings paint a mixed picture. On the one hand, classroom learners demonstrate great variability in the acquisition of L2 sounds and can sometimes stall in the development of their L2 phonetic abilities. Such acquisitional plateaus are especially common with respect to “difficult” aspects of L2 phonology and are likely due to the slow rate with which classroom learners accumulate L2 experience and proficiency. On the other hand, even in these circumstances, learners can notice the distinguishing features of L2 sounds and begin creating L2 categories that are acoustically and perceptually distinct from L1 categories. Moreover, this phonetic learning is profound enough to trigger the partial restructuring of the L1 phonetic space. Finally, the tools of explicit learning available in the classroom setting, specifically explicit pronunciation instruction and corrective feedback, have shown great results in boosting phonological acquisition in classroom learners.
What does the future hold for this line of research? There are many exciting directions that are yet to be fully explored. First, with respect to the role of input, while it is undeniable that more input is better, future research should also focus on exploring the types of input that could be especially beneficial for classroom learners. For example, a recent laboratory study suggests that exposing classroom learners to authentic L2 speech input via television and film could have beneficial effects for the development of L2 speech production (Reference Hutchinson and DmitrievaHutchinson & Dmitrieva, 2022).
Second, acknowledging that not all L2 phonological targets are created equal, future research should survey acquisitional challenges with respect to a greater variety of L2 segmental and suprasegmental properties and especially L2 phonological processes, such as allophony and phonological neutralization. When working with specific L2 phonological targets, it is also important to remember that not all have an equal impact on the perceived comprehensibility of L2 speech. It is especially pertinent for pronunciation training research and practice (see the relevant recommendations in Chapter 35, this volume). To determine what areas of L2 phonology pronunciation training in the classroom should pay special attention to, it is important to continue research identifying links between the acoustic realization of specific aspects of L2 phonology and the resulting effects on comprehensibility, intelligibility, and accentedness (Reference LevisLevis, 2005, Reference Levis2020).
Third, future research should continue exploring the sources of individual variability in classroom learners, including the possibility that classroom learners of certain languages in specific cultural contexts can have unique motivational and attitudinal profiles, affecting their L2 phonetics and phonology. Individual or group motivational profiles or cultural contexts could be responsible for disparate research findings, for example with respect to L2 category formation in classroom learning. We also need more classroom studies utilizing the tools of instrumental acoustic analysis and perceptual experiments that allow researchers to isolate and evaluate learners’ phonetic ability without conflating it with other oral abilities, such as morpho-syntactic and lexical sophistication. In addition, future research needs to expand the range of language pairs studied, including less commonly taught languages, learners with an L1 background other than English, and learners in diverse cultural settings.
Fourth, as pertains to category formation, it is apparent that collecting L1 samples in studies of L2 classroom acquisition is a useful practice. Even when learners’ L2 productions are profoundly different from their L2 targets, L1–L2 acoustic and perceptual comparisons can reveal that cognitive separation of L1 and L2 phonological systems has begun, preparing ground for further gains in L2 phonology. Similarly, markers of phonetic changes in classroom learners’ native speech can be indicative of phonetic and phonological L2 learning. For example, in Reference Dmitrieva, Jongman and SerenoDmitrieva et al. (2020) acoustic indices of assimilatory L1-to-L2 changes were significantly correlated with the degree of L2 phonetic learning. Therefore, phonetic changes in L1 are another possible tool for diagnosing the extent of underlying L2 phonetic and phonological learning in the classroom and elsewhere.
Fifth, although it was not reviewed in the present chapter because of the limited attention it received in previous research, the role that instructors’ characteristics and beliefs play in the classroom acquisition of L2 phonology is a worthy target for future investigations. Similarly, age effects on the classroom learning of L2 phonetics and phonology received relatively little attention, with most investigations restricted to college-aged learners.
Sixth, future research into aspects of classroom learning that set it apart from other contexts of L2 acquisition can explain, at least in part, whether and why outcomes of classroom learning differ from outcomes of naturalistic acquisition. One such example is the effect of orthography on the acquisition of L2 phonology, since classroom L2 learning is notoriously reliant on literacy and written materials (more on this topic in Chapter 31, this volume).
Finally, future research demonstrating the benefit of prior classroom experience in L2 phonological learning beyond the classroom would provide critical demonstration of the efficacy of second language education. Even if there is some truth to the intuition that, short of signing up for countless years of L2 classes, purely classroom-based language learning has an upper limit on the attainment of L2 phonetics and phonology, it is also true that classroom L2 learning often constitutes only an initial step of the journey. This step is likely to be vitally important in preparing learners to take on the challenges of the next steps, for example L2 immersion. As such, it should not be underestimated.
24.1 Introduction
Study abroad is often considered an optimal environment for pronunciation learning because learners have access to a massive amount of second language (L2) input that is rich, varied, and representative of various types of speaking contexts and registers. They can also participate in real-life communicative situations exclusively in the L2, thus using the language in an environment perceived as more authentic than the traditional foreign language classroom. While there is no doubt that study abroad can be beneficial for pronunciation learning, recent research has challenged some of the implicit assumptions researchers and teachers make about study abroad. First and foremost, the fact that learners can immerse themselves and interact in the L2 frequently does not necessarily mean that they always choose to do so. Most learners seem to use a combination of their native language (L1) and the L2 while abroad (Reference Dewey, Bown and BakerDewey et al., 2014), and the reality is that even communicatively competent L2 learners may continue to use the L1 for certain tasks and/or with certain individuals while abroad. As a result, the L2 may be used in a limited range of settings, such as in the courses students are taking and in short transactions, which are unlikely to provide sufficient linguistic complexity and variability to stimulate development. It is important to bear in mind that a learner’s desire to interact in the L2 in an immersion setting is not static, nor are their patterns of L1 and L2 use over their stay, which is to say that understanding L2 phonetic and phonological learning during study abroad likely hinges on tracking such fluctuations. It is equally important to recognize that the nature of study abroad has changed radically, with many students opting for short-term, summer programs that do not disrupt their coursework during the academic year. If part of the potential of study abroad for pronunciation development rests upon massive exposure and sustained, meaningful L2 contact, then shorter lengths of stay may prove less optimal for L2 pronunciation learning.
The changing dynamics of study abroad are not necessarily positive or negative, but they do demand that stakeholders reconceptualize the role of study abroad as a panacea for pronunciation development. In fact, it seems more appropriate to think of study abroad as one element of successful pronunciation learning that must be combined with formal instruction in the language both before departure and upon learners’ return to the language classroom.
24.1.1 Brief Historical Overview of the Field
Research on pronunciation learning during study abroad is in many respects still in its infancy or at least early adolescence. One of the first studies to examine pronunciation-related changes after study abroad was Reference SimõesSimões (1996). He investigated English-speaking learners of Spanish who spent five weeks in Costa Rica. Through an impressionistic analysis, he noted that two of the five participants improved their production of Spanish vowels, exhibiting less vowel reduction. In the past twenty-five years, there have been many more studies on pronunciation and study abroad, but the scope of this body of work has remained relatively narrow. To date, most studies have examined changes in production, largely ignoring the potential value of study abroad for perception, and production studies have focused on L2 speakers’ acquisition of phonetic aspects of the target language, which means that the current state of knowledge reflects an understanding of how learners move toward a more native-like accent while abroad. To a certain extent, this emphasis reflects the trajectory of L2 pronunciation research at large, insofar as early work concentrated on documenting relationships between age and experience and ultimate attainment in highly experienced bilingual populations (Reference Flege and LiuFlege & Liu, 2001), including the degree of foreign accent that L2 speakers have (Reference Piske, MacKay and FlegePiske, MacKay, & Flege, 2001).
Findings for length of residence, amount of L1 and L2 use, and other experiential variables seemed to set expectations for the effect of study abroad on pronunciation learning because researchers have commonly viewed study abroad as another case of language immersion, despite clear contextual differences between immigrants living in an L2 environment and instructed L2 learners participating in a study abroad program. In fact, most study abroad pronunciation research lacks a systematic examination of the quantity and quality of L2 contact while abroad, making it difficult to determine precisely how immersive participants’ study abroad experience actually was. In this chapter, we therefore adopt the perspective that study abroad should be viewed as a case of instructed – as opposed to naturalistic – second language acquisition because someone (usually the study abroad director) manipulates and decides on a set of external variables (e.g., type of accommodation, length of stay, adherence to a language pledge, courses students take while abroad) that affect the type and the amount of L2 learning that takes place. It is our view that as research on phonetic and phonological development during and after study abroad moves forward, researchers will need to take these external variables into account, along with learner-internal factors such as motivation and attitudes, in order to better understand pronunciation learning outcomes.
24.1.2 Theoretical Grounding
It bears mentioning that the popular belief that study abroad could serve as an ideal context for pronunciation learning is theoretically sound, insofar as models of L2 phonetic learning argue that experience plays a fundamental role in helping late L2 learners improve their ability to perceive and produce L2 sounds. According to the Speech Learning Model (SLM; Reference Flege and StrangeFlege, 1995; Reference Flege, Bohn and WaylandFlege & Bohn, 2021), phonetic learning remains possible throughout the life span, but late (i.e., post-pubertal, adult) learners are more likely to equate similar L1 and L2 sounds, hindering their ability to create new L2 categories. However, with increasing L2 experience, learners may notice cross-linguistic differences in phonetic realization, which could lead to the formation of a separate L2 phonetic category. Furthermore, even if L1 and L2 sounds remain linked under a shared category, pronunciation should still improve, though the phonetic characteristics of the L2 sound are likely to fall somewhere between the L1 and the L2, depending on the makeup of the shared category. The L2 Perceptual Assimilation Model (Chapter 7, this volume) and the L2 Linguistic Perception Model (Chapter 8, this volume) also assign an important role to input and interaction. For instance, according to the L2 Linguistic Perception Model, interaction is the engine of pronunciation development. Specifically, perceptual errors trigger a reevaluation and reorganization of the emerging phonological system, with the goal of preventing similar errors in the future.
In summary, then, models of L2 phonetic learning coincide on at least two points: adult L2 learners can improve their perception and production of L2 sounds, and access to rich, varied, and meaningful input and interaction is critical for L2 pronunciation learning. These models posit a strong perception–production link, which means that perceptual learning is the basis for improving production and, to a certain extent, gains in production should feed back into perception (Reference Flege, Bohn and WaylandFlege & Bohn, 2021).
Research on high variability phonetic training, a paradigm in which the listener is exposed to tokens spoken by multiple talkers in different phonetic contexts, also provides an empirical rationale for the potential positive effect of study abroad on pronunciation learning. Accumulated work on this technique has shown that it helps listeners establish robust and flexible perceptual categories by encouraging the listener to abstract away from the patterns of a particular speaker toward the phonetic cues that signal the target contrast (Reference ThomsonThomson, 2018). It stands to reason, then, that listening to and interacting with a wide variety of L2 speakers, both native and non-native, in an immersion context could stimulate perceptual learning. This, in turn, should percolate up to different levels of pronunciation structure, aiding with word recognition, sentential processing (e.g., connected speech characteristics), and so on. With respect to production, assuming a perception–production link, perceptual gains should contribute to gains in production (Reference Sakai and MoormanSakai & Moorman, 2018), enabling learners to produce more phonologically and phonetically accurate L2 sounds and prosody. Likewise, frequent opportunities to interact in the L2 should help learners practice and eventually automatize new articulatory routines, leading to more fluent production in increasingly spontaneous contexts of use.
24.2 Current Perspectives
24.2.1 Improvement in Perception as a Result of Study Abroad
If learning to perceive the target language accurately depends on meaningful experience and interaction in the L2, as L2 speech learning models suggest, then it stands to reason that perception should show robust gains after a period abroad. Surprisingly, very few researchers have examined the effect of study abroad on perceptual learning, and the studies that have been conducted have yielded mixed results. In one of the most comprehensive longitudinal studies to date, Reference Mora and Pérez-VidalMora (2015) examined Spanish-Catalan bilinguals’ perception of English vowels and consonants longitudinally over a four-year period, during which university students took general English language courses and participated in a three-month study abroad program. The longitudinal design of the study allowed for a direct comparison of the effect of general language instruction and study abroad on L2 perception. Results showed that learners made gains after general instruction but not study abroad, though there were substantial individual differences in both initial performance and gains over time. To explain these somewhat surprising results, Reference Mora and Pérez-VidalMora (2015) suggested that providing pronunciation instruction prior to study abroad could have helped prepare students for the study abroad experience (e.g., by raising their awareness of key cross-linguistic differences), which could have facilitated additional gains in perception beyond those that had been documented after general instruction. This is precisely what Reference Romanelli, Menegotto and SmythRomanelli, Menegotto, and Smyth (2015) found in their study on L1 English speakers who participated in a three-week study abroad program in Argentina. Eight of the fifteen learners received instruction on lexical stress patterns, including explicit descriptions of cross-linguistic differences in stress assignment and realization in Spanish and English, whereas the other seven did not. Although both groups improved their stress perception after studying abroad, gains were greater for the instructed students, suggesting that instruction in some cases may play an important role in preparing learners to notice and process the extensive input they receive while abroad.
An important theme that emerges in this small body of literature is variation, in terms of the gains that individual learners achieve and the patterns that are evident for individual sounds. For instance, Reference NicholasNicholas (2018) investigated English speakers’ perception of French vowels before and after studying abroad in France for a semester. Of the twelve participants included in the study, seven improved their vowel perception. In another study of L2 French sounds, Reference EngstlerEngstler (2012) tested a group of seventeen English-speaking learners at two, five, and nine months after they had studied in France for four months. Using an AX discrimination task, Engstler found that learners’ perception of consonant contrasts improved more than that of vowels, though these trends varied from learner to learner based on factors such as attitudes toward the language and amount of exposure to French upon returning home.
Overall, present work suggests that learners’ perception can improve after study abroad, though the amount of learning that takes place seems to be linked to other factors, both instructional (e.g., whether participants received instruction prior to departure) and individual (e.g., initial starting point, learner differences). However, much more work is needed in this area before a definitive conclusion can be reached.
24.2.2 Improvement in the Production of Specific Features as a Result of Study Abroad
Compared to research on speech perception and study abroad, there have been far more studies focusing on the impact of study abroad on speech production, especially the production of specific target features such as individual vowels and consonants. Most studies have focused on the acquisition of Spanish pronunciation features during short-term study abroad programs. Thus, results should be interpreted with caution and treated as preliminary until a more diverse and representative body of research is available.
In this area, a common methodology is to compare a group of learners studying abroad with a group of at-home learners who are enrolled in a language course to determine if study abroad confers any gain above and beyond formal instruction. In most cases, the post-test occurs at the end of the study abroad program or immediately after it, which means that study abroad research focuses on immediate gains rather than on the long-term impact of study abroad on pronunciation learning. As with perception research, the logic is that extensive L2 input and opportunities for L2 use should promote gains in phonetic accuracy. Because this logic rests on the assumption that learners are indeed choosing to interact in the L2, studies have also sought to examine the extent to which experiential and social variables such as learners’ self-reported amount of L1 and L2 use, their apparent social networks, and their motivation and attitudes toward the target language community regulate learning. Methodological tools such as the Language Contact Profile have proven useful in accounting for these variables (Reference DeweyDewey, 2017; Reference Freed, Dewey, Segalowitz and HalterFreed et al., 2004; Reference GassGass, 2017). Some studies have also investigated whether targeted pronunciation instruction prior to or during study abroad catalyzes learning by drawing attention to cross-linguistic differences in pronunciation that might otherwise fly under the radar in a communicative context.
Overall, study abroad seems to play a positive role in the learning of some pronunciation features, but study abroad groups do not always outpace their at-home counterparts. Reference Díaz-CamposDíaz-Campos (2004) and Reference Bongiovanni, Long, Solon and WillisBongiovanni et al. (2015) compared study abroad and at-home learners’ production of a range of Spanish consonants before and after a short, summer immersion experience. The authors obtained slightly different results, which, when compared, provide insight into the types of structures that are likely to develop as a result of study abroad. In Reference Díaz-CamposDíaz-Campos (2004), both at-home and study abroad participants improved their production of word-initial voiceless stops (/p, t, k/) and word-final /l/, but there was little change in the production of word-medial voiced stops (/b, d, g/). In Reference Bongiovanni, Long, Solon and WillisBongiovanni et al. (2015), both learner groups improved their production of word-initial voiceless stops but showed limited gains on word-medial voiced stops, which aligns with Díaz-Campos’ findings. However, study abroad participants in Reference Bongiovanni, Long, Solon and WillisBongiovanni et al. (2015) also showed significant improvements in their production of the tap and a trend toward improvement in trill production. Together, these results tentatively suggest that features that occur in salient positions (e.g., word-initial stops) or features that are themselves salient to learners (e.g., the tap and trill) tend to undergo development, whereas features that occur in other contexts (e.g., word-medially, such as the voiced stops; word-finally, such as /l/) may not improve to the same extent. Reference Nagle, Morales-Front, Moorman, Sanz, Velliaris and Coleman-GeorgeNagle et al. (2016) obtained similar results: after a short-term study abroad program, English-speaking learners of Spanish showed significant improvements in their production of word-initial voiceless stops, whereas their production of word-medial voiced stops did not change significantly. It bears mentioning that these studies differed in terms of how they evaluated accuracy. Reference Díaz-CamposDíaz-Campos (2004) used a binary accuracy criterion (e.g., correct versus incorrect variant), whereas Reference Bongiovanni, Long, Solon and WillisBongiovanni et al. (2015) and Reference Nagle, Morales-Front, Moorman, Sanz, Velliaris and Coleman-GeorgeNagle et al. (2016) evaluated accuracy using gradient measures such as voice onset time values for word-initial voiceless stops (/p, t, k/). Methodological differences (i.e., the choice of a binary versus a continuous outcome measure) could therefore account for some of the empirical differences observed across these studies.
Word-medial voiced stops deserve special attention because they have been studied extensively in relation to study abroad. As such, they offer an interesting test case for the type of study abroad experience that is likely to promote the development of a linguistically complex structure. Briefly stated, Spanish voiced stops have stop and fricative or approximant allophones. The stop allophone occurs after a homorganic consonant (nasals and /l/ for /d/), whereas the fricative or approximant occurs in other phonetic contexts. Spanish also allows for resyllabification across word boundaries, which means that word-initial voiced stops are often realized as fricatives or approximants in connected speech (e.g., la bodega, “the wine cellar,” [la.βo.ˈðe.ɣa]), and the degree of weakening that the fricative or approximant allophone exhibits varies as a function of dialect, register, and many other factors (Reference EddingtonEddington, 2011). Thus, it could be the case that learners simply need a greater amount of exposure before they can successfully begin to map this system and produce the fricative and approximant allophones in their own speech. Targeted instruction might also help learners improve their production by drawing their attention to this complex alternation. Both hypotheses are borne out in current research findings. Reference Alvord and ChristiansenAlvord and Christiansen (2012) and Reference Rogers and AlvordRogers and Alvord (2014) studied learners who had spent two years on a mission trip in a Spanish-speaking country. Approaching the production of the voiced stops from a categorical perspective, Reference Alvord and ChristiansenAlvord and Christiansen (2012) characterized learner variants as either stops or approximants. Overall, learners produced the approximant allophone 81 percent of the time in intervocalic position, though there was substantial variation in individual rates of suppliance. Reference Rogers and AlvordRogers and Alvord (2014) adopted a more gradient approach, focusing on the phonetic characteristics of the approximants that learners produced. They found that study abroad participants approximated native-like levels of lenition, or weakening, in intervocalic contexts. Because neither study included a pretest, gains cannot be unequivocally attributed to participants’ extensive study abroad experience, but the findings nevertheless provide evidence that for certain target structures (i.e., linguistically complex, nonsalient, etc.) longer periods abroad may be beneficial. Instruction also seems to play an important role in shaping the gains learners achieve while abroad. Reference LordLord (2010) compared students who received pronunciation instruction while abroad to those who did not. Although both groups improved their production significantly, gains were much larger for instructed learners, but results should be interpreted with caution due to the small sample size of the study (n = 8).
Studies examining vowels paint a different picture, insofar as they suggest limited development after study abroad, even after relatively long-term immersion experiences. For example, Reference Avello, Lara and Pérez-VidalAvello and Lara (2015) compared Spanish-Catalan bilinguals who had studied abroad in an English-speaking country for either three or six months. They found no significant gains in the production of the /i/–/ɪ/ and /æ/–/ʌ/ contrasts for the six-month group, whereas the three-month group showed a slight improvement on /æ/–/ʌ/ and slight backsliding on /i/–/ɪ/. They also examined participants’ production of word-initial voiceless stops, finding no change for the three-month group versus improvement for /k/ for the six-month group. Participants’ stop consonant production was relatively target-like at pretest, falling between Spanish and English norms (save /k/ for the six-month group), so a lack of improvement could be due, at least in part, to participants’ initial level of production. However, this was not the case for the vowels making up each contrast, which showed substantial overlap, or neutralization, at pretest. To explain these results, the authors hypothesized that the phonetic features of the target structures may not have been salient to learners in an immersion context where communication is paramount. For that reason, they suggested that pronunciation instruction could have been beneficial, as confirmed by studies such as Reference Romanelli, Menegotto and SmythRomanelli et al. (2015) and Reference LordLord (2010). Other studies have shown similar patterns of modest, albeit statistically significant, change in speakers’ vowel production. For instance, Reference StevensStevens (2011) found that English speakers produced shorter, more target-like Spanish vowels after study abroad, whereas Reference Long, Solon and BongiovanniLong, Solon, and Bongiovanni (2018) reported that study abroad learners produced longer, less Spanish-like vowels at post-test. Overall, studies have reported either null findings or statistically significant findings that are modest at best, which indicates that study abroad in and of itself may not have much of an (immediate) impact on L2 vowel production. However, more research on vowel production is needed given the limited work in this area, especially with respect to the target languages sampled (i.e., Spanish and English).
Against this backdrop of studies that found few effects of study abroad on L2 vowel development, Reference O’BrienO’Brien’s (2003) dissertation stands out as an exception, arguably due to the relatively long duration of the sojourn. The author compared the production of German high vowels /i:, y:, u:/ between a group of thirty-four native speakers of English who spent an academic year in Freiburg, Germany, and a comparison group of twenty-six students who continued German studies in Wisconsin, USA. While learners in both groups frequently produced native-like /i:/ at the end of their respective instructional periods, probably because of its similarity to English /i/, only the study abroad participants approached native-like production of /y:/. Several study abroad learners even seemed to have created a new L2 category for this sound, which the SLM would attribute to this sound’s distinct acoustic properties compared to the L1. In sum, Reference O’BrienO’Brien’s (2003) study suggests that longer stays might be necessary to acquire certain aspects of L2 phonology.
Finally, a small handful of studies have examined suprasegmental features after study abroad. Studies on intonation indicate that speakers’ production of boundary tones tends to become more target-like after study abroad, but their production of other intonation features, such as prenuclear peak alignment in L2 Spanish, does not undergo the same amount of development (Reference CraftCraft, 2015; Reference Henriksen, Geeslin and WillisHenriksen, Geeslin, & Willis, 2010; Reference Seijas, Sanz and Morales-FrontSeijas, 2018; Reference ThornberryThornberry, 2014). Again, this finding may be due to the saliency of boundary tones relative to other suprasegmental features. Like segmental research, suprasegmental research reflects an emphasis on L2 Spanish, but Reference Kim, Dewey and Baker-SmemoeKim et al. (2015) is a notable exception. They studied English speakers’ tonal accuracy after studying abroad in China. Twenty-two learners participated in a simulated oral proficiency interview before departure and at the end of their sixteen-week sojourn. Tonal accuracy was judged by four native Chinese listeners. Results showed that participants improved significantly despite a high level of accuracy at pretest (93 percent).
Two other areas that deserve attention in relation to study abroad and speech production are the role of individual differences and the tasks that have been used to elicit speech data, which can shed light on pronunciation use under controlled and spontaneous speaking conditions (Reference Saito and PlonskySaito & Plonsky, 2019). Turning first to individual differences, research indicates that the gains students make while abroad depend in part on what they do and the attitudes they have during their experience (Reference Solon, Long, Sanz and Morales-FrontSolon & Long, 2018). Among experiential factors, watching television in Spanish, using English less frequently, using Spanish less frequently with English speakers, and use of Spanish outside the classroom have been linked to segmental accuracy (Reference Alvord and ChristiansenAlvord & Christiansen, 2012; Reference Díaz-CamposDíaz-Campos, 2004; Reference StevensStevens, 2011). Cultural integration and motivation also appear to play a role (Reference Alvord and ChristiansenAlvord & Christiansen, 2012). In a study of seventeen English-speaking students learning French in several cities in France, for example, Reference Kennedy TerryKennedy Terry (2017) found that participants’ interactions with native speakers, measured twice via a multilevel social network strength scale, significantly predicted native-like variability in the elision of /l/ in third-person subject pronouns. Unlike Reference Kennedy TerryKennedy Terry’s (2017) study, however, most research to date has polled learners using relatively simple questionnaires administered at a single point in time. Given recent emphasis on developing methodologically rigorous approaches to tracking variation in language use abroad (e.g., Reference DeweyDewey, 2017), including doing so longitudinally over the length of the program, more work is needed on individual differences and L2 phonetics and phonology after study abroad.
In term of speech elicitation techniques, Reference Saito and PlonskySaito and Plonsky (2019) have argued that controlled tasks such as word and passage reading tap into controlled production knowledge, while spontaneous tasks such as picture description provide a window into spontaneous production knowledge, or speakers’ ability to use pronunciation accurately and efficiently while engaged in a communicative task. Study abroad affords learners frequent opportunities for meaningful L2 use, which means that it is perhaps in this context where spontaneous knowledge would be most likely to (begin to) develop. Reference Díaz-Campos, Klee and FaceDíaz-Campos (2006) and Reference Rogers and AlvordRogers and Alvord (2014) compared production on both controlled and spontaneous tasks. In both cases, they found that speakers were more accurate on the conversational task than on the reading task. Interestingly, Reference Díaz-Campos, Klee and FaceDíaz-Campos (2006) additionally reported that the task comparison was only statistically significant for study abroad learners, which provides initial evidence that study abroad may enable learners to automatize and use pronunciation features in open-ended tasks.
24.2.3 Improvement in Foreign Accent as a Result of Study Abroad
Study abroad researchers have also examined the development of global pronunciation features, notably foreign accent. Foreign accent represents the overall level of phonetic accuracy that learners have achieved relative to a native variety of the target language. Echoing results for the production of specific features, findings in this area suggest that learners tend to have a more native-like accent (i.e., less foreign accent) after studying abroad, though learner differences in L2 use appear to modulate gains. Two studies of Catalan-Spanish bilinguals who were studying in English-speaking environments illustrate this trend. Reference Avello, Pérez-Vidal, López-Serrano, Ament and Thomas-WilhelmAvello (2018) examined the differential effects of type of instruction on foreign accent as a result of eighty hours of formal instruction followed by a three-month study abroad program. Modest improvement in learners’ accent – measured before and after each type of instructional period – was found only after the study abroad experience, though the author acknowledged that the holistic ratings used to measure foreign accent failed to capture gains at the segmental level that occurred for some learners throughout instructional contexts. Reference Muñoz and LlanesMuñoz and Llanes (2014) in turn compared children and adults studying at home and abroad. Study abroad learners made greater gains than their at-home counterparts. The authors also found statistically significant correlations (medium effect sizes) between foreign accent gain scores and three measures of L2 use: general speaking, speaking with native speakers, and hours of class time. Interestingly, the adults tended to spend more time speaking with other non-native speakers, which was not a significant predictor of gains in foreign accent. These findings underscore the importance of considering both quantity and quality (e.g., type) of L2 use (Reference MoyerMoyer, 2011).
Reference Martinsen, Alvord and TannerMartinsen, Alvord, and Tanner (2014) reported that English speakers who had spent two years in a Spanish-speaking country had a more native-like accent in the L2. In contrast, Reference Martinsen and AlvordMartinsen and Alvord (2012) did not find any significant group-level improvement for English speakers who spent six weeks in Argentina, though of the thirty-eight participants, nineteen did improve, and attitudes toward others, an attitudinal individual difference measure, was a statistically significant predictor of pronunciation gains. In summary, then, it seems that longer stays abroad, coupled with substantial interaction in the L2 and a positive attitude toward the L2 community, may be necessary to promote gains in global foreign accent.
24.2.4 Perception and Production of Regional Features as a Result of Study Abroad
As study abroad becomes a more common option, and even a requirement in some cases, for students pursuing language studies, universities are establishing partnerships with a wider range of international universities, thus giving students access to increasingly diverse linguistic communities across the globe. Many Spanish learners, for example, now have access to several destinations in Central and South America in addition to traditional programs in Mexico or Spain. Linguistically, this means that learners are more likely to encounter regional phonetic features that do not often appear in textbooks and other teaching materials. What learners do when exposed to these local phonetic features varies widely as a result of both linguistic and social variables, as recent research in the area has shown.
In terms of perception, studies have found that learners can become attuned to local phonetic features even if these features are considered exceptional or highly marked from the point of view of a standard variety. Reference Schmidt, Collentine, García, Lafford and Marcos MarínSchmidt (2009) found that a group of eleven English-speaking learners of Spanish improved their comprehension of several features associated with Caribbean Spanish after three weeks in the Dominican Republic. The features that improved the most were word-final velarization of /n/ and deletion of /d/, whereas comprehension of word-internal rhotic lateralization (e.g., [pal.ke] for parque, “park,” instead of standard [paɾ.ke]) and deletion of /s/ showed less improvement. As we argued earlier, the phonetic environment – specifically, the salience resulting from word-final positions – may account for the more favorable comprehension of some features. Social factors such as attitudes toward the local community also predict gains in the perception of regional features. This is what Reference NicholasNicholas (2018), for instance, found when assessing American students’ perception of three nasal vowels /ɛ̃, ɑ̃, ɔ̃/ typical of Northern Metropolitan (Parisian) French. Increased contact with the language and more positive attitudes toward French and the local community seemed to drive improvements in perception after a semester-long stay.
Studies on the production of local features also attribute gains, or lack thereof, to social and individual factors such as motivation to learn the language, quantity and quality of interactions in the target language, and attitudes toward the local variety. Reference SchmidtSchmidt (2020) studied twenty-four American students who stayed in Buenos Aires, Argentina, and examined their production of the pre-palatal [ʃ] and [ʒ] sounds found in River Plate Spanish (Buenos Aires and Uruguay regions) in phonological contexts where other dialects produce the palatal [ʝ]. The author found a moderate yet statistically significant positive correlation between learners’ desire to imitate Argentinian Spanish and their use of [ʃ] and [ʒ] in post-test tasks.
Another example of a regional feature apparently governed by learners’ attitudes is the interdental fricative phoneme /θ/ of Spanish, which is arguably the most widely studied feature in relation to pronunciation learning during study abroad. This phoneme is used exclusively in Spain (consistently in central and northern areas and variably in the south), corresponds to graphemes <z> (e.g., zapato, “shoe”) and <c> followed by <e> (e.g., cero, “zero”), and forms minimal pairs with /s/, as in /ka.θa/–/ka.sa/ (“hunt”–“house”). The contrast is neutralized in American varieties, which use /s/ in both cases. Every published study on the acquisition of /θ/ during study abroad has reported minimal use of this phoneme after study abroad in Spain (Reference GeorgeGeorge, 2014; Reference KnouseKnouse, 2012; Reference Ringer-HilfingerRinger-Hilfinger, 2012). For example, Knouse (2012) found only 36 productions of /θ/ out of 2,119 possible occurrences (1.7 percent), spread across seven out of fifteen learners who spent six weeks in Salamanca. It is worth noting that all participants in these studies were from the United States and thus might not see the value of learning features of European Spanish because their interactions are more likely to be with speakers of Latin American Spanish. However, even some students with positive attitudes toward this phoneme may never actually produce it, as reported by Reference Ringer-HilfingerRinger-Hilfinger (2012), who collected data on students’ attitudes and motivations. It might be the case, then, that some learners choose not to adopt a phonological contrast of a given dialect simply because they can follow the easier option of neutralizing the contrast and still conform to native standards, albeit of a different variety.
Weakening of /s/ in Spanish appears to be another dialectal feature that is not necessarily constrained by attitudes. More than half of native Spanish speakers produce some type of /s/ weakening, the most common being complete deletion or aspiration (that is, realization as [h]) in coda positions (Reference Lipski and Díaz-CamposLipski, 2011). Unlike /θ/, which is produced categorically by all speakers in dialects that have this phoneme, /s/ weakening is a gradient phonetic process with a high degree of variability, even within the same speaker, and often constrained by sociolinguistic variables such as perceived level of formality. Just like the case of /θ/, however, studies have found that learners normally exhibit very low degrees of /s/ weakening after studying abroad in /s/-weakening areas, as illustrated by L1-English learners who spent six weeks in Argentina (Reference SchmidtSchmidt, 2020) or four months in the Dominican Republic (Reference Linford, Harley and BrownLinford, Harley, & Brown, 2021). A parallel process has been described for /l/ deletion in French, which occurs often in subject pronouns (e.g., il, “he”), variably in certain lexical items, and – as in the case of Spanish /s/ weakening – varies greatly by region and register (deleting the /l/ is linked to an informal style). Reference Howard, Lemée and ReganHoward, Lemée, and Regan (2006) and Reference Regan, Howard and LeméeRegan, Howard, and Lemée (2009) reported that Irish learners of French deleted /l/ substantially more after a year in France, yet still considerably less than native speakers from the area.
Together, the cases of Spanish /θ/, /s/ weakening in Spanish, and /l/ deletion in French illustrate how phonological and phonetic acquisition often operates differently in study abroad, yet this distinction is not always clearly drawn in published research. On the one hand, the phonemic status of /θ/ means that learners will be exposed to this sound predictably and consistently from every speaker in the local community and in every expected phonological context. Whether the sound is actually perceived, and subsequently produced, is dependent on variables related to L1–L2 similarities, articulatory ease, and attitudes, as reviewed earlier. On the other hand, it is very hard to predict to what degree, or if at all, the input learners are exposed to will contain instances of /s/ weakening or /l/ deletion. It is entirely plausible, for example, that these phonetic variants rarely appear in the more formal classroom context where students take classes while abroad. To complicate matters more, even when learners do hear these forms, they need to decipher the complex, sociolinguistic rules that govern these phonetic alternations. Hence, expectations of perception – and future production – of regional variants will vary drastically depending on whether the variant is phonological or phonetic in nature. Researchers should design their studies and interpret their findings with this difference in mind.
Most of the scholarship on regional phonetic and phonological features has focused on segments. A noteworthy exception, Reference TrimbleTrimble (2013) studied intonation in declarative utterances of a group of English-speaking learners who spent a semester in Venezuela and found that, as in the case of segments, learners can acquire regional suprasegmental features even if they are perceived, in relative terms, as highly exceptional.
24.3 Conclusions and Future Directions
Traditionally, study abroad has been viewed as a capstone developmental experience for language learners, especially with respect to pronunciation. Yet, as reviewed in this chapter, findings have not been particularly convincing in this regard. Instead of large and robust effects, most studies point to either no overall effect, considering group means, or substantial variability in the gains that learners achieve after a study abroad experience. Thus, it seems important to reconceptualize study abroad within a longitudinal developmental framework that considers what happens before learners go abroad and what happens after they return to the L2 classroom. Research suggests that articulating study abroad with pronunciation instruction may hold the key to maximizing pronunciation gains. Studies such as Reference LordLord’s (2010) show that formal instruction on phonetic fundamentals can maximize phonetic learning. As numerous scholars have suggested (Reference Mora and Pérez-VidalMora, 2015; Reference Romanelli, Menegotto and SmythRomanelli et al., 2015), the advantage of pronunciation instruction may stem from its ability not only to provide initial input before the experience but, more importantly, to raise learners’ awareness about phonetic features which may otherwise be difficult to notice. This awareness may aid learners in directing their attention to the feature when exposed to native input during study abroad. Importantly, although general language instruction may help learners improve their perception and production of L2 segmentals and suprasegmentals (Reference Mora and Pérez-VidalMora, 2015), targeted pronunciation instruction may be necessary to stimulate development or help learners continue to develop their pronunciation abroad.
The same logic holds after studying abroad, insofar as targeted pronunciation instruction following a study abroad experience could help learners consolidate gains obtained abroad and continue to improve their pronunciation thereafter. There could also be long-term learning as a result of studying abroad even in the absence of instruction. Yet, if researchers adopt a narrow window of focus, using immediate post-tests only, the long-term effects of study abroad on L2 phonetic learning will remain unknown. Given the number of studies highlighting the importance of individual differences in the quantity and quality of L2 use, language attitudes, and so on for pronunciation learning during study abroad, it seems entirely plausible to imagine a scenario in which patterns of L2 use that emerge during study abroad might continue well beyond the study abroad experience (e.g., Reference EngstlerEngstler, 2012). For instance, a learner who goes abroad and begins interacting more frequently in the L2 might seek out additional opportunities for sustained L2 use upon returning to the L1 environment. Thus, it may not be the study abroad experience in and of itself that drives L2 phonetic learning, but rather the potential that experience offers for catalyzing the types of behaviors that are known to have a positive impact on pronunciation development. Thus, we strongly argue that a holistic and complete approach to understanding L2 phonetic and phonological development during study abroad needs to take into account the before, during, and after of study abroad stays (see, e.g., Reference Huensch and Tracy-VenturaHuensch & Tracy-Ventura, 2017; Reference McManus, Mitchell and Tracy-VenturaMcManus, Mitchell, & Tracy-Ventura, 2021). At the same time, we recognize that robust, longitudinal methodologies such as those we propose here are challenging. As a result, researchers may wish to consider multisite collaboration and multicohort studies (collecting data from similar learner samples over consecutive years), both of which could make larger-scale longitudinal research more feasible.
We also recommend expanding both the target constructs considered in this area and the means by which they are measured. First, as outlined previously, there has been very little work on study abroad and speech perception, and all work to date has focused on accuracy. However, study abroad could result in more efficient L2 perception, which could be reflected via faster reaction times or better accuracy in suboptimal listening conditions, such as listening in noise. Similarly, researchers have relied on individual and group means to index gains in production accuracy, even though one sign of development could be the destabilization of previous production patterns, as might be evidenced by greater variability after study abroad. Furthermore, if study abroad helps disrupt nontarget-like production patterns, then targeted pronunciation instruction after study abroad could help learners master the correct ones. We therefore suggest that researchers consider production ranges and standard deviations in addition to production means.
From a methodological point of view, given that the quantity and the quality of L2 input and use play a pivotal role in L2 phonetic learning, we recommend adopting more methodologically robust and longitudinally sensitive measures of individual differences. That is, instead of polling learners on their patterns of language use once during or after the study abroad experience, we suggest it would be more fruitful to track such variables longitudinally. Doing so will provide a more comprehensive and, as a result, more accurate picture of overall language use during study abroad. It also allows for a range of analyses related to how changes in language use predict gains in L2 phonetic and phonological accuracy. Current research, by default, adopts a static view of these relationships, examining whether language use predicts gains in accuracy. However, a more compelling and dynamic research question is whether changes in language use and attitudes toward the host community are related to changes in pronunciation (see, e.g., Reference Tullock, Sanz and Morales-FrontTullock, 2018). Such an analysis rests on testing the degree of correlation between the slope of language use and the slope of pronunciation change. Triangulating measures would also prove beneficial given that most studies to date have relied on self-reports.
Last but not least, we believe that research on phonetic and phonological learning as a result of studying abroad is in a prime position to add to the theoretical discussion on what variables trigger and sustain L2 linguistic development. As we stated already, existing theories of pronunciation learning such as the SLM were conceptualized for acquisition in naturalistic contexts and thus may fall short of explaining the more nuanced and complex learning in a study abroad context, which we take as a particular case of instructed L2 acquisition. In this sense, models and theories of L2 development will need to look at external variables such as amount and quality of input during study abroad, integrate learner-internal variables such as motivation and attitudes toward the host community, and increase the range and timing of data collection points.
25.1 Introduction
Heritage speakers are a type of early bilinguals whose home language differs from the societally dominant language. Heritage speakers’ home language (i.e., heritage language) is a minority language that is acquired naturalistically in a bilingual or multilingual environment. Some examples of heritage languages include diasporic languages spoken by children of immigrants, aboriginal or indigenous languages whose linguistic status has been jeopardized by colonizing languages, and historical minority languages that have coexisted with other standard languages (Reference Montrul, Polinsky, Montrul and PolinskyMontrul & Polinsky, 2021; Reference RothmanRothman, 2009). Due to the minority status of heritage languages, their use is generally limited to interactions with family and/or community members. Outside these social domains, heritage speakers mainly use the majority language of the society. While heritage speakers typically grow up becoming more dominant in the majority language than the heritage language (Reference Montrul, Polinsky, Montrul and PolinskyMontrul & Polinsky, 2021), they are highly heterogeneous in many aspects (e.g., amount of heritage language use, proficiency, access to formal education, speech community size), demonstrating variable degrees of knowledge in their heritage language (Reference Montrul, Polinsky, Montrul and PolinskyMontrul & Polinsky, 2021; Reference Polinsky and KaganPolinsky & Kagan, 2007).
Heritage bilinguals provide unique scenarios that are rarely observed in other bilingual populations (e.g., asymmetry between language exposure and use, dominance shift from heritage to majority language, acquisition of diasporic varieties), which helps expand our understanding of bilingual phonetics and phonology. Over the past few decades, research on heritage language phonetics and phonology has grown tremendously. Although heritage speakers’ phonetics and phonology appear to be quite robust compared to other domains of language (Reference Polinsky and ScontrasPolinsky & Scontras, 2020), heritage speakers are often considered to have an accent that is distinct from homeland varieties and some of their speech and perceptual behaviors differ from those of nonheritage native speakers.
This chapter discusses findings in heritage language phonetics and phonology. The remainder of the chapter is organized as follows. In Section 25.2, I review studies examining heritage speakers’ global accent and factors contributing to perceived heritage accent. Then, I present areas of divergence that have been found in the production (Section 25.3) and perception (Section 25.4) of heritage language segments and prosody. Lastly, in Section 25.5, I synthesize the findings, discussing common patterns observed in heritage language phonetics and phonology, and suggest areas for future research.
25.2 Heritage Accent
Research on heritage accent has consistently shown that heritage speakers sound less native-like than nonheritage native speakers and more native-like than late L2 learners (Reference Au, Knightly, Jun and OhAu et al., 2002; Reference Au, Oh, Knightly, Jun and RomoAu et al., 2008; Reference Flores and RatoFlores & Rato, 2016; Reference Knightly, Jun, Oh and AuKnightly et al., 2003; Reference Kupisch, Barton and KlaschikKupisch et al., 2014; Reference Kupisch, Lloyd-Smith, Stangen and BayranKupisch, Lloyd-Smith, & Stangen, 2020; Reference KupischLloyd-Smith, Einfeldt, & Kupisch, 2020; Reference Oh, Jun, Knightly and AuOh et al., 2003). Moreover, heritage speakers’ speech is evaluated with more variable accent ratings and with higher degrees of uncertainty than the speech of homeland speakers and late L2 learners (Reference Chang and YaoChang & Yao, 2016; Reference Flores and RatoFlores & Rato, 2016; Reference KanKan, 2021; Reference Kupisch, Barton and KlaschikKupisch et al., 2014, Reference Kupisch, Lloyd-Smith, Stangen and Bayran2020; Reference Lloyd-Smith, Einfeldt and KupischLloyd-Smith et al., 2020; Reference Stangen, Kupisch, Proietti Erguen, Zielke and PeukertStangen et al., 2015). Nevertheless, it is undeniable that early exposure to heritage language has a positive effect on heritage speakers’ perceived nativeness later in life.
The role of heritage language use has frequently been examined to account for heritage speakers’ variable perceived nativeness (Reference Au, Oh, Knightly, Jun and RomoAu et al., 2008; Reference Flores and RatoFlores & Rato, 2016; Reference Kupisch, Barton and KlaschikKupisch et al., 2014, Reference Kupisch, Lloyd-Smith, Stangen and Bayran2020; Reference Lloyd-Smith, Einfeldt and KupischLloyd-Smith et al., 2020; Reference Oh, Jun, Knightly and AuOh et al., 2003). Kupisch and colleagues found a positive correlation between the composite score of factors associated with heritage language use (e.g., heritage language use at home, quality of heritage language use, time spent in the homeland) and perceived nativeness in heritage Turkish (Reference Kupisch, Lloyd-Smith, Stangen and BayranKupisch et al., 2020) and heritage Italian in Germany (Reference Lloyd-Smith, Einfeldt and KupischLloyd-Smith et al., 2020). Findings in heritage accent research converge in that heritage speakers’ childhood language experience is key to predicting their accent later in life. Reference Au, Oh, Knightly, Jun and RomoAu et al. (2008) and Reference Oh, Jun, Knightly and AuOh et al. (2003) demonstrated that, apart from being exposed to the heritage language early on, there is an additional benefit of regular heritage language production during childhood. According to Reference Oh, Jun, Knightly and AuOh et al. (2003), it is the quality of spoken heritage language, not just the quantity of it, that predicts perceived nativeness in the heritage language later in life. Indeed, having the experience of living in the homeland during childhood and/or adolescence is found to be a more powerful predictor of heritage speakers’ perceived nativeness than remigrating to the homeland as adults (Reference Flores and RatoFlores & Rato, 2016; Reference Kupisch, Barton and KlaschikKupisch et al., 2014). In other words, immersion in the homeland variety is beneficial for heritage speakers’ perceived nativeness, but its impact is much stronger if it is done at an earlier age.
To examine the role of individual phonetic features in the perception of heritage accent, Reference Lein, Kupisch and van de WeijerLein, Kupisch, & van de Weijer (2016) examined the correlation between global accent and the voice onset time (VOT) of voiceless stops in German and French produced by French heritage speakers in Germany and German heritage speakers in France. They did not find any systematic relationship between VOT and perceived nativeness. Similar patterns have been attested in heritage Spanish (Reference Au, Knightly, Jun and OhAu et al., 2002) and heritage Korean in the USA (Reference Oh, Jun, Knightly and AuOh et al., 2003). That is, listeners may be attending to features other than VOT when evaluating heritage speakers’ speech.
Focusing on prosody, Reference KanKan (2020) examined the perceived nativeness in Cantonese by Cantonese heritage speakers in the USA. She compared various suprasegmental features (e.g., tonal space and distance, speech rate, pausing behavior) in the speech of child heritage speakers with the highest native accent scores and those with the lowest scores. Results showed that children with low native accent scores produced smaller tonal space and distance between level tones, spoke more slowly, and produced more frequent and longer pauses, which suggests that these suprasegmental variables are associated with perceived heritage accent in Cantonese. Using low-pass filtered passages, Reference ShinShin (2005) found that heritage speakers are identified as Hankwukin (“Korean”) with higher accuracy when they are presented next to late L2 learners than when they are presented alone. These findings suggest that raters easily distinguish heritage speakers from late learners, but have more difficulty judging the nativeness of heritage speakers’ speech. To the best of my knowledge, Reference Wrembel, Marecka, Szewczyk and OtwinowskaWrembel et al. (2019) is the only study that examined the effects of both segmental and suprasegmental properties on heritage accent. Reference Wrembel, Marecka, Szewczyk and OtwinowskaWrembel et al. (2019) created a diagnostic list of atypical speech patterns of Polish vowels, consonants, and prosody produced by Polish heritage speakers in the UK. Among these categories, only prosody (i.e., incorrect number of syllables, incorrect stress pattern) significantly predicted heritage speakers’ accent in Polish. These findings suggest that prosody has an important role in perceived heritage accent.
25.3 Speech Production
25.3.1 Segments
Heritage accent is perceived when some aspects of heritage speakers’ speech diverge from the homeland variety. Thus, identifying areas of divergence would help characterize heritage accent. In the production of heritage language segments, a great deal of research has examined the phonetic realization of individual phonemes. In some cases, heritage language segments approach perceptually similar sounds in the majority language, as attested in the following examples: (1) front vowel lowering/retraction in Cantonese (Reference TseTse, 2019), Armenian (Reference GodsonGodson, 2004), Farsi (Reference SheikhbahaieSheikhbahaie, 2020), and Spanish (Reference Grijalva, Piccinini and ArvanitiGrijalva, Piccinini, & Arvaniti, 2013) in English-speaking countries; (2) back vowel fronting in Spanish (Reference Grijalva, Piccinini and ArvanitiGrijalva et al., 2013) and Mandarin (Reference Chang, Yao, Haynes and RhodesChang et al., 2011) in the USA; (3) VOT-shortening in voiceless stops and prevoicing in voiced stops in German in the Netherlands (Reference Stoehr, Benders, Van Hell and FikkertStoehr et al., 2018); (4) VOT-lengthening in voiceless stops in Japanese (Reference Harada, Solé, Recasens and RomeroHarada, 2003), Russian, and Ukrainian (Reference NagyNagy, 2015; Reference Hrycyna, Lapinskaya, Kochetov and NagyHrycyna et al., 2011) in English-speaking countries; and (5) approximantization of tap/flap in Norwegian (Reference NatvigNatvig, 2022) and Spanish (Reference Kim and Repiso-PuigdelliuraKim & Repiso-Puigdelliura, 2020) in the USA. Despite extensive evidence of assimilation to the majority language, it is important to note that heritage language segments are rarely replaced with majority language segments (but compare with Reference Trovato, Lopes, de Avelar and CyrinoTrovato [2017] for evidence of English-like labiodental production of /b/ that is orthographically represented as <v> in heritage Spanish in the USA).
Contrary to assimilation, heritage speakers sometimes produce heritage language categories even further away from those of the majority language than nonheritage native speakers would, in order to enhance cross-linguistic phonetic distinction (Quechua: Reference GuionGuion, 2003; Spanish: Reference Cummings Ruiz, Calhoun, Escudero, Tabain and WarrenCummings Ruiz, 2019; Reference KimKim, 2011). Heritage speakers also show variability in the production of articulatorily complex sounds (Dutch: Reference SimonSimon, 2010; Italian: Reference MacKay, Flege, Piske and SchirruMacKay et al., 2001; Spanish: Reference KimKim, 2011; Reference Repiso-Puigdelliura and KimRepiso-Puigdelliura & Kim, 2021) and produce dialectal variants that are present in their input (Reference Mazzaro and González de AndaMazzaro & González de Anda, 2020; Reference O’Rourke and PotowskiO’Rourke & Potowski, 2016).
Variability in heritage language segments is conditioned by multiple between-subjects factors, such as age of exposure to the majority language relative to the heritage language (Reference AmengualAmengual, 2019), heritage language proficiency (Reference Muxika LoitzateMuxika Loitzate, 2021; Reference SheaShea, 2019), heritage language exposure and/or use (Reference Chang, Yao, Haynes and RhodesChang et al., 2011; Reference Kim and Repiso-PuigdelliuraKim & Repiso-Puigdelliura, 2020; Reference SheaShea, 2019), speech community size (Reference MorrisMorris, 2021), and ethnic orientation (Reference Hrycyna, Lapinskaya, Kochetov and NagyHrycyna et al., 2011), as well as within-subjects factors, such as orthography (Reference Repiso-Puigdelliura, Benvenuti and KimRepiso-Puigdelliura, Benvenuti, & Kim, 2021) and cognate status (Reference AmengualAmengual, 2012; Reference Muxika LoitzateMuxika Loitzate, 2021).
With regard to the production of phonologically contrastive segments, heritage speakers are able to use various phonetic cues (e.g., duration, spectral properties) to keep apart contrasts in heritage language vowels (Arabic: Reference SaadahSaadah, 2011; Mandarin: Reference Chang, Haynes, Rhodes and YaoChang et al., 2009, Reference Chang, Yao, Haynes and Rhodes2011; Reference Yang, Fox and JacewiczYang, Fox, & Jacewicz, 2015; Japanese: Reference Nagano, Sperbeck, Mizoguchi and ChoiNagano et al., 2018) and consonants (Arabic: Reference Alkhudidi, Stevenson and RafatAlkhudidi, Stevenson, & Rafat, 2020; Farsi: Reference Rafat, Mohaghegh and SevensonRafat, Mohaghegh, & Sevenson, 2017; Italian: Reference Einfeldt, van de Weijer and KupischEinfeldt, van de Weijer, & Kupisch, 2019; Japanese: Reference Nagano, Sperbeck, Mizoguchi and ChoiNagano et al., 2018; Mandarin: Reference Chang, Haynes, Rhodes and YaoChang et al., 2009, Reference Chang, Yao, Haynes and Rhodes2011; Spanish: Reference AmengualAmengual, 2016; Reference HenriksenHenriksen, 2015; Reference KisslingKissling, 2018), even when the majority language does not exhibit parallel contrasts.
Among various phonemic contrasts, a great deal of research has examined whether stop contrasts are maintained in heritage language settings. Heritage speakers of languages with two stop laryngeal categories (e.g., voiced-voiceless, unaspirated-aspirated) make the distinction primarily by means of VOT, which is the same cue that is used by nonheritage native speakers (Arabic: Reference KhattabKhattab, 2000; Dutch: Reference SimonSimon, 2010; German: Reference Stoehr, Benders, Van Hell and FikkertStoehr et al., 2018; French: Reference Mack and NeldeMack, 1990; Reference Sundara, Polka and BaumSundara, Polka, & Baum, 2006; Mandarin: Reference Chang, Yao, Haynes and RhodesChang et al., 2011; Spanish: Reference KimKim, 2011; Reference Knightly, Jun, Oh and AuKnightly et al., 2003; Reference Magloire and GreenMagloire & Green, 1999; Western Armenian: Reference Kelly and KeshishianKelly & Keshishian, 2021). In situations where the heritage language has a more complex system than the majority language, heritage speakers reliably use VOT and additional cues (e.g., f0 of the following vowel, breathiness) like nonheritage native speakers (Korean: Reference Chang and MandockChang & Mandock, 2019; Reference ChengCheng, 2019; Reference Kang and GuionKang & Guion, 2006; Reference Kang and NagyKang & Nagy, 2016; Sylheti: Reference Mayr and SiddikaMayr & Siddika, 2018). For example, Korean heritage speakers in English-speaking countries successfully differentiate the Korean fortis-lenis-aspirated stop contrast using VOT and f0 of the following vowel (Reference Chang and MandockChang & Mandock, 2019; Reference ChengCheng, 2019; Reference Kang and GuionKang & Guion, 2006; Reference Kang and NagyKang & Nagy, 2016; but compare with Reference Oh, Jun, Knightly and AuOh et al. [2003] in which heritage speakers who did not regularly speak Korean during childhood fail to distinguish tense from plain/aspirated stops). Similarly, Reference Mayr and SiddikaMayr and Siddika (2018) found that child and adult heritage speakers of Sylheti in the UK are able to distinguish the Sylheti voiced breathy, voiced nonbreathy, and voiceless unaspirated stops by means of VOT and breathiness, although some heritage speakers, especially third-generation children, do not reliably use breathiness to distinguish voiced breathy stops from nonbreathy stops.
Heritage speakers are also able to keep their two sound systems separate by making both language-internal phonological and cross-linguistic phonetic distinctions. Reference Chang, Haynes, Rhodes and YaoChang et al. (2009, Reference Chang, Yao, Haynes and Rhodes2011) found that Mandarin heritage speakers in the USA are able not only to distinguish the contrasts between front and back high rounded vowels, unaspirated and aspirated stops, and retroflex and alveolo-palatal fricatives in Mandarin but also to differentiate them from the closest English sounds (i.e., back rounded vowel, voiced and voiceless stops, and palato-alveolar fricatives, respectively). Cross-linguistic phonetic distinctions between heritage and majority language phones have been well-attested in various sound pairs, such as unaspirated versus aspirated voiceless stops (Arabic: Reference KhattabKhattab, 2000; Dutch: Reference SimonSimon, 2010; French: Reference Sundara, Polka and BaumSundara et al., 2006; Reference Lein, Kupisch and van de WeijerLein et al., 2016; Reference Mack and NeldeMack, 1990; Japanese: Reference Harada, Solé, Recasens and RomeroHarada, 2003; Spanish: Reference KimKim, 2011; Reference Knightly, Jun, Oh and AuKnightly et al., 2003; Reference Magloire and GreenMagloire & Green, 1999; Reference Muxika LoitzateMuxika Loitzate, 2021; Sylheti: Reference Mayr and SiddikaMayr & Siddika, 2018), more versus less aspirated voiceless stops (Korean: Reference Kang and GuionKang & Guion, 2006), light [l] versus dark [ɫ] (Spanish: Reference AmengualAmengual, 2018; Reference Barlow, Branson and NipBarlow, Branson, & Nip, 2013), and less versus more fronted back vowels (Korean: Reference ChengCheng, 2021; Mandarin: Reference Yang, Fox and JacewiczYang et al., 2015; Spanish: Reference Cummings Ruiz, Calhoun, Escudero, Tabain and WarrenCummings Ruiz, 2019; Reference Grijalva, Piccinini and ArvanitiGrijalva et al., 2013).
Although heritage languages display robust phonemic contrasts, they often diverge from nonheritage native varieties in the phonetic implementation of the contrasts. Heritage speakers tend to produce phonemic contrasts with smaller phonetic distance than nonheritage native varieties, as demonstrated in the short-long vowel contrast in heritage Arabic in the USA (Reference SaadahSaadah, 2011) and in the singleton-geminate contrast in heritage Arabic (Reference Alkhudidi, Stevenson and RafatAlkhudidi et al., 2020) and heritage Farsi in Canada (Reference Rafat, Mohaghegh and SevensonRafat et al., 2017, but compare with Reference Einfeldt, van de Weijer and KupischEinfeldt et al. [2019], which demonstrates a lack of difference from nonheritage native speakers in the production of singleton-geminate contrasts in heritage Italian in Germany).
In some cases, heritage speakers produce one of the phonemes differently from nonheritage native speakers, while maintaining the contrast. For instance, heritage speakers of voicing languages often produce voiced stops with short-lag VOTs as they would in the majority language, instead of with prevoicing (Arabic: Reference KhattabKhattab, 2000; Dutch: Reference SimonSimon, 2010; Italian: Reference MacKay, Flege, Piske and SchirruMacKay et al., 2001; Spanish: Reference KimKim, 2011; Sylheti: Reference Mayr and SiddikaMayr & Siddika, 2018; compare with Reference Magloire and GreenMagloire and Green [1999] and Reference Sundara, Polka and BaumSundara et al. [2006] for evidence of systematic cross-linguistic distinction between prevoiced and unaspirated voiced stops). Reference Mayr and SiddikaMayr and Siddika (2018) attribute this to aerodynamic challenges in prevoicing, which is accomplished through complex articulatory maneuver (e.g., tongue root advancement, modification of linguopalatal contact, loosening of the velopharyngeal port) (Reference Cho, Whalen and DochertyCho, Whalen, & Docherty, 2019). Like prevoicing, the Spanish trill is an articulatorily complex sound and studies have shown that Spanish heritage speakers in the USA differentiate the trill from the tap primarily through segmental duration, rather than the number of apico-alveolar occlusions (Reference AmengualAmengual, 2016; Reference Cummings Ruiz and MontrulCummings Ruiz & Montrul, 2020; Reference HenriksenHenriksen, 2015; Reference KisslingKissling, 2018). According to Reference Repiso-Puigdelliura and KimRepiso-Puigdelliura and Kim (2021), the development of the heritage Spanish trill occurs in the order of single occlusion → frication → multiple occlusions, without abandoning the variants of earlier stages. This leads to increased variability in heritage Spanish trill productions, which aligns with Reference KupischKupisch’s (2020) argument that heritage speakers exploit language-inherent variation to avoid markedness.
Heritage speakers also maintain subphonemic contrasts by applying heritage language phonological rules, which has been attested in vowel pharyngealization after an emphatic element in heritage Arabic in the USA (Reference SaadahSaadah, 2011), /o/-fronting and raising in unstressed condition in heritage Russian in Israel (Reference Asherov, Fishman and CohenAsherov, Fishman, & Cohen, 2016), /ø/-lowering and retraction before /r/ (Reference Strandberg, Gooskens and SchüppertStrandberg, Gooskens, & Schüppert, 2021) and vowel length distinction before consonants in heritage Swedish in Finland (Reference Helgason, Ringen and SuomiHelgason, Ringen, & Suomi, 2013), and coda /s/ voicing assimilation (Reference Boomershine, Stevens and Núñez-MéndezBoomershine & Stevens, 2021) and voiced stop lenition in heritage Spanish in the USA (Reference AmengualAmengual, 2019; Reference Au, Knightly, Jun and OhAu et al., 2002; Reference Blair and LeaseBlair & Lease, 2021; Reference Knightly, Jun, Oh and AuKnightly et al., 2003; Reference RaoRao, 2014, Reference Rao2015; Reference Rao, Fuchs, Polinsky and ParraRao et al., 2020; Reference Ronquest, Michonowicz, Wilbanks, Cortés, Morales-Front, Ferreira, Leow and SanzRonquest et al., 2020).
Perhaps the most studied heritage language phonological rule is the stop lenition in heritage Spanish in the USA (Reference AmengualAmengual, 2019; Reference Au, Knightly, Jun and OhAu et al., 2002; Reference Blair and LeaseBlair & Lease, 2021; Reference Knightly, Jun, Oh and AuKnightly et al., 2003; Reference RaoRao, 2014, Reference Rao2015; Reference Rao, Fuchs, Polinsky and ParraRao et al., 2020; Reference Ronquest, Michonowicz, Wilbanks, Cortés, Morales-Front, Ferreira, Leow and SanzRonquest et al., 2020). In Spanish, syllable-initial voiced stops have two allophonic variants (i.e., stops and approximants) in complementary distribution, whose difference mainly lies in the degree of lenition; approximants are more lenited than stops. Studies have shown that heritage speakers consistently apply the Spanish stop lenition rule in the appropriate context (Reference AmengualAmengual, 2019; Reference Au, Knightly, Jun and OhAu et al., 2002; Reference Blair and LeaseBlair & Lease, 2021; Reference Knightly, Jun, Oh and AuKnightly et al., 2003; Reference RaoRao, 2014, Reference Rao2015; Reference Ronquest, Michonowicz, Wilbanks, Cortés, Morales-Front, Ferreira, Leow and SanzRonquest et al., 2020). While heritage speakers with more English experience display less Spanish lenition (Reference AmengualAmengual, 2019; Reference Blair and LeaseBlair & Lease, 2021; Reference RaoRao, 2014, Reference Rao2015; Reference Rao, Fuchs, Polinsky and ParraRao et al., 2020, but compare with Reference Ronquest, Michonowicz, Wilbanks, Cortés, Morales-Front, Ferreira, Leow and SanzRonquest et al. [2020] in which the heritage speakers are comparable to long-term immigrants), they seldom substitute the approximant variant with the stop variant (Reference Au, Knightly, Jun and OhAu et al., 2002; Reference Blair and LeaseBlair & Lease, 2021; Reference Knightly, Jun, Oh and AuKnightly et al., 2003; Reference RaoRao, 2014, Reference Rao2015), unless they are of fourth generation with little exposure to Spanish growing up (Reference Blair and LeaseBlair & Lease, 2021). In other words, despite reduced distance between allophonic variants conditioned by multiple factors (e.g., speech style, language experience), a complete loss of allophony is rare for heritage speakers who were regularly exposed to the heritage language during childhood (compare with Reference Asherov, Fishman and CohenAsherov et al. [2016] for evidence of successful application of /o/-raising in heritage Russian in Israel in real words, but not in nonce words, which suggests rote-learning or lexicalization, rather than robust phonological process).
Contrary to these findings, heritage speakers sometimes display enhanced allophonic distinctions. For instance, Reference Łyskawa, Maddeaux, Melara and NagyŁyskawa et al. (2016) found that Polish heritage speakers in Canada, especially those who code-switch more, devoice word-final obstruents more than long-term immigrants from Poland and homeland speakers. Moreover, the highest devoicing rates are found in contexts where devoicing is favored in both Polish and English (i.e., before pauses and voiceless obstruents). As code-switching activates heritage speakers’ two languages, Reference Łyskawa, Maddeaux, Melara and NagyŁyskawa et al. (2016) argued that it provides a context in which convergence occurs between the mechanisms of the two languages.
In phonological rules that are present in the majority language, but not in the heritage language, findings suggest that it is unlikely that majority language phonological rules are transposed into the heritage language (Reference AmengualAmengual, 2018; Reference Barlow, Branson and NipBarlow et al., 2013; Reference Kirkham and McCarthyKirkham & McCarthy, 2021). Heritage speakers of Spanish (Reference AmengualAmengual, 2018; Reference Barlow, Branson and NipBarlow et al., 2013) and Sylheti in the USA (Reference Kirkham and McCarthyKirkham & McCarthy, 2021) do not velarize coda-/l/ in their heritage language like in English, although they may demonstrate variability in the phonetic distance between onset and coda /l/ (Reference AmengualAmengual, 2018; Reference Barlow, Branson and NipBarlow et al., 2013). A great deal of research on the influence from majority language phonological rules has been conducted on stress-induced vowel quality variation in heritage Spanish in the USA (Reference Alvord and RogersAlvord & Rogers, 2014; Reference Elias, McKinnon and Milla-MuñozElias, McKinnon, & Milla-Muñoz, 2017; Reference ReadyReady, 2020; Reference Ronquest, Howe, Blackwell and Lubbers QuesadaRonquest, 2013; Reference Solon, Knarvik and DeClerckSolon, Knarvik, & DeClerck, 2019). These studies demonstrated that heritage speakers reduce unstressed vowels, exhibiting a smaller vowel space compared to stressed vowels (Reference Alvord and RogersAlvord & Rogers, 2014; Reference Elias, McKinnon and Milla-MuñozElias et al., 2017; Reference ReadyReady, 2020; Reference Ronquest, Howe, Blackwell and Lubbers QuesadaRonquest, 2013; Reference Solon, Knarvik and DeClerckSolon et al., 2019, but compare with Reference Kim, Willis, Butragueño and ZendejasKim [2015], which demonstrates a lack of systematic vowel quality variation in stress minimal pairs). The extent of vowel reduction in heritage Spanish varies by multiple factors, such as language mode (Reference Elias, McKinnon and Milla-MuñozElias et al., 2017) and cognate status (Reference ReadyReady, 2020), which suggests that more English experience enhances this process. However, when compared with long-term immigrants and/or Spanish monolinguals, Spanish heritage speakers are comparable to these speakers (Reference Alvord and RogersAlvord & Rogers, 2014; Reference ReadyReady, 2020; Reference Solon, Knarvik and DeClerckSolon et al., 2019). Moreover, given that heritage speakers do not categorically change the vowel quality to the extent of English-like vowel centralization (Reference Alvord and RogersAlvord & Rogers, 2014; Reference WillisWillis, 2005), English influence, if any, is likely to occur in a gradient fashion.
Unrelated to the majority language, heritage speakers may show divergent patterns from the homeland varieties due to contact with other heritage language dialects (Reference Williams, Kerswill, Foulkes and DochertyWilliams & Kerswill, 1999). When heritage speakers’ dialect spoken at home differs from the majority heritage language dialect of the community or the dialect of social prestige, they tend to accommodate toward that dialect. For instance, heritage speakers of Caribbean Spanish and Salvadoran Spanish in the USA exhibit positional allophony specific to their home dialect (e.g., coda /s/ reduction, final /n/ velarization) less frequently than homeland speakers or first-generation speakers, particularly recent arrivals (Reference Aaron, Hernández, Potowski and CameronAaron & Hernández, 2007; Reference Erker, Reffel and Núñez-MéndezErker & Reffel, 2021; Reference Hernández and Ortiz-LópezHernández, 2011; Reference O’Rourke and PotowskiO’Rourke & Potowski, 2016). Interestingly, dialect-specific allophony in heritage languages is observed more in in-group exchanges than when the interlocutors speak a different dialect (Reference Hernández and Ortiz-LópezHernández, 2011; Reference O’Rourke and PotowskiO’Rourke & Potowski, 2016). In other words, heritage speakers are able to adjust their speech style depending on the context, which indicates that, despite a potential leveling in the long run as a consequence of contact with other heritage language dialects, dialect-specific phonological patterns persist in heritage grammars (Reference Erker, Reffel and Núñez-MéndezErker & Reffel, 2021; Reference Hernández and Ortiz-LópezHernández, 2011; Reference O’Rourke and PotowskiO’Rourke & Potowski, 2016).
25.3.2 Prosody
Prosody, that is, the rhythm, tone, stress, and intonation of speech, has not received as much attention in heritage language research as segments. Research on heritage language rhythm has explored the variability in the duration of vocalic and consonantal intervals of heritage bilinguals who speak two languages that belong to different rhythmic classes (Reference Carter and GessCarter, 2005; Reference Carter and WolfordCarter & Wolford, 2016; Reference Robles-PuenteRobles-Puente, 2019a; Reference Ronquest, Michonowicz, Wilbanks, Cortés, Morales-Front, Ferreira, Leow and SanzRonquest et al., 2020; Reference YakelYakel, 2018). According to Reference Grabe, Low, Gussenhoven and WarnerGrabe and Low (2002), so-called stress-timed languages (e.g., English, Dutch, German) tend to exhibit greater durational variability than syllable-timed (e.g., Spanish, French) and mora-timed languages (e.g., Japanese). Spanish heritage speakers in the USA tend to demonstrate larger differences between the duration of successive pairs of vocalic/consonantal intervals (i.e., greater durational variability) than Spanish monolinguals (Reference Carter and GessCarter, 2005; Reference Carter and WolfordCarter & Wolford, 2016; Reference Robles-PuenteRobles-Puente, 2019a; Reference Ronquest, Michonowicz, Wilbanks, Cortés, Morales-Front, Ferreira, Leow and SanzRonquest et al., 2020; Reference YakelYakel, 2018), although there is evidence that more monolingual-like patterns are observed among heritage speakers of older immigrant generations (Reference Carter and WolfordCarter & Wolford, 2016), heritage speakers who are younger in age (Reference Robles-PuenteRobles-Puente, 2019a), and heritage speakers who are more proficient in Spanish (Reference YakelYakel, 2018). That is, despite some signs of influence from English to Spanish, heritage speakers are generally able to distinguish the rhythm between their two languages.
Similarly, heritage speakers of tone languages maintain tonal contrasts in the heritage language, even when tone is not phonologically contrastive in the majority language (Reference Chang and YaoChang & Yao, 2016; Reference KanKan, 2020; Reference Lan and MokLan & Mok, 2020; Reference SoSo, 2000). For instance, Cantonese heritage speakers in Canada display six Cantonese tones with similar pitch contours to those of homeland speakers. However, they produce them with reduced tonal distinctions (Reference KanKan, 2020; Reference Lan and MokLan & Mok, 2020; Reference SoSo, 2000). Reduced tonal space is associated with lower perceived native-likeness (Reference KanKan, 2020) and it is more evident in contour tones, particularly the low–high rising tone pair, than level tones (Reference Lan and MokLan & Mok, 2020; Reference SoSo, 2000). Reference Chang and YaoChang and Yao (2016) found similar results in heritage Mandarin. They showed that, like long-term immigrants from Mainland China or Taiwan, Mandarin heritage speakers in the USA distinctly produce the four Mandarin tones, but they show high variability in their turning points in the low falling-rising tone. Heritage speakers’ production of this tone exhibits more monolingual-like patterns in multisyllabic than in monosyllabic items and in nonfinal than in final positions. This finding indicates that heritage speakers perform better in connected speech than in isolated and clear speech which occurs more in formal registers (Reference Chang and YaoChang & Yao, 2016).
With regard to lexical stress, research on heritage Spanish has demonstrated that, like nonheritage native speakers, Spanish heritage speakers in the USA are sensitive to syllable weight when assigning stress in Spanish (Reference Shelton and GrantShelton & Grant, 2018) and primarily use duration to mark stressed syllables (Reference Elias, McKinnon and Milla-MuñozElias et al., 2017; Reference Kim, Willis, Butragueño and ZendejasKim, 2015, Reference Kim2020; Reference RodríguezRodríguez, 2021; Reference Ronquest, Howe, Blackwell and Lubbers QuesadaRonquest, 2013; Reference Solon, Knarvik and DeClerckSolon et al., 2019). However, some heritage speakers do not further demonstrate an effect of the type of syllable structure (i.e., heavy syllable with rising versus falling diphthongs) like Spanish monolinguals do (Reference Shelton and GrantShelton & Grant, 2018) or sometimes show a large overlap in duration and other suprasegmental cues (e.g., pitch, intensity) between stress contrasts (Reference Kim, Willis, Butragueño and ZendejasKim, 2015, Reference Kim2020; but compare with Reference RodríguezRodríguez [2021], which shows a clear duration-based distinction between paroxytones and oxytones by heritage speakers with more Spanish experience growing up).
For prosody above the word level, heritage speakers use intonation patterns of their heritage language to distinguish different sentence types (Reference AlvordAlvord, 2010a, Reference Alvord2010b; Reference DehéDehé, 2018; Reference QueenQueen, 2001; Reference Rao and Pascual y CaboRao, 2016; Reference Zuban, Rathcke and ZerbianZuban, Rathcke, & Zerbian, 2020). Reference QueenQueen (2001) showed that Turkish heritage speakers in Germany categorically mark yes-no questions in Turkish by aligning the bitonal falling pitch accent with the syllable preceding the question particle -mi, which is realized as a clear spike toward the upper limit of the speaker’s pitch range, followed by an immediate fall. Reference Zuban, Rathcke and ZerbianZuban et al. (2020) also demonstrated that Russian heritage speakers in Germany and in the USA exhibit language-specific patterns when producing yes-no questions in Russian, such as the bitonal rising nuclear pitch accent on the verb, but they tend to do so with a higher pitch range than Russian monolinguals. Moreover, while the heritage speakers align with the monolinguals in their use of the low boundary tone when producing longer phrases (i.e., subject–verb–object condition), they exhibit different patterns depending on the majority language when producing shorter phrases (i.e., subject–verb condition); the heritage speakers in Germany mainly use the truncated high boundary tone, as expected in both Russian and German, whereas the heritage speakers in the USA mainly use the low boundary tone, possibly due to compression of pitch movement, typically found in English (Reference Zuban, Rathcke and ZerbianZuban et al., 2020). These findings suggest that heritage speakers mark yes-no questions using the same intonation patterns as homeland speakers, but sometimes show divergence in the phonetic implementation of those patterns.
There are also cases in which heritage speakers display a mix of the intonation patterns of their two languages. Reference DehéDehé (2018) found that Icelandic heritage speakers in Canada not only use falling contours when producing yes-no questions, as expected in Icelandic, but also use English-like rising contours. Heritage speakers of Cuban Spanish (Reference AlvordAlvord, 2010a, Reference Alvord2010b) and Mexican Spanish in the USA (Reference Rao and Pascual y CaboRao, 2016) employ dialect-specific patterns to distinguish yes-no questions from statements, using heightened pitch range and bitonal rising boundary tone, respectively. However, they additionally exhibit patterns not attested in the homeland varieties, possibly due to cross-linguistic/dialectal transfer (Reference AlvordAlvord, 2010b) or emergence of innovative configurations resulting from fusion of heritage speakers’ prosodic systems (Reference Rao and Pascual y CaboRao, 2016).
Mixed patterns are also observed in statements. In the production of high rising terminal in declarative sentences, Spanish heritage speakers in the USA (Reference Repiso-PuigdelliuraKim & Repiso-Puigdelliura, 2021) and Turkish heritage speakers in Germany (Reference QueenQueen, 2001, Reference Queen2012) demonstrate intonation contours reminiscent of the ones in their two languages and show divergence from homeland speakers in the relative frequency and the phonetic realization of the contours. Likewise, French heritage speakers (Reference BullockBullock, 2009) and Spanish heritage speakers in the USA (Reference KimKim, 2019) prosodically mark focus through prominence in situ with a prosodic boundary, like homeland speakers do, but also produce focus in situ without intonational phrasing, which is mainly observed in English.
With regard to pitch accents in declarative sentences, studies have shown that Spanish heritage speakers in the USA produce English-like high-level prenuclear pitch accent in declarative sentences (Reference Colantoni, Cuza, Mazzaro, Vanrell, Armstrong and HenriksenColantoni, Cuza, & Mazzaro, 2016; Reference Robles-PuenteRobles-Puente, 2019b), although in the majority of the cases they produce the rising prenuclear pitch accent with the f0 peak displaced to a post-tonic syllable, which is the main Spanish pitch accent type in this prosodic context (Reference AlvordAlvord, 2010a; Reference Colantoni, Cuza, Mazzaro, Vanrell, Armstrong and HenriksenColantoni et al., 2016; Reference KimKim, 2020; Reference Robles-PuenteRobles-Puente, 2019b; Reference Zárate-SándezZárate-Sández, 2015). Moreover, Spanish heritage speakers tend to display earlier peak alignment when compared to Spanish monolinguals (Reference KimKim, 2020; Reference Zárate-SándezZárate-Sández, 2015), while they are comparable to long-term immigrants (Reference AlvordAlvord, 2010a; Reference Colantoni, Cuza, Mazzaro, Vanrell, Armstrong and HenriksenColantoni et al., 2016) only in spontaneous speech. Similarly, Reference Mennen, Chousi, Klessa, Bachan, Wagner, Karpiński and ŚledzińskiMennen and Chousi (2018) found that Greek heritage speakers in Austria do not produce early prenuclear rises to the extent that Greek monolinguals do, but their onset of prenuclear rises does not differ from that of long-term immigrants. With regard to heritage language phonotactics, Reference Repiso-PuigdelliuraRepiso-Puigdelliura (2021) showed that Spanish heritage speakers in the USA utilize phonotactic strategies of both Spanish (i.e., resyllabification) and English (i.e., glottal stop insertion) to repair word-external empty onsets (e.g., el#oso “the (male) bear” > [e.ˈlo.so]~[el.ˈʔo.so]). However, while they diverge from homeland speakers, they perform similarly as long-term immigrants.
The findings in heritage language speech production exhibit similar patterns in that phonological contrasts are generally well maintained and divergence from the monolingual norms occurs primarily at the phonetic level. However, heritage speakers sometimes present a mix of both heritage language and majority language categories when producing heritage language prosody and phonotactics, which is rarely observed in segments.
25.4 Speech Perception
25.4.1 Segments
Compared to speech production, heritage language speech perception has received much less attention. The main focus in this area has been on heritage speakers’ ability to perceptually distinguish phonologically contrastive speech sounds. In the perception of segments, a great deal of research has examined heritage language stop contrasts. Studies have demonstrated that Spanish heritage speakers in the USA rely on prevoicing as much as nonheritage native speakers to distinguish the Spanish voiced from voiceless stops (Reference KimKim, 2011; Reference Mazzaro, Cuza, Colantoni, Tortora, den Dikken, Montoya and O’NeillMazzaro, Cuza, & Colantoni, 2016), even though production research has demonstrated variability in the implementation of this cue (Reference KimKim, 2011; Reference Magloire and GreenMagloire & Green, 1999). Similarly, Korean heritage speakers in the USA are comparable to nonheritage native speakers at distinguishing the Korean three-way stop laryngeal contrasts (Reference Cheon and LeeCheon & Lee, 2013; Reference Oh, Jun, Knightly and AuOh et al., 2003; Reference Seo, Dmitrieva and CuzaSeo, Dmitrieva, & Cuza, 2022) and at identifying the place of articulation of word-final unreleased stops based on coarticulatory cues (Reference ChangChang, 2016). Interestingly, Reference ChangChang (2016) found that, when perceiving unreleased stops in English, heritage speakers are even better than English monolinguals, which suggests that heritage language knowledge can be applied to the majority language, if it is found useful. Successful distinction of heritage Korean stop contrasts has been found for heritage speakers with low Korean use (Reference Oh, Jun, Knightly and AuOh et al., 2003; Reference Seo, Dmitrieva and CuzaSeo et al., 2022) and low proficiency (Reference ChangChang, 2016; Reference Cheon and LeeCheon & Lee, 2013, but compare with Reference Seo, Dmitrieva and CuzaSeo et al. [2022] in which heritage speakers’ oral fluency was predictive of their perceptual accuracy), implying a lasting effect of early heritage language exposure on the perception of heritage language phonemic contrasts.
However, that is not to say that all phonemic contrasts are equally stable in heritage grammars. Reference Ahn, Chang, DeKeyser and Lee-EllisAhn et al. (2017) demonstrated that Korean heritage speakers in the USA successfully distinguish the Korean /n/-/l/ contrast, but they have greater difficulty in distinguishing tense-lax contrasts (i.e., /t/-/t*/ and /s/-/s*/) than nonheritage native speakers, especially for those who immigrated to the USA at an earlier age. In other words, with more experience in the majority language, phonemic contrasts that the majority language does not have (i.e., /t/-/t*/ and /s/-/s*/ contrasts) are more vulnerable to change than contrasts that already exist in the majority language (i.e., /n/-/l/ contrast). Perception of heritage language phonemic contrasts may also be more challenging if they are in a position where the distinction is less acoustically salient, which has been demonstrated in Spanish /e/-/i/ and /o/-/u/ contrasts in unstressed syllables (Reference Mazzaro, Cuza, Colantoni, Tortora, den Dikken, Montoya and O’NeillMazzaro et al., 2016) and in Russian hard-soft consonant contrasts in word-final position (Reference Lukyanchenko, Gor, Danis, Mesh and SungLukyanchenko & Gor, 2011). Indeed, Reference Blasingame and BradlowBlasingame and Bradlow (2020) found that, compared to nonheritage native speakers, Spanish heritage speakers in the USA have more difficulty recognizing the final word of a sentence if it is presented in a higher level of noise, in less clear speech, and in semantically less predictable contexts. According to Reference Lukyanchenko, Gor, Danis, Mesh and SungLukyanchenko and Gor (2011), heritage speakers’ attenuated sensitivity may occur as a consequence of two distinct processes: readjustment of heritage language phonological representations with massive exposure to the majority language or underspecified phonological representations.
25.4.2 Prosody
The perception of heritage language prosody has been even more understudied than segments. Studies on Cantonese tones have demonstrated that Cantonese heritage speakers distinguish tonal contrasts with lower accuracy than homeland speakers, particularly the low–high rising tone pair (Reference Kan and SchmidKan & Schmid, 2019; Reference SoSo, 2000), consistent with the findings of the production studies (Reference Lan and MokLan & Mok, 2020; Reference SoSo, 2000). Reference Kan and SchmidKan and Schmid (2019) attribute this to the quality of heritage speakers’ input, which may exhibit the ongoing merger of this tone pair observed in various Cantonese varieties. With regard to lexical stress, despite variability found in the production of stress correlates (Reference Kim, Willis, Butragueño and ZendejasKim, 2015, Reference Kim2020), perceptual studies have shown that Spanish heritage speakers in the USA are able to distinguish different stress patterns like nonheritage native speakers (Reference Kim, Willis, Butragueño and ZendejasKim, 2015, Reference Kim2020; Reference RodríguezRodríguez, 2021, but see Reference OrtínOrtín [2022] in which heritage speakers, particularly those with more English experience, demonstrate lower perceptual accuracy in conditions with higher cognitive demands). Similarly, Reference Zárate-SándezZárate-Sández (2015) found that Spanish heritage speakers categorically perceive the distinction between nonemphatic and emphatic speech styles based on the same cut-off point (i.e., prenuclear peak at stressed syllable offset) as nonheritage native speakers, although divergence in production has been found in prenuclear peak alignment (Reference Colantoni, Cuza, Mazzaro, Vanrell, Armstrong and HenriksenColantoni et al. [2016] in read speech; Reference KimKim, 2020; Reference Zárate-SándezZárate-Sández, 2015).
Perceptual studies have also demonstrated heritage speakers’ knowledge of heritage language phonotactics. Reference BeatonBeaton (2020a, Reference Beaton2020b) show that Spanish heritage speakers in the USA have monolingual-like syllabification intuition when identifying Spanish sequences as diphthongs and hiatuses (compare with Reference Shelton, Counselman and PalmaShelton, Counselman, and Palma [2017] for evidence supporting potential transfer from English phonotactic patterns among low proficiency heritage speakers). Reference Carlson, Goldrick, Blasingame and FinkCarlson et al. (2016) also demonstrate that Spanish heritage speakers in the USA detect an illusory vowel /e/ in word-initial /s/-consonant clusters, which is a vowel that is used in Spanish to repair illicit #sC sequences (e.g., snob > esnob). However, English-dominant speakers demonstrate weaker perceptual repair effects than Spanish monolinguals. Given that #sC sequences are licit in English, Reference Carlson, Goldrick, Blasingame and FinkCarlson et al. (2016, p. 939) argued that “conflicting phonotactic systems can jointly influence bilinguals’ perceptual repair of the acoustic signal in the more restrictive language.”
The findings in heritage language speech perception show that heritage speakers successfully distinguish categories that are phonologically contrastive, as found in production research. However, some contrasts may be more challenging to distinguish than others, depending on heritage speakers’ language learning experience and task difficulty, suggesting that early exposure to heritage language does not guarantee target-like heritage language speech perception.
25.5 General Discussion and Conclusion
Heritage speakers demonstrate an accent that is distinct from that of homeland speakers to varying degrees. There are numerous phonetic properties that could individually or collectively contribute to heritage accent (e.g., vowels, consonants, intonation, speech rate, frequency and length of pauses, length of utterance). Thus, in order to understand the characteristics of heritage accent, it is important to identify the phonetic features that mark heritage accent more strongly than others. While some studies have demonstrated that heritage speakers’ prosody is responsible for their perceived heritage accent (Reference KanKan, 2020; Reference ShinShin, 2005; Reference Wrembel, Marecka, Szewczyk and OtwinowskaWrembel et al., 2019), the relative importance of segments and prosody has been severely understudied. In L2 foreign accent research, the interplay between segments and prosody has been extensively examined using resynthesized stimuli through prosody transplantation, in which native prosody is transplanted into nonnative segments and nonnative prosody is transplanted into native segments (Reference Boula de Mareüil and Vieru-DimulescuBoula de Mareüil & Vieru-Dimulescu, 2006; Reference Sereno, Lammers and JongmanSereno, Lammers, & Jongman, 2016; Reference Ulbrich and MennenUlbrich & Mennen, 2016). To the best of my knowledge, no study has examined the perception of heritage accent using this method. Future research should implement various techniques to tease apart the role of segments and prosody in perceived heritage accent.
Understanding areas of divergence in heritage language speech sounds would also help characterize heritage accent. Findings in heritage language segments indicate that phonemic contrasts are generally stable in heritage languages; heritage speakers reliably use relevant cues to distinguish phonemic contrasts (Reference Chang and MandockChang & Mandock, 2019; Reference ChengCheng, 2019; Reference Kang and GuionKang & Guion, 2006; Reference Kang and NagyKang & Nagy, 2016; Reference Mayr and SiddikaMayr & Siddika, 2018) and, in some cases, even display cross-linguistic phonetic distinctions (Reference AmengualAmengual, 2018; Reference Barlow, Branson and NipBarlow et al., 2013; Reference Chang, Haynes, Rhodes and YaoChang et al., 2009, Reference Chang, Yao, Haynes and Rhodes2011; Reference Cummings Ruiz, Calhoun, Escudero, Tabain and WarrenCummings Ruiz, 2019; Reference KhattabKhattab, 2000; Reference Sundara, Polka and BaumSundara et al., 2006; Reference Yang, Fox and JacewiczYang et al., 2015). Heritage speakers also apply the phonological rules of their heritage languages in appropriate contexts (Reference AmengualAmengual, 2019; Reference Asherov, Fishman and CohenAsherov et al., 2016; Reference Boomershine, Stevens and Núñez-MéndezBoomershine & Stevens, 2021; Reference Helgason, Ringen and SuomiHelgason et al., 2013; Reference SaadahSaadah, 2011; Reference Strandberg, Gooskens and SchüppertStrandberg et al., 2021) and maintain dialect-specific allophony of their homeland varieties (Reference Hernández and Ortiz-LópezHernández, 2011; Reference O’Rourke and PotowskiO’Rourke & Potowski, 2016).
Divergence from nonheritage native speakers mainly occurs in a gradient fashion with reduced/enhanced distance between phonemic contrasts (Reference Alkhudidi, Stevenson and RafatAlkhudidi et al., 2020; Reference SaadahSaadah, 2011; Reference Rafat, Mohaghegh and SevensonRafat et al., 2017) and allophonic variants (Reference AmengualAmengual, 2018; Reference Barlow, Branson and NipBarlow et al., 2013; Reference Helgason, Ringen and SuomiHelgason et al., 2013; Reference Łyskawa, Maddeaux, Melara and NagyŁyskawa et al., 2016; Reference RaoRao, 2014, Reference Rao2015; Reference Rao, Fuchs, Polinsky and ParraRao et al., 2020), or when producing phonological categories that are articulatorily challenging (Reference Mayr and SiddikaMayr & Siddika, 2018; Reference Repiso-Puigdelliura and KimRepiso-Puigdelliura & Kim, 2021). In other words, despite variability in the phonetic realization of heritage language segments, heritage speakers’ core phonological system, which involves phonemic contrasts and allophony, appears to remain stable (Reference NatvigNatvig, 2022), provided that regular heritage language exposure was available during childhood (compare with Reference Asherov, Fishman and CohenAsherov et al., 2016; Reference Blair and LeaseBlair & Lease, 2021; Reference Oh, Jun, Knightly and AuOh et al., 2003).
Like in segments, prosodic contrasts are generally stable in heritage languages and divergence from nonheritage native speakers is mainly observed in the distance between prosodic categories (Reference KanKan, 2020; Reference Lan and MokLan & Mok, 2020; Reference SoSo, 2000) and in their use of relevant acoustic correlates (Reference Chang and YaoChang & Yao, 2016; Reference Kim, Willis, Butragueño and ZendejasKim, 2015, Reference Kim2020; Reference Mennen, Chousi, Klessa, Bachan, Wagner, Karpiński and ŚledzińskiMennen & Chousi, 2018; Reference Zárate-SándezZárate-Sández, 2015). However, apart from the phonetic realization of prosodic categories, heritage speakers also show divergence from nonheritage native speakers by mixing in majority language categories when producing questions (Reference AlvordAlvord, 2010a, Reference Alvord2010b; Reference DehéDehé, 2018; Reference QueenQueen, 2001; Reference Rao and Pascual y CaboRao, 2016; Reference Zuban, Rathcke and ZerbianZuban et al., 2020) and statements (Reference Colantoni, Cuza, Mazzaro, Vanrell, Armstrong and HenriksenColantoni et al., 2016; Reference Repiso-PuigdelliuraKim & Repiso-Puigdelliura, 2021; Reference QueenQueen, 2001, Reference Queen2012; Reference Robles-PuenteRobles-Puente, 2019b), and by utilizing the strategies of their two languages for discourse functions (Reference BullockBullock, 2009; Reference KimKim, 2019). Such a mixture of cross-linguistic categories has rarely been observed in heritage language segments. Prosody plays a central role in human communication, given that it is present in every spoken utterance. Thus, heritage speakers’ flexible use of intonation patterns indicates that they extract resources from their two language systems to expand their communicative means (Reference BullockBullock, 2009; Reference QueenQueen, 2012).
While relatively little research has been conducted in heritage language prosody, compared to segments, it appears that heritage speakers’ prosody exhibits more variability than their segments. However, variability, or divergence from homeland varieties, in a single area does not always lead to perceived heritage accent. In order to understand the characteristics of heritage accent, it is important to employ both bottom-up and top-down approaches. That is, phonetic features contributing to heritage accent can be narrowed down by first identifying areas of divergence and then examining whether divergence in those areas leads to perceived heritage accent (i.e., bottom-up approach), but it can also be done by conducting holistic judgment on resynthesized speech that is designed to tease apart the effects of individual phonetic features (i.e., top-down approach).
In most heritage accent research, heritage speakers’ speech is generally evaluated by homeland speakers (Reference Flores and RatoFlores & Rato, 2016; Reference Kupisch, Barton and KlaschikKupisch et al., 2014, Reference Kupisch2020; Reference Lloyd-Smith, Einfeldt and KupischLloyd-Smith et al., 2020; Reference Stangen, Kupisch, Proietti Erguen, Zielke and PeukertStangen et al., 2015). With homeland speakers as raters, it is possible to examine whether heritage speakers have an accent that is distinct from homeland varieties. However, having a distinct accent is not equivalent to having a nonnative accent. Production studies that include both homeland speakers and long-term immigrants as baselines have demonstrated that heritage speakers tend to behave more similarly to long-term immigrants than homeland speakers (Reference AlvordAlvord, 2010a; Reference Colantoni, Cuza, Mazzaro, Vanrell, Armstrong and HenriksenColantoni et al., 2016; Reference Mennen, Chousi, Klessa, Bachan, Wagner, Karpiński and ŚledzińskiMennen & Chousi, 2018; Reference Repiso-PuigdelliuraRepiso-Puigdelliura, 2021; Reference Ronquest, Michonowicz, Wilbanks, Cortés, Morales-Front, Ferreira, Leow and SanzRonquest et al., 2020), which suggests the possibility that heritage speakers have acquired a linguistic variety that is distinct from homeland varieties. Thus, beyond the existence of heritage accent, future research should examine the nativeness of heritage accent by including members of heritage speakers’ speech community as raters.
In heritage language research, speech perception is severely understudied, compared to speech production. Given that most research focuses on areas in which heritage speakers diverge from homeland speakers, rather than areas in which they perform similarly (Reference Polinsky and ScontrasPolinsky & Scontras, 2020), the lack of perceptual research may be an indirect sign of better performance in this speech modality. Studies that examined both heritage speakers’ speech and perceptual behaviors (Reference Kim, Willis, Butragueño and ZendejasKim, 2015, Reference Kim2020; Reference Oh, Jun, Knightly and AuOh et al., 2003; Reference Zárate-SándezZárate-Sández, 2015) showed that heritage speakers are closer to the nonheritage native baseline in their perception than in their production of heritage language speech sounds. Target-like speech motor control, especially that which involves complex coordination of multiple articulators, requires extensive amounts of practice. Thus, in heritage language settings where varying degrees of receptive use of heritage languages are found (Reference Hurtado and VegaHurtado & Vega, 2004; Reference PotowskiPotowski, 2004), more variability is likely to occur in the production than in the perception of speech sounds. Nevertheless, it is important to note that perceptual knowledge in heritage languages is also prone to change. Heritage speakers, especially those with limited heritage language experience, tend to have more difficulty in distinguishing phonological contrasts that are less acoustically salient (Reference Kan and SchmidKan & Schmid, 2019; Reference Lukyanchenko, Gor, Danis, Mesh and SungLukyanchenko & Gor, 2011; Reference Mazzaro, Cuza, Colantoni, Tortora, den Dikken, Montoya and O’NeillMazzaro et al., 2016; Reference SoSo, 2000) and more cognitively demanding (Reference Ahn, Chang, DeKeyser and Lee-EllisAhn et al., 2017; Reference OrtínOrtín, 2022). These findings imply that heritage speakers’ phonological representations may not be as robust as those of speakers who grew up monolingually in the heritage language (Reference Lukyanchenko, Gor, Danis, Mesh and SungLukyanchenko & Gor, 2011). To better understand the relationship between heritage language speech perception and production, future research should examine heritage speakers’ knowledge of speech sounds of various complexities in both modalities.
Among multiple factors that contribute to heritage language outcomes, heritage language use during childhood has been proposed as the main predictor of heritage speakers’ linguistic behaviors later in life (Reference Au, Oh, Knightly, Jun and RomoAu et al., 2008; Reference Flores and RatoFlores & Rato, 2016; Reference Kupisch, Barton and KlaschikKupisch et al., 2014; Reference Oh, Jun, Knightly and AuOh et al., 2003; Reference Stangen, Kupisch, Proietti Erguen, Zielke and PeukertStangen et al., 2015). Unlike adults, children’s phonological grammars are still developing and inhibition of cross-linguistic influence may be more taxing for children than for adults (Reference LleóLleó, 2018; Reference Repiso-PuigdelliuraRepiso-Puigdelliura, 2021). Compared to adult heritage speakers, child heritage speakers’ speech and perceptual behaviors are less understood (Reference KanKan, 2020; Reference KhattabKhattab, 2000; Reference Kirkham and McCarthyKirkham & McCarthy, 2021; Reference LleóLleó, 2018; Reference Mack and NeldeMack, 1990; Reference Mayr and SiddikaMayr & Siddika, 2018; Reference Repiso-PuigdelliuraRepiso-Puigdelliura, 2021; Reference RodríguezRodríguez, 2021; Reference SimonSimon, 2010; Reference Wrembel, Marecka, Szewczyk and OtwinowskaWrembel et al., 2019; Reference Yang, Fox and JacewiczYang et al., 2015), and there are few studies that directly compare children with adults (Reference Mayr and SiddikaMayr & Siddika, 2018; Reference Repiso-PuigdelliuraRepiso-Puigdelliura, 2021; Reference RodríguezRodríguez, 2021). Thus, future research should examine the development of heritage language speech sounds through longitudinal research and by comparing heritage speakers of different age groups.
In this chapter, I reviewed studies on various heritage languages and suggested areas for future research in heritage language phonetics and phonology. I would like to conclude by emphasizing the importance of examining heritage speakers’ other language(s). Heritage speakers are also found to demonstrate divergent patterns from monolingual speakers of the majority language (Reference Carter, López Valdez and SimsCarter, López Valdez, & Sims, 2020; Reference Hall-LewHall-Lew, 2009; Reference Santa Ana, Bayley, Kortmann and SchneiderSanta Ana & Bayley, 2008), in some cases, as additional linguistic resource to mark social meanings (e.g., affiliation with their ethnic group) (Reference Cheshire, Kerswill, Fox and TorgersenCheshire et al., 2011). Examining heritage speakers’ two languages will help us understand the role of cross-linguistic influence in heritage speakers’ speech and perceptual patterns.
26.1 Introduction
Although various definitions exist, for the purposes of this chapter we understand indigenous languages as predominantly minority languages spoken by linguistically distinct and often vulnerable ethnic groups, autochthonous to a specific region of the world, and found in diglossias with majority international languages resulting from colonization. These can range from relatively well-known indigenous languages, such as Quechua, Basque, or Hawaiian, to lesser-known languages that might be severely endangered and exclusively oral. In this sense, indigenous language bilingualism, as defined here, is usually small-scale and involves speaking at least one minority indigenous language and at least one majority international language, thus being a step towards a seemingly inevitable language shift and in some cases an eventual indigenous language disappearance. As such, languages such as Hindi in India that are also indigenous to the area of the world in which they are primarily spoken are not considered in this chapter as they maintain large numbers of native and monolingual speakers and political power. We acknowledge that many of the effects described in this chapter may also be found among other types of bilingual populations, for example, heritage speakers, non-indigenous regional/minority languages, and so on, and that there may be some overlap among these populations. Nonetheless, a loss or a change of a feature in one of the indigenous languages may result in said feature being lost or changed among every single speaker of that language. Thus, the dynamic and asymmetrical character of indigenous bilingualism stemming from colonization, along with the vast number of language combinations and the speaker community size differences between the members of these language pairs, sets it apart from other types of bilingualism considered in this book. As such, the aim of this chapter is not necessarily theoretical but to highlight these bilingual speakers that tend to be socially and scientifically marginalized.
According to Ethnologue (Reference Eberhard, Simons and FennigEberhard, Simons, & Fennig, 2024), there are still more than 7,000 languages in the world. Out of these, more than 2,300 are spoken in Asia (32 percent of the total), 2,100 in Africa (30 percent), 1,300 in the Pacific (18 percent), and about 1,000 in the Americas (15 percent). Although there are fewer than 300 European languages (4 percent), more than 1.7 billion people speak them around the world, out of which 1.5 billion are speakers of English. The sharp contrast between the number of European languages and the number of their speakers worldwide can be attributed, at least in part, to the colonization of numerous world territories, mostly by European countries, which happened within modern history. As a result, the languages of the colonizers became the majority languages of the colonized societies, whereas the local languages turned into minority languages, oftentimes accompanied by rapid decline and disappearance. In these settings, grammatically simplified but phonologically distinct pidgins and creoles have sometimes emerged, as well as grammatically complex mixed languages (a few examples are given in this chapter). However, each situation is different. Therefore, when studying indigenous bilingualism, it is crucial to take into account the historical and current situations of the minority and majority language, since indigenous bilingualism and even diglossia of previous generations of speakers can have a profound effect on the production of today’s bilinguals and even majority language monolinguals (Reference Singer and VaughanSinger & Vaughan, 2018).
In terms of linguistics, these historical and current events have led to an overall deficiency of literature on the indigenous languages examined in this chapter. Although some languages and language families have received attention from linguists, they pale in comparison to the existing literature on global languages such as Spanish and English. Many of the studies that exist indeed do an exceptional job of documenting indigenous languages, but some fail to account for bilingualism. As many indigenous languages spoken today no longer have monolingual speakers, a consideration of their bilingualism is crucial for a better understanding of these situations. Furthermore, although the effects of language contact and bilingualism affect both languages, it is usually the majority language that affects the minority language more (Reference MontrulMontrul, 2012). Nevertheless, as seen in this chapter, there tend to be more studies that document the effect of an indigenous minority language on a majority language than vice versa; however, to primarily focus on majority languages in these bilingual contexts can lead to a fragmented understanding of indigenous language bilingualism (Reference GrosjeanGrosjean, 2008). Thus, whenever possible, we pay special attention to studies in which both languages of the bilinguals are analyzed as they present a more holistic view of bilingualism and a better understanding of how the phonetics and phonology of both languages of these Indigenous language bilinguals may or may not affect each other.
The present chapter reviews available information on the phonetics and phonology of Indigenous language bilinguals published in the last few decades, focusing on both of the bilinguals’ languages and the interplay between their phonological systems as well as the phonetic realizations of the sounds present in their languages. Published research from different fields of linguistics, such as sociolinguistics and psycholinguistics, is considered in this review. This chapter is by no means comprehensive. Instead, we have tried to represent different geographical areas and typologically diverse languages. There are, of course, some indigenous languages that are well documented in terms of phonetics and phonology and other indigenous languages in which other linguistic features of bilingualism with a majority language are relatively well studied. However, as specific research on the phonetics and phonologies of bilinguals who speak those particular indigenous languages may be scarce, such indigenous languages fall outside of the scope of the present chapter (Reference FrancisFrancis, 2006). Figure 26.1 shows the approximate locations where the indigenous languages that are mentioned in the present chapter are spoken and where the studies highlighted here took place. Majority languages are not listed on the map. Studies on segmental phonology are covered in Section 26.2 and suprasegmentals are discussed in Section 26.3. Finally, indigenous languages throughout this chapter are referred to with both their name and the letter from Figure 26.1 that denotes their geographical locations, for example, AmazighⓀ.

Figure 26.1 The approximate geographical location of the indigenous languages (language family) mentioned in this chapter. The letters do not denote any linguistic relation between languages, as distinct languages that are in geographical proximity are represented with the same letter.
26.2 Segmentals
In terms of segmental phonology, different phoneme inventories of vowels and consonants that the Indigenous bilinguals possess and the way the phonological systems of the majority and the minority languages may interact with each other are examined here. Both historically and in present-day bilingual speakers, the phonological systems of the majority languages can relate to those of the indigenous languages in several ways: the indigenous language can influence the phonology of the majority language and vice versa. While this interaction can go both ways at the same time, the influence may also be evident in only one direction. In other words, there is the possibility of phonological system convergence and/or single phonological system expansion, common when speakers of a marginalized language acquire the dominant language (see Reference Myers-ScottonMyers-Scotton, 2002). Finally, there is also a possibility that some Indigenous bilinguals might count on two separate phonological systems: one for the indigenous language and another for the majority language, and that these systems do not interact with each other, either on the phonetic or the phonological level, and neither in production nor in perception. These scenarios are explored in the following subsections. For definitions of different types of bilinguals explored in this chapter, for example early, late, simultaneous, and so on, see Reference GrosjeanGrosjean (2008).
26.2.1 Vowels
As for the vowel systems of Indigenous language bilinguals, two examples of majority languages are considered in this section: Spanish and English. The vowel systems of Spanish tend to consist of five distinctive phonemes, /a e i o u/, independently of the Spanish dialect spoken. These correspond to three levels of height and three levels of backness. In contrast, the vowel system of English typically contains a larger number of phonemes, the precise number of which depends on the English dialect. For example, there are thirteen monophthong vowels in Standard American English altogether: /i ɪ e ɛ æ ə ɚ ʌ u ʊ o ɔ ɑ/. These include tense and lax vowels (a feature which can translate into phonetic vowel length distinction), as well as full and reduced vowels. On top of these monophthongs, both English and Spanish have several diphthongs. However, here we focus only on the monophthongal vowel sounds of these languages since there are few studies on Indigenous bilinguals’ diphthongs (e.g. Reference Onosson and StewartOnosson & Stewart, 2021; Reference Watson, Maclagan, King, Harlow and KeeganWatson et al., 2016).
As previously stated, Indigenous bilinguals’ language influence on their vowels can go both ways: from the majority language to the indigenous language or vice versa, with the different vowel systems’ acoustic space either reducing or expanding, the vowels aligning or reorganizing, or the base of articulationFootnote 1changing as a result of bilingualism. Let us exemplify this relationship for several language pairs reported in the literature. QuichuaⒺ (Ecuador) has a vowel system consisting of three vowel phonemes, /a i u/, which has been studied in relation to Spanish. In her seminal study, Reference GuionGuion (2003) recruited Quichua-Spanish bilinguals with different ages of Spanish acquisition and analyzed their vowel production in both languages. The results of the production data analysis suggested that the two phonetic systems of bilinguals can influence each other in such a way that not only does the first language influence the second language but also that the second language influences the first, depending on the age of acquisition of the majority language. On the one hand, bilinguals who learned both languages simultaneously from birth produced different vowels for Quichua and for Spanish that were very similar to the production of monolingual speakers of these languages. On the other hand, some of the later Quichua-Spanish bilinguals did not acquire the five-vowel system of Spanish but instead used their three-vowel system of Quichua for both of their languages; that is to say, they produced both Spanish /i/ and /e/ as Quichua /i/ and both Spanish /u/ and /o/ as Quichua /u/. Early sequential bilinguals, however, were seen to produce a monolingual-like Spanish vowel system with their native Quichua vowel production influenced by their second language (L2) Spanish vowels, with their Quichua /u/ gravitating towards either Spanish /u/ or /o/ and their Quichua /i/ produced very similarly to Spanish /i/, unlike in simultaneous bilinguals who maintained the Quichua /i/–Spanish /i/ difference. Guion attributed these findings to dispersion theory (Reference Lindblom, Ohala and JaegerLindblom, 1986) and stated that plasticity might be retained for the phonetic systems even after the first language of the bilingual has been well established. Nevertheless, her study is also a prime example of how both languages of Indigenous language bilinguals should always be considered in the analysis of their production as well as in other studied aspects of their bilingual experience, not assuming that either one of the languages will be always identical to the monolingual speakers’ production.
For Indigenous bilinguals who learn the majority language later in life, like the late Quichua-Spanish bilinguals from Guion’s study, the influence of the first language (L1) on their L2 vowel system is expected, since their L1 phonological system is already well established and influences the acquisition of the L2. For example, Reference O’Rourke and Ortega-LlebariaO’Rourke (2010) observed differences between early and late QuechuaⒻ-Spanish bilinguals’ Spanish vowels, which at the same time differed from monolingual Spanish speakers from the same area of Peru. The author suggests that these speakers have different bases of articulation for their entire vowel system, which depends on whether they are bilinguals or monolinguals. Nevertheless, cross-language vowel influence has also been reported for simultaneous bilinguals who are dominant in the majority non-indigenous language, that is, speakers who were raised with both languages from birth, but show less proficiency and/or use of the indigenous language. Among Spanish-dominant simultaneous bilingual speakers of Spanish and one of two distinct dialects of K’ichee’Ⓓ with phonetically distinct high vowels, /i/ and /u/ are more centralized in Cantel K’ichee’, with only six vowels, than in Zunil K’ichee’, which has nine vowels. Spanish vowels produced by Guatemalan Spanish monolinguals were not the same as those produced by K’ichee’-Spanish bilinguals. Specifically, the two bilingual groups produced Spanish high vowels in the same acoustic space as the dialect-specific high vowels they produced in their K’ichee’. As /a/ was in the same acoustic space for all speakers in this study, it was not proposed that they have different bases of articulation for each language (Reference Baird and RaoBaird, 2020). As for the difference between these simultaneous K’ichee’-Spanish bilinguals and the simultaneous bilinguals from Reference GuionGuion’s (2003) study, Reference Baird and RaoBaird (2020) suggests that, following dispersion theory (Reference Lindblom, Ohala and JaegerLindblom, 1986), it is the absence of mid vowels in the three-vowel system of Quichua that permits the Quichua high vowels /i/ and /u/ to be phonetically lower than in Spanish and still maintain all three phonemic distinctions between Quichua /a/, /i/, and /u/, whereas the vowel systems of the two different varieties of K’ichee’ and Spanish all have larger inventories than Quichua and thus need to be more dispersed.
In light of the historical context of MapudungunⒽ-Spanish contact and bilingualism, Reference SadowskySadowsky (2020) examined Spanish vowels produced by monolingual speakers in Chile, and compared them with those of Mapudungun, an indigenous language with six vowels: /a e i o u ɨ/. Apart from the fact that Chilean Spanish has five vowels, both the quality of the vowels and the size of the vowel system in the acoustic space were very similar between Chilean Spanish and Mapudungun, but differed from those of other varieties of Spanish. Specifically, the Chilean Spanish vowel system occupies a more centralized acoustic space compared to other dialects of Spanish, whose vowels tend to be peripheral and seem to use all the acoustic space that is available. This may be a direct influence from Mapudungun on Chilean Spanish, given the fact that the Mapudungun vowel system occupies a similar, strongly mid-centralized space. Sadowsky suggests that the Chilean Spanish vowel system was likely reorganized under the influence of Mapudungun and that this change most probably occurred during the last few centuries of close language contact, when Chilean children were exposed to the language spoken by their Indigenous mothers and caretakers.
Both Reference Baird and RaoBaird (2020) and Reference SadowskySadowsky (2020) bring us to consider the necessity of taking into account the particular dialect of both the majority language and the indigenous language spoken by Indigenous bilinguals when investigating their abilities in either of their languages: these may already have changed under the influence of the previous generations of speakers who lived in bilingual settings. In other words, marginalized languages can be highly influential on the majority language, creating new dialects of the majority language over time.
After reviewing how indigenous language vowels influence those of majority languages in bilingual production and how the degree of influence may depend on the age of L2 acquisition, we now focus on the situation in which the influence goes from the majority language to the indigenous language. In an intergenerational production study, Reference Watson, Maclagan, King, Harlow and KeeganWatson et al. (2016) studied longitudinal sound change in MāoriⓊ and New Zealand English vowels. Reference Watson, Maclagan, King, Harlow and KeeganWatson et al. (2016) combine a historical approach with the study of current bilingualism, since it analyzes both archival and recent recordings of native Māori speakers with L2 English, as well as those of present-day simultaneous bilingual speakers of New Zealand English and Māori. In this study, recordings from nineteen historical elders, twenty-two present-day elders, and twenty-one present-day young speakers were analyzed in order to study their vowel spaces. Māori has a five-vowel system consisting of /a e i o u/, with a historical phonemic length distinction that has mostly been lost for all vowels except /a/ and /aː/ (Reference Harlow, Keegan, King, Maclagan, Watson, Stanford and PrestonHarlow et al., 2009; Reference Maclagan, Harlow, King, Keegan, Watson., Elhindi and McGarryMaclagan et al., 2013). In addition to the loss of the vowel length distinction, the results of Reference Watson, Maclagan, King, Harlow and KeeganWatson et al. (2016) demonstrated that the intergenerational change in the raising of Māori vowel /e/ highly correlates with the raising of the New Zealand English vowels /e/ and /æ/, as well as the fronting of the Māori vowel /u/ with that of the English vowel /uː/. The authors interpreted these results as evidence for the influence that New Zealand English has had on Māori vowels over the last generations of bilingual speakers, basing this assumption on the fact that the /u/-fronting in Māori lags behind the same process in English, and that English is the more-dominant and prevalent language in New Zealand. In addition, the vowel systems of these two languages seem to be closely related in all three generations of speakers, in both L1 Māori-L2 English speakers and simultaneous English-dominant bilinguals. This study not only shows contact-induced sound change but also provides a clear picture of the changes in progress.
There are, of course, more examples in the literature of contact-induced changes on vowels that go from the majority language to the indigenous languages, including indigenous language–based creoles and mixed languages. These examples include the adoption of Spanish mid-vowels in Quichua and the Spanish-Quichua mixed language Media LenguaⒺ in both production and perception via lexical borrowings (Reference StewartStewart, 2014, Reference Stewart2018) or the influence of English pronunciation on Australian Aboriginal languages as a result of indigenous language revitalization programs (Reference Reid, Hobson, Lowe, Poetsch and WalshReid, 2010), among others. What remains to be discussed is how fast this change can occur. Since a rapid generational change in the phonetics and phonology of Indigenous language bilinguals might indeed be a characteristic feature of these populations, this question should be explored in further detail. It is worth mentioning that this mechanism may not be as common, or at least not as rapid, for other types of bilingualism (e.g., additive bilingualism), with the exception of heritage language bilingualism, described in Chapter 25 of this volume.
In a descriptive study, Reference Mulík, Amengual, Avecilla-Ramírez and Carrasco-OrtízMulík et al. (2023) analyzed the production of balanced Hñäñho-Spanish bilinguals, native speakers of an OtomiⒸ variety spoken in Central Mexico. These bilinguals produced all ten Hñäñho vowels as distinct phonemes. Additionally, they produced all five Spanish phonemes and these mostly overlapped with their Hñäñho counterparts. The main exception was the Spanish /a/, which some bilinguals produced as phonetically more similar to the Hñäñho /a/ and others as more similar to the Hñäñho /ɔ/. Importantly, this had no effect on the maintenance of both Hñäñho vowels as distinct phonemes in their production, as no influence was found between both languages in this generation of Hñäñho speakers. However, the specific vowel contrast between the Hñäñho vowels /ɔ/ and /a/ seems to be already lost in the perception of the next generation of Hñäñho speakers, heritage speakers of this language, who categorize Hñäñho vowels /ɔ/, /a/, and /o/ similarly to monolingual speakers of Mexican Spanish, both groups perceiving the Hñäñho vowel /ɔ/ mainly as /a/ (Reference Mulík, Carrasco-Ortíz and AmengualMulík, Carrasco-Ortíz, & Amengual, 2022). Like the vowels produced by Māori-English bilinguals, Hñäñho vowels seem to be changing under the influence of the majority language (Mexican Spanish), with a vowel contrast rapidly disappearing from one generation to the next. Such symptoms of indigenous language loss may be related to the migration of Indigenous populations to cities, since urban spaces tend to be dominant in the majority language and culture (Reference Mulík, Amengual, Maldonado and Carrasco-OrtízMulík et al., 2021; Reference Watson, Maclagan, King, Harlow and KeeganWatson et al., 2016). Another motivation for this change is that the Hñäñho vowel system consists of twice as many distinct phonemes as Spanish, some of which might be merging due to the effect of Spanish dominance. The variability in the number of vowel phonemes (i.e., the difference between Hñäñho and Spanish vowel systems in comparison to Quichua and Spanish) is illustrated in Figure 26.2, which shows indigenous vowel systems with either fewer or more vowels than Spanish.

Figure 26.2 Examples of indigenous vowel systems with fewer or more phonemes than the majority language.
26.2.2 Consonants
Similar to vowels, the consonant systems of Indigenous bilinguals can also affect one another in either direction, both historically and in present-day bilingual speakers, and such changes are primarily motivated by differences in the consonantal inventories of these languages. These are known as phonemic conflict sites, defined as conflicting areas of phonological convergence (Reference StewartStewart, 2015). There is evidence that, over time, the indigenous languages of the Americas have influenced the phonology of Spanish consonants. For example, Central Mexican Spanish presently includes several fricative and affricate consonants that originated in NahuatlⒸ (Reference AvelinoAvelino, 2018). In particular, many Hispanicized loanwords from Nahuatl include the voiceless alveolar lateral affricate /t͡ɬ/, which is not a part of the Spanish consonant system elsewhere. It occurs in coda position, as in the common Mexican female name Xóchitl /ˈsot͡ʃit͡ɬ/ “flower,” with the phonetic variant [t͡l] occurring in onset position, as in tlacuache /t͡laˈkwat͡ʃe/ “opossum.” Similarly, Mexican and Central American Spanish also include the voiceless alveolar affricate /t͡s/ in words like Quetzalcóatl /ket͡salˈkoat͡l/ “feathered serpent,” in addition to the prototypical Spanish voiceless palatal affricate /t͡ʃ/. While these consonants may not be very productive, they are produced and perceived as distinctive phonemes even by monolingual speakers of Mexican Spanish (Reference Lope BlanchLope Blanch, 2004). On the other hand, monolingual speakers of Ecuadorian Spanish contrast /ʃ/ from /ʧ/ via Quichua borrowings (e.g., /ʃunʃo/ “fool,” /ʃunɡo/ “heart”), and this consonant has made its way into Spanish borrowings not of Quichua origin such as chef /ʃef/ and show /ʃo/. In a mostly yeísmo region (the loss of contrast between /ʎ/ and /ʝ/), Ecuadorian Andean Spanish maintains this phonemic contrast, but it has shifted to /j/ and /ʒ/, which may also be due to contact with Quichua (Reference Adelaar and MuyskenAdelaar & Muysken, 2004).
Conversely, a majority language may have consonants that do not exist in indigenous languages. For example, the voiceless labiodental fricative /f/ is present in Spanish but not in Mayan languages. As such, Spanish loanwords in Mayan languages such as Yucatec MayaⒹ and K’ichee’Ⓓ tend to change this phoneme into the voiceless bilabial stop /p/: /kaˈfe/ → /kaˈpe/ “coffee.” Experimental research among K’ichee’-Spanish bilinguals demonstrates that the fortition of /f/ into [p] also occurs in their Spanish and varies by sociolinguistic and phonological factors. It is more likely to occur among K’ichee’-dominant bilinguals and immediately after a nasal or a pause, phonological contexts in which fortition is most common cross-linguistically (Reference Baird and ReganBaird & Regan, in press). Furthermore, /f/ fortition is socially stigmatized in Guatemalan Spanish and is considered a common feature of “Mayan-accented” Spanish (Reference BairdBaird, 2023).
However, the effect of indigenous languages on Spanish consonants is not exclusive to the Americas. In the Basque Country (Spain), Reference BeristainBeristain (2022) studied the acoustic realization of the Spanish voiceless apical fricative /s/ in BasqueⒿ-Spanish bilinguals and Spanish monolinguals. The bilingual speakers of the Goizueta Basque variety, which counts on a phonemic distinction between the apico-alveolar /s̺/ and the lamino-alveolar /s̻/, produced an L2 Spanish /s/ with a more reduced acoustic dispersion than both monolingual Spanish speakers and bilingual Basque-Spanish speakers of the sibilant-merging Basque varieties. The results of this study show an influence of L1 Basque on L2 Spanish /s/ production. The effects of the indigenous language have also been reported on majority languages other than Spanish, such as English in Australia. Currently, the many indigenous languages of Australia, called Aboriginal languagesⓈ, coexist with Australian English(es), English-based creoles, as well as with mixed languages which are not based on one specific language, but rather incorporate features from their source languages. The interplay of these Australian language layers with different characteristics has led to some specific features of Aboriginal English. For instance, Reference Mailhammer, Sherwood and StoakesMailhammer, Sherwood, and Stoakes (2020) have shown that phonetic voicing manifests itself in voice termination time (that is, the transition phase between the voicing of a previous segment and the complete closure of a stop consonant), a prevalent and characteristic feature of Aboriginal English on Croker Island (Northern Territory), where the Australian indigenous languages Iwaidja and Kunwinjku are spoken. This feature aligns Aboriginal English on Croker Island with local Aboriginal languages and differentiates it from Standard Australian English. On the other hand, consonant production and perception in present-day Indigenous bilinguals can be marked by the presence or absence of phonological features of their indigenous native language. Thus, L1 speakers of Wubuy, an indigenous language without a voicing distinction in its consonant phonology, seem to be insensitive to the voice onset time (VOT) feature that underlies the voicing distinctions of English stops and fricatives, even when they are proficient L2 English bilinguals (Reference Bundgaard-Nielsen, Baker, Nyvad, Hejná, Højen, Jespersen and SørensenBundgaard-Nielsen & Baker, 2019). The typological diversity of these two languages has allowed for the study of Wubuy-English bilinguals to provide valuable information on how phonological dimensions that are not exploited in bilinguals’ L1 are acquired in their L2. Finally, the phonology of mixed languages is typically highly dependent on the indigenous language and borrows only a small subset of sounds from the introduced, majority language (e.g., Reference StewartStewart, 2015; Reference Stewart, Meakins, Algy, Ennever and JoshuaStewart et al., 2020). However, multilingual speakers of mixed languages can sometimes benefit from the combination of the phonological features of the various source languages. A production and perception study carried out by Reference Bundgaard-Nielsen and O’ShannessyBundgaard-Nielsen and O’Shannessy (2021) illustrates this on first-generation native speakers of Light Warlpiri, a mixed language from Australia, who also speak Walpiri, Australian English, and, to some extent, Kriol. While Warlpiri lacks fricatives and a voicing contrast for stops, such phonemes are present in both English and, to a lesser extent, Kriol. Figure 26.3 shows the integrated phonemic inventory of consonants of Light Warlpiri according to Reference Bundgaard-Nielsen and O’ShannessyBundgaard-Nielsen and O’Shannessy (2021, p. 14). Results from their perception task indicate that Light Warlpiri speakers can discriminate between several contrasts that do not exist in Warlpiri, confirming the integration of several phonological systems into one.

Figure 26.3 Tentative consonant inventory of Light Warlpiri: gray = phonemes from English and/or Kriol; black = phonemes from English; others = phonemes that exist in Warlpiri, English, and Kriol.
The effects of indigenous bilingualism on the realization of consonants have also been reported in the direction from the majority language toward the indigenous languages. Similar to the effects on consonant voicing in Australian Indigenous bilinguals mentioned earlier, the indigenous languages of the Americas have also been influenced by Spanish, such as the progressive appearance of /b d ɡ/ in Tena QuichuaⒺ from Ecuador, a language that originally did not have a voicing contrast for its set of stops which only consisted of the voiceless series /p t k/ (Reference O’Rourke and SwansonO’Rourke & Swanson, 2013).
Another example of majority language influence on the production of Indigenous bilingual voiceless stops is their aspiration and pre-aspiration. As a result of contact with English, the plosives of some Polynesian languages, such as MāoriⓊ (New Zealand) and HawaiianⓇ (Hawai’i), have undergone a change in their aspiration; that is, they were originally unaspirated, but are currently realized as aspirated. Reference Maclagan and KingMaclagan and King (2007) carried out an intergenerational study in which three generations of Māori-English bilinguals show progressively more aspirated Māori /p t k/ stops over a period of 100 years. However, the aspiration of English /p t k/ was consistently present in the bilinguals’ production over this entire time span, suggesting the influence of English stop aspiration on Māori (Reference Maclagan and KingMaclagan & King, 2007). Recent evidence for this claim also comes from an L1 speaker of Hawaiian, also fluent in English and Hawai’i Creole English, who produces aspirated Hawaiian /p/ and /k/ stops, which are historically considered to be unaspirated (Reference Parker JonesParker Jones, 2018). These natural changes in Māori and Hawaiian phonetic features, induced by their contact with English, were probably enhanced by the unfavorable historical situation of both of these indigenous languages, which are only recently being revitalized.
In addition to these historical changes in the aspiration of plosives by Indigenous bilinguals, voiceless stops can also be pre-aspirated, as in [ht]. This rare linguistic feature is found in several languages across northwestern Europe, a part of whose population is bilingual in their respective indigenous language and English (Reference ClaytonClayton, 2017). In the oral production of Welsh-English bilinguals in Bethesda WelshⒾ (North Wales), the degree to which a speaker of Welsh produces pre-aspirated /p t k/ correlates with the language they speak at home: English as a home language equates to shorter pre-aspiration whereas Welsh equates to longer pre-aspiration in Welsh stops (Reference Morris and HejnáMorris & Hejná, 2020). Additionally, bilingualism not only affects pre-aspiration of stops in the indigenous language of its current speakers but also has had an impact on the majority language phonetics over time. The pre-aspirated voiceless stops (e.g., in the word heat [hit] pronounced as [hiht]) of Hebrides English (Scotland), as produced by Scottish GaelicⒾ-English bilinguals, are thought to have originated by means of historical transfer from L1 Scottish Gaelic in the previous generations of bilingual speakers (Reference ClaytonClayton, 2017).
26.3 Suprasegmentals
Some suprasegmental outcomes among Indigenous language bilinguals may be primarily due to extralinguistic factors. For example, bilinguals of K’ichee’Ⓓ and Spanish tend to speak the language in which they are less dominant at a higher pitch than their more-dominant language (Reference Baird, Calhoun, Escudero, Tabain and WarrenBaird, 2019a). Following Reference Ohala, Nichols, Ohala and HintonOhala’s (1995) frequency code, this higher pitch may be attributed to the bilinguals being less comfortable speaking their nondominant language. Nonetheless, the majority of the existing literature focuses on specific suprasegmental features among the languages of these populations. Reference Hellmuth, Lucas and ManfrediHellmuth (2020) suggests that there are three primary reasons why suprasegmentals are so predisposed to such changes. First, the acoustic parameters of suprasegmentals are part of the linguistic encoding of all spoken languages. Although pitch, duration, and intensity are employed differently cross-linguistically, they are rather malleable and thus able to adapt to different languages’ needs. Second, all spoken languages express utterance-level meaning and, at least perceptually, “the mapping of form to meaning is also readily adaptable” (Reference Hellmuth, Lucas and ManfrediHellmuth, 2020, p. 587). Finally, in Reference Hellmuth, Lucas and ManfrediHellmuth’s (2020) strongest argument, the form-meaning mapping of acoustic parameters is generally not fixed, at least for suprasegmentals. Specifically, intonation exhibits substantial inter- and intra-speaker and contextual variation (Reference Hellmuth, Lucas and ManfrediHellmuth, 2020; see also Reference Cangemi, El Zarka, Wehrle, Baumann, Grice, Barnes, Brugos, Shattuck-Hufnagel and VeilleuxCangemi et al., 2016; Reference Cangemi, Grice, Krüger, Fuchs, Pape, Petrone and PerrierCangemi, Grice, & Krüger, 2015; Reference WalkerWalker, 2014). Reference SoraceSorace (2004) states that bilingual grammars are particularly disposed to change with regards to the variability of prosody. Thus, suprasegmental changes in both languages of Indigenous language bilinguals are common, although many cases of these remain either undocumented or anecdotal in the literature.
26.3.1 Intonation
The transfer of intonational contours and pitch accents from an indigenous language to a majority language is perhaps the most commonly attested suprasegmental phenomenon among these populations. Examples include IgboⓃ-like intonational patterns, such as a greater usage of falling tones, in the English of Igbo-English bilinguals (Nigeria; Reference Asadu, Okoro and KadiriAsadu, Okoro, & Kadiri, 2019). Reference Mailhammer and CaudalMailhammer and Caudal (2019) report that bilinguals of IwaidjaⓈ and English (Australia) use a striking intonational contour which they describe as linear lengthening intonation in their English. This contour, which consists of “a prolonged stretch of high pitch, either in a plateau or a rise, concluded by a high boundary tone, typically with lengthening of the final syllable nucleus” (Reference Mailhammer and CaudalMailhammer & Caudal, 2019, p. 40), is not found in other varieties of English but is common in Iwaidja, among other Australian Aboriginal languages. Intonational plateaus, or hat patterns, in MapudungunⒽ (Chile) have been transferred into the Spanish of both monolinguals and bilinguals and are used as a means of focus extension in both languages (Reference Rogers and RaoRogers, 2020).
In an analysis of the prosodic patterns of different types of questions among Mandarin-YamiⓆ bilinguals (Taiwan), Reference LaiLai (2018) reports that near-balanced and Yami-dominant bilinguals tend to use the Yami prosodic pattern of a rising boundary tone for statement questions, questions which echo an earlier uttered statement to express surprise/disbelief, for all statement questions in their Mandarin. Conversely, Mandarin-dominant bilinguals used both flat and rising prosodic patterns for different types of statement questions, similar to Mandarin monolinguals in China. Another example concerning question intonation is that of Tunisian Arabic yes/no questions, which Reference Bouchhioua, Hellmuth, Almbark, Miller, Barontini, Germanos, Guerrero and PereiraBouchhioua, Hellmuth, and Almbark (2019) report as having a complete rise-fall contour in utterance-final position that is frequently accompanied by a question-marking morpheme: a vowel added at the end of the last word in the utterance. These features differ from the simple rise present in most Arabic dialects (Reference Hellmuth, Lucas and ManfrediHellmuth, 2020). Although Tunisian Arabic-Tunisian BerberⓁ (Tunisia) bilingualism is not as common as it once was and there are few studies on intonational phenomena in Tunisian Berber, Zwara BerberⓁ, spoken in nearby Libya, demonstrates a yes/no question-marking clitic, /a/, that is obligatorily accompanied by this same rise-fall contour (Reference GussenhovenGussenhoven, 2018).
In several varieties of Spanish, prenuclear pitch peaks are often realized in a post-tonic syllable in declarative utterances. However, some studies have demonstrated that, among Indigenous language bilinguals, the alignment of these pitch peaks occurs earlier, often within the tonic syllable (see Figure 26.4). Examples include BasqueⒿ-Spanish bilinguals (Spain; Reference ElordietaElordieta, 2003), QuechuaⒻ-Spanish bilinguals (Peru; Reference O’Rourke, Auger, Clements and VanceO’Rourke, 2004), and Yucatec MayaⒹ-Spanish bilinguals (Mexico; Reference Michnowicz, Barnes, Howe, Blackwell and Lubbers QuesadaMichnowicz & Barnes, 2013). However, this is not to say that earlier peak alignment in Spanish prenuclear pitch peaks is the default in all Indigenous bilingual contexts; it was not found in the Spanish of GuaraníⒼ-Spanish bilinguals (Argentina; Reference Colantoni, Christoph and LleóColantoni, 2011). Furthermore, including extralinguistic variables in the analysis of this particular suprasegmental feature provides a clearer picture of this variation. For example, data from simultaneous K’ichee’Ⓓ-Spanish bilinguals (Guatemala) demonstrates that peak alignment is highly correlated with language dominance: bilinguals that are more dominant in K’ichee’ tend to produce more K’ichee’-like early peaks in their Spanish, whereas Spanish-dominant bilinguals tend to produce later peaks that are comparable to those of Spanish monolinguals from Guatemala (Reference Baird, Willis, Butragueño and Herrera ZendejasBaird, 2015).

(a) a late peak (H) aligned in a post-tonic syllable produced by a Spanish-dominant bilingual and

(b) an early peak (H) aligned within the tonic syllable produced by a K’ichee’-dominant bilingual.
Figure 26.4 Sample pitch tracks of the Spanish word /baˈnana/ with
There are, of course, cases of majority languages influencing the intonation of indigenous languages, though these are less documented. Reference NanceNance (2015) notes a structural and generational change among English-Scottish GaelicⒾ bilinguals (Scotland). Younger bilinguals have shifted from a lexical pitch accent to a purely post-lexical system, thus speaking Scottish Gaelic with English-like intonation. In the aforementioned study of Yami-Mandarin bilinguals, Reference LaiLai (2018) shows that bilinguals have integrated neutral question intonation from Mandarin into Yami. In a perception study of yes/no questions in K’ichee’, K’ichee’-Spanish bilinguals perceived stimuli with rising boundary tones as questions and stimuli with falling boundary tones as statements regardless of the presence of the K’ichee’ yes/no question-marking particle. In other words, these bilinguals ignored the morphosyntactic strategies of yes/no question marking in K’ichee’ and relied solely on intonational patterns that are parallel to those of yes/no questions in Spanish (Reference Baird, Bennett, Henderson, Mateo Pedro and HarveyBaird, 2019b).
Within the pragmatic framework of information structure (Reference LaddLadd, 2008), several studies in Latin America have analyzed how prosody is used to mark a constituent for focus among bilinguals of Spanish and different indigenous languages. Although other strategies may be employed to mark a constituent for focus in Spanish, the predominant strategy is to emphasize the constituent to a greater prosodic degree, especially when there are no changes in word order, that is, in situ focus (Reference LaddLadd, 2008). Several of these studies have investigated prosodic focus marking in the Spanish of Indigenous language bilinguals and have used language dominance as a variable. Although the specific acoustic findings differ according to population, the overall finding that each study has demonstrated is that, in Spanish, Spanish-dominant bilinguals tend to mark a constituent for focus to a greater prosodic degree than bilinguals that are more dominant in the indigenous language: Quechua-Spanish bilinguals (Reference O’RourkeO’Rourke, 2012; Reference Van Rijswijk and Muntendamvan Rijswijk & Muntendam, 2014); Yucatec Maya-Spanish bilinguals (Reference UthUth, 2016, Reference Uth2019); K’ichee’-Spanish bilinguals (Reference BairdBaird, 2021b). It has been hypothesized by several scholars that this may be due to the fact that in the specific indigenous languages spoken by these bilinguals, focus is primarily marked via morphosyntactic means, that is, word order changes, focus-marking particles, and so on, and that, in these languages, prosody is at best a secondary cue of focus (Reference O’RourkeO’Rourke, 2012; Reference BairdBaird, 2021b). To further this proposal, some of these authors have also analyzed focus marking in the indigenous languages spoken by these bilinguals. Reference O’RourkeO’Rourke (2008) found that in Quechua, Quechua-Spanish bilinguals only mark focus morphosyntactically and not prosodically. In K’ichee’, Reference BairdBaird (2018) demonstrated parallel findings to those reported for Spanish among K’ichee’-Spanish bilinguals (Reference BairdBaird, 2021b); in both languages, Spanish-dominant bilinguals mark focus to a greater prosodic degree than K’ichee’-dominant bilinguals. Thus, these bilinguals may be using the focus-marking strategy more common to their dominant language in both of their languages.
Other studies note how a specific focus-marking cue from one language may be transferred to another. In Spanish, a longer duration is a common prosodic focus-marking strategy (Reference FaceFace, 2002). However, in dialects of K’ichee’ that maintain phonemic vowel length, a constituent is not marked for focus via a longer duration and some K’ichee’-Spanish bilinguals from these dialects do not use a longer duration to mark focus in either language (Reference Baird, Bellamy, Child, González, Muntendam and Parafita CoutoBaird, 2017). The same phenomenon has been found in the Spanish of Yucatec Maya-Spanish bilinguals (Mexico), as Yucatec Maya does not mark focus via a longer duration either (Reference UthMartínez García & Uth, 2019). Reference KochKoch (2008) reports that among Nlaka’pamuxtsnⒶ-English bilinguals (Canada), focus is prosodically realized at the leftmost edge of the intonational phrase although the nuclear accent is predominantly located at the rightmost edge in Nlaka’pamuxtsn. It is argued that this shift in the location of the prosodic emphasis follows that of clefted focus marking in English, as there are no special semantics that motivate such changes in Nlaka’pamuxtsn (Reference KochKoch, 2008, p. ii).
It should be noted that cross-linguistic suprasegmental influences are not always found among Indigenous language bilinguals. For example, Reference McDonough, Garnie and HarleyMcDonough (2002) demonstrates that NavajoⒷ-English bilinguals (United States of America) have distinct intonational patterns in their two languages for yes/no questions, focus marking, and declaratives. Although Reference O’RourkeO’Rourke (2005) notes many cases of transfer and influence already mentioned in this chapter among Quechua-Spanish bilinguals, these speakers maintain language-specific pitch contours for yes/no questions. Reference Velázquez PatiñoVelázquez Patiño (2016) also demonstrates that NahuatlⒸ-English bilinguals (Mexico) clearly differentiate the prosodic features of interrogatives across both of their languages.
26.3.2 Stress and Tone
Although limited in comparison to studies on intonation, studies on the word-level-stress and word-level-prosody of Indigenous language bilinguals have also demonstrated how indigenous languages have influenced different majority languages and vice versa. For example, Moroccan Arabic is typologically different from most other dialects of Arabic in terms of word-level prosody. Specifically, although most varieties of Arabic are considered head-marking languages with prominent word-level stress, Moroccan Arabic is not head-marking and only tonal events are marked at the edges of prosodic phrases (Reference Hellmuth, Lucas and ManfrediHellmuth, 2020, p. 591). According to Reference BruggemanBruggeman (2018), there are no consistent acoustic cues to lexical stress in Moroccan Arabic and listeners present a “deafness” toward the perception of stress. Reference Hellmuth, Lucas and ManfrediHellmuth (2020, p. 591) argues that this variation is likely due to sustained contact and bilingualism with AmazighⓀ and not to contact with French. Although both Amazigh and French are nonhead-marking languages, there are no differences in Amazigh and Moroccan Arabic in word-level prosody as both lack acoustic cues to word-level stress and both monolingual and bilingual speakers of these languages demonstrate stress deafness (Reference BruggemanBruggeman, 2018). Furthermore, Moroccan Arabic and Amazigh also share other prosodic features that are not found in French; for example, the shape of the phrase-edge tonal contour in both Moroccan Arabic and Amazigh is a rise-fall, whereas it is a rise in French (Reference Hellmuth, Lucas and ManfrediHellmuth, 2020, p. 592).
Word-level stress in multiple Mayan languages primarily occurs in word-final position (Reference Baird, Bellamy, Child, González, Muntendam and Parafita CoutoEngland & Baird, 2017). Many Spanish loanwords with paroxytone stress patterns, that is, an acute accent on the penultimate syllable, that have been adopted into these Mayan languages demonstrate word-final atonic syllable deletion in order to follow this stress pattern: for example, /motoˈsjera/ → /motoˈsjer/ “chainsaw.” This pattern of deletion has also been attested in the Spanish of Mayan-dominant bilinguals in Guatemala, though it is socially stigmatized in this context (Reference Baird, Ortiz and Suárez BudenbenderBaird, 2021a). A similar example can be seen among Spanish-NahuatlⒸ bilinguals, where bilinguals tend to shift the tonic syllables of Spanish oxytones, where there is an acute accent on the final syllable, to a more Nahuatl-like paroxytone stress pattern: for example, /luˈɡaɾ/ → /ˈluɡaɾ/ “place” (Reference Hill and HillHill & Hill, 1986, p. 212).
Several tonal indigenous languages are well documented; however, the number of studies concerning the effects of bilingualism on tonal indigenous languages is also limited. Indeed, many of the studies that do exist focus on how bilinguals navigate loanwords from their atonal majority language into their tonal indigenous language. For example, among Hausa-Arabic and Kanuri-Arabic bilinguals (Nigeria), speakers often interpret the first syllable with a high tone in HausaⓂ or KanuriⓂ words as the stressed syllable in Arabic (e.g., Kanuri /bírnii/ → Nigerian Arabic /ˈbirni/ “city”) (Reference Owens and OwensOwens, 2000, p. 285) and Spanish-TriquiⒸ bilinguals (Mexico) have been shown to convert word-level stress in Spanish into the Triqui high tone (e.g., /ˈmakina/ → /ma3kina1/ “machine”) (Reference ScipioneScipione, 2011, p. 187). Among some bilinguals, speakers assign a tone that is rare, and thus marked, in their indigenous language to loanwords from their majority language in order to mark the foreignness of these words. This strategy has been reported with the uncommon low tone among monolinguals and bilinguals of ZapotecⒸ and Spanish (e.g., /kɾus/ → /kɾùz/ “cross”) (Mexico; Reference Operstein, Dakin, Parodi and OpersteinOperstein, 2017, p. 110), and among bilinguals of French and CèmuhîⓉ (e.g., /butɔ̃/ → /bùto/ “bud”) (New Caledonia; Reference Rivierre and RivierreRivierre, 1994, p. 507).
Other studies have demonstrated how the tonal systems of Indigenous language bilinguals have become simplified and are being lost among younger generations of bilingual speakers; that is, these indigenous languages may no longer be tonal in the near future. For example, Reference HoflingHofling (2014) reports that while older generations of Lacandon-Spanish bilinguals (Mexico) maintain a system of tones, younger generations that are increasingly more bilingual in Spanish no longer demonstrate any tonal contrasts in LacandonⒹ. Similar results were found among HakkaⓆ-Mandarin bilinguals (Taiwan) as younger speakers that were more dominant in Mandarin performed significantly worse on production and perception tasks of the Hakka low-level tone than older speakers that were closer to being balanced between Hakka and Mandarin (Reference Yeh, Lu, Ma, Ding and HirstYeh & Lu, 2012). Finally, among YorùbáⓃ-English bilinguals (Nigeria), Reference ShittuShittu (2019) finds that older bilinguals are better at both tone production and perception than younger bilinguals. This study goes one step further as it investigates Yorùbá-English bilinguals, both those living in Nigeria and those that have immigrated to Canada. These results demonstrate that those from Nigeria outperformed the speakers in Canada in both the production and the perception of Yorùbá tones.
Conversely, tone may influence the atonal majority language among Indigenous language bilinguals. Reference Bordal Steien and YakpoBordal Steien and Yakpo (2020) report that both Central African French and Equatorial Guinean Spanish have developed a two-tone system with high and low tones and attribute this to intense bilingualism with tonal languages: Ubangian languagesⓄ in the Central African Republic, Bantu languagesⓅ in Equatorial Guinea, and even some creoles. Evidence of these innovative tonal systems in both French and Spanish is seen in word-tone patterns, where only one high tone is permitted per word, and in tonal minimal pairs, as in, for example, the Central African French /sә́/ “this/that (adj.)” versus /sә̀/ “it/that (pron.)” or the Equatorial Guinean Spanish /ké/ “what” vs. /kè/ “that” (Reference Bordal Steien and YakpoBordal Steien & Yakpo, 2020, pp. 16, 19).
26.4 Conclusions
Overall, this chapter demonstrates some general considerations concerning the study of the phonetics and phonology of Indigenous language bilinguals. First, as noted in the map in Figure 26.1, this is a field with ample room for growth. There are obviously more indigenous languages in bilingual contexts than those mentioned in this chapter. As such, we must acknowledge our biases as authors of this chapter as indigenous language bilingualism in Latin America is well represented here. That being said, there are still many valuable works on this phenomenon in Latin America that could not be mentioned here due to the constraints of chapter length. While other geographical areas and language families are also discussed here, there are numerous contexts in which there are currently no data on the phonetics and phonology of Indigenous language bilinguals. Additionally, there are many disciplines within phonetics and phonology that are scarcely represented in indigenous language bilingualism. For example, production studies greatly outnumber perception studies, and psycholinguistic and neurolinguistic studies are sparse.
Second, indigenous language bilingualism provides a unique context in which to study the phonetics and phonology of bilingualism as these contexts consider the historical and social hierarchies of small-scale bilingualism that have emerged from colonialism. A common finding in the studies presented here is the decline in the indigenous language of these bilinguals, regardless of whether it is their L1 or their L2. As this decline may last over generations or occur rapidly, these contexts offer important insights into language loss, language attrition, and so on. Importantly, the studies that include language dominance as a variable offer a clearer picture of these processes as bilinguals that are more dominant in their indigenous language, even if it is their L2, may demonstrate more influence on the phonetics and phonology of their majority language than vice versa. In this sense, while the L1 does influence the L2 earlier on, especially in sequential bilinguals, it seems to be the dominant language that matters more in the long run (see Chapter 29, this volume, for a more in-depth discussion on language dominance).
In cases where an indigenous language may influence the prestigious majority language, this may contribute to a different accent or even to a new dialect of the majority language. Although there are few sociolinguistic perceptual studies, those that exist tend to demonstrate that the majority language is often socially stigmatized after it takes on phonetic features from the already stigmatized indigenous language. On the other hand, the phonology of the majority language may influence that of indigenous languages, creoles, and mixed languages via lexical borrowings or the transfer of phonetic features through bilingual individuals. The outcomes of mixed languages increase our understanding of bi- and multilingual phonetics and phonology. Such studies aid in our understanding of notions such as which constraints tend to be maintained and which tend to be lost in mixed and integrated phonemic inventories. As many studies cited here do not mention the type of acquisition of these bilinguals, for example naturalistic versus explicit instruction, future comparisons need to consider this important variable. Indeed, many of the frameworks and hypotheses detailed in this volume would be furthered by more research among Indigenous language bilinguals.
Finally, as we stated in the introduction, it is of utmost importance to study both languages of Indigenous language bilinguals for both scientific and ethical reasons. Many Indigenous language bilinguals are on the peripheries of society, and only focusing on their majority language continues this dichotomy. In terms of linguistics, fragmented studies of only one language of these bilinguals may lead to misunderstandings about how their languages work and to falsely attributing different phenomena to bilingualism and the unstudied language. As seen throughout this chapter, the transfer of features goes both ways: indigenous language to majority language and vice versa. In other words, if we study only one language of these bilinguals, we understand only half the picture (Reference GrosjeanGrosjean, 2008).
27.1 Introduction
27.1.1 Bimodal Bilinguals
This volume on bilingual phonetics and phonology focuses on bilinguals who use two spoken languages, known as unimodal bilingualism, where two languages are used in the same modality (in this case, spoken language). In this chapter, we turn the discussion to bimodal bilinguals, whose languages primarily make use of different modalities. Bimodal bilinguals include users of a natural sign language in the visual-manual modality or a protactile language in the haptic-manual modality, along with a second language in another modality such as a spoken language in the auditory-oral modality or even a written version of a spoken language. By “sign languages,” we are referring to the natural sign languages that emerge among groups of deaf people,Footnote 1 not the invented systems that are sometimes used to represent a spoken language for educational purposes. Natural sign languages employ the hands, face, and body as articulators to produce linguistic components that are perceived visually (or through touch, in the case of tactile sign languages) and share many of the same underlying organizational principles as natural spoken languages.
In addition to their sign language, some bimodal bilinguals use the spoken language in both a spoken and a written form, while others use the written form only. Deaf or hard-of-hearing (DHH) bimodal bilinguals often prefer to use a written form of the spoken language; while they have frequently been trained in the use of speech, the degree to which they use it spontaneously is a matter of personal choice. On the other hand, bimodal bilinguals with more access to speech may use spoken language more frequently. They include hearing offspring of deaf parents, known as codas (children of deaf adults, for adults) or kodas (kids of deaf adults, for young children) and DHH people who access sound through hearing technology like cochlear implants (CIs) or hearing aids (HAs). When we consider the use of speech by bimodal bilinguals, we also include their co-speech gestures, as these are an integral part of the communication that occurs with spoken languages. Our discussion separates studies that include the spoken versus the written form of a spoken language.
27.1.2 Sign Language Phonology
Phonological analyses of sign languages begin with the meaningless components that are the building blocks for individual signs and signs in sequences (Reference StokoeStokoe, [1960] 2005). For example, sign language researchers often describe individual signed words in terms of the configuration of the hand(s) used to make the sign, the location on the body or in signing space where the sign is produced, and the movements used by the hand(s) in articulating the sign. The sublexical components of signs, often called parameters, combine in rule-governed ways, for example by following constraints on which handshapes can be used in one- versus two-handed signs (Reference BattisonBattison, 1978). Although sign phonological research has shown that signed words typically occupy only one timing unit (or syllable) formed with complex simultaneously articulated elements, there have also been studies examining sequential (segmental) phenomena, such as are found in compounds (e.g., Reference SandlerSandler, 1989, Reference Sandler1993).
The manual components of signs are combined with nonmanual aspects, which can include body position, head position, and facial configurations, often referred to as nonmanual markers (NMMs). Nonmanual components are sometimes lexically specified, but more often these elements are part of the prosodic component, which also includes rhythm and timing (Reference Sandler, Pfau, Steinbach and WollSandler, 2012b). They may mark prosodic components such as junctures, or they may co-occur across sequences of signs contributing to discourse functions.
Sign phonologists have discussed models that account for the representation of signs, constraints on the combinations of features, and questions such as the possible existence of an organizational unit comparable to the syllable. These studies have demonstrated that the phonologies of sign languages have many parallels to those of spoken languages, despite the fact that the primary modalities of these languages are different. For overviews of sign language phonological analyses, see Reference SandlerSandler (2012a), Reference Brentari, Fenlon and CormierBrentari, Fenlon, and Cormier (2018), and Reference Van der Hulst, van der Kooij, Quer, Pfau and Hermannvan der Hulst and van der Kooij (2021).
27.1.3 Research with Bimodal Bilinguals
The field of research on bimodal bilingualism is still rather young and fairly limited, especially compared to research on spoken unimodal bilinguals, but the findings emerging so far have already demonstrated the importance of bimodal bilingual research for our understanding of bilingualism more broadly. The first question we address, then, is why the study of sign language phonology, which describes the sublexical organization of the articulatory components making up signs, is relevant to those studying the phonology of spoken languages. While notions like feature hierarchies, constraints on combinations of elements, and syllable structure are common across both signed and spoken language phonologies, it might be expected that the differences between modalities for the surface-level phonological elements (hand configurations versus consonants, say) would make examination of the phonological features of bimodal bilingualism either very boring or so unique as to have limited relevance for understanding unimodal bilingual phonology. In contrast, the overall finding from research so far is that, despite modality differences, signed phonology and spoken phonology do interact for bimodal bilinguals, just as the phonologies of a unimodal bilingual’s two spoken languages do.
27.1.4 Goals of This Chapter
Our review will focus in turn on two aspects of bimodal bilingual phonology. First, in Section 27.2 we examine how a bimodal bilingual’s languages interact in various contexts of language production. We start by considering phonological interactions in fully competent bimodal bilingual adults. Then we turn to phonological development by bimodal bilingual children, followed by a section on the learning of phonology in a second modality by adults. We conclude the first section with a look at code-blending, an integral part of bimodal bilingual language use made possible by partial separation of their two phonological systems. In Section 27.3 we discuss studies of the processing of languages in two modalities, focusing on studies that reveal interaction at the phonological level.
In all these areas of our review, we will see that many aspects of phonology manifest similarly across modalities. For example, when considering both first (L1) and second (L2) language development, the factors that influence production accuracy are parallel for sign and spoken languages. Markedness and articulatory or perceptual complexity play important roles in development in both modalities, for instance. Among sign languages, there is an additional modality-specific effect of iconicity, the motivated link between form and meaning which is heightened in the visual modality, thanks to hands that bear a visual similarity to a referent or movement that mimics carrying out an action. Most striking, however, are the findings that, despite articulatory differences, sign and spoken phonologies interact in many ways.
27.2 Signed and Spoken Phonologies and Their Interaction in Language Production
27.2.1 Interaction of Signed and Spoken Phonologies in Bimodal Bilingual Adults
We are not aware of any studies that specifically examine the segmental phonological grammar in speech or in sign for bimodal bilingual adults. However, one study compares prosodic features across signed narratives of codas, deaf native signers, and second-language learners of American Sign Language (ASL) (Reference Brentari, Nadolske and WolfordBrentari, Nadolske, & Wolford, 2012). A reasonable expectation is that since the phonologies of sign languages and spoken languages use distinct articulators, there might not be any transfer effects between them, or other evidence that phonological experience with the language in one modality affects the language in the other. Nevertheless, Reference Brentari, Nadolske and WolfordBrentari et al. (2012) predicted that their hearing participants’ experience with spoken English and the co-speech gestures accompanying spoken English would cause their ASL prosodic patterns to differ from those of deaf native signers. Indeed, this was the case, at least for sign duration and torso position, two prominent prosodic cues that occur in both co-speech gesture and ASL. Brentari and colleagues found that hearing participants (codas and second modality second language [M2L2] signers) patterned similarly to each other, by lengthening signs at intonation-phrase-final positions (whereas deaf signers showed this effect at initial positions). The hearing participants also produced more torso shifts than deaf signers. We will provide further discussion of the phonology of M2L2 signers in Section 27.5.
27.2.2 Phonological Development by Bimodal Bilingual Children
An estimated 80–90 percent of children born to deaf families in the United States are kodas (Reference Mitchell, Young, Bachleda and KarchmerMitchell et al., 2006; Reference Singleton and TittleSingleton & Tittle, 2000), receiving sign language input from one or both parents, and spoken language input from hearing family and community members. Deaf parents also often use spoken language with kodas (Reference PizerPizer, 2008; Reference Van den Bogaerde and Bakervan den Bogaerde & Baker, 2005), although their pronunciation and prosody may be different from that of hearing adults. Reference MayberryMayberry (1976) found that the spoken language development of kodas was not affected by their parents’ use of spoken language, but the children’s sign language proficiency was inversely correlated with parental use of spoken language, suggesting that kodas, like other bilinguals, take advantage of multiple input sources when they are available.
A somewhat similar input situation exists for the deaf children of deaf signing parents who access spoken language through an HA, CI, or other hearing amplification; we refer to this group as ‘DDHA’ (deaf children of deaf parents with hearing amplification). Typically, DDHA children are fitted with hearing amplification at varying ages, and CIs may be activated as early as at one year of age, but frequently later. This means that the earliest accessible linguistic input comes in the form of their sign language. Extensive rehabilitation is recommended for the development of spoken language with the use of a CI or other hearing technology.
There is a growing acquisition literature focused on these two groups of bimodal bilingual children, but their early phonological development has not yet been extensively studied. Here we will summarize the few existing findings about child bimodal bilingual phonology.
The most basic level of phonological analysis focuses on formational accuracy, or the accuracy with which children produce the phonemes (for spoken language) or the major phonological parameters (for signed language) of their two languages. Here we first summarize findings reported from studies of bimodal bilingual children’s naturalistic spontaneous production. In their spoken language development, kodas and DDHA children behave similarly to their monolingual hearing peers by beginning with less marked phonemes and progressing over time to more marked ones. This developmental pattern is unsurprising, given that unmarked phonemes are easier to both perceive and articulate than marked phonemes. For example, by the age of 3;00 (years;months), 90 percent of English-speaking children master production of phonemes /p/ and /b/, both unmarked by virtue of their articulation at the front of the mouth, making them visible and relatively easy to produce. In contrast, the same level of mastery for marked phonemes /s/ or /z/ is not reached by bimodal bilingual children until age 8;0 (Reference SanderSander, 1972).
Markedness plays a similar role in the development of signed phonology for bimodal bilingual children. As mentioned earlier, signs can be decomposed into three major parameters: hand configuration (or handshape), location, and movement. Research on multiple sign languages around the world reports that signing deaf children master location earliest of the three major sign parameters (Reference Chen Pichler, Pfau, Steinback and WollChen Pichler, 2012). Although handshape is mastered later, children generally display early acquisition of the most unmarked handshapes, including
,
,
,
,
,
, and
.Footnote 2 Marked handshapes that are more difficult to articulate (and often also more difficult to perceive accurately), such as
,
, and
, tend to be acquired later (Reference Boyes-Braem, Volterra and ErtingBoyes-Braem, 1990). This pattern appears to also apply to bimodal bilingual children’s sign language development, although only a limited number of studies exist to date (Reference Bonvillian and SiedleckiBonvillian & Siedlecki, 2000). Figure 27.1 illustrates handshape substitutions for the ASL signs READFootnote 3 (the target
handshape is more marked than the substituted
handshape) and WATER (the target
handshape is more marked than the substituted
handshape). Note that the fact that a child does not supply the target handshape in a given sign does not imply that they will not use it in another sign; although the child pictured in Figure 27.1 fails to produce the
handshape in READ, he uses it as a less marked substitute for the highly marked
handshape in WATER. Patterns such as this suggest that handshape errors are influenced by more factors (e.g., sign environment, availability of visual feedback, etc.) than simply the articulatory complexity of the handshape itself.

(a) Target version of READ;

(b) Child’s production of READ;

(c) Child’s production of WATER;
Figure 27.1 Target and child forms of ASL signs READ and WATER showing substitutions of less-marked handshapes.
More extensive investigation is needed to examine the cross-linguistic effects of bimodal bilingual children’s developing signed and spoken phonological systems. Because these systems occupy distinct modalities, any interaction between them could provide an interesting window into the role that modality plays in the organization and activation of phonological forms. To date we know of only one study that directly addresses influences of bimodal bilingual children’s spoken language phonology on their sign language phonology; this study focused on cyclicity, the number of repetitions of a movement path within the lexical movement of a sign. Reference Chen Pichler, Quadros and Lillo-MartinChen Pichler, Quadros, and Lillo-Martin (2009) examined accuracy of cyclicity in two kodas’ signing between roughly 1;06 and 2;00. A large proportion of ASL signs are multicyclic, involving reduplicated movements, but monocyclic signs also exist. Previous studies of very young deaf signers (0;09–1;05) reported higher accuracy in the number of cycles for multicyclic targets than for monocyclic targets, which tend to be reduplicated in infants’ signing (Reference Meier, Mauk, Cheek and MorelandMeier et al., 2008). Reference Chen Pichler, Quadros and Lillo-MartinChen Pichler et al. (2009) did not find any generalized pattern of increased reduplication (perhaps due to the fact that the koda subjects they studied were older than those studied by Reference Meier, Mauk, Cheek and MorelandMeier et al., 2008), but they reported cyclicity errors that appeared to be influenced by bimodality. Both kodas code-blended the majority of their signs, rhythmically aligning the spoken and the signed components. For instance, the ASL question BALL WHERE was code-blended with the English words “ball where,” and BALL was produced with a single movement temporally aligned with the English word “ball.” The target form of BALL is multicyclic, but Reference Chen Pichler, Quadros and Lillo-MartinChen Pichler et al. (2009) suggested that this child’s production of a monocyclic form was a consequence of code-blending with the single-syllable English word “ball.” This pattern may be an early manifestation of the prosodic accommodation produced by adult codas when they use code-blending (see Section 27.2.4.1).
In addition to studies of spontaneous production, bimodal bilingual children’s phonological development has also been examined using more controlled experiments. Initially, researchers focused exclusively on spoken language, reflecting (misplaced) concerns that the signing home environment of deaf families would not support kodas’ spoken language development. While some older studies reported delayed or atypical patterns of spoken English development for kodas (Reference Sachs, Bard and JohnsonSachs, Bard, & Johnson, 1981; Reference Schiff and VentrySchiff & Ventry, 1976), such results were not reproduced even in the early years of this research, when multiple studies found that hearing children with deaf parents acquired their spoken language phonology in a parallel way to children from homes with hearing parents (Reference Leonard, Newhoff and MesalamLeonard, Newhoff, & Mesalam, 1980; Reference SchiffSchiff, 1979; Reference Schiff-MyersSchiff-Myers, 1982); and that phonological processes such as cluster reduction, postvocalic devoicing, stopping, and final consonant deletion occurred for both kodas and hearing monolingual children alike (Reference Schiff-Myers and KleinSchiff-Myers & Klein, 1985; Reference TooheyToohey, 2010).
More recent research has been concerned with the spoken phonological development of DHH children who use both a sign language and hearing technology, but very few such studies include children with deaf, signing parents who receive fluent input in a natural sign language from birth. One such study examined deaf children using CIs speaking Persian/Farsi in Iran, and compared deaf children with deaf parents to those with hearing parents (Reference HassanzadehHassanzadeh, 2012). This study found that the deaf children with deaf parents performed better than those with hearing parents on speech perception and auditory tests. Another study in the US also found that four- to six-year-old bimodal bilingual children demonstrated largely age-appropriate spoken language development for spoken English (Reference Davidson, Lillo-Martin and Chen PichlerDavidson, Lillo-Martin, & Chen Pichler, 2014; Reference KozakKozak, 2018), as measured by their performance on the Goldman Fristoe Test of Articulation 2 (Reference Goldman and FristoeGoldman & Fristoe, 2000), a standardized test of spoken phonological aptitude that has commonly been used with DHH children.
Bimodal bilingual children’s spoken and sign language processing has also been tested with various nonword or pseudoword repetition tasks such as the Children’s Test of Nonword Repetition (CNRep; Reference Gathercole and BaddeleyGathercole & Baddeley, 1996). These tests present children with possible words that follow the phonotactic rules of the target language but are not actual lexical items of that language. Children must repeat these nonwords as accurately as possible, recalling the phonological form without recourse to lexical knowledge. American bimodal bilingual children score in the same range as monolingual English hearing controls on the CNRep (Reference KozakKozak, 2018), mirroring results reported for bimodal bilinguals learning other spoken languages, including Brazilian Portuguese (Reference Cruz, Kozak, Pizzio, de Quadros, Chen Pichler, Orman and ValleauCruz et al., 2014; Reference Kozak, de Quadros and CruzKozak et al., 2013).
Nonword repetition tasks have also been developed for a number of sign languages, reflecting the same design principles as the spoken pseudoword repetition tasks described earlier. To date, pseudosign tests have been used with bimodal bilingual children only in ASL and Libras (the sign language used in Brazil), with results showing that these bimodal bilinguals scored within the same range as their deaf peers (Reference Cruz, Kozak, Pizzio, de Quadros, Chen Pichler, Orman and ValleauCruz et al., 2014; Gu et al., 2022; Reference KozakKozak, 2018). In combination, these assessments with bimodal bilingual children indicate that phonological development is largely parallel in their two languages, despite the modality difference between them.
27.2.3 Learning of a Second Phonology by Adults
The vast majority of existing research on bimodal bilingual phonology focuses on hearing adults learning their first sign language as an L2 (M2L2 signers). As learners in a bimodal bilingual context, adult M2L2 signers might be expected to display similarities to the child bimodal bilingual learners discussed in Section 27.2.2. Indeed, M2L2 learners appear to mirror child L1 signers in their general phonological accuracy, being relatively accurate in their (re)production of location, but much less accurate in handshape and movement (Reference Chen Pichler, Mathur and NapoliChen Pichler [2011] for ASL; Reference JissinkJissink [2005] for Sign Language of the Netherlands [NGT]; Reference Ortega and MorganOrtega and Morgan [2015] for British Sign Language [BSL]). Among M2L2 handshape errors, marked forms are generally produced with higher error rates than unmarked forms (Reference Ortega and MorganOrtega & Morgan, 2015), although some very frequent and unmarked handshapes such as
and
also display high error rates, often due to incorrect thumb position (Reference Chen Pichler, Mathur and NapoliChen Pichler, 2011). Figure 27.2 illustrates such a handshape error for the ASL sign WHERE in which the M2L2 handshape features the thumb in an unopposed position rather than the opposed position of the target.

(a) Target form of ASL sign WHERE displaying
handshape with opposed thumb.

(b) M2L2 handshape error due to unopposed thumb position.
Among the movement errors that have been documented for M2L2 signers are proximalization errors, an error pattern also attested in child L1 signing. Figure 27.3(a) shows the target form of the ASL sign RAIN, in which the repeated downward movement originates from bending of the wrists, joints that are fairly distal from the signer’s torso. Figure 27.3(b) reproduces a proximalization error in which an L2 signer (a hearing baby-sign instructor on YouTube) has activated a more proximal joint, the shoulders, in addition to the wrists. For target signs that call for activation of more than one joint, proximalization can also manifest as the omission of the most distal joint.

(a) Target form for ASL sign RAIN showing movement from wrists.

(b) Reproduction of a proximalized form of RAIN showing movement from shoulder and wrist.
The findings of proximalization errors and the inverse relationship between markedness and accuracy for both adult M2L2 and child L1 signing across multiple sign languages indicates that developing an accurate phonology in the signed modality is challenging for adult and child learners alike. Difficulty stems not only from the motor demands of forming signs but also from the perceptual demands of correctly seeing or recalling signs. Reference RosenRosen (2004), in one of the earliest studies to investigate M2L2 phonology, argued that new signers’ phonological errors could be divided into those caused by lack of manual dexterity and those caused by faulty perception. Reference RosenRosen (2004) categorized any M2L2 error types that paralleled errors also observed for L1 signing children as stemming from poor dexterity. These included handshape substitutions and inaccurately formed handshapes, but also cases in which handshape changes were not articulated correctly, or two-handed signs with incorrect relative placement and/or movement of the two hands. Perceptual errors included a wide variety of errors related to failure in mental rotation, which Reference RosenRosen (2004) attributed to the traditional ASL classroom setup in which the instructor faces students while signing, prompting them to “mirror” the sign as they perceive it.Footnote 4 This detailed analysis is insightful and offers valuable classroom application, although in reality it can be difficult to determine the relative contribution of dexterity versus perception to sign errors, and it is very likely that both are active to various degrees.
Another, perhaps somewhat surprising influence on M2L2 sign phonology is cross-linguistic transfer. Transfer is a major source of errors in spoken language L2 phonology, but in the case of M2L2 signers, the modality difference between their spoken L1 and their signed L2 largely precludes phonological transfer from speech to sign (although compare with the discussion of cross-modal activation in Section 27.3 and rhythmic alignment in bimodal utterances in Section 27.2.4). However, M2L2 signers’ previous experiences with their spoken L1 include considerable gestural experience and sensitivity to iconicity, which influence their sign language development in interesting ways.
A series of studies with nonsigning hearing adults, meant to reproduce the very beginning stages of M2L2 sign language learning, observed that iconic signs were more easily learned and recalled than noniconic (arbitrary) signs (Reference Campbell, Martin and WhiteCampbell, Martin, & White, 1992; Reference Lieberth and GambleLieberth & Gamble, 1991; Reference MorettMorett, 2015), at least during the initial stages of learning. Iconicity serves as a mnemonic device for these hearing learners, who readily recognize the meaning of highly iconic, transparent signs (Reference OrtegaOrtega, 2017). But is the relevance of iconicity limited to lexical learning?
At first glance, we might expect that if an M2L2 learner recognizes the meaning of a new sign based on its transparent form, they should then be able to reproduce that form accurately. Surprisingly, the opposite is often true. Reference Ortega and MorganOrtega and Morgan (2015) observed that new signers are in fact less accurate in their reproduction of highly iconic signs than they are for arbitrary (noniconic) signs. The proximalization error in Figure 27.3 illustrates this error pattern. According to the ASL-LEX database (Reference Caselli, Sehyr, Cohen-Goldberg and EmmoreyCaselli et al. 2017), the ASL sign for RAIN is perceived as fairly iconic by both hearing nonsigners (average iconicity rating 5.429 out of 7) and deaf signers (average iconicity rating 5.13 out of 7). Contrary to the target form, the M2L2 form in Figure 27.3 begins with the hands raised above the head, likely reflecting a gestural form that depicts rain falling from the sky. Because the sign for RAIN is iconic, an M2L2 learner can recognize its meaning without having to attend to all of the phonological details of that form, and apparently overlook the fact that the sign actually begins lower in the signing space. Thus, M2L2 learners can represent these signs “based on how they would express the concept in gesture” (Reference Ortega, Schiefner and ÖzyürekOrtega, Schiefner, & Özyürek, 2019, p. 2). Reference Ortega and MorganOrtega and Morgan (2015) reported that this negative effect of iconicity on phonological accuracy persisted in their group of beginner M2L2 signers even after eleven weeks of BSL instruction.
Beyond iconicity, there are also other perceptual factors that influence M2L2 signers’ phonological accuracy, including their visual discrimination of signed forms in general. We have already seen that both L2 and M2L2 signers are error-prone in their production of marked handshapes, in part because marked handshapes can be difficult to perceive or distinguish from similar handshapes (e.g., the
handshape is often misperceived as a
handshape; the
and
handshapes are likewise difficult to distinguish). Reference Bochner, Christie, Hauser and SearlsBochner et al. (2011) presented students in college-level ASL courses with short, signed sentence pairs on video, asking them to judge whether the two sentences were the “Same” or “Different.” They reported poor discrimination for these ASL learners, particularly for contrasts in movement and complex morphology (a mixed category involving contrasts in handshape, location, or movement), as well as pairs with no contrast (“Same” pairs). Reference Bochner, Christie, Hauser and SearlsBochner et al. (2011, p. 1321) concluded that identifying linguistic contrasts poses a significant challenge for new signers that warrants explicit training “in a deliberate and systematic manner.”
27.2.4 Code-Blending
27.2.4.1 Adult Code-Blending
While studies that individually examine bimodal bilinguals’ signed and spoken phonologies are rare, there have been several studies examining how bimodal bilinguals use the phonologies of their two languages when interacting with each other and with monolinguals. Like unimodal bilinguals, bimodal bilinguals sometimes engage in intersentential and intrasentential code-switching, switching between signing and speaking. However, it is much more common for bimodal bilinguals to engage in code-blending, the simultaneous production of (parts of) an utterance in both speech and sign (Reference Emmorey, Borinstein, Thompson and GollanEmmorey, Borinstein, et al., 2008). Because the phonologies of speech and sign use different articulators for the most part, this simultaneous output is physically possible for bimodal bilinguals. However, it is not the case that these reduced articulatory restrictions mean that any kind of mixing will be well-formed. Code-blending is linguistically constrained, and understanding these constraints can inform us about the nature and organization of bimodal bilingual grammars (e.g., Reference Quadros, Davidson, Lillo-Martin and EmmoreyQuadros, Davidson, et al., 2020; Reference Emmorey, Borinstein, Thompson and GollanEmmorey, Borinstein, et al., 2008; Reference Fung and TangFung & Tang, 2016).
Figure 27.4 illustrates a simple code-blend consisting of a sentence produced primarily in English, with insertion of one ASL sign (GALLAUDET) that is coarticulated with its translation equivalent in spoken English (“Gallaudet”).Footnote 5Note that the timing of the coarticulated sign and word is precise: the beginning of the sign corresponds to the beginning of the word, and the sign’s movement to the back of the head along with handshape closing is cotimed with the final, stressed syllable of the spoken word.

Figure 27.4 English question “Have you ever visited Gallaudet/GALLAUDET?” featuring the spoken word “Gallaudet” aligned with the lexical movement of the code-blended sign GALLAUDET.
More complex examples of code-blending might variously involve multiple ASL signs, signs and words that are not exact translation equivalents, or sentence structures in which both sign and speech follow the grammatical rules of ASL rather than English. The example shown here involves ASL as the primary language, with a few words spoken in English.Footnote 6 The lines labeled ASL and Eng are coarticulated according to the timing indicated by the spacing of the words.
| ASL: | IX_1 VISIT ONE FAMILY. IX(family) HAVE | ||
| Eng: | LIVING-ROOMliving room | HAVEhave | WHAT?what |
| ASL: | FOUR | TV | DS_b5(4-items-arrayed-in-2-rows) |
| Eng: | four | TV | |
| ‘I visited a family who had four TVs in their living room!’ | |||
Reference Emmorey, Petrich and GollanEmmorey, Petrich, and Gollan (2012, p. 209) have noted that when spoken and signed words are produced, their lexical phonological onsets are aligned in the way we described for Figure 27.4; this “vocal–manual coordination appears to be driven by pressure to synchronize linguistic articulations during production.” The lexical movement of code-blended signs is temporally aligned with the onset of the code-blended spoken word, so any preceding transitional movement occurs before the onset of that spoken word. This pattern, first documented by Reference Emmorey, Borinstein, Thompson and GollanEmmorey, Borinstein, et al. (2008), is robust and has since been reported by other studies, with implications for the mental architecture underlying phonological production in general.
Code-blending should be distinguished from another type of bimodal production known as SimCom (simultaneous communication), in which speech and sign are also produced simultaneously. The goal of SimCom is to produce equivalent content in both modalities, but in reality the spoken message is generally prioritized, making it an inequitable practice in deaf education, where DHH students may be able to access only the (often impoverished) signed portion of the SimCom message (Reference Scott and HennerScott & Henner, 2020). Furthermore, SimCom displays more prosodic dysfluencies than code-blending (Reference Emmorey, Borinstein, Thompson, Cohen, McAlister, Rolstad and MacSwanEmmorey, Borinstein, & Thompson, 2005), with more pauses overall and less tight cotiming of spoken and signed words.
While code-blending is primarily a phenomenon of bimodal bilingual interactions, aspects of the sign language, such as nonmanual signals, may “leak out” even during interactions with nonsigners. The nonmanual marking in Figure 27.4, including head tilt and raised brows, is generally associated with yes/no questions; other nonmanual signals mark wh-questions or conditionals. Reference Pyers and EmmoreyPyers and Emmorey (2008) recorded ASL-English bimodal bilinguals interacting with nonsigning English speakers and found that some ASL NMMs occurred with their spoken English utterances, despite the fact that they knew their nonsigning interlocutors would not perceive these markers as grammatically relevant. Reference Pyers and EmmoreyPyers and Emmorey (2008) interpreted their finding as evidence that the bilingual architecture is integrated quite late in the process of language processing, up to the level of phonological implementation. This conclusion is supported by various studies of code-blending, all of which indicate that speech and sign are closely linked. Code-blending takes advantage of the fact that the articulators allow for coproduction of aspects of speech and sign; this suggests that the multiple spoken languages of unimodal bilinguals are similarly closely linked, but their simultaneous output is blocked due to the two spoken languages competing for the same articulatory channel.
Code-blending primarily describes signed components and spoken components combining to produce a single utterance. However, there is also a phenomenon known as “coda talk” or sometimes “deaf voice” which may be combined with code-blending, but this phenomenon is sociolinguistically limited to all-coda contexts (Reference Bishop and HicksBishop & Hicks, 2005; Reference PrestonPreston, 1995). During coda talk, the phonology of the spoken utterance is modified (with prosodic accommodation) to include features typical of deaf speech, such as nasalization, a distortion of prosody toward the extremes of highs and lows, strong assimilation processes that lead to a loss of syllables, vowel epenthesis at word end, and nonlinguistic vocal gestures (Reference Bishop and HicksBishop & Hicks, 2005). Coda talk is regarded as a discourse style that reflects the cultural identity of this specific hearing bimodal bilingual group (Reference BishopBishop, 2010).
27.2.4.2 Child Code-Blending
Code-blending has been observed for kodas acquiring Langue des Signes Québécoise and French (Reference Petitto, Katerelos and LevyPetitto et al., 2001); NGT and Dutch (Reference Van den Bogaerde and Bakervan den Bogaerde & Baker, 2005); Libras and Brazilian Portuguese (Reference Quadros, Lillo-Martin, Chen Pichler, Marschark and SpencerQuadros, Lillo-Martin, & Chen Pichler, 2016); ASL and English (Reference Lillo-Martin, Quadros and Chen PichlerLillo-Martin, Quadros, & Chen Pichler, 2016); and Finnish Sign Language and Finnish (Reference Kanto, Laakso and HuttunenKanto, Laakso, & Huttunen, 2017). Like adult codas, kodas are much more likely to use code-blending than code-switching. They also produce the same types of code-blends as adults, although their productions involve simpler, age-appropriate structures. Even very young children tend to follow the phonology of adult code-blending by temporally coordinating their signs and speech, but as their coordination is still developing, they sometimes produce mismatches such as the following example from a child at age 2;01. In attempting to coordinate his signed and spoken production, he first speaks the word “snake,” then signs SNAKE, then speaks it again, and finally manages to produce both sign and spoken word simultaneously (Reference Quadros, Lillo-Martin, Chen Pichler, Zeshan, Webster and BradfordQuadros, Lillo-Martin, and Chen Pichler, 2020). As noted earlier, code-blending also displays cross-linguistic influence of production in the phonological phenomenon cyclicity: children sometimes alter the number of repetitions of a sign to match the spoken words in timing (Reference Chen Pichler, Quadros and Lillo-MartinChen Pichler et al., 2009).
Bimodal bilingual development provides a unique opportunity to test whether two languages that do not compete in modality interfere with each other in children’s bilingual development. Child language acquisition research so far has revealed very little evidence of any such interference. At the same time, the uniquely bimodal bilingual phenomenon of code-blending indicates that sign and speech are not completely autonomous in bimodal bilingualism, but interact in rule-governed ways.
27.3 Processing of Languages in Two Modalities
One profound finding emerging from research on bilinguals is that even in contexts where only one language is being produced, the other language is always active (Reference Kroll, Bobb and HoshinoKroll, Bobb, & Hoshino, 2014). Such parallel activation is found ubiquitously in the processing of two spoken languages, that is, unimodal bilingualism. The question then arises with respect to bimodal bilingualism, whether it is still the case that both languages are activated during production and comprehension. Do the two languages compete or are they coordinated? On a par with discoveries from spoken language bilingualism, parallel activation of a sign language and a spoken language has been attested (Reference Morford, Kroll and FrancisMorford & Kroll, 2021), demonstrating cross-modal activation. Ample evidence has been found for interaction between sign and spoken language phonology, hence coactivation occurs regardless of the modality of the languages involved. However, research on bimodal bilingual processing has also revealed important modality-specific properties regarding language switch costs and simultaneous use of both languages, which expand our understanding of language processing in general.
27.3.1 Comprehension
Cross-language activation can be seen in unimodal bilinguals when researchers find that one language is “activated” due to some kind of overlap, including phonological overlap, between a target word and its pair in a second language (Reference Marian, Blumenfeld and BoukrinaMarian, Blumenfeld, & Boukrina, 2008). It might be thought that the different modalities of signed and spoken languages would preclude cross-modal activation, but this effect has been found in several studies (see Reference Morford, Kroll and FrancisMorford & Kroll [2021] for an overview). As illustrated by the schema in Figure 27.5, signs and spoken language can in principle activate each other. Signs can be activated by the spoken language in either written or spoken form, and vice versa. Below, we summarize the current research on cross-modal activation of phonology in bimodal bilinguals.

Figure 27.5 Cross-modal activation of sign language and spoken language (both orthography and speech) in bimodal bilingualism.
27.3.1.1 Signs Activated by Written Forms
A number of studies have demonstrated that signs may be activated by print/visual words. Reference Morford, Wilkinson, Villwock, Piñar and KrollMorford et al. (2011) asked deaf bilingual adults to judge the semantic relatedness of pairs of printed English words. The ASL translation equivalents for these English word pairs were sometimes phonologically related, sharing at least two phonological parameters, and sometimes not. Participants’ response times were faster when identifying semantically related English words if the ASL translation equivalents of the target words were phonologically related, displaying facilitatory effects of the sign language phonology. However, response times to semantically unrelated English words were slower when the ASL translations were phonologically related, showing an inhibitory effect. A follow-up study with unbalanced bilinguals by Reference Morford, Kroll, Piñar and WilkinsonMorford et al. (2014) revealed a weaker inhibitory effect and no facilitation effects, similar to the results found by Reference Kubus, Villwock, Morford and RathmannKubus et al. (2015) for bilinguals using German Sign Language (DGS) and German. Nevertheless, cross-modal activation, especially in the form of inhibitory effects, has been replicated in other sign language users, including bilinguals using NGT and Dutch (Reference Ormel, Hermans, Knoors and VerhoevenOrmel et al., 2012). Moreover, findings from Reference Villwock, Wilkinson, Piñar and MorfordVillwock et al. (2021) for middle school–aged children (mean age: 12.9) suggest that cross-modal activation of Spanish Sign Language (LSE) in hearing bimodal bilinguals occurs irrespective of the age when the sign language was acquired. Therefore, sign activation during visual word comprehension is not necessarily an outcome of years of bilingual experience but rather a characteristic of bilingual development. In addition to these behavioral studies, an event-related potential (ERP) study by Reference Meade, Midgley, Sevcikova Sehyr, Holcomb and EmmoreyMeade et al. (2017) leads to the same conclusion of covert activation of sign language phonology during visual word recognition.
27.3.1.2 Signs Activated by Speech
Speech can coactivate signs just as printed words can. Employing a visual world paradigm, Reference Shook and MarianShook and Marian (2012) and Reference Giezen, Blumenfeld, Shook, Marian and EmmoreyGiezen et al. (2015) report the results of studies in which stimuli consisted of auditory English words paired with a set of four pictures: the target, a competitor, and two distractors. The ASL sign for the target word and the “competitor” picture overlap in their phonology. Participants (hearing adults) looked more at the competitor picture than at distractors whose signs were phonologically unrelated to the sign corresponding to the target picture. Reference Villameriel, Dias, Costello and CarreirasVillameriel et al. (2016) adopted a semantic relatedness paradigm to study hearing adult bilinguals using LSE and Spanish and found that bimodal bilinguals were faster at judging semantically related auditory words when the equivalent signed translations were phonologically related (facilitatory effects), while they were slower at judging semantically unrelated word pairs when the LSE translations were phonologically related (inhibitory effects).
27.3.1.3 Spoken Language Activated by Signs
In addition to signs being activated by speech or print, cross-modal activation is found to be bidirectional in the sense that signs can coactivate spoken language (speech/written). Reference Ormel, Giezen and Van HellOrmel, Giezen, and Van Hell (2022) found that spoken language overlap interfered with a verification task for hearing adult bilinguals, as indicated by slower reaction times when the sign and the picture had translations that overlapped orthographically and phonologically in Dutch. Coactivation of both written German and DGS in the comprehension of whole DGS sentences was reported for deaf bilingual adults in an ERP study by Reference Hosemann, Mani, Herrmann, Steinbach and Altvater-MackensenHosemann et al. (2020). Another ERP study by Reference Lee, Meade, Midgley, Holcomb and EmmoreyLee et al. (2019) using semantic relatedness judgments by both deaf and hearing adults using ASL and English reported a similar coactivation of English words during ASL recognition. This study also found that deaf and hearing bilinguals differ in their coactivation of words, probably due to differences in language dominance and proficiency.
27.3.2 Production
Despite the different modalities of signed and spoken phonologies, the previous subsection demonstrated that cross-modal activation takes place in bimodal bilinguals. We then ask whether the different phonological systems of sign and speech influence the processing mechanisms seen in language production. In particular, we turn to the phenomenon known as switch cost. Researchers investigating unimodal bilinguals have questioned the relative processing cost of code-switching in terms of potentially separate costs of inhibition (suppressing the language switched out of) and activation (turning on the new target language). Since bimodal bilinguals have the option of code-blending, these two types of processing costs can be studied separately.
As discussed earlier, bimodal bilinguals are more likely to produce code-blending than code-switching (Reference BishopBishop, 2010; Reference Emmorey, Borinstein, Thompson and GollanEmmorey, Borinstein, et al., 2008). This preference has been cited as possible explanation for an early finding that coda adults do not demonstrate the same bilingual advantages in cognitive control reported for unimodal spoken language bilinguals (Reference Emmorey, Luk, Pyers and BialystokEmmorey, Luk, et al., 2008). This result is unexpected, given the demonstration from the cross-modal activation literature (summarized in Section 27.3.1) that signed and written/spoken phonologies interact. More recent research has examined a more nuanced picture of switching, however, noting that switching between sign-only and speech-only production is not the only type (or, indeed, a very common type) of switching used by bimodal bilinguals. They are much more likely to switch between bimodal (code-blended) and unimodal production.
These options provide an ideal testing ground to compare the cost of inhibition and activation during production. Using picture-naming tasks in English and/or ASL, both behavioral studies (Reference Emmorey, Petrich and GollanEmmorey et al., 2012, Reference Emmorey, Li, Petrich and Gollan2020) and neuroimaging studies (Reference Blanco-Elorrieta, Emmorey and PylkkänenBlanco-Elorrieta, Emmorey, & Pylkkänen, 2018) converge on the conclusion that for hearing bimodal bilinguals, inhibiting a language (i.e., switching out of a code-blend) incurs a cost, whereas releasing a language from inhibition (i.e., switching into a code-blend) is cost-free. Moreover, the same pattern is replicated in unbalanced bimodal bilinguals who are native speakers of German but novice or late learners of DGS (Reference Kaufmann, Mittelberg, Koch and PhilippKaufmann et al., 2018; Reference Kaufmann and PhilippKaufmann & Philipp, 2017). Additionally, Reference Dias, Villameriel, Giezen, Costello and CarreirasDias et al. (2017) reported that hearing bilingual users of Spanish and LSE incurred a greater switch cost and produced more errors (such as responding in the incorrect language or using the incorrect word order) and hesitations when switching from LSE into their dominant language, Spanish, than in the other direction. This finding confirms previous reports that switching into the dominant language incurs a higher processing cost because the dominant language must be strongly inhibited to allow for production of the less dominant language.
27.4 Conclusions
To summarize, our review has shown that bimodal bilinguals have much in common with unimodal bilinguals, even when it comes to phonology, the language level that might be expected to be the most distinct, given the difference in articulators. Despite this difference, the phonologies of bimodal bilinguals do interact. When fluent adults process their sign language or their spoken/written language, even in contexts where only one language is relevant, lexical access and ERP measures show that both languages are activated at the phonological level. Adult bimodal bilinguals also provide evidence that the bilingual language architecture is integrated until a very late level, where differences emerge due to modality of output. Adults use their sign language and their spoken language together in code-blending, tightly timing the production so that coproduced words make use of a common rhythmic pattern. And when adult learners of a second language in a new modality (M2L2) approach their new language, they bring with them knowledge of how manual and facial articulators participate in multimodal expression of language; this knowledge might facilitate them, but it also negatively influences the target accuracy of their sign language phonological production.
It is clear then that the study of bimodal bilingualism is crucial for the development of a full picture of the nature of bilingual phonology. Understanding the similarities between unimodal and bimodal phonologies, and their important modality differences, helps improve our understanding of the human capacity for language. Yet there is still much to be learned. We located very few studies on the phonological system of bimodal bilingual adults, other than those focused on code-blending. We also found gaps in studies of childhood bimodal bilingual development specifically focusing on phonology and possible interactions between speech and sign. Most of the extant research appears to assume that bimodal bilinguals’ sign language phonology develops in the manner that has already been reported for monolingual deaf signers, without reference to interactions with spoken language phonology. Similarly, bimodal bilingual development of spoken language phonology is discussed in the general context of whether or not it is delayed for children who are exposed early to a sign language. The more interesting possibilities of linguistic interaction between spoken and sign languages have yet to be seriously addressed.
Another interesting line for future research derives from the observation that bimodal bilinguals learn their sign language as a heritage language (Reference Chen Pichler, Lillo-Martin and PalmerChen Pichler, Lillo-Martin, & Palmer, 2018), and this status might affect their phonological development. Similarly, there are many processing questions and neurolinguistic questions that have yet to be addressed with data from child bimodal bilinguals. We hope to see new studies addressing these interesting questions in the near future.
One additional area that we only briefly mentioned in this chapter is the nature of literacy development for bimodal bilingual children, especially DHH children. There is a large body of literature discussing methods for teaching DHH children to read (the written version of their spoken language) and overall outcomes in this domain, but, with a few exceptions, it generally does not take the approach of considering such children as bimodal bilinguals. Given the increasing evidence that bilinguals do not completely deactivate one language when they use the other, an increased understanding of one’s sign language phonology could well contribute to efficient reading skills (Reference Clark, Hauser and MillerClark et al., 2016; Reference McQuarrie and AbbottMcQuarrie & Abbott, 2013). We leave a systematic review of this literature for other papers.
One possible conclusion from our observations about bimodal bilingual linguistic interactions that we would strongly argue against concerns the ways that sign and speech interact in childhood language development. There is a long-standing debate regarding the best educational practices for DHH children. As we have noted, the vast majority of DHH children are born to parents without knowledge of deafness or a natural sign language, who emphasize the use of spoken language and want their DHH children to succeed in the ways of the hearing majority. In the United States, where the hearing majority is also monolingual, the expectation is often that DHH children should aim for development parallel to monolingual hearing children with respect to their spoken language. However, access to spoken language is difficult (if possible at all) for DHH children, and significant time can elapse before a child is fitted with and able to use appropriate hearing technology (Reference Levine, Strother-Garcia, Golinkoff and Hirsh-PasekLevine et al., 2016). Faced with these hurdles, many parents of and service providers to DHH children argue that using a natural sign language as part of a bilingual approach would detract from, delay, or otherwise harm the development of the spoken language.
We disagree with this conclusion. Educational philosophies that limit linguistic input to spoken language have resulted in a situation known as language deprivation for too many DHH children (Reference HallHall, 2017; Reference Hall, Levin and AndersonHall, Levin, & Anderson, 2017). The use of a natural sign language can do much to ameliorate this condition, even for families without prior knowledge of a sign language (Reference Caselli, Pyers and LiebermanCaselli, Pyers, & Lieberman, 2021; Reference Hall, Hall and CaselliHall, Hall, & Caselli, 2019; Reference Lillo-Martin, Gale and Chen PichlerLillo-Martin, Gale, & Chen Pichler, 2023). Research summarized in Section 27.2.2 has shown that DHH children with full access to both a natural sign language and a spoken language perform on a par with bimodal bilingual hearing children. As bilinguals, all these children can be expected to display typical bilingual effects (Reference Goodwin and Lillo-MartinGoodwin & Lillo-Martin, 2023). But these potential drawbacks are offset by the many opportunities and benefits that early bilingualism confers (Reference Lillo-Martin, Gale and Chen PichlerLillo-Martin et al., 2023; Reference Napoli, Mellon and NiparkoNapoli et al., 2015; Reference Wilkinson and MorfordWilkinson & Morford, 2020). In the absence of convincing evidence that early sign language exposure is harmful to spoken language development, we support early bimodal bilingualism as the most equitable approach for DHH children.
Acknowledgments
The preparation of this chapter was supported in part by the National Institute on Deafness and Other Communication Disorders of the National Institutes of Health (NIH) under Award No. R01DC009263, and by the National Science Foundation (NSF) under Grant No. 1734120. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or the NSF.
28.1 Introduction
As discussed in numerous chapters in this volume, the term “bilingual” is used for a wide range of different types of speakers (see also a list in Reference WeiWei, 2000, p. 6 f), such as individuals who are raised with two native languages, individuals who have one native language and are in the process of learning their first non-native language, learners who show greater proficiency in one of their two languages, and individuals who, besides their native language, can understand their second language but cannot actively use it in writing or speech. By some authors, bilingualism is seen as a continuum where “any knowledge of another language will make you a bilingual” (Reference De Bot, Lowie and Verspoorde Bot, Lowie, & Verspoor, 2005, p. 5). To complicate matters further, the term bilingual was, and often still is, employed with the meaning of “speaking two or more languages” and thus synonymously with the term “multilingual” (e.g., Reference Bhatia and RitchieBhatia & Ritchie, 2006, p. 5; Reference Gass and SelinkerGass & Selinker, 2008, p. 515; Reference Grosjean and HarrisGrosjean, 1992, p. 51; Reference MacaroMacaro, 2010, p. 39; Reference Myers-ScottonMyers-Scotton, 2006, p. 2; Reference WeiWei, 2000, p. 7). Conversely, there are researchers who do exactly the opposite by using the term multilingual instead of bilingual for individuals who know only one non-native language besides their native language (e.g., Reference Kemp, Aronin and HufeisenKemp, 2009, p. 9; Reference Saville-TroikeSaville-Troike, 2006, p. 8).
This terminological overlap partly stems from the failure to make a distinction between the acquisition of a second (i.e., the first non-native) and any further non-native language. It is still common in second language acquisition (SLA) research to only differentiate between the concepts of “first language acquisition,” referring to the acquisition of the first language/s, and “second language acquisition,” used as an umbrella term for the learning of all subsequent non-native languages (e.g., Reference De Bot, Lowie and Verspoorde Bot et al., 2005; Reference Gass, Glew, Altarriba and HerediaGass & Glew, 2008, p. 270; Reference Gass and SelinkerGass & Selinker, 2008; Reference Mitchell and MylesMitchell & Myles, 1998; Reference OrtegaOrtega, 2009; Reference Sharwood SmithSharwood Smith, 1994, p. 7). For example, Reference HoffmannHoffmann (2001, p. 19) claims that “trilinguals have been shown to follow the same patterns and to be subject to influence of the same kind of social and psychological factors as bilinguals.” Likewise, Reference Gass, Glew, Altarriba and HerediaGass and Glew (2008, p. 270) state that the “term second language speaker refers to a person who speaks a language other than the native language. The term second language speaker can refer to a person who speaks a second, third, fourth, etc. language. The term second simply means any language other than the first and focuses on the chronological order of learning.”
Rising awareness of the complex processes in acquiring more than one non-native language, however, has in recent decades led to the development of a wider perspective in language acquisition research that goes beyond the umbrella term of a “second language.” As a result, we have witnessed a growing number of investigations into the acquisition process in a trilingual or multilingual context. These contexts include diverse scenarios such as the consecutive learning of several foreign languages in a formal context (e.g., monolingually raised L1 Mandarin speakers learning Japanese after English as a foreign language at university), the learning of one or more foreign language(s) by learners who grow up in a more or less stable bilingual environment (e.g., Yoruba-English bilinguals in Nigeria learning French in school), or the learning of one or more foreign language(s) against the backdrop of individual heritage bilingualism (e.g., English as the first foreign language learned by Russian heritage speakers in Germany).
Multilingual language acquisition has become recognized as an independent field, quantitatively and qualitatively different from SLA (e.g., Reference De AngelisDe Angelis, 2007). For example, Reference Cenoz and GeneseeCenoz and Genesee (1998, p. 19) propose that “multilinguals possess a configuration of linguistic competencies that is distinct from that of bilinguals and monolinguals.” Specifically, learners of a third, that is, a second non-native language, have been described to possess knowledge from at least two languages stored in their mind, increased metalinguistic awareness, as well as non-native language learning strategies, all of which distinguishes them crucially from learners of a first non-native language (e.g., Reference Clyne, Rossi Hunt and IsaakidisClyne, Rossi Hunt, & Isaakidis, 2004; Reference CookCook, 1995; Reference Fouser, Cenoz, Hufeisen and JessnerFouser, 2001; Reference Hufeisen, Helbig, Götze, Henrici and KrummHufeisen, 2001; Reference Ó Laoire, Hufeisen and FouserÓ Laoire, 2005). These specific properties of trilingual and multilingual language acquisition will be discussed in detail in Section 28.2. Based on growing empirical evidence that shows that the learning processes of the first and any further non-native language are qualitatively different, a terminological distinction has been proposed between a learner’s first non-native language (L2) and their further non-native languages (variously referred to as “L3,” “Ln,” or “LX”; see, e.g., Reference De AngelisDe Angelis, 2007, p. 10). In this chapter, we will use the term “L3/Ln” for the second and further non-native languages that are acquired by a speaker. By the same token, we will use the term “trilingual” for speakers who have learned or are currently acquiring exactly three languages (including the native one), and the term “multilingual” for speakers with more than three languages.
Systematic enquiries into L3/Ln phonological acquisition began in the late twentieth century with publications such as Reference Benrabah and RomeasBenrabah (1991), Reference Hammarberg and HammarbergHammarberg and Hammarberg (1993), Reference Hammarberg, Williams and HammarbergHammarberg and Williams (1993), and continued with a steady growth of publications in the early twenty-first century such as Reference Gut, Fuchs and WunderGut, Fuchs, and Wunder (2015), Reference Pyun, Hufeisen and FouserPyun (2005), Reference WrembelWrembel (2015b), Reference Wrembel and Cabrelli AmaroWrembel and Cabrelli Amaro (2018), and Reference Wrembel, Gut and MehlhornWrembel, Gut, and Mehlhorn (2010). Their aim was to explore the process of acquiring perceptual and production abilities, on both the segmental and the suprasegmental level, as well as potential factors influencing this process. In these studies, a particular focus lies on the cross-linguistic interactions between the phonologies of the speaker’s L1, L2, L3, and Ln, recognizing that this phenomenon is multidirectional (compare with Reference Cabrelli Amaro and WrembelCabrelli Amaro & Wrembel, 2016). Consequently, a new best-practice research methodology has emerged in L3 phonological acquisition research that is characterized by data collection and analysis of all of a speaker’s languages, for example their L1, L2, L3, and L4. Typically, the data for each language would be collected at different days and with different researchers that address the learners in the respective target language in order to create a setting for the multilinguals’ different language modes (e.g., Reference Kopečková, Gut, Wrembel and BalasKopečková et al., 2022; see also Chapter 30, this volume) – unless of course the research objective is to create an environment for encouraging maximum cross-linguistic influence. Moreover, incorporating Reference Grosjean and HarrisGrosjean’s (1992, p. 55) claim that “a bilingual is NOT the sum of two complete or incomplete monolinguals; rather, he or she has unique and specific linguistic configuration,” trilingual or multilingual speakers are not conceptualized as equaling three or more monolingual speakers. Rather, L3/Ln phonological acquisition research aims at investigating the specific properties of the perception, production, and phonological processing of multilingual speakers.
28.2 Comparing the Phonetic and Phonological Properties of Bilingual and Trilingual Speech
A number of studies have directly compared the perception and production of speech by bilingual and trilingual/multilingual speakers (e.g., Reference Amengual, Meredith, Panelli, Calhoun, Escudero, Tabain and WarrenAmengual, Meredith, & Panelli, 2019; Reference Antoniou, Liang, Ettlinger and WongAntoniou et al., 2015; Reference Domene MorenoDomene Moreno, 2021; Reference EnomotoEnomoto, 1994; Reference Gabriel, Krause and DittmersGabriel, Krause, & Dittmers, 2018; Reference Geiss, Gumbsheimer, Lloyd-Smith, Schmid and KupischGeiss et al., 2021; also see Section 28.2.4). Based on these as well as empirical findings on phonological L2 and L3/Ln acquisition, we will point out the commonalities between the phonetics and phonology of bilingual and of trilingual speakers and describe the ways in which speech perception and production differ across those two groups. As proposed in Section 28.1, the major differences lie in the type and direction of cross-linguistic influence (Section 28.2.1), the speakers’ phonological awareness (Section 28.2.2), and their perceptual sensitivity (Section 28.2.3). In Section 28.2.4, we will discuss findings on the question of whether trilingual speakers have a general advantage over bilingual ones in learning new phonologies. In Section 28.2.5, recent theoretical models of phonological acquisition that incorporate third and multiple language acquisition will be discussed.
28.2.1 Cross-Linguistic Influence
One difference between bilingual and trilingual speakers and a natural consequence of the different number of languages represented in the minds of these speaker groups is the number of types of potential cross-linguistic influence (CLI) as well as an increase in the number of potential directions CLI can take. For bilingual speakers, the speaker’s L1 can influence the perception and production of the L2 and the L2 can influence the speaker’s L1, while for trilingual speakers at least six different types of CLI are possible: L1 onto L2 and L3, L2 onto L1 and L3, as well as L3 onto L1 and L2.
Another difference that distinguishes trilingual and multilingual from bilingual speakers concerns the type of CLI that can occur. Only in trilingual and multilingual speakers (with two non-native languages) is it possible for two non-native languages to interact and influence each other, which has been referred to as “lateral CLI” (e.g., Reference Jarvis and PavlenkoJarvis & Pavlenko, 2008, pp. 21–22). This type of CLI has, for example, been found in the production of Ln Swedish by an L1 English and L2 German multilingual at the very beginning of L3/Ln learning (Reference Hammarberg, Williams and HammarbergHammarberg & Williams, 1993, p. 64), where the pronunciation of L2-like sound segments, intonation as well as voice quality in the Ln led judges to believe that the speaker was in fact a native speaker of German. Lateral CLI has also been documented for L2-like voice onset time (VOT) values in L3 (Reference Wunder, De Angelis and DewaeleWunder, 2011), stops (Reference Cabrelli and PichanCabrelli & Pichan, 2021), rhotics (Reference PatiencePatience, 2018), stress (Reference Ghazali, Bouchhioua, Solé, Recansens and RomeroGhazali & Bouchhioua, 2003), as well as vowel reduction and speech rhythm (Reference Gabriel, Stahnke, Thulke, Gut, Fuchs and WunderGabriel, Stahnke, & Thulke, 2015; Reference GutGut, 2010), although it tended to be less frequent than CLI from the L1 onto the L3 in those studies.
Moreover, it is only in trilingual and multilingual speakers where combined CLI can occur. This concept refers to the simultaneous influence of two languages of a speaker onto their third/further language (Reference De AngelisDe Angelis, 2007, p. 21). This was, for example, shown in a study by Reference Wrembel, Marecka and KopečkováWrembel, Marecka, and Kopečková (2019), who found that L1 German, L2 English, and L3 Polish trilinguals assimilate L3 vowel sounds to both L1 and L2 categories with a preference for the latter. Combined CLI can also occur from two or more source languages in a sequential fashion. For example, the resulting hybrid form of CLI from the L1 onto the L2 can then in turn influence the L3 and produce further hybrid forms. This was found, for instance, in Reference Llama, Cardoso and CollinsLlama, Cardoso, and Collins’ (2010) study on the acquisition of aspiration rates in L3 Spanish for L1 Canadian French-L2 Canadian English and L1 Canadian English-L2 Canadian French speakers. The VOT values in their L2 productions differed from monolingual values by exhibiting L1-L2 hybrid values in both groups, pointing to influence from the L1 during L2 acquisition. Moreover, the L3 VOT values were very similar to the hybrid L2 values suggesting that they constitute the result of combined influence from the L1 and the L2 onto the L3. Hybrid VOT values reflecting the properties of both of the speakers’ background languages were also found for L3 French produced by German-Russian and German-Turkish heritage bilingual speakers (Reference Dittmers, Gabriel, Krause, Topal, Belz, Mooshammer, Fuchs, Jannedy, Rasskazova and ZygisDittmers et al., 2018).
Unlike models predicting the relationship between the L1 and the L2 in bilingual speakers’ language perception and production (e.g., Reference Best, Tyler, Bohn and MunroBest & Tyler, 2007; Reference Flege and StrangeFlege, 1995; Reference Flege, Bohn and WaylandFlege & Bohn, 2021; Reference Tyler, Nyvad, Hejná, Højen, Jespersen and SørensenTyler, 2019; also see Chapters 7–10, this volume), models predicting the source and type of CLI in trilingual and multilingual speakers have to reflect the complexity of the potential interplay of all of the speakers’ languages. The first models were proposed for the area of morphosyntax and mostly focus on early stages of L3 learning: for example, the L2 Status Factor Model (Reference Bardel and FalkBardel & Falk, 2007, Reference Bardel, Falk, Amaro, Flynn and Rothman2012) predicts the L2 as the prevailing source of transfer in early L3 learning due to its greater cognitive similarity to the L3 compared to the L1. The Typological Primacy Model (Reference RothmanRothman, 2011, Reference Rothman2015) proposes that the speaker’s language parser (unconsciously) compares the similarity between the target language L3 and the potential source languages L1 and L2 on the basis of lexical, phonological, morphological, and syntactic overlap and then transfers holistically from the source language that has been determined as most similar to the L3. In contrast, the Linguistic Proximity Model (Reference Westergaard, Mitrofanova, Mykhaylyk and RodinaWestergaard et al., 2016) predicts that CLI onto the L3 proceeds property-by-property, depending on the perceived similarity between individual structures. Thus, one structure might have the speaker’s L1 as the source language while another structure might be drawn from the L2. This model furthermore explicitly acknowledges the possibility of a combined transfer from L1 and L2 languages onto the L3.
When applied to the area of phonetics and phonology, there is no exclusive empirical support for any one of the predictions made by these models with regard to the source of CLI. Conversely, many studies show that the source for CLI onto an L3 can either be the L1 or the L2 and that the choice of source language is modulated by a myriad of factors. For example, CLI from the L2 was found to be frequent in various aspects of the pronunciation of the Ln Swedish by an L1 English-L2 German speaker studied by Reference Williams and HammarbergWilliams and Hammarberg (1998). Moreover, it was the dominant type of transfer reflected in the VOT values of the voiceless stops /t/ and /k/ produced by L1 Spanish-L2 English-L3 French speakers in their L3 French (Reference Llama and López-MorelosLlama & López-Morelos, 2016). By contrast, a number of investigations have shown that CLI from the L1 is far more frequent than CLI from the L2 (e.g., Reference Hammarberg, Cenoz, Hufeisen and JessnerHammarberg, 2001; Reference Kopečková, Gut, Wrembel and BalasKopečková et al., 2022; Reference Pyun, Hufeisen and FouserPyun, 2005).
One factor that might explain these diverging findings is language dominance or high language proficiency in the L2 that might trigger CLI from the L2 rather than the L1. In Reference Williams and HammarbergWilliams and Hammarberg’s (1998) study, the multilingual L1 English speaker had been living in Germany and predominantly had been using German for a long time prior to moving to Sweden, and the L1 Spanish-L2 English-L3 French speakers studied by Reference Llama and López-MorelosLlama and López-Morelos (2016) grew up in Canada, where they acquired the community language English at age three and used it as their dominant language. Other factors that have been shown to constrain the choice of source language for CLI in trilingual acquisition of phonetics and phonology include explicit instruction and metalinguistic awareness (e.g., Reference Cabrelli and PichanCabrelli & Pichan, 2021; Reference Gabriel, Stahnke, Thulke, Gut, Fuchs and WunderGabriel et al., 2015; Reference Gabriel, Kupisch, Seoudy, Neveu, Bergounioux, Côté, Fournier, Hriba and PrévostGabriel, Kupisch, & Seoudy, 2016), the complexity of the task (Reference Patience and QianPatience & Qian, 2022), and language proficiency. One aspect of phonological proficiency, automatized articulatory routines, might explain many cases of CLI from the L1: a speaker’s automatized neuromotor articulatory gestures of the L1 transfer more frequently to the L3 than the unstable articulatory settings of an L2 spoken with low(er) proficiency (Reference Kopečková, Gut, Wrembel and BalasKopečková et al., 2022; Reference Patience and QianPatience & Qian, 2022).
The diverse and partly contradicting findings of previous studies might in addition be caused by the methodology of these studies, which typically investigate only a single structure in the L3 phonology. Investigations of CLI in trilinguals and multilinguals that include more than a single phonological/phonetic structure of the L3 suggest that a model that predicts structure-by-structure CLI is able to predict and explain empirical findings best: for example, Reference Kopečková, Gut, Wrembel and BalasKopečková et al. (2022) investigated CLI in the production of rhotics, the labiovelar approximant /w/ and final obstruent (de)voicing in L1 German and L1 Polish teenagers who were respectively learning L2 English/L3 Polish and L2 English/L3 German in school. They found that, although across all groups and structures the main source of CLI was the L1, individual learners showed CLI from multiple sources (L1, L2, and combined CLI) and that the structure under investigation was the strongest predictor for the source of CLI. Some L1 German speakers, for instance, drew upon their L1 to produce the rhotic and final devoicing in their L3 Polish, while they relied predominantly on their L2 English to produce the /w/ in L3 Polish. Likewise, Reference ArchibaldArchibald (2022) showed that L1 Arabic-L2 French-L3 English speakers’ CLI from Arabic or French when producing English differs across structures: while the vowels in the L3 English evidenced CLI from the speakers’ French vowel system, it was the Arabic consonants that were transferred into English. In addition, the sentence-level rhythm in the speakers’ L3 English appeared to have been influenced by their L2 French, while word stress in L3 English was influenced by their L1 Arabic. Structure-dependent CLI was also found by Reference Domene MorenoDomene Moreno (2021) for adolescent German-Turkish heritage speakers learning L3 English. She investigated a range of phonological structures such as initial consonant clusters, phonemic vowel length, voiced syllable codas, and laterals, and found that the occurrence of CLI was highly divergent across structures. For example, the multilinguals’ perception of vowel length and the laterals as well as the production of voiced coda consonants was influenced by their Turkish, while in the production of initial consonant clusters and vowel length German was the dominant source language for CLI. She thus showed that CLI can apply differently to the various units and structures of phonology and proposed modelling multilingual phonologies as complex, interlaced hierarchical systems (see Section 28.2.5). This idea has also been incorporated by Reference ArchibaldArchibald (2022) into his recently proposed Contrastive Hierarchy Model of feature structure (see Section 28.2.5).
In conclusion, it appears that the main difference between CLI in bilinguals and trilinguals lies not so much in the nature of the interaction of a speaker’s languages but rather in the complexity of the processes of cross-linguistic influence and their constraining factors, which can only surface in speakers/learners of more than two languages. One of the major insights from research into multilingual speakers’ phonologies is thus that all of the background language phonologies are available and active and that all of them can act as sources of CLI depending on a number of factors. Another important insight is that in order to determine the processes and directions of CLI, speech needs to be investigated from all of the speakers’ languages. Reference Kopečková, Gut, Wrembel and BalasKopečková et al. (2022) and other studies have shown that only with measuring the phonological properties of the multilinguals’ background languages can actual processes of CLI, including combined CLI, be traced correctly. Descriptions of CLI in L3, or indeed L2, phonological properties based on investigations of the target language alone remain speculative.
28.2.2 Metaphonological and Phonological Awareness
Another difference between bilingual and trilingual learners is their level of (meta)phonological awareness, which is conceptualized as both tacit and explicit knowledge about the target and background language phonologies. Cognitive processes of awareness, attention, and noticing have been acknowledged to exert influence on foreign language development, and research into phonological awareness relates to how learners attend to their phonological output, reflect upon it, and modify it.
According to several scholars (e.g., Reference Herdina and JessnerHerdina & Jessner, 2002; Reference JessnerJessner, 2006, Reference Jessner, Pawlak and Aronin2014), the level of metalinguistic awareness in L3/Ln learners is more developed compared to L2 learners, due to previous experiences with formal foreign language learning. Consequently, L3/Ln learners are hypothesized to be equipped with an enhanced ability to notice the sound structure, to better monitor their speech production, and to perform conscious phonetic analyses. What is of particular interest for researchers in this area is the multilinguals’ awareness of cross-linguistic differences and interactions between L1, L2, and L3/Ln on the level of phonetics and phonology. Therefore, we expect the (meta)phonological awareness to differ between L2 and L3 acquisition perspectives, in terms of both its scope and its degree, yet one cannot overlook potential individual learner differences in this respect as well as the impact of the typological proximity/distance in the combination of the languages they speak.
It seems difficult to compare phonological awareness studies conducted from the L2 versus L3 acquisition perspectives, as they mostly applied different methodologies and focused on different aspects of learners’ performance, thus no straightforward conclusions can be drawn with respect to their potentially divergent nature. Previous research into phonological awareness in L2 learners has relied on typical offline measures such as diary/journal entries (Reference Kennedy and TrofimovichKennedy & Trofimovich, 2010) as well as online tasks involving verbal protocols and retrospective think-aloud protocols (TAPs) (Reference OsborneOsborne, 2003) or a delayed mimicry paradigm (Reference Mora, Rochdi and Kivistö-de SouzaMora, Rochdi, & Kivistö-de Souza, 2014). Moreover, some phonological awareness tests typically used in L1 literacy studies, such as phonological blending, segmentation, manipulation, rhyming, and alliteration, have also been applied (e.g., Reference Venkatagiri and LevisVenkatagiri & Levis, 2007). Similar research conducted from a multilingual perspective has mostly resorted to online methods featuring near-concurrent TAPs (Reference WrembelWrembel, 2015a; Reference KopečkováKopečková, 2018) and an extension of the delayed mimicry paradigm to L3 phonological awareness (Reference Kopečková, Wrembel, Gut and BalasKopečková et al., 2021).
Extant studies carried out from the SLA perspective have demonstrated that an enhanced degree of phonological awareness is aligned with the learners’ speech comprehensibility (Reference Kennedy and TrofimovichKennedy & Trofimovich, 2010; Reference Venkatagiri and LevisVenkatagiri & Levis, 2007), accentedness, and fluency (Reference Kennedy and TrofimovichKennedy & Trofimovich, 2010), as well as phonological short-term memory (Reference Venkatagiri and LevisVenkatagiri & Levis, 2007). The latter findings showed that learners with superior explicit phonological knowledge were perceived to be more intelligible foreign language speakers. Furthermore, Reference Kennedy and TrofimovichKennedy and Trofimovich (2010) reported a relationship between L2 pronunciation ratings and the number of qualitative language awareness comments, which were related to using pronunciation in order to convey their intended output.
In another L2-oriented study, Reference SaitoSaito (2019) found a positive effect of explicit phonological awareness on the pronunciation of the target sound in foreign language classroom settings. The author investigated the link between explicit attention and articulatory knowledge of Japanese learners of L2 English and their production accuracy of the English rhotic sound. The relationship between learners’ awareness and their acquisition accuracy was reported to be partial and limited to some of the investigated acoustic dimensions. In turn, Reference Mora, Rochdi and Kivistö-de SouzaMora et al. (2014) found positive evidence for implicit phonological awareness of cross-linguistic differences in VOT productions by Spanish learners of L2 English. The participants were reported to modify their native short-lag laryngeal timing patterns by producing longer VOT values in L2 English as well as in their imitations of English-accented Spanish. Nonetheless, there was only partial support for a relationship between phonological awareness and accentedness ratings by English native speakers: while L2 English VOT measures of word tokens were strongly correlated to accentedness ratings, this was not the case for their samples of English-accented Spanish.
A considerably more limited number of studies was conducted on phonological awareness in multilingual instructed learners (but see Reference Gabriel, Klinger and UsanovaGabriel, Klinger, & Usanova, 2021; Reference KopečkováKopečková, 2018; Reference Kopečková, Wrembel, Gut and BalasKopečková et al., 2021; Reference WrembelWrembel, 2015a, Reference Wrembel2015b). For instance, Reference WrembelWrembel (2015a) examined L1 German-L2 English-L3 Polish adult learners with the application of stimulated recall and near-concurrent TAPs. The participants’ task was to modify and comment on their L3 phonetic output after listening to excerpts of their read-text performance. The performed qualitative and quantitative analysis pointed to a range of manifestations of metaphonological awareness featuring accurately noticed L3 phonetic and phonological features, modifications of mispronunciations through ad hoc self-corrections and post hoc self-repair, as well awareness of problems in L3 pronunciation or reflective comments on the process of pronunciation learning.
Another comprehensive L3 study investigated metaphonological awareness via introspective and retrospective oral protocols (Reference WrembelWrembel, 2015b). It was carried out on a large cohort of participants (n = 130) divided into four groups with complementary language triads including Polish, English, German, and French in various constellations. Reference WrembelWrembel (2015b) implemented a complex codification system for a qualitative and quantitative analysis and designed a formula for quantifying a composite measure of metaphonological awareness. The findings pointed to the prevalence of implicit forms of awareness at the level of noticing, including paying attention to phonetic features of L3 speech, intentionally focusing on auditory forms and articulatory gestures, noticing specific problems in L3 pronunciation, as well as modifying mispronunciations. Explicit metaphonological knowledge was evidenced, to a lesser extent, through the formulations of phonological rules as well as introspections about perceived influences and interactions between L1, L2, and L3 sound systems. The declared sources of CLI involved both native and non-native language systems, thus attesting L1-to-L3 and L2-to-L3 transfer. However, the study failed to show significant correlations between the generated metaphonological awareness composite score and the perceived pronunciation performance in L3 as reflected in foreign accentedness, comprehensibility, and correctness ratings.
In turn, Reference KopečkováKopečková (2018) investigated young L3 Spanish learners with L1 German and L2 English, who evidenced awareness of particularly salient features of Spanish phonology and their mispronunciations at segmental and suprasegmental levels. The reported problem with the applied verbal protocols was that the participants frequently lacked metalanguage to verbalize their comments and reflections. In yet another L3-oriented study, Reference Kopečková, Wrembel, Gut and BalasKopečková et al. (2021) expanded to the multilingual context the delayed foreign-accent mimicry paradigm, used previously by Reference Mora, Rochdi and Kivistö-de SouzaMora et al. (2014), to investigate phonological awareness in L3 learners. The application of this procedure enabled tapping into implicit and explicit phonological knowledge of phonetic features in the participants’ first, second, and third languages. In the study, foreign language (L2/L3) phonological awareness was operationalized as the ability to mimic foreign-accented speech in the participants’ first language and to consciously reflect on the performed phonetic/phonological manipulations. To this end, the design encompassed a delayed mimicry task and foreign-accentedness ratings of the participants’ performance in the first task. The study evidenced prevailingly low levels of L2/L3 phonological awareness at the level of noticing in young multilinguals, no significant changes in the development of phonological awareness over the course of one school year, and differentiated manifestations as a function of the L1 background driven by the phonological distance between their L2 and their L3 with respect to their L1.
For older speakers, by contrast, Reference Gabriel, Klinger and UsanovaGabriel et al. (2021) in their study with the application of TAPs demonstrated, among other things, that German-Turkish learners of French were able to provide more specific statements on the phonetic similarities and differences between French and their background languages as compared to the monolingually raised control group. This result evidenced an enhanced level of phonological awareness on the part of the multilingual learners.
Recapitulating, the overview of extant studies just given demonstrates some differences between L2 and L3 learners in terms of metaphonological awareness, which are reflected in the scope and degree to which these two groups of learners are aware of their speech performance. Although the relationship between phonological awareness and fine-grained speech production was shown to exist already in L2 learners, the L3 learners seem to possess a more complex cross-linguistic awareness that is visible in a broader range of manifestations. When it comes to differences between younger and older multilingual learners, it appears that the former rely more on tacit phonological awareness while the latter are able to verbalize their cross-linguistic reflections on articulatory and auditory interactions between phonological systems using metalanguage. Further research may be necessary, though, to uncover differences in the nature of phonological awareness between heritage speakers and L3 learners who learned both of their non-native languages in school.
All in all, metaphonological awareness has been shown to constitute an important component of multilingual competence, which entails an interaction of metalinguistic awareness and cross-linguistic awareness. While L3 learners have been theorized and shown to outperform L2 learners at the levels of conscious analyses and verbalization (Reference Herdina and JessnerHerdina & Jessner, 2002; Reference Jessner, Pawlak and AroninJessner, 2014), some of the L3 studies described here attested a wide range of manifestations of multilinguals’ metalinguistic awareness in the phonetic and phonological domain. However, a major challenge lies in the adequate operationalization of this concept as well as in developing suitable measurements that could combat limitations of previously applied instruments and be applicable to various age groups and acquisition scenarios. What also remains to be further investigated and pedagogically resourced is a potential link between multilinguals’ phonological awareness and their speech perception and production performance. Moreover, there is a pressing need for comparative studies into phonological awareness juxtaposing L2 and L3 learners directly, which would allow us to draw more solid conclusions as to the nature and strength of differences between these populations.
28.2.3 Enhanced Perceptual Sensitivity
Studies examining the effects of having learned a first foreign language on L3 speech perception have shown that L3 learners tend to outperform L2 learners in target language phonetic discrimination and sound production (e.g., Reference Antoniou, Liang, Ettlinger and WongAntoniou et al., 2015; Reference EnomotoEnomoto, 1994; Reference OnishiOnishi, 2016), which has been labeled the “bilingual advantage” and refers to the idea that speakers who already speak/learn two languages find it easier to acquire the phonology of a third language. Having compared monolinguals, bilinguals, and multilinguals on the perception of novel phonetic and phonological contrasts, some scholars point to the ease with which the latter groups acquire L3 perceptual categories (Reference Antoniou, Liang, Ettlinger and WongAntoniou et al., 2015; Reference Tremblay and SabourinTremblay & Sabourin, 2012).
Reference EnomotoEnomoto (1994), for example, investigated the perception of Japanese durational contrasts (e.g., /iken/ versus /ikken/) by monolingual L1 English and various trilingual and multilingual speakers who were all beginner learners of L2/L3 Japanese, and found a superior perceptual performance by the tri/multilingual speakers over the L1 English monolinguals. In turn, Reference Kopečková, Pawlak and AroninKopečková (2014) demonstrated that young Polish-English bilingual learners tend to be less sensitive to the differences between Polish and English vowels than their multilingual peers, whose linguistic repertoires also include additional languages. Also, Reference OnishiOnishi (2016) found evidence of general facilitative effects of the prior linguistic experience of her L1 Korean-L2 English learners on the perception of L3 Japanese contrasts, based on the attested positive correlation between the performance on an L2 discrimination task and the performance on an L3 perception task. The study involved both identification and discrimination tasks in Japanese and in English, which included selecting eight contrasts in these languages that were deemed particularly difficult for foreign language learners (e.g., geminate versus singleton stops, long versus short vowels, word initial voiced versus voiceless stops in Japanese). The findings seem to suggest that L3 learners’ perception may be partially influenced by experience with some specific L2 contrasts. More importantly, Reference OnishiOnishi (2016) pointed to the general experience of learning a foreign language, which, as she claimed, offers the so-called global advantage in phonological perception, that is, the more proficient L3 learners were in their L2 phonology, the more sensitive they became in the discrimination of non-native speech.
There is some further evidence that trilinguals exhibit enhanced perceptual sensitivity. Reference Wrembel, Marecka and KopečkováWrembel et al. (2019) investigated the perception of vowels and sibilants in L1 German, L2 English, and L3 Polish trilinguals who had just started learning their L3 as their second foreign language in school. The findings showed that even beginner L3 learners formed new L3 categories, distinguishing between highly similar L3 sibilant pairs that would typically follow the single-category assimilation pattern predicted by PAM-L2 (see Chapter 7, this volume). Thus, in terms of perceptual acquisition, beginner L3 learners were argued to behave similarly to advanced L2 learners in that they were able to discriminate sound contrasts that PAM-L2 predicts to be challenging. It was thus suggested that multilingualism results in enhanced perceptual acuity in the manner that perceptual attunement works in advanced L2 learners. However, the novelty of Reference Wrembel, Marecka and KopečkováWrembel et al.’s (2019) study was to focus on a range of multilingual contexts of acquisition and to investigate the effects of prior linguistic experience for various multilingual backgrounds. Therefore, they focused their investigation on different subgroups of multilinguals and evidenced varied acquisition patterns therein.
Although the studies discussed here point to enhanced auditory awareness in L3 learners, there are some contradictory or mixed results (e.g., Reference Patihis, Oh and MogilnerPatihis, Oh, & Mogilner, 2015) that fail to demonstrate significant differences between monolinguals and bilinguals in discriminating novel speech sound contrasts. However, the majority of findings indicate that previous language learning facilitates the speech perception of multilinguals, thus suggesting that multilingual learners tend to be more sensitive to novel contrasts and learn them faster (see Reference KopečkováKopečková [2016] for a critical review thereof). One as yet unresolved issue is whether these studies indeed suggest a general advantage providing L3 learners with enhanced perceptual sensitivity irrespective of the typological similarity between the languages in their repertoires and/or of the actual features that may have been acquired in the previously learned languages. More empirical evidence is needed to determine whether the findings might be better explained by another type of advantage, the so-called specific advantage related to the particular features (e.g., the tense-lax vowel contrast or the voiceless-voiced contrast) that the L3 learners acquired in their L1 or when learning their previous foreign language (i.e., L2), which results in the facilitated acquisition of this particular feature in the subsequent language. The question of whether bilinguals and multilinguals have a general advantage when acquiring a new phonology when compared to monolingual learners, or rather rely on the specific properties of the previously learned languages, will be discussed in Section 28.2.4.
28.2.4 Do Trilinguals Have an Advantage in Learning New Phonologies?
Research focused explicitly on differences in second and third language phonological acquisition is still scarce to date. However, it has been hypothesized that, through previous speech learning experience and an already enlarged phonetic and phonological repertoire, L3/Ln learners should have a general advantage in acquiring yet another phonological system.
This line of research has been mostly pursued with heritage language speakers, often involving rather small groups and a single phonological feature, and has yielded contradictory evidence. For example, both Reference Dittmers, Gabriel, Krause, Topal, Belz, Mooshammer, Fuchs, Jannedy, Rasskazova and ZygisDittmers et al. (2018) and Reference Gabriel, Krause and DittmersGabriel et al. (2018) investigated the production of VOT in the voiceless stops /p, t, k/ by German-dominant heritage speakers of Turkish and Russian in their L3 French and found that their values were more target-like than those of L1 German speakers who had been raised as monolinguals. The bilingually raised speakers appear to have been able to transfer the short-lag VOT from their Turkish and Russian to their L3 French, while the L1-only-German speakers appear to have transferred the long-lag VOT from German onto their L3 French. Likewise, Reference Geiss, Gumbsheimer, Lloyd-Smith, Schmid and KupischGeiss et al. (2021) compared the VOT values of monolingual L1 German and L1 Italian learners of L2 English with German-dominant speakers with the heritage language Italian learning English as an L3. They found that the bilingual heritage speakers produced more target-like VOT values in English than the L1 Italian-only control group, indicating that the bilinguals were able to transfer the VOT values from their German, which has similar VOT values to English. Similar advantages were found for Brazilian Portuguese speakers with Pomeranian, a Low German variety, when producing VOT in their L3 English, compared to speakers with L1 Brazilian Portuguese in their L2 English (Reference Tessmann Bandeira and ZimmerTessmann Bandeira & Zimmer, 2012).
For the production of final voiced obstruents in English, both Reference Özaslan, Gabriel, Gabriel, Grünke and ThieleÖzaslan and Gabriel (2019) and Reference Domene MorenoDomene Moreno (2021) showed that bilingual Turkish/German learners were successful in avoiding negative transfer of the so-called final devoicing rule to their L3 English compared to monolingually raised German learners, who transferred final obstruent devoicing from their L1 onto their L2 English. Again, it can be assumed that the bilingual learners successfully applied their knowledge of voiced obstruent production from Turkish to English. Reference AmengualAmengual (2021) examined VOT in English, Japanese, and Spanish /k/ in three different groups: two groups of English-Japanese bilinguals in a mirror L1/L2 design and a trilingual group with L1 Spanish, L2 English, and L3 Japanese. The results demonstrated that both the bilingual and the trilingual participants were able to differentiate VOT durations for velar voiceless plosives in the three languages under investigation, that is, they had acquired language-specific timing properties in English, Japanese, and Spanish. However, the bilinguals’ VOT productions in their L2 converged more on their L1 VOT values, while the trilingual group’s performance was marked by a greater degree of differentiation between their VOT values in L1 Spanish, L2 English, and L3 Japanese. In terms of speech rhythm, finally, Reference Gabriel, Rusca-Ruths, Witzigmann and RymarczykGabriel and Rusca-Ruth (2015) found some evidence that L1 German speakers with either Mandarin or Turkish as heritage language perform more target-like in their L3 English, French, or Spanish compared to monolingually raised German speakers.
Other studies, by contrast, did not find any advantages for L3 learners compared to L2 learners. Reference Gabriel, Kupisch, Seoudy, Neveu, Bergounioux, Côté, Fournier, Hriba and PrévostGabriel et al. (2016) investigated the perception and production of French voiceless stops in both speakers raised with L1 German and bilinguals with dominant German and Mandarin as a heritage language. Although the L3 learners of French theoretically could have transferred the shorter VOT values from their heritage language, no difference was found between them and the monolingually raised German learners of French. Neither did the German/Turkish bilinguals studied by Reference Grünke and GabrielGrünke and Gabriel (2022) outperform the monolingually raised German speakers in their production of L3 French intonation, although they were expected to do so due to certain similarities between the intonational systems of French and Turkish. By the same token, monolingually raised German adolescents learning English as an L2 actually were more accurate than Turkish/German bilinguals learning English as an L3 in the perception of the trap vowel, final obstruent voicing, phonemic vowel length, and the voiceless interdental fricative (Reference Domene MorenoDomene Moreno, 2021). Comparing adult monolingually raised Germans’ with Turkish/Germans’ perception and production of various phonological features of English, Reference Domene MorenoDomene Moreno (2021) found that the heritage speakers outperformed the monolinguals in the production of the English laterals, while the monolingually raised speakers outperformed the bilinguals in the production of the trap vowel.
It thus appears that the “bilingual advantage” that has been found in some studies might not reflect a general advantage in phonological acquisition but rather show that L3/Ln learners can – but not always do – benefit from the specific phonological properties of their background languages when there is an overlap with the phonology of the target language. This conclusion seems to be supported by Reference KopečkováKopečková’s (2016) study on the acquisition of the Spanish rhotic phonemes, the tap, and the trill. She compared monolingually raised German learners of Spanish with bilingual learners speaking different heritage languages and German and found that bilinguals who actively use both of their languages seem to have a general long-term advantage in the production of new sounds, which, however, is constrained by the degree of similarity between the phonologies of the multilinguals’ language repertoire. These findings have been incorporated into the first models of trilingual and multilingual phonological acquisition that are presented in Section 28.2.5. However, more research is necessary, especially involving L3 learners who acquire both of their non-native languages in a formal school setting. It is possible that the raised metaphonological awareness that such a learning scenario can bring with it might turn out to be a factor contributing to the “bilingual advantage.”
28.2.5 Modeling Multilingual Phonological Acquisition
As extant L2 speech models (see, e.g., Chapters 7–10, this volume) mostly draw on the notion of phonetic similarity and do not put forward any claims on how phonology works in the foreign language learner’s mind, they do not seem to fully fit a multilingual context, which requires a more global approach to how multiple phonologies are coactivated and influence one another in multilingual speakers. Recently, first models have been proposed that take account of the specifics of multilingual phonologies. These include Reference ArchibaldArchibald’s (2022) model for L3 acquisition, Reference Domene MorenoDomene Moreno’s (2021) “bit model” for speakers of multiple languages, and the Natural Growth Theory of Acquisition (NGTA; Reference Dziubalska-Kołaczyk, Wrembel, Sardegna and JaroszDziubalska-Kołaczyk & Wrembel, 2022). One basic tenet shared by the three models is that multilinguals possess an integrated phonological system for all of their languages. As a consequence, CLI from the common pool of phonological systems is possible. As Reference ArchibaldArchibald (2022, p. 2) puts it, “L3 learners do not have to choose whether to transfer L1 or L2 because both L1 and L2 reside in a common accessible location.” Based on the empirical findings discussed in Section 28.2.1, CLI is modelled as occurring property-by-property, that is, phonological structures are selected from either or a combination of the speaker’s other languages, depending on a variety of factors.
Reference ArchibaldArchibald (2022) suggests similarity to be the main factor determining the source of phonological transfer. Based on the idea of a contrastive hierarchy (Reference DresherDresher, 2009), he proposes making comparisons of the target language and the background language phonologies based on the entire phonemic inventories in which contrastive features are specified. In L3 phonological acquisition, the similarity between these feature hierarchies determines, for each phonological structure, which other language(s) acts as the source of CLI. Learning is modelled as incremental: as the learner discovers new contrasts, they select appropriate structures to represent those contrasts. In his model, Archibald explicitly includes not only transfer of articulatory routines but also phonological representations. The task of the learner is not merely to “notice” a particular novel sound in the target language but rather to determine its underlying representation.
The important role that the functional load of contrasts plays in multilingual phonological acquisition has also been noted by Reference Domene MorenoDomene Moreno (2021, p. 127). In her model, she also proposes structure-by-structure transfer from any or both of the trilingual’s other languages, which is in turn, and for each structure individually, conditioned by language-specific, universal, and extralinguistic factors. Such factors include phonological markedness and articulatory difficulty. Like Reference ArchibaldArchibald (2022) but unlike many models of L2 phonological acquisition, Reference Domene MorenoDomene Moreno (2021) claims that transfer cannot be predicted simply by the presence or absence of a particular phonological structure in one or both of the speaker’s background languages. Like in Reference ArchibaldArchibald’s (2022) approach, phonological structures are modeled as rich, complex, and abstract, including acoustic and articulatory properties of sounds, their functions, and their mental representations. Phonological grammars in turn are modelled as highly complex hierarchical structures that develop dynamically (Reference Domene MorenoDomene Moreno, 2021, p. 193).
The dynamic aspect of phonological acquisition is also incorporated into the NGTA (Reference Dziubalska-Kołaczyk, Wrembel, Sardegna and JaroszDziubalska-Kołaczyk & Wrembel, 2022), which assumes a holistic perspective of multiple language systems. It is conceptualized as a general theory of language acquisition, which allows us to explain the acquisition of various language domains including phonology, morphology, and syntax, and it does not limit its assumptions to one non-native language. Major tenets of NGTA include a gradual dynamic emergence of non-native language (Ln) phonology, guided by the input from native language and other non-native languages, a process which is influenced by typology, universal preferences (i.e., preferability generalizations), and context. Stemming from natural phonology and complexity theory, NGTA aims to expand these explanatory frameworks by applying them to the process of multiple language acquisition. The proposed modifications include, among others, dividing the evidence into system-internal and external, allowing for dynamic emergence rather than unsuppressed phonological processes, accepting both inductive and principled deductive explanations, incorporating usage-based frequency-driven explanations, and expanding the range of extralinguistic factors that condition the acquisition process (see Reference Dziubalska-Kołaczyk, Wrembel, Sardegna and JaroszDziubalska-Kołaczyk & Wrembel, 2022). According to its main assumptions, all three linguistic variables (i.e., L1, Ln, universal preferences) have an impact on the process, which is modulated by the configuration of extralinguistic variables in a given acquisition situation (i.e., acquisition of L1 or Ln by an individual or a population, in a formal or natural context, at a given age, with a given proficiency level, etc.).
Being guided by the principled explanations (e.g., cognitive economy) and inductive data-driven accounts, (i.e., post factum interpretations), NGTA proposes some specific scenarios pertaining to the process of multilingual acquisition of phonology. For instance, it is claimed that low proficiency triggers hybrid values based on L1 and L2/Ln; while it is with the advancement of proficiency that target values emerge. In turn, universal phonetic grounding, automatic in nature, is hypothesized to be present throughout the process of phonological acquisition, irrespective of the language status (L2 or L3). Further, at the initial stages of acquisition of an additional language, the most recent articulatory routines, including but not limited to primary (L1) routines, prevail as the source of CLI in acquisition and only then, at a later stage, does the metalinguistic learning of Ln take place. The authors point out that even a high degree of metaphonological awareness may not ensure that foreign language learners will overcome universal phonetic difficulties, as attested in their data. It is also emphasized that L2 phonology plays an important role as a source of CLI and it is intricately connected with such variables as metalinguistic awareness, recency of use, and the language status (L2 versus L3/Ln). Moreover, NGTA was proposed to account for different acquisition settings, including L2 and L3 acquisition. Following NGTA’s tenets as well as its specific predictions, different scenarios can be further developed to model patterns of acquisition for each of these settings, pointing in the direction of an enhanced complexity of the acquisition situation in the multilingual perspective. Reference Dziubalska-Kołaczyk, Wrembel, Sardegna and JaroszDziubalska-Kołaczyk and Wrembel (2022) maintain that learners’ phonology will grow following their individual natural paths of acquisition, an open question being to what extent these paths diverge or converge for specific L2/L3/Ln learners.
28.3 Conclusion
This chapter has provided empirical evidence for distinguishing between L2 and L3/Ln learners in both language acquisition theory and research, as there are considerable differences between them related to previous learning experience, types of CLI, their phonetic and phonological repertoire, their level of metalinguistic awareness, and their perceptual sensitivity, which may facilitate the learning of subsequent phonological systems. Further, it was shown that a different methodology is required for studying L3/Ln phonological acquisition that goes beyond the methods established in L2-speech oriented investigations. These include data collection in all of a multilingual’s languages as well as a careful consideration of the impact of the created language mode during data collection (see also Reference AmengualAmengual, 2021). Moreover, studies aiming to contribute to our knowledge of CLI between a multilingual’s languages need to include a wide range of both phonological structures and potentially influencing factors.
Research directly comparing L2 and L3 phonological acquisition is still rather limited to date, especially in terms of studies that investigate more than one individual phonological structure, so that we do not have full insight yet into the specificity of the acquisition of phonological categories in a foreign language for bilingual versus trilingual speakers. What have increasingly been demonstrated, though, are differences in their learning process related to the interaction of their languages, metalinguistic awareness, and perceptual abilities. So far, no clear general advantage for multilingual speakers compared to monolingual speakers in acquiring a new phonology has been empirically confirmed, but there is increasing evidence that, given an overlap between the phonological structures in the target language and one of the multilinguals’ background languages, positive transfer can occur and an advantage in acquisition is present.
Future research should expand the number of comparative studies into phonological acquisition, juxtaposing L2 and L3/Ln learners directly in order to inform debates on the nature and strength of similarities and differences between these populations. Furthermore, as shown in Section 28.2.1, it should become standard methodology to investigate more than one phonological structure within the same learner population, as this has proven to be the optimal approach for drawing theoretical conclusions on phonological acquisition and the role of CLI in it.















