10 Phonetic reflexes of code-switching
10.1 Introduction
While there is abundant descriptive and theoretical literature on the morpho-syntactic aspects of code-switching (hereafter CS) in a variety of language pairings, the phonetic and phonological reflexes of CS remain relatively unexplored. The paucity of research on these latter properties of CS may reflect the widespread assumption that, in contrast to borrowing, CS utterances manifest an abrupt transition between the sound systems of each language. When this view is challenged, it is generally done so on the basis that it inaccurately describes the degree of phonological integration that lexical borrowings, but not code-switches, may undergo. The adaptation of loan words has received considerable attention from phonologists, but the relationship between CS and the sound system of a language has not. If borrowing and CS fall along a single continuum, as many linguists have argued, then it is possible that CS utterances, as well as borrowings, may manifest some degree of integration or convergence.
This chapter presents an overview of the extant research on the phonetics of CS and attempts to address the types of questions that a full linguistic inquiry into the phonetics/phonology of CS should explore. Much of the current literature on phonetics and CS arises from the field of psycholinguistics, where the focus is on the mechanisms underlying CS in bilinguals (see Kutas et al., Gullberg et al., this volume) rather than on understanding the role of phonetics/phonology in relation to the structural aspects of bilingual CS. By examining both the psycholinguistic and the structural aspects of the phonetics of CS, this chapter demonstrates that many of the controversies that arise in explorations of the morpho-syntax of CS exist as well for the phonetic domain. In this respect, three broad questions regarding the role of the sound system in CS can be raised:
(1.) Does CS have an effect on phonological/phonetic production and perception?
(2.) Can phonological/phonetic properties be observed to constrain CS production?
Each of these questions has been addressed in the small body of research on the phonetics and phonology of CS, but the findings of these studies are often contradictory. Nevertheless, this chapter will advance tentative answers to these questions and address the many challenges that await future researchers in this field.
As has often been noted, there is a good bit of terminological confusion surrounding the term “code-switching.” This may be particularly true of some of the literature on CS and phonetics, where “code-switching” may refer not to the alternation of languages within a single utterance but instead to a bilingual’s performance in one language rather than the other (see Bahr and Frisch Reference Bahr, Frisch, Braun and Masthoff2002 on “code-switching” and voice identification in forensic phonetics; Hazan and Boulakia Reference Hazan and Boulakia1993 on phonetic production). The focus of this chapter will be limited to a consideration of the perception and production of bilingual speakers when they are performing simultaneously in both languages either via alternational or insertional CS (see Muysken Reference Muysken2000). The organization of this chapter is as follows: § 10.2 examines the use of phonological integration as a metric for distinguishing borrowing from CS. In § 10.3 we turn to a review and analysis of psycholinguistic “switching studies” that are largely devoted to examining bilingual perception and that rely on the notion of a phonetic base language. § 10.4 reviews the findings of a handful of recent linguistic studies on the phonetics and phonology of CS productions that, in part, advance answers and introduce new complexities into the question of whether bilinguals truly switch completely from the phonetic structure of one language to the next. The possibility that prosody constrains CS is considered in § 10.5. Finally, § 10.6 concludes with areas to be investigated and challenges for future research on CS and sound structure.
10.2 The phonology and phonetics of contact phenomena
There has been a great deal of debate within the field of contact linguistics on whether or not borrowing can be distinguished from CS on the basis of phonological structure. It is important to clarify what is intended by phonological, as opposed to phonetic, structure. Phonology is commonly held to be distinct from phonetics. Where phonological differences are envisioned as categorical, phonetic ones are seen as gradient. For instance, /b/ defines the phonemic category of a voiced stop which, depending on the language and the context, may in actuality be only partially or gradiently voiced. Similarly, L(ow)H(igh) defines the distinctive phonological category of a rising tone but the slope of the LH tone may be more or less steep depending on the distance of the interval between the pitch alignment positions of the valley and the peak. Phonological distinctions, such as /b/ or LH, are generally salient to native speakers, whereas the gradient phonetic properties of an utterance, such as more or less voicing or steeper pitch rises, are not.
In a general way, the division between phonology and phonetics is analogous to the segregation of research strands in borrowing and CS, respectively. Much of the work on borrowing is undertaken at the phonological level, analyzing broadly transcribed data to advance the notion that a borrowing conforms to the sound pattern of its recipient language. Conversely, research on the interaction between CS and sound structure invariably involves examining (or manipulating) the discrete phonetic properties of an utterance since it is assumed that code-switches should manifest only marginal cross-linguistic assimilation or, ideally, none at all. The following sections discuss, in turn, the phonology of borrowing and the phonetics of CS.
10.2.1 Phonology as a metric of lexical borrowing
It is popularly accepted that established borrowings tend to show a high degree of phonological integration to the recipient language. This observation has inspired a subdiscipline of theoretical linguistics, the study of loan phonology, which attempts to account for the perceptual, articulatory, and prosodic constraints that map donor language inputs onto well-formed recipient language outputs (see Coetsem Reference Coetsem1988 for a theory of loan phonology, Jacobs and Gussenhoven Reference Jacobs, Gussenhoven, Dekkers, van der Leeuw and van de Weijer2000 for a review of loan phonology analyses within Optimality Theory). Established loan words typically manifest the application of an array of common strategies – deletion, epenthesis, sound substitutions – that reveal the systematic properties of the phonology of the recipient language. For instance, throughout the Caribbean, Vick’s® VapoRub®, widely used as a cure-all salve, has been adapted into Spanish as vivaporú [biβaporú], manifesting the appropriate distribution of the Spanish labial allophones [b] and [β] as substitutes for /v/, which is absent from the Spanish inventory. Its syllabic structure, as well, conforms to Spanish via the deletion of the coda consonants from the English input form.
That vivaporú is a borrowing is hardly in doubt; it is fully integrated into the grammatical system of the recipient language and Spanish monolinguals and bilinguals alike use it ubiquitously. Yet identifying the status of a donor language lexeme as a borrowing versus a CS is not always so straightforward even when such criteria as structural integration and high frequency of use are taken as indexes of borrowing. In fact, many researchers agree that CS and borrowing cannot be fully differentiated but, instead, form a continuum of non-assimilated to assimilated forms (Myers-Scotton Reference Myers-Scotton1993a; Treffers-Daller Reference Treffers-Daller1991). Still others find it necessary to distinguish these phenomena (Poplack and Meechan Reference Poplack, Marjory, Milroy and Muysken1995), reflecting the intuition that the processes underlying them are different; CS arises from the ability of bilinguals to alternate between two linguistic systems on-line, whereas borrowing derives from lexical storage. Of the two, only CS is held to be a uniquely bilingual behavior.
In early theoretical works that attempted to distinguish single lexeme borrowings from CS, phonological integration was held to be an important factor in identifying loan word status. However, many researchers soon objected that borrowings of any vintage (new or established) do not always manifest phonological integration. For instance, even monolingual speakers of English may manifest a reasonable phonological approximation of the French culinary term jus [ʒy] despite the fact that established French loan words in English, such as jury, justice, Julian, show fortition of the post-alveolar word initial fricative [ʒ] to the affricate [dʒ] to conform to English phonotatic patterns.
The failure of all borrowings to be consistently adapted to the phonology of the source language led to the abandonment of phonological integration as a necessary property of loan words. Unassimilated loans are now either classified as “nonce borrowings” (Poplack et al. Reference Poplack, Sankoff and Miller1988) or are considered to belong within a continuum that spans from fully integrated borrowings to unassimilated code-switches (Myers-Scotton Reference Myers-Scotton1993a). Whether the degree to which a lexeme has assimilated phonetically, as opposed to phonologically, can be used as a diagnostic for situating it along such a purported continuum remains an open question.
Rarely considered in the debate over whether one can distinguish borrowing from CS is the potential objection that switches, as well as borrowings, may manifest phonological adaptation. In a study of Finnish–American English CS, Lehtinen (Reference Lehtinen1966:191) remarked early on that, “The phonological switching point cannot always be established with precision.” In particular, Lehtinen notes that English stem-final consonants preceding Finnish suffixes appear to undergo Finnish consonant gradation while in all other respects the speakers are faithful to the English phonological form of the stem. Such forms, then, are only partially integrated so that the phonetic transition between English and Finnish is obscured. Under one view, these forms would likely be classified as “nonce borrowings” rather than code-switches, but regardless, Lehtinen’s observation about a potential interplay between phonology and CS passed largely unnoticed.
Intuitively, it would seem apparent that bilinguals may show signs of phonological adaptation in CS since many bilinguals speak with a detectable accent in one, or perhaps both, of their languages. Accents may be attributed to various individual factors such as language dominance, age of acquisition, or to external factors such as the quality of the ambient input that they receive which, particularly in immigrant settings, may differ substantially from the norms of the monolingual community. In fact, it has been demonstrated that many bilinguals in such situations acquire the ability to calibrate their speech to phonetically accommodate to the non-native pronunciations of their interlocutors, even when they may pass as native speakers in monolingual contexts (see Khattab, this volume). Given that very few bilinguals are equally proficient in both their languages and that they likely command a variety of socio-phonetic registers that they may be able to consciously control, it is reasonable to expect some degree of cross-linguistic convergence in their speech. Of particular relevance to the study of CS, then, is the question of whether bilinguals alter the sound structure of one of both languages particularly when switching between them. In order to investigate this question, the level of linguistic analysis must shift from the phonological, where sound alternations are generally salient, to the acoustic phonetic, where degrees of difference, rather than wholesale sound substitutions, may be revealed.
10.3 The processing of acoustic information in bilingual switching studies
Psycholinguists interested in the mechanisms that underlie bilingualism, such as lexical access, inhibitory control, and selective attention, have conducted a series of studies investigating the acoustic and phonetic properties of language switching. These works largely aim to test proposals similar to those put forth by Macnamara (Reference Macnamara1967a, Reference Macnamarab) and Macnamara and Kushnir (Reference Macnamara and Kushnir1971) that bilinguals’ control of the input (perception) operates independently from their control of the output (production). Under such a view, the input switch is said to be automatic and biased toward the language of the incoming signal. That is, speakers expect the input signal to continue in the same language, and hence their processing strategies are tuned to that language. Thus, language switching has a processing cost. On the other hand, the output switch would operate under the conscious or voluntary control of the speaker. The normal design of a switching study involves the insertion of a “guest” word into what is termed a “base” or “precursor” language that provides the language set for the input. The aim is to determine whether the base language affects the recognition, perception, or production of the guest word.
Gullberg et al. (this volume) define language switching studies as those that induce participants to switch at a predestined point in an utterance. This is distinct from CS, which is assumed be voluntary. For the purposes of this chapter, though, switching studies are additionally characterized as experiments that examine the insertion of only a single guest word into a base language utterance. Thus, from a linguistic point of view, this kind of switching may fall more toward the borrowing than the CS end of the continuum of contact phenomena. However, at least one switching study (Li Reference Li1996, discussed below) endeavors to control for these different contact phenomena by manipulating the phonological structure of the guest word.
In the phonetic realm, switching studies normally target bilingual perception and are nearly exclusively dedicated to examining one phonological parameter, the categorical perception of the voiced /b,d,g/ versus voiceless /p,t,k/ series of stop consonants. One acoustic cue for the voiced–voiceless distinction is voice onset time (VOT), which defines the interval between the burst release of the consonant and the onset of voicing of the vowel. VOT spans a continuum with different languages situating the transition between a voiced and voiceless stop at different points. In Spanish and French, voiceless stops are produced with very short VOT values and are said to be short lag stops. In English, by contrast, VOT values for voiceless stops are relatively long and such stops are produced with a period of aspiration following the consonant burst, as indicated in the waveform diagram in Figure 10.1. The gradient nature of the voicing lag makes it an ideal testing ground for perceptual switching studies since the VOT value can be manipulated either through the creation of synthetic stimuli or through edited natural speech tokens. This allows for the establishment of clear end-points; for instance, VOT values of − 60 ms would be perceived as voiced by all listeners and, at the other extreme, values of 60 ms, as voiceless. Between the two endpoints lie ambiguous stimuli that could be perceived as either voiced or voiceless. In general, phonetic switching studies have been designed to test whether the language of presentation, the base language, has an effect on the perceptual categorization of ambiguous inputs. The results have been contradictory, so it is worth considering the relevant experiments in turn.

Figure 10.1 Waveform of English cat showing long voicing lag and accompanying aspiration for initial /k/ between the vertical lines
Using synthetically generated nonce syllables, Caramazza et. al (Reference Caramazza, Yeni-Komshian, Zurif and Carbone1973), testing French–English bilinguals, and Williams (Reference Williams1977), testing Spanish–English bilinguals, found that listeners were unaffected by the language of the experimental instructions, taken as the precursor language, and that bilinguals appeared to have fixed (i.e. merged) perceptual boundaries for the voicing distinction across their two languages. Elman et al. (Reference Elman, Diehl and Buchwald1977) directly assessed the effect of the precursor language on bilingual perceptual switching, using natural stimuli embedded in either English (1a) or Spanish (1b) base language contexts, as shown by the translations equivalents in (1).
(1) Elman et al. (Reference Elman, Diehl and Buchwald1977:972) switching stimuli
In contrast to the previous findings in VOT switching studies, Elman et al. found that bilinguals did shift their perceptual boundary in response to the precursor language. Further, the effect remained when their listeners were divided into groups reflecting different levels of bilingual proficiency. Even the highest proficiency bilinguals performed differently from the corresponding monolingual groups. The researchers hypothesized that their results differed from those of the previous studies primarily due to the use of natural versus synthetic speech tokens.
A number of subsequent studies confirm the dominance of the base language on the perception of the guest language in CS but acknowledge that there might be numerous factors – structural, contextual, and psychological – that impinge on a listener’s access of a CS word (Soares and Grosjean Reference Soares and François1984; Grosjean and Soares Reference Grosjean, Soares and Vaid1986; Grosjean Reference Grosjean1988). With respect to structural factors, Bürki-Cohen et al. (Reference Bürki-Cohen, Grosjean and Miller1989) hypothesize that the phonetic structure of the stimuli itself may have a bearing on bilingual perception during CS. They constructed two different sets of stimuli, one in which the switched tokens could be homophonous across languages (French dé “dice” and English day), and one in which the phonology provides a distinctive cue to the guest language (French ré [ʁe] and English ray [ɹe]). They edited the tokens by splicing French and English productions together to create ambiguous or hybrid stimuli for the perception tasks. As in the Elman et al. study, the stimuli were embedded into base language carrier phrases as in (2).
(2) Bürki-Cohen et al. (Reference Bürki-Cohen, Grosjean and Miller1989:365) switching stimuli
They found that the base language had no effect on the listener’s categorization of the language-neutral series of stimuli. The ambiguous stimuli of this series were identified as the same regardless of the precursor. However, they found a polarizing effect of the base language on the perception of the language-selective tokens. Here, the hybrid tokens were categorized more toward the guest language, in contrast to the base language. This implies that any effect of the base language is not necessarily assimilatory.
The use of the distinctive phonetic and phonotactic structure of a guest word as a perceptual cue to a language switch is also investigated by Li (Reference Li1996). Li uses phonological criteria to distinguish English borrowings from code-switches in a Chinese–English context. For instance, the English word flight is pronounced [faɪ] as a borrowing but as a CS, it retains the English phonetic and phonotatic structure [flaɪt]. Li shows that the structurally distinctive properties of a CS allow listeners to recognize an English word in a Chinese base language as quickly as monolingual English listeners do. The recognition of borrowings that are phonologically integrated into Chinese was found to take much more time. Li uses this evidence as an argument against an automatic language input switch since the precursor language does not affect the perception of a CS. His results can be seen to affirm those of Bürki-Cohen et al. (Reference Bürki-Cohen, Grosjean and Miller1989) in that a significant phonological dissimilarity between languages can apparently facilitate the recognition or perception of CS.
The studies by Bürki-Cohen et al. (Reference Bürki-Cohen, Grosjean and Miller1989) and Li (Reference Li1996) show that the effect of the precursor language on the perception of the guest language is probably not independent of the phonological properties under examination. This may be true of the acoustic level as well. Hazan and Boulakia (Reference Hazan and Boulakia1993) examine an additional phonetic cue to the voicing distinction in stops, the frequency of the first formant (F1) at the onset of the voicing of the vowel. F1 onset frequency can present a strong perceptual cue for the voicing distinction in English but not in French. In contrast to the VOT continuum that serves as a distinctive voicing cue in both English and French, the cue weighting of F1 onset frequency, then, is categorically different across these languages. In their study, edited tokens of /bɛn/ and /pɛn/, real words in both English and French, were edited to have an identical VOT range but to vary in F1 frequency at the onset of the syllable rhyme. As in previous studies, the test materials were constructed in a base language + guest word series to test the effect of switching, as in (3), and presented to French–English bilinguals who differed in language dominance.
(3) Hazan and Boulakia (Reference Hazan and Boulakia1993:22) switching stimuli
Their results showed only a small effect of the precursor language on phoneme categorization and, for a majority of the bilinguals, the precursor language failed to affect cue-weighting at all. They tentatively conclude that language dominance, defined as the language learned first, determines cue-weighting in bilinguals.
Taken together, the results of the perceptual studies offer only tentative evidence that the base or precursor language affects the perception or recognition of the guest word. Soares and Grosjean (Reference Soares and François1984) enumerate various linguistic and psycho-social factors of CS that might impinge upon bilingual listeners’ performance in these tasks, few of which are ever taken into account in psycholinguistic studies of CS. Nevertheless, given the available evidence, it is unlikely that the base language functions as the phonetic equivalent of the morpho-syntactic matrix language (see Myers-Scotton and Jake, this volume), providing an acoustic frame for the perception of a mixed language utterance.
10.3.1 Production in bilingual switching studies
In the few available switching studies of production, the base language has been found to have no effect on the production of categorically distinct sets of stop phonemes. For instance, Hazan and Boulakia (Reference Hazan and Boulakia1993) complemented their perception analysis with a production task administered to their French–English bilinguals. They found that all groups (monolingual French, monolingual English, French dominant bilinguals, and English dominant bilinguals) showed categorical differences between /p/ and /b/ in both English and French (an effect that they refer to confusingly as evidence of code-switching). Caramazza et al. (Reference Caramazza, Yeni-Komshian, Zurif and Carbone1973:427) reported similar results from their production study and conclude that, “It seems that language switching is easier for production than for perception. In perception, the stimulus itself seems to determine the type of analysis to be performed.” This statement is in line with Macnamara’s (Reference Macnamara1967a, Reference Macnamarab) proposal of independent input and output switches.
Grosjean and Miller (Reference Grosjean and Miller1994:201), who declare, perhaps precipitously given the available evidence, that there is a “momentary dominance of base-language units” in the perceptual domain, find that the precursor language has no such effect in production and that the French–English bilinguals in their study switched immediately and completely from the phonetics of one language to that of another. It merits noting that this study attempts to test whether bilinguals anticipate a switch in production and assimilate earlier than the switch point to the phonetics of the guest language. One task requires bilinguals to code-switch for the proper names Paul, Tom, Carl into the phonetics of the guest language, as shown in (4).
(4) Grosjean and Miller (Reference Grosjean and Miller1994:203) stimuli for production study
(a.) “During the first few days, we’ll tell him to copy Carl constantly.”
(b.) “Pendant les premiers jours, il faudra qu’il copie Carl constamment.”
The construction of these test stimuli in this way allowed Grosjean and Miller to measure the VOT values of the initial consonants of the French base language words, underlined in (4b), in contexts immediately preceding and following the switch (Carl, pronounced with English phonetics in (4b)). These values could then be compared with the values for the French phoneme /k/ when it occurs at a switch juncture (Carl, pronounced with French phonetics in (4a)). Again, their results showed a categorical shift between English and French language phonetics, irrespective of the context.
10.3.2 Reconsidering the switching paradigm for production
There is an apparent disparity between the findings of the perceptual studies, where the acoustics of the base language arguably affects the processing of the guest language, and those of the production tasks, where the separation between the phonetics of the base and guest language is claimed to be complete. This would seem to provide support for the notion that bilinguals have voluntary control over the output but that the processing of the input shows an influence from the precursor (unless the phonetics of the guest language provides a salient cue to the language switch). In other words, it would appear from these studies that bilinguals are able to completely suppress or inhibit their non-target language in production, a result that would be entirely at odds with more current thinking that both languages of a bilingual are simultaneously “on,” although to different degrees of activation. A deeper consideration of the switching paradigm may help to resolve this paradox.
Note that the materials for the various switching studies, as illustrated in examples (1) through (4), show a similar design in that they consist of a base language carrier phrase into which is inserted a single guest language word. The vast majority of these guest words (with the exception of some of the tokens in (2)) are intentionally selected, or synthesized, to be bilingual homophones. This choice may be appropriate for the perception studies, but may have unintended effects on production. While bilingual speakers have been repeatedly shown to produce merged or compromised VOT values relative to monolinguals, they have also demonstrated the opposite tendency; that is, they may be observed to exaggerate these same values to maximize the phonetic contrast between their component languages (Flege and Eefting Reference Flege and Eefting1987). When faced with a production task that requires them to pronounce isolated homophones in the alternate language from the carrier phrase, some speakers may indeed maximize the cross-linguistic contrast while others may assimilate the homophones to the phonetics of the base. Group averages would effectively efface the effect of different strategies, making it appear as if bilinguals are impervious to the influence of the base language in production.
10.4 Laboratory research on the phonetics of CS
There are a number of conceptual issues underlying switching studies that limit their possible extension to understanding the phonetics of naturalistic CS. First, the guest language is represented only by a single syllable or word, a structure that is representative of a lexical insertion rather than an intra-sentential CS. As noted above (§ 10.2.1), the status of such items, even when they are real words rather than synthesized ones, is questionable and they may be interpreted by bilinguals as borrowings (therefore easily assimilated to the base language) rather than switches. Second, switching studies are predicated on the idea that the language you start in affects the language you switch to. Yet if we admit that bilinguals can activate both languages simultaneously, a state surely to be achieved during CS or when accessing interlingual homophones, then we would expect that cross-linguistic interaction may operate bi-directionally (from base to guest or vice-versa). Third, switching studies, by their current design, cannot be informative regarding how long before or after a CS any cross-linguistic effect can be detected. In theory, it is possible that bilinguals adopt a bilingual production (or perceptual) mode, in which they may behave quite differently from when they expect to produce (or hear) in only one language. Given that bilinguals should not be assumed to perform to the phonetic norms of monolinguals, it is crucial to investigate the effects of CS relative to their own non-switching norms.
Linguistic studies devoted to describing the phonetic effects of CS, rather than the cognitive mechanisms underlying language switching, are few (Toribio et al. Reference Toribio, Bullock, Botero, Davis, Gess and Rubin2005; Bullock et al. Reference Bullock, Toribio, González, Dalola, O’Brien, Shea and Archibald2006; Khattab Reference Khattab, Hua and Dodd2006, this volume). Like switching studies, these have often induced CS in bilinguals in order to insure that the specific phonetic features under examination appear in the appropriate contexts with the difference that the materials used are intra-sentential CS constructions with grammatically constrained junctures, occurring either at the Subject–Predicate or Verb–Object boundaries. In this respect, the stimuli resemble natural bilingual CS. In laboratory studies of linguistic CS, researchers have attempted to redress the limitations imposed by the switching paradigm with respect to bilingual language production by posing additional questions, such as those cited in (5).
(5) Research questions adapted from Bullock et al. (Reference Bullock, Toribio, González, Dalola, O’Brien, Shea and Archibald2006:11)
(a.) Are there within-language differences between bilingual production in monolingual versus code-switched natural speech?
(b.) Is one language affected more than the other?
(c.) Is the speaker’s L1 less permeable to convergence than the L2?
(d.) Does the direction of the switch matter (from L1→L2 or from L2→L1)?
(e.) If an effect of CS occurs, how long does it persist into an utterance?
These research questions are cited here because they pose fundamental issues that any inquiry into bilingual CS should take into consideration. Item (5a) considers the general effect of CS on bilingual production because it is possible, in theory, that bilinguals manifest no difference between modes, or that they adopt compromised or merged phonetic values across a CS utterance relative to a monolingual one. Notice that questions (5b–c) raise the possibility that the effects of CS on phonetic production may be asymmetrical. That is, perhaps due to inherent linguistic differences or to speaker proficiency, to mention but a few factors, only one language of the pair may be affected (5b). Additionally, given that L1 phonetic values are assumed to be set early, it is possible that the language first acquired may be more stable during CS than the L2 (5c). Item (5d) aims to test the directionality assumption implicit in language switching studies and (5e) is designed to tease apart the effects of CS from that of language mode by examining whether perturbations to the phonetic system in CS are temporary or global.
Interestingly, the results of phonetic CS studies to date do not converge with those of the production switching studies reviewed above in § 10.3.1. In particular, the study by Bullock et al. (Reference Bullock, Toribio, González, Dalola, O’Brien, Shea and Archibald2006) showed a robust effect of CS on phonetic production that would not be predicted by a switching study. They tested the production of Spanish–English bilinguals in both monolingual and bilingual modes, separating their participants into two groups who were mismatched in proficiency. The Spanish (L1) bilinguals were strongly Spanish dominant and most had detectable foreign accents in their English. The English (L1) speakers, however, were Spanish instructors and, thus, more balanced across their languages. Each group was tested on their productions of /p,t,k/ in separate Spanish and English monolingual sessions. They were then tested in a bilingual session where they read CS sentences in both directions, randomly ordered. Embedded in each sentence were counterbalanced tokens of /p,t,k/ at strategic sites: pre-switch, at the switch juncture, and post-switch, as illustrated in (6).
(6) CS stimuli from Bullock et al. (Reference Bullock, Toribio, González, Dalola, O’Brien, Shea and Archibald2006:11)
(a.) Spanish to English

“All my friends talked Spanish as kids.”
(b.) English to Spanish

“The typhoon damaged roofs and walls.”
The results showed that both groups, regardless of mode or switch site, maintain significantly distinct categories for Spanish versus English, a result that confirms the findings from the production switching studies. However, despite the participant group differences in L1 and in L2 proficiency, both groups showed an identical asymmetric pattern of phonetic shift in CS; that is, the effect of CS on production was manifested only in their English language productions. Specifically, their English language VOT values merged toward (but did not converge with) Spanish language values only when CS, but their Spanish language VOT productions remained constant across modes. The influence of Spanish on English occurred regardless of the direction of the switch. Intriguingly, the phonetic merger was most pronounced before switching from English to Spanish, rather than in the reverse direction. That is, bilinguals showed the highest degree of phonetic merger in anticipation of CS. When switching from Spanish to English, their English VOT productions at the switch site also merged significantly toward the Spanish language values while, at the post-switch position, they recovered their own monolingual values.
These findings suggest that there is a cross-linguistic effect in CS but one that is more complex than anticipated by switching studies. This effect appears to be local, rather than global, as it is concentrated before and directly after the switch. It is also independent of the base language (i.e. the language that you start in) because it occurred regardless of the direction of the switch. In fact, the English language productions of both groups were most Spanish-like when speakers began an utterance in English. Finally, the effect can be asymmetric, affecting only one language of the pair whether it is the base or the guest language. Importantly, the convergence between languages is not complete; these bilinguals, regardless of proficiency level, maintained separate voicing categories across their two languages, although not necessarily in the identical range to those of monolinguals of the respective languages.
The authors of the study speculate that the observed asymmetry may be due to inherent linguistic differences. That is, the VOT range for voiceless stops in English is expansive compared to the relatively compressed range of the short lag stops of Spanish. This could potentially allow more flexibility in the production of voiceless stops in English, permitting convergence toward (but not confusion with) Spanish during CS.1 By contrast, expanding the VOT continuum of voiceless stops for Spanish past a certain interval (>30ms) may push them noticeably out of the Spanish range. This would suggest that inherent phonetic differences may condition CS behavior and, as within the morpho-syntactic domain, the output of CS must respect the phonological constraints of both languages, albeit allowing for phonetic variability in their expression.
Only one study to date examines directly whether CS can confound phonological distribution. Bullock et al. (Reference Bullock, Toribio, Davis, Botero, Chand, Kelleher, Rodríguez and Schmeiser2005) investigated whether CS could impact the production of syllable final lateral allophones among Puerto Rican Spanish (PRS)–American English (AE) bilinguals. Both languages possess phonological processes that impact syllable final liquids. In AE, a final lateral is produced with a retracted tongue dorsum and realized as a velarized, or dark l: [ɬ]. A salient (and sociolinguistically stigmatized) property of PRS is the variable application of lamdacization where an underlying rhotic surfaces as a lateral (e.g. vivir “to live” → [biβíl]). In PRS, syllable final laterals are apico-alveolar but they may surface as the reflex of either an underlying /l/ or of an underlying /r/. This means that the distributional as well as the phonetic properties of laterals differ between these two languages. The study was designed to test whether bilinguals could be observed to confuse the phonologies of their two languages by producing the alternate language allophone while engaged in reading CS sentences such as as in (7):
The perfume smells suti/l/ pero fuerte.
“The perfume smells subtle but strong.”
Extracting each lateral produced in both monolingual and CS contexts, the researchers measured the degree of velarization of all lateral productions by reference to the position of the second formant (F2) – a velarized lateral will show a significantly lower F2 than an apico-alveolar lateral (i.e. “clear l”). There was a small effect of CS within the Spanish language productions among individual speakers in that one speaker only produced lambdacization of underlying /r/ and another produced significantly velarized variants for underlying /l/ only while CS. The researchers suggest that, “it may be more difficult to . . . self-monitor pronunciation” while CS (Bullock et al. Reference Bullock, Toribio, Davis, Botero, Chand, Kelleher, Rodríguez and Schmeiser2005:110). However, overall, the results showed that these bilinguals, even in CS, maintain separate, correctly distributed allophones across their two languages. That is, they did not confuse their phonologies while engaged in CS.
In sum, laboratory studies investigating the effect of CS on production demonstrate that cross-linguistic influence is present at the phonetic level even though bilinguals are successful in maintaining separate phonological categories across languages. However, it also shows that the interplay between CS and phonetics is complex and may, in part, be determined by the specific phonetic properties under investigation.
10.4.1 The phonetics of naturalistic CS
An objection that can be raised with respect to the studies of CS reviewed so far is that the participants are induced to code-switch and that this fails to reflect the natural behavior of bilinguals. This is a valid concern because the motivations underlying a speaker’s choice to code-switch are complicated and we cannot simply assume that a speaker’s CS productions are invariable across the different conversational contexts in which they use both languages simultaneously. Laboratory findings into the consequences of CS on phonetic structure, then, need to be weighed against findings from bilinguals engaged in natural CS.
Khattab (Reference Khattab2002a, Reference Khattab2002b, in press, this volume) provides insight into the phonetic properties of naturalistic CS through her investigations of the phonetic productions of Arabic–English bilingual children. She demonstrates that the children under study often engage in CS with their bilingual (Arabic dominant) parents and that when they do so, their English productions display Arabic phonetic features that are absent when they are speaking English in monolingual settings. Khattab reasons that the children are accommodating in their CS speech to the non-native productions of their parents. Importantly, she argues that the apparent “interference” of Arabic on their English language productions may not be accidental at all, but rather that the children are capable of fine phonetic control, displaying evidence of an expanded and sophisticated phonetic repertoire relative to monolinguals.
Notice that the findings from naturalistic studies indeed confirm the findings of the laboratory studies that CS has an effect on the phonetic production of bilinguals. However, the observed phonetic convergence revealed by these two study paradigms may arise for entirely different reasons. The naturalistic data, unlike the laboratory data, suggest that bilinguals can intentionally enhance linguistic crossover between their two linguistic systems while CS. This implies that the laboratory studies may actually present a conservative picture of the potential effects of CS on phonetic production. We can hypothesize that in spontaneous bilingual interactions, we might expect even more dramatic evidence of phonetic overlap during CS. Whether this prediction is borne out awaits future study.
10.5 Can phonology constrain CS?
Up to this point we have considered only whether CS affects phonological/phonetic structure. The issue can be viewed the other way around: can phonological/phonetic structure affect CS? This question is the natural corollary to the syntactic theoretic literature devoted to CS, yet only rarely has a role for phonology been acknowledged in the search for linguistic constraints on CS. The few proposals that exist view the role of phonology as facilitating, not constraining, CS and at a lexical rather than a phrasal level (Clyne Reference Clyne2003). The idea behind facilitation, as envisioned by Clyne (Reference Clyne2003), is that certain lexical items can act as triggers for CS in bilingual speech. Because there generally needs to be some similarity in the surface form of a trigger word across the component languages, facilitation is more likely to arise in closely related languages, but it is not unattested in typologically distinct languages.
According to Clyne, certain types of words – bilingual homophones, unassimilated lexical transfers (i.e. nonce borrowings), and proper names – may facilitate a shift in language, as illustrated in (8).
(8) Dutch–English CS triggered by a bilingual homophone
En we reckoned Holland was too smal vor uns. Het was te benauwd allemaal.
“And we reckoned Holland was too narrow/small for us. It was too oppressive altogether.”
The bilingual homophone smal (Dutch “narrow”) has converged phonetically for the speaker cited in (8) and he pronounces it identically across Dutch and English: [smɑl]. The coincidence of the phonetic surface form across languages triggers a CS in an unlikely syntactic context (between a modifier and adjective). This implies that facilitation (triggering) can contravene syntactic constraints.
Facilitation has also been reported at the prosodic level in Vietnamese–English CS (Tuc Reference Tuc2003). Standard Vietnamese has a repertoire of six distinctive tones, each designated by a name and represented orthographically by a diacritic (or by the absence of a diacritic for the “neutral” tone ngang), as given in (9).
(9) Vietnamese tones
- sắc:
high (or mid) rising
- ngang:
mid level (neutral)
- huyền:
mid falling
- ngã:
rising contour, constricted
- hỏi:
dipping-rising contour
- nặng:
low, constricted
Tuc (Reference Tuc2003) shows that of these six tones, the last three, characterized by contours, by glottalization, or a combination of both, are virtually excluded from occurring immediately before CS into English.2 The remaining tones have a relatively high or mid pitch, which Tuc argues facilitates switching into English because Vietnamese speakers establish a perceptual equivalence between the high and mid Vietnamese tones with the stressed and unstressed syllables of English, respectively. Thus, CS into English overwhelmingly occurs at the tonal range that is most appropriate for both languages. Zheng (Reference Zheng1997, cited in Clyne Reference Clyne2003) finds that switching between Mandarin and English is similarly restricted to a particular tonal range that is perceived to be compatible to both languages.
On another interpretation of these data, one could argue that the tonal properties of Vietnamese (and perhaps Mandarin) do more than facilitate a CS; they appear to constrain it. It is not simply the case that lexemes bearing particular tones trigger CS but CS is virtually blocked unless certain tones appear at a switch juncture. This can be seen when a particle with no syntactic function in an utterance is inserted before a CS, as in the example in (10), where the determiner đó “that” has been inserted.
(10) Vietnamese–English
Nhũ’ng gì nó nói mày phải đó recall lại hềt
pl what he say you must det recall again final particle
“You have to recall whatever he said.”
As Tuc shows, the CS sentence would be fully grammatical without đó. In fact, the corresponding monolingual phrase would be ungrammatical if the determiner were to precede the equivalent Vietnamese verb for “recall.” But the presence of the dummy determiner, which Tuc endows with the pragmatic function of signaling CS, can be understood to be prosodically motivated. Without it, the sentence may be grammatical but the CS would likely be ill-formed as it would be directly preceded by a contour tone, rather than a mid or high tone. This implies that the particle is inserted not simply to facilitate CS but, instead, to allow it.
In sum, the data summarized in this section provide empirical support to the notion that CS can be conditioned by phonological structure. The Vietnamese data, in particular, strongly suggest that CS may be subject to prosodic constraints. This implies that although the search for structural constraints on CS has largely been confined to the morpho-syntactic component of grammar, it may be time to expand the quest to consider the role of prosody in CS. This is but one of a number of topics that awaits future study.
10.6 Conclusion: challenges for future research
This chapter began by laying out three general questions concerning the role of phonology and phonetics in CS. Here, we consider them in turn in an attempt to advance some conclusions.
(1.) Does CS have an effect on phonological/phonetic production and perception?
This question probes whether two languages may overlap or influence one another in CS. As we have seen, there is clear evidence of crossover between languages in the production domain but only at the phonetic level; phonological categories do not appear to overlap in CS. In the perceptual domain, there is also reported evidence of cross-linguistic influence on the processing of acoustic stimuli. In particular, phonological dissimilarity between languages has been shown to have a facilitative effect on both perception and word recognition. The answer to question (1.), then, depends upon the degree and type of overlap concerned but, by and large, we do find effects of CS on both production and perception. Indeed, this is the expected result under models that assume that bilinguals maintain both languages simultaneously activated.
(2.) Can phonological/phonetic properties be observed to constrain CS production?
Although constraints on CS have been the main preoccupation of syntacticians interested in bilingualism, this issue has been only cursorily addressed in the phonological literature. Results from studies of CS between languages with typologically distinct prosodic systems suggest that the answer to this question is affirmative. But, clearly, this is an area that merits much more consideration.
(3.) Is there a phonetic base or matrix language in CS?
If the base language is construed to be the language that initiates a CS utterance then the answer to this question is negative. Phonetic overlap can occur irrespective of the direction of CS. However, there remains the possibility that some bilinguals may show a greater influence of one language over the other only in their CS pronunciations.
The answers to the questions posed in this chapter are only tentative and an exploration of the phonetic reflexes of CS remains very much an open field of inquiry. Clearly, the factors underlying phonetic production and perception in CS are complicated and present significant challenges to future researchers. Data collection, alone, is an obstacle because future work must undertake rigorous acoustic phonetic analyses of CS speech as impressionistic transcriptions are not detailed enough to detect the myriad cues that may be present. The addition of data from spontaneous CS corpora is an absolute necessity but, here, researchers will be hampered by the difficulty of collecting a sufficient amount of target tokens in the appropriate contexts.
A significant challenge to understanding the phonetics of CS arises from the fact that bilingual phonology, in general, is much understudied. Not all bilinguals are “accent free” in both languages and it is quite likely that even those who pass as monolinguals differ substantially from true monolinguals at the phonetic level. Thus, it is imperative that researchers examine bilinguals’ CS behavior in relation to the participants’ own monolingual performance. In addition, researchers repeatedly underscore the highly variable nature of bilingual phonetic performance; even bilinguals of virtually identical sociolinguistic profiles can behave quite differently at the phonetic level. Thus, group results must be treated with caution as they tend to efface the often dramatic differences that individuals may manifest in their speech production.
Finally, although there are many avenues of bilingual phonology to investigate in relation to CS, this chapter will conclude with one in particular. Neglected in much of the discussion regarding CS is the fact that phonology and syntax interface in bilingual performance. In this respect, a fruitful area for research in CS will likely be found at the prosodic level where pitch contours range across an utterance and, in many languages, are used for various discourse-pragmatic purposes. Prosodic or accentual boundaries are not necessarily isomorphic to syntactic or lexical ones. For instance, pitch peaks in many languages, like Spanish, are not bound to the stressed syllable of a lexical item but may, at times, be aligned to a syllable in the following word. Does this affect CS? How do bilinguals jointly reconcile the syntactic constraints of their component languages with the prosodic ones? These are unanswered questions but it seems quite likely that the interface between prosody and syntax, and not merely syntax alone, may play a role in circumscribing the domain of CS.
Notes
1. This observation, if sustained, could explain why Grosjean and Miller (Reference Grosjean and Miller1994) failed to find any phonetic effect of preplanning during CS on VOT values in their production study, since they only examined these values for French, which patterns like Spanish with respect to VOT.
2. It is worth noting that the contour tones are absent, as well, from French borrowings in Vietnamese where only the mid tones occur by default. In these borrowings, the high tone, sắc, and the constricted low tone, nặng, replace the mid level tones when a voiceless stop appears in the coda of the syllable (Hoi Doan, p.c.).


