Exposure to a second language in infancy alters speech production

Abstract We evaluated the impact of exposure to a second language on infants’ emerging speech production skills. We compared speech produced by three groups of 12-month-old infants while they interacted with interlocutors who spoke to them in Spanish and English: monolingual English-learning infants who had previously received 5 hours of exposure to a second language (Spanish), English- and Spanish-learning simultaneous bilinguals, and monolingual English-learning infants without any exposure to Spanish. Our results showed that the monolingual English-learning infants with short-term exposure to Spanish and the bilingual infants, but not the monolingual English-learning infants without exposure to Spanish, flexibly matched the prosody of their babbling to that of a Spanish- or English-speaking interlocutor. Our findings demonstrate the nature and extent of benefits for language learning from early exposure to two languages. We discuss the implications of these findings for language organization in infants learning two languages.

Acquiring a language involves not only becoming a native listener, but also becoming a native speaker. How and when does infants' speech production become language-specific? It is uncontroversial that native language experience alters speech production at or after the oneword stage . Thus, the earliest words of children speaking English or German (e.g., Kehoe, 2015) are shorter, with more coda consonants when compared to those of children speaking Spanish (e.g., Roark & Demuth, 2000;Lleó, 2006), Italian (Ingram, 1981) or Farsi (Keshavarz & Ingram, 2002), reflecting the tendencies in these languages.
In contrast, prior to the one-word stage, infants' babbling has traditionally been thought to exhibit features that are independent of specific language experience (Oller, 2000; see Buder, Warlaumont & Oller, 2013 for a review). Thus, infants exposed to many different languages produce quasi-vowels and glottal stops right after birth. Between 1 and 4 months of age, infants' cooing typically involves the production of back consonants with vowels that are partially resonant. Subsequently, infants begin to produce fully resonant vowels between 3 and 8 months. At this point, infants begin to combine vowels with consonants, first with slow transitions, producing marginal syllables, then with faster transitions, producing canonical syllables (a repetitious string of syllables, as in "bababa") between 5 and 10 months.
More compelling evidence that babbling in the first year of life is affected by language experience comes from research assessing the supra-segmental characteristics of infant babbling. Several researchers have suggested that infants reproduce the prosodic characteristics of their ambient language before its segmental patterns (e.g., Crystal, 1979;. Cross-linguistic research shows that monolingual infants between 8-12-months of age begin to produce the characteristic intonation (Whalen, Levitt & Wang, 1991), syllable, and word-form shapes (Levitt & Utman, 1992;Lleó, Prinz, El Mogharbel & Maldonado, 1996) of the specific language to which they are exposed. Perhaps this is not surprising given that the human fetus responds to the supra-segmental properties of speech input by about 30 weeks of gestation (Kisilevsky, Hains, Lee, Xie, Huang, Ye, Zhang & Wang, 2003), and even newborns' cries reflect the prosodic characteristics of their mother's language (Mampe, Friederici, Christophe & Wermke, 2009). Even in the absence of such prenatal experience, the rhythm of manual babbling by 7-month-olds learning sign language is altered by their ambient language experience (Petitto, Holowka, Sergio, Levy & Ostry, 2004;Petitto, Holowka, Sergio & Ostry, 2001).
In sum, it is controversial how early infants exposed to a specific language begin to produce speech patterns characteristic of that language. Researchers comparing babbling of monolingual infants must contend with considerable variation that comes from comparing behaviors across different infants. By comparing the babbling of infants with exposure to two languages, researchers can treat each infant as a control for herself, allowing for more nuanced comparisons. Thus, ultimately, the most compelling evidence for whether language experience affects infant babbling is likely to come from bilingual infants.
In this paper we present infant babbling data from bilingual English-and Spanish-learning infants (Experiment 1), and monolingual English-learning infants with and without short term exposure to Spanish (Experiments 3 and 2, respectively). With data from these three groups we show that, by 12 months, infants can alter their babbling to match the prosody of English and Spanish interlocutors, but only if they have had at least some prior exposure to both languages.

Experiment 1
There have been only a few studies focused on investigating whether babbling by bilingual infants has language specific characteristics. In one longitudinal single-subject investigation, a French-English bilingual 10-month-old was reported to produce more multisyllabic utterances and fewer sounds per syllable when interacting with a French-speaking interlocutor than with an English-speaking one (Maneva & Genesee, 2002). Similarly, a Spanish-English bilingual 12-to 13-month-old was reported to produce fewer coda consonants in Spanish contexts than in English contexts (Andruski, Cassielles & Nathan, 2014). However, a study with a larger group of English-and French-learning bilingual 13.5-month-olds failed to find differences in the production of consonants between the two language contexts (Poulin-Dubois & Goodz, 2001, see also Zlatić, MacNeilage, Matyear & Davis, 1997 for a twin sibling study where the two language contexts were not separated). One possible reason for these differences across studies is that younger infants have more nuanced control over the production of prosody rather than the segmental characteristics of the ambient language, given constraints on their developing anatomy and motor control. This could explain why Maneva and Genesee (2002) and Andruski et al. (2014) found differences in the prosodic features of babbling, but Poulin-Dubois and Goodz did not find differences in the segmental features of babbling. However, given that Maneva and Genesee (2002) and Andruski et al. (2014) each tested only one infant, their findings may not generalize to a larger population.
Besides being the strongest test case for language-specific effects on speech production, an investigation of babbling by infants exposed to more than one language can also shed light on a second controversy: how are the two languages of bilinguals represented? There is now a consensus that infants growing up bilingual do not start out with a fused representation of their two languages (e.g., Genesee, 1989;De Houwer, 1990), although the extent to which the two languages develop autonomously or interdependently continues to be debated (e.g., Hammer, Hoff, Uchikoshi, Gillanders, Castro & Sandilos, 2014). Early differentiation of the two languages is supported by research showing that newborns, whether monolingual (Byers-Heinlein et al., 2010;Mehler, Juscyzk, Lambertz, Halstead, Bertoncini & Amiel-Tison, 1988;Moon, Panneton-Cooper & Fifer, 1993;Nazzi, Bertoncini & Mehler, 1998) or bilingual (Byers-Heinlein et al., 2010), are able to distinguish between prosodically dissimilar languages. By 4 to 5 months, monolingual and bilingual infants are also able to differentiate their native language from a prosodically similar language (Bahrick & Pickens, 1988;Bosch & Sebastián-Gallés, 1997, 2001Nazzi, Juscyzk & Johnson, 2000). This early ability to discriminate languages is likely to support the separation of the two native languages of the bilingual infant.
Empirical evidence from older bilingual children is also consistent with a differentiated representation of the two languages early in verbal development. There is evidence for differentiated systems in bilingual children's productions at the emerging sociopragmatic (e.g., Genesee, Nicoladis & Paradis, 1995), syntactic (e.g., Meisel, 1990;Paradis & Genesee, 1996), semantic (e.g., Quay, 1995), as well as word level (e.g., Ingram, 1981;Lleó, 2002Lleó, , 2006Lleó, Kuchenbrandt, Kehoe & Trujillo, 2003). Moreover, at least some bilingual infants differentiate the shapes of their earliest words across the two languages by their first birthday (Vihman, 2016). These abilities continue to develop such that, by their second year, bilingual infants are well able to differentiate the prosodic shapes of their utterances (e.g., Lleó, 2002) as well as vary the number of closed syllables across their two languages (Ingram, 1981;Kehoe, 2015;Lleó et al., 2003).
In contrast, research on the developing sound system of bilinguals, even beyond the first word stage, has produced mixed results. Some studies have reported that bilingual children older than 2 do not produce language-specific differences in vowels and consonants (e.g., Kehoe, Lleó & Rakow, 2004). Others have found evidence that bilingual children produce language-specific differences in some, but not all segments (e.g., Johnson & Wilson, 2002;Kehoe et al., 2004). Still others have found that children produce differences in the segments of their two languages at all ages (e.g., Ingram, 1981;Johnson & Lancaster, 1998;Khattab, 2003;Paradis, 2001). In sum, we can see that language-specific influences are more likely to emerge in the prosodic rather than segmental characteristics of early speech of both monolingual and bilingual infants.
In Experiment 1, we compared the prosodic properties of babbling produced by bilingual Spanish-and English-learning 12-month-olds while they interacted with a Spanish-or an English-speaking interlocutor, in order to establish the early precursors of bilingual speech production. Babbling in the two sessions was characterized using two measuresproportion of multisyllabic utterances and proportion of utterances produced with closed syllablesbecause of differences in English and Spanish in these prosodic properties. Conversational English is predominantly monosyllabicabout 80% of words in English have one syllable (Cutler & Carter, 1987). In contrast, 80% of words in Spanish have more than one syllable (Roark & Demuth, 2000). English and Spanish also differ in how often syllables end in a consonantin English, roughly 60% of syllables end in consonants compared to only 25% in Spanish (Roark & Demuth, 2000). We expected that, if babbling at 12 months is language-specific, bilingual infants should produce longer utterances with more open syllables when interacting with a Spanish-speaking interlocutor compared to an English-speaking interlocutor. This would also provide evidence that bilingual infants differentiate between their two languages in speech production.

Subjects
Ten bilingual 12-month-old infants (average age: 368 days, Range: 353-394 days; 5 girls) participated in the study. All infants were reported by their parents to be full-term (38-42 weeks gestation) and healthy on the date of testing with no history of ear infections or speech or hearing difficulties. Infants were included in the bilingual group only if they were learning both languages at home and their daily language input was at least 20%, but no more than 80% in Spanish (average: 41%; Range: 20-80), based on a detailed language questionnaire administered to parents (Bosch & Sebastián-Gallés, 2001;Sundara & Scutellaro, 2011). English and Spanish short-form versions of the MacArthur-Bates Communicative Development Inventory (CDI) (Fenson, Pethick, Renda, Cox, Dale & Reznick, 2000;Jackson-Maldonado, Marchman & Fernald, 2013) were administered to the parents of the infants to measure early language and gestural communication skills. These results are summarized in Appendix A.

Design and Procedure
All infants came to the lab for one visit and participated in two consecutive 30-minute recording sessions; one of the sessions was conducted exclusively in Spanish and the other in English. If necessary, infants were given a break between the two sessions. Recordings were done in a sound-attenuated booth. The first recording session was always with the infant's parent (four in Spanish; six in English), and the second session with a bilingual research assistant (in the other language). The language of the session with the parent was the one habitually used by that parent with the infant.
The parent and the research assistant were in the room for both sessions. This allowed the infant to become comfortable with the lab set-up as well as the research assistant. During the parent's session, the research assistant was instructed not to talk to the infant and to only interact non-verbally when approached by the infant. Similarly, in the research assistant's session, the parent was instructed not to talk to the infant and to interact nonverbally only when approached by the infant. Both adults were asked to cease talking when the infant was vocalizing. The adult interlocutors were provided with a set of quiet toys.
Using the Audio-Technica ATW-T701 wireless microphone system, a stereo recording was made for each babbling session (sampling rate = 44.1 kHz; 16-bit resolution) with the adult interlocutor on one track, and the infant on the other. The microphone was attached to the side of a vest that the infant wore. The adult interlocutor wore a similar microphone attached to his/her clothing. All recordings were made using the Pro-Tools software.

Coding
All acoustic analyses were done in PRAAT, using a combination of waveforms and spectrograms (Boersma & Weenink, 2010). Each of the babbling sessions was first segmented into "utterances." Utterances were defined as a string of infant vocalizations that were separated by at least 700 ms of silence, with no more than 450 ms of silence within the utterance . Next, based on the criteria described in Oller (1986) and Rvachew, Creighton, Feldman, and Sauve (2002), each syllable in every utterance was classified into one of four categoriesfully resonant vowel, canonical syllable, marginal syllable and other, i.e., non-speech sounds.
Fully resonant vowels were defined as "vowel-like utterances with at least two measurable formants and resonances above 1200 Hz, in addition to resonances in the lower frequency range." Canonical syllables were syllables that contained a nonglottal consonant, with transitions lasting 25-120 ms, and containing a fully resonant vowel. Syllables also had to be between 100 and 500 ms in duration to be classified as canonical. Marginal syllables were syllables with a consonant and fully resonant vowel that failed to meet any one of the criteria for canonical syllables. All coding was consistent with the criteria described by Rvachew et al. (2002).
The "non-speech" utterances included quasi-resonant vowels, squeals, cries, whispers, raspberries, and utterances with abnormal phonation. Non-speech utterances were excluded from the analysis. Because we were only assessing the prosodic, not the segmental, characteristics of the babbling, we also excluded the utterances with only fully resonant vowels that lacked consonants. The final dataset included all utterances with canonical and marginal syllables (Average duration:1.12s). The results are similar for marginal and canonical syllables; hence we do not report them separately.
Utterances were extracted from the recordings, so that transcribers were blind to the language spoken by the interlocutor, and independently coded by two transcribers. A third transcriber, who was also blind to the language spoken by the interlocutor, adjudicated in case of disagreements. For an utterance to be included in the analysis, two out of three transcribers needed to agree that the utterance was speech-like. All speech-like utterances were then coded as mono-or multisyllabic. Monosyllabic utterances were those with only one fully resonant vowel within the utterance. Multisyllabic utterances contained two or more fully resonant vowels separated by consonants. Each syllable was also coded as open (V, CV) or closed (VC, CVC).

Statistical analysis
As recommended by Jaeger (2008), we used linear mixed logistic regression models to analyze two binary outcomes, whether or not an utterance (a) was multisyllabic, or (b) had a closed syllable. In principle, bilingual infants could have produced utterances with
multiple closed syllables, in which case our coding would underestimate infants' ability to produce closed syllables. In fact, less than 1% of the utterances had more than one closed syllable separated by (less than 450ms long) silence (11 out of 1417, 0.8%); thus, we could not analyze the number of closed syllables as a continuous dependent variable because of the extremely small number of utterances with more than one closed syllable. The binary coding allowed us to analyze both dependent variables using mixed logistic regression models. We modeled the log odds (logit) of each of the two outcomes, e.g., a bilingual infant producing a multisyllabic utterance, as a function of the language of the interlocutor (Spanish vs. English) weighted by the total number of utterances produced by that child in that specific session. The weighting was included to adequately represent the substantial variation in the speech output across infants (e.g., the number of utterances produced in one session ranged from 9 to 212). With this weighting individual infants influenced the results to the extent proportionate to the quantity of their output 1 .
Additionally, a random intercept for each infant, and a random slope for the language of the interlocutor, were also included. The random intercept allowed us to model variability across infants' anatomical development as well as speech motor control that might influence their overall ability to produce developmentally later-acquired monosyllabic utterances and closed syllables. The random slope was included to allow for differences in the degree to which each bilingual infant was able to alter her speech production in the two languages. This could either be due to absolute differences in the amount of input each infant received in Spanish and English in her daily life, or due to differences in the infants' uptake of that language input.
The final model was determined by backward stepwise comparison. Each effect was removed from the model, one at a time, and the log likelihood of the two resulting models that were in a subset relationship were compared using a Likelihood Ratio test. This was done to determine if the inclusion of factors significantly improved model fit. All analyses were implemented in R (R Core Development Team, 2013) using the lme4 package (Bates, Maechler, Bolker & Walker, 2015).

Results
The final sample included 1417 utterances consisting of 3835 syllables (see Table 1 for a breakdown by the language of the interlocutor). Across the two languages, half the multisyllabic utterances had just two syllables. There was no significant difference in the number of utterances bilingual infants produced in the Spanish (Average: 77; Range: 20-212) and the English session (Average: 65; Range: 9-187), t(9), = 1.6, p = 0.14, d = 0.20. There was also no significant difference in the number of utterances bilingual infants produced with their parents (Average: 70; Range: 9-187) and the research assistant (Average: 72; Range: 20-212), t(9), = −0.25, p = 0.8, d = −0.03, indicating that the infants were comfortable in the lab set-up. The results for length of utterance and syllable shape are presented in Figures 1 and 2, respectively.

Length of utterance
Overall, 79% of the utterances in the Spanish session (Range = 64-100) and 70% of the utterances in the English session (Range = 52-89) were multisyllabic. Further, 9 out of 10 bilingual infants produced more multisyllabic utterances with the Spanish-speaking interlocutor than with the English-speaking interlocutor.
To evaluate these differences, a mixed effects logistic regression model was fitted to predict whether the syllable was multisyllabic. The final model included the fixed effects of the language of the interlocutor and the number of utterances produced in that session ( Table 2). The random slope for language of interlocutor was not included in the final model because it did not significantly improve model fit, χ 2 (2) = 3.96, p = 0.14 2 .
The significant positive intercept indicates that, overall, infants were likely to produce more multisyllabic than monosyllabic utterances. This is unsurprising, given previous research showing that, across languages, infants produce multisyllabic utterances earlier in development than monosyllabic ones (Davis & MacNeilage, 1995;Kern & Davis, 2010;. Crucially, the language of the interlocutor significantly predicted the log odds of producing a multisyllabic utterance [χ 2 (1) = 22.0, p < 0.001]. The positive estimate for language of interlocutor indicates that infants were more likely to produce a multisyllabic utterance with the Spanish-speaking interlocutor.

Syllable shape
Overall, 11% of the utterances produced by the bilingual infants in each of the two language sessions contained closed syllables (Spanish Range = 3-31; English Range = 4-16). Out of 10 subjects, 6 produced more closed syllables with the English-speaking interlocutor than with the Spanish-speaking interlocutor. Although we obtained a detailed parent report of the percent input in Spanish for each child, we did not include it in the model as a fixed effect because the overall difference in proportion of multisyllabic words across the two sessions did not correlate with percent Spanish exposure.
To evaluate these differences, another mixed-effects logistic regression model was fitted to predict whether an utterance had a closed syllable. The final model included the fixed effects of the language of the interlocutor and the number of utterances produced in that session ( Table 2). The random slope for language of the interlocutor was included because it significantly improved model fit [χ 2 (2) = 6.84, p = 0.03].
The significant negative intercept indicates that, overall, infants were more likely to produce utterances with open rather than closed syllables. Again, this is unsurprising, given previous research that infants produce open syllables earlier in development than closed syllables (Davis & MacNeilage, 1995;Kern & Davis, 2010;. The language of the interlocutor, however, did not significantly predict the log odds of producing an   2. Distribution of utterances with closed syllables produced by 12-month-old infants in the bilingual (n = 10), monolingual English (n = 10), and monolingual English with short term exposure to Spanish (n = 10) groups.
utterance with a closed syllable [χ 2 (1) = 0.42, p = 0.51]. The finding that the random slope for the language of the interlocutor, but not the fixed effect of language of the interlocutor, was a significant contributor to model fit shows that there was a large amount of variability in the production of closed syllables in Spanish and English sessions across the bilingual infants. It also suggests that only some infants showed the ability to manipulate syllable shapes across languages. To summarize, bilingual 12-month-olds produced more multisyllabic utterances with a Spanish-compared to an Englishspeaking interlocutor, a difference that is consistent with the prosody of the target language. Thus, the babbling of pre-lexical infants shows language-specific characteristics. Such languagespecific differences in the production of the two languages are also consistent with the argument that there are separate representations of the two languages in bilingual infants as young as 12 months of age.
What bilingual 12-month-olds did not do, as a group, was to alter the shape of the syllables (i.e., open or closed) to match that of the language of their interlocutors. We can rule out the possibility that developmental immaturity severely limited the bilingual 12-month-olds' ability to alter the proportion of closed syllables as a function of their ambient language because, in previous research, both monolingual and bilingual infants have been reported to produce closed syllables among their earliest words at the same age (Kehoe, 2015;Lleó et al., 2003). It is, however, possible that specific exposure to Spanish, a language that has few closed syllables, limited the bilingual infants' production of closed syllables in English. We discuss this possibility later in the paper after comparing the proportion of closed syllables produced by bilingual Spanish-English infants and monolingual English infants.

Experiment 2
In Experiment 1, we showed that bilingual 12-month-olds altered the length of their utterances, but not syllable shape, as a function of the language of the interlocutor. In Experiment 2, we tested monolingual English-learning 12-month-olds using the same set-up as in Experiment 1, to determine whether any previous exposure to Spanish is necessary for infants to be able to systematically alter their speech production when interacting with a Spanish-speaking interlocutor.
In recent years, there have been several lines of research showing that phonetic imitation, also called phonetic convergence, is a powerful learning mechanism by which children might become native speakers. Newborn infants have been shown to imitate tongue and lip gestures that form the precursors of early speech (Meltzoff & Moore, 1983, 1997Kugiumutzakis, 1999). By 4.5-months of age, infants are further able to imitate vowel sounds (Kuhl & Meltzoff, 1996). Even adults, without conscious control, alter their speech production to sound like their interlocutors in an immediate and automatic process (e.g., Babel, 2012;Goldinger, 1998;Nielsen, 2011). Experiment 2 allowed us to test whether 12-month-old monolingual English-learning infants can rapidly converge on the speech of their interlocutors, even in the absence of previous exposure to the language spoken by the interlocutors (in this case, Spanish).

Subjects
Ten English-learning monolingual 12-month-old infants (Average age: 367 days, Range: 354-384 days; four girls) participated in the study. Inclusion criteria for the subjects were identical to that in Experiment 1, with the exception of language input; infants were included only if they received more than 95% exposure to English (Average: 99%, Range: 95-100%). One additional infant was tested but excluded from analysis because she did not produce any canonical syllables in either session. MacArthur-Bates CDI short forms were administered to the parents of these infants as well. These results are summarized in Appendix A.

Design and Procedure
As in Experiment 1, infants came to the lab for one visit, and their babbling was recorded in two consecutive 30-minute sessions. The first session was always with the parent, in English. During this session, the Spanish-speaking research assistant was also present in the room. This allowed the infant to become comfortable with the lab set-up as well as the research assistant. The details of the recording set-up were identical to the one used in Experiment 1.

Coding
Coding of infant babbling was also identical to that in Experiment 1.

Results
The final sample included 705 utterances consisting of 1729 syllables (see Table 1 for a breakdown by the language of the interlocutor). Across the two languages, half the multisyllabic utterances had just two syllables. Monolingual English-learning infants produced significantly fewer utterances in the Spanish session while interacting with the research assistant (Average = 29; Range = 12-61) than in the English session with their parent Recall that the bilingual infants tested in Experiment 1 were also unfamiliar with the laboratory set-up and the research assistants; however, they produced a comparable number of utterances in English and Spanish. So, this difference is unlikely to be due to the newness of the lab set-up or the research assistant alone.

Length of utterance
Overall, 63% of the utterances infants produced in the Spanish session (Range = 46-82) and 62% of utterances in the English session (Range = 43-79) were multisyllabic. Out of 10 infants, 6 produced more multisyllabic utterances with the Spanish-speaking interlocutor than with the English-speaking interlocutor ( Figure 1). As in Experiment 1, the final mixed logit model included the fixed effects of language of the interlocutor and the number of utterances produced in that session ( Table 2). The random slope for language of interlocutor was not included because it did not significantly improve model fit [χ 2 (2) = 0.14, p = 0.93]. The significant positive intercept indicates that, like bilingual infants, monolingual English-learning 12-month-olds were likely to produce more multisyllabic than monosyllabic utterances. Crucially, the language of the interlocutor did not significantly predict the log odds of producing a multisyllabic utterance [χ 2 (1) = 0.48, p = 0.49]. Thus, monolingual English-learning infants with no exposure to Spanish did not alter the length of their utterance as a function of the language of the interlocutor.

Syllable shape
Overall, 15% of the utterances produced by the monolingual English-learning 12-month-olds in the Spanish session (Range = 0-29) and 13% of the utterances produced in the English session (Range = 0-28) contained closed syllables. Out of the 10 infants, only 4 produced fewer closed syllables in the Spanish session than in the English session ( Figure 2).
In the mixed logit model used to test this effect, the random slope for language of interlocutor was not included because it did not significantly improve model fit [χ 2 (2) = 0.005, p = 0.99]. As was found with the bilingual infants (Experiment 1), the significant negative intercept obtained with the monolingual English-learning infants indicated that, overall, the infants were likely to produce utterances with open rather than closed syllables. Additionally, the significant positive effect of total number of utterances shows that infants who produced more utterances were likely to produce more utterances with closed syllables. Most relevant for our hypothesis, the language of the interlocutor did not significantly predict the log odds of producing an utterance with a closed syllable [χ 2 (1) = 2.88, p = 0.09].
These results indicate that infants without prior exposure to Spanish did not alter their speech production to match the specific language characteristics of the interlocutor, either in utterance length or in syllable shape. Therefore, imitation alone cannot account for bilingual infants' ability to produce more multisyllabic utterances with the Spanish-speaking interlocutor when compared to the English-speaking interlocutor.

Experiment 3
In Experiment 3 we investigated whether exposure to a second language for a very brief period in infancy, roughly for about 5 hours, is sufficient to alter the prosodic characteristics of infant babbling. We know that exposure to a novel language for this period of time alters infants' perceptual processing of speech (Kuhl, Tsao & Liu, 2003;Conboy, Brooks, Meltzoff & Kuhl, 2015). In this experiment we asked whether short-term exposure to a new language results in changes in speech production as well.
Monolingual English-learning infants received short-term exposure by interacting with different native speakers of Spanish in twelve 25-minute play sessions over a 4-to 6-week period, for a total of about 5 hours. Infants were exposed to Spanish when they were between 9.5 and 10.5 months of age. Roughly one month after the cessation of exposure to Spanish, we recorded babbling in two free play sessionsfirst, when infants were interacting with their parent who spoke English; and then, when they were interacting with a Spanish-speaking research assistant.
We used play sessions for language exposure because studies showing benefits of language exposure typically involve social interactions. Social interaction has been previously documented to be crucial for learning vocal behavior, both in infants and in songbirds (see Goldstein & Schwade, 2010, for a review). In naturalistic play sessions, caregivers are more likely to respond to infants' productions of fully resonant vowels, and marginal and canonical syllables, than to their productions of other types of sounds (Gros-Louis, West, Goldstein & King, 2006). In turn, 8to 10-month-olds who receive contingent responses from mothers produce greater numbers of marginal and canonical syllables (Goldstein, King & West, 2003). Thus, we expected language exposure via social interaction to be most effective in helping infants to develop language-specific production.
Experiment 3 was designed to address three issues. First, we were interested to know if 5 hours of exposure to a second language was enough for infants to alter their speech production as a function of the language of their interlocutor. Second, we were interested in whether sequential exposure to two languages has different consequences for speech production than simultaneous exposure. Third, we were also interested in the beginnings of bilingual acquisition. By comparing the babbling of infants with limited, recent exposure to a second language we can address whether infants differentiate their babbling from the onset of dual language exposure. If short-term exposure to Spanish facilitates language-specific speech production, we expected infants to produce significantly longer, more multisyllabic utterances, with fewer closed syllables in sessions with a Spanish-speaking interlocutor than with an English-speaking interlocutor.

Subjects
Ten 12-month-old infants (Average age: 369 days, Range: 342-381 days; six girls) from monolingual English-speaking homes participated in the study. As per parent report, none of them had exposure to Spanish prior to the experimental Spanish exposure, or during that time outside of the experimental exposure sessions. Five additional infants were tested but excluded from the final sample because they either completed only one babbling session (n = 2), or because they produced fewer than 20 speech-like utterances across both sessions (n = 3). Finally, MacArthur-Bates CDI long forms (Fenson, Marchman, Thal, Dale & Reznick, 2007) were administered to these infants in English. However, as these infants were part of a longitudinal study, the forms were administered about 40 days (Range = 12-50) before the babbling session. These results are also summarized in Appendix A.

Design
All 15 infants (including the 5 that were excluded) were enrolled in the longitudinal study when they were 9.5 months of age. The design of the exposure sessions was modeled after Kuhl et al.'s (2003) study of effects of short-term exposure to a second language on speech perception in infancy. After baseline behavioral and event-related potential (ERP) measures (ERP outcomes are not reported in this paper) the infants participated in 12 exposure sessions over a period of 4-5 weeks. In these exposure sessions, infants interacted with a Spanish-speaking research assistant for 25 minutes. These sessions consisted of the Spanish-speaking research assistant reading books to or playing with toys with the infants. Each infant was exposed to three different Spanish-speaking research assistants, at least three different times. Within two weeks after the end of the exposure sessions, post-exposure behavioral and ERP measures were taken for all infants (see , for details about the participants and the exposure sessions). Within the next two weeks, infants were invited to participate in the production recordings. The details of the recording set-up were identical to those used in Experiment 1, except that an Azden 221LT Dual Lavalier wireless microphone was used to record the infants' speech, and that the data for this experiment were collected in Seattle.

Results
The final sample included 895 utterances consisting of 1771 syllables (see Table 1 for a breakdown by the language of the interlocutor). Across the two sessions, two thirds of the multisyllabic utterances were disyllabic. Monolingual English-learning infants with short-term exposure to Spanish produced a comparable number of utterances when interacting with the Spanish-speaking research assistant (Average: 45; Range: 6-153) and their English-speaking parent (Average: 45; Range: 11-109), t(9) = 0.17, p = 0.87, d = 0.66. This is perhaps not surprising given the familiarity of the infants with the laboratory due to their repeated visits for the exposure sessions as well as their familiarity with Spanish-speaking interlocutors.

Length of utterance
Overall, monolingual English-learning 12-month-olds with shortterm exposure to Spanish produced 67% multisyllabic utterances in the Spanish session (Range = 35-86) and 52% multisyllabic utterances in the English session (Range = 21-83). All 10 infants in this group produced more multisyllabic utterances with the Spanish-speaking interlocutor than with the English-speaking parent.
The final mixed logit model (Table 3) did not include the random slope for language of interlocutor because it did not significantly improve model fit [χ 2 (2) = 0.86, p = 0.65 3 ]. Crucially, the language of the interlocutor significantly predicted the log odds of producing a multisyllabic utterance [χ 2 (1) = 13.8, p = 0.0002]. The positive estimate for language of interlocutor indicates that infants were more likely to produce a multisyllabic utterance with the Spanish research assistant than with their parent. Thus, like the bilingual infants, monolingual infants with only 5 hours of previous social exposure to Spanish were also able to alter the length of the utterance to match that of their interlocutor.

Syllable shape
Overall, 15% (Range = 0-27) of the utterances produced by the monolingual English-learning 12-month-olds with short-term exposure to Spanish in the Spanish session, and 21% (Range = 0-61) of their utterances in the English session, contained closed syllables. Out of 10 infants, one did not produce any closed syllables in either session; thus, this subject had to be excluded from the mixed logit analysis. Of the remaining 9, 6 produced fewer closed syllables in the Spanish session than in the English session.
In the final mixed logit model, the random slope for language of interlocutor was not included because it did not significantly improve model fit [χ 2 (2) = 2.28, p = 0.32]. The language of the interlocutor did not significantly predict the log odds of producing an utterance with a closed syllable, according to an LR test, although there was a trend in this direction [χ 2 (1) = 3.0, p = 0.08].
Perceptual evaluation of speech produced by infants with short-term exposure to Spanish To rule out the possibility that the infants with short-term exposure to Spanish had simply memorized a few words in Spanish, we asked six phonetically-trained adults who were native speakers of either English or Spanish to classify the utterances as Spanish or English. If adults were able to recognize words in their native language, we expected them to perform above chance. However, the judgments of these adults were at chance (Average: 50%, Range: 46-55). Thus, it is unlikely that these results are simply due to infants producing a few known words in the Spanish or English session.
In summary, monolingual English-learning infants with just 5 hours of exposure to Spanish babbled differently with Spanishversus English-speaking interlocutors. These results confirm that infants altered the prosodic properties of their speech to resemble the language of their interlocutor. Together with the results from Experiment 1, our data demonstrate that 12-montholds with exposure to two languages differentiate their speech production based on the language spoken by the interlocutor. Moreover, they do so even with as little as 5 hours of exposure to one of those two languages.
Comparing bilingual, monolingual and monolingual infants with short term exposure to Spanish In this section we directly compare the babbling of the three groups of 12-month-old infants: bilinguals, monolinguals, and monolinguals with short-term exposure to Spanish. Before presenting the results of the statistical comparison between the groups, we compared the groups on percentile scores based on the number of words understood in English measured by the CDI. Recall that the bilingual and monolingual English-learning infants tested in Experiments 1 and 2 were administered the short forms whereas the monolingual English-learning infants with short term exposure from Experiment 3 were administered the long forms. Additionally, these long forms were obtained about a month before the babbling sessions. Nonetheless, independent sample t-tests comparing the CDI percentiles found no significant differences between the bilingual infants and the monolingual English-learning infants with (t(18) = −1.4, p = 0.2) or without short-term exposure to Spanish (t(18) = 0.87, p = 0.4); there was also no significant difference in CDI percentiles between the monolingual English-learning infants with and without short-term exposure to Spanish (t(18) = −1.8, p = 0.1). Thus, any differences across groups are not likely to be due to differences in receptive vocabulary. We do not have information about word production for the infants with short-term exposure to Spanish. The monolingual group without exposure to Spanish had an average production raw score of 2 (range 0-8) whereas the bilingual group had an average production score of 1.8 in English (range 0-7) and 1.25 (range 0-5) in Spanish. That is, the production scores on the CDI short forms for the monolingual group without exposure to Spanish and the bilingual group were completely overlapping.
To compare the three groups using a mixed logit model, we included two dummy variables -Bilingual (bilingual = 1; monolingual = 0; monolingual with short term exposure = 0) and Short-term exposure to Spanish (bilingual = 0; monolingual = 0; monolingual with short term exposure = 1). This was in addition to the fixed effects of total number of utterances and language of the interlocutor and its interaction with each of the dummy variables. Finally, two random effects, a random intercept for subject and a random slope to account for the variable effect of language of interlocutor on each subject were also included as predictors in the mixed logit model.

Length of utterance
The random slope for language of interlocutor was not included in the final mixed logit model (Table 4) because it did not significantly improve model fit [χ 2 (2) = 2.56, p = 0.28 4 ]. Crucially, the interaction of both dummy variables with language of interlocutor significantly improved model fit [Bilingual: χ 2 (1) = 8.94, p = 0.003; Exposure: χ 2 (1) = 6.49, p = 0.01]. Thus, the bilingual group and the monolingual group with short-term exposure to Spanish differed from the monolinguals in that only the infants in the two former groups altered the length of their utterance as a function of the language of the interlocutor.
The only difference in utterance length between the bilingual and monolingual group with short-term exposure to Spanish was that bilingual infants produced a greater proportion of multisyllabic utterances overall. This was confirmed by another mixed effect model where the monolingual group with short-term exposure to Spanish was coded as the reference group [Estimate = 0.93, SE = 0.31, z-value = 3.0, p-value = 0.003; χ 2 (2) = 8.94, p = 0.01].

Syllable shape
In the final mixed logit model predicting syllable shape, the random slope for language of interlocutor was not included because it did not significantly improve model fit [χ 2 (2) = 4.20, p = 0.12]. As in all previous analyses of syllable shape as a dependent variable, the three groups did not differ in their ability to alter syllable shape as a function of the language of the interlocutor.
A follow-up analysis showed only a marginal difference between the bilingual group and monolingual group with shortterm exposure to Spanish: infants in the bilingual group produced a smaller proportion of closed syllables overall. This was confirmed by another mixed effect model where the monolingual group with short-term exposure to Spanish was coded as the reference group [Estimate = −0.85, SE = 0.33, z-value = −2.56, p-value = 0.01; χ 2 (2) = 5.82, p = 0.05].
In summary, compared to monolingual English-learning 12-month-olds, bilingual infants and infants with short-term exposure to Spanish altered the length of their utterances as a function of the language of the interlocutor. There was also a trend towards altering syllable shape, i.e., producing more closed syllables, as a function of the language of the interlocutor, but only in monolingual 12-month-olds with short-term exposure to Spanish. An explicit comparison between monolingual infants with short-term exposure to Spanish and bilingual infants demonstrated that the latter produced more multisyllabic utterances overall, while the former had a trend towards greater number of closed syllables overall. Recall that the groups had comparable receptive vocabularies as measured by CDI percentiles. Thus, qualitative differences in the effects of dual language exposure on speech production, based on the extent and timing of exposure to the second language, are evident even in infancy.

General discussion
In three experiments, we tested whether exposure to a second language is necessary and sufficient to alter speech production of pre-lexical infants. Our results show that bilingual Spanish-and English-learning, as well as monolingual English-learning 12-month-olds with merely 5 hours of exposure to Spanish, altered their babbling as a function of the language of their interlocutor. Specifically, they produced significantly more multisyllabic utterances when interacting with a Spanish-speaking interlocutor compared to an English-speaking interlocutor. Monolingual English-learning infants without this exposure did not alter their babbling to match the prosody of the language spoken by their interlocutor. Thus, exposure to a second language, whether short-or long-term, is necessary and sufficient to alter the speech production of infants in the first year of life. We discuss the implications of these findings on three aspects of language acquisition. First, we show that infants in two groupsbilingual as well as monolingual with short-term exposure to Spanishwere able to differentiate their babbling when interacting with Spanish and English interlocutors. Moreover, these differences were consistent with the prosodic properties of Spanish and English. Our results add to the growing body of research supporting the idea that prosodic properties of babbling reflect the characteristics of the ambient language. Given that infants have been previously reported to alter the prosodic shape of their first words in response to their language input (e.g., Vihman, 2016), our results also support the hypothesis that there is continuity between early babbling and later speech production. These data are compelling because we show this within, rather than across, infants.
Second, not only did infants' babbling reflect the characteristics of the ambient language, the results also showed that infants flexibly navigated between the two language modes depending on the language spoken by the interlocutor. Early differentiation in speech production is consistent with the development of separate representations of the bilinguals' two languages. Given suggestions that there are as many, if not more, children growing up bilingual than monolingual (Crystal, 1997;Tucker, 1998), our results lay the groundwork for characterizing the limits of the language faculty. Specifically, our findings showcase the behavioral plasticity of bilingual infants in speech production.
We showed that, as a function of the language of their interlocutor, infants in the bilingual and short-term exposure group altered the length but not syllable shape of their utterances. As mentioned previously, we can rule out the possibility that the articulatory complexity of closed syllables prevented infants from being able to use syllable shape to differentiate between languages because, in previous reports, monolingual and bilingual infants at about the same age have been reported to manipulate syllable shape while producing their first words (e.g., Lleó et al., 2003). Additionally, recall that monolingual English-learning infants with short-term exposure to Spanish showed a trend towards producing more complex syllables when interacting with an English interlocutor rather than a Spanish one.
One possibility is that perhaps bilingual infants hear more adults who themselves are non-native speakers of English than infants from monolingual English-speaking homes (Bosch & Ramon-Casas, 2011). Because of this, bilingual infants could hear more variable productions of closed syllables in their input compared to their monolingual English-learning peers. In turn, this could result in fewer closed syllables being produced by bilingual infants (11%) compared to monolingual infants with shortterm exposure to Spanish (21%). However, this explanation cannot account for why monolingual infants without short term exposure to Spanish also produced fewer closed syllables (13%) than the monolingual infants with short-term exposure to Spanish. Alternately, it is possible that the infants who were tested in Los Angelesbilinguals as well as monolinguals without exposure to Spanishhad more experience with non-native (especially, Spanish-speaking) speakers of English, and consequently, less exposure to closed syllables, than infants who were tested in Seattle, and that these previous listening experiences altered the infants' productions of closed syllables.
A more promising proposal is that of Kirk and Demuth (2006), that children produce coda consonants earlier at the ends of monosyllables than disyllables because monosyllables are typically longer, and thus require less rapid movement of articulators (see White & Turk, 2010, for a nuanced discussion of polysyllabic shortening). Consistent with this proposal, bilingual infants who produced the smallest percentage of monosyllables in English (30%) also produced the fewest utterances with closed syllables (11%), followed by monolingual infants with no exposure to Spanish (monosyllables = 38%, closed syllables with English interlocutor = 13%), and finally the monolingual infants with exposure to Spanish (monosyllables = 48%, closed syllables with English interlocutor = 21%).
We can then ask why bilingual infants were likely to produce significantly more multisyllabic utterances compared to monolingual infants with short-term exposure to Spanish. One intriguing possibility is that this difference stems from the extent of interaction between the two language systems of the bilingual and monolingual infants with short-term exposure to Spanish. Mutual systemic influences of the two languages of a bilingual learner have been previously documented and are thought to manifest as differences in rate of development or transfer (Paradis & Genesee, 1996). A predominance of multisyllabic utterances, as seen in the bilingual infants, is consistent with either a difference in the rate of acquisition of monosyllabic utterances in English or a transfer of open, multisyllabic utterances from Spanish. Future research is necessary to distinguish between the two accounts. In either case, the evidence from speech production in pre-lexical bilingual infants is consistent with interdependent development of two language systems (see also Lleó, 2006;Kehoe, Trujillo & Lleó, 2001;Lleó et al., 2003 for similar findings in older children).
In contrast, the monolingual English-learning infants with shortterm exposure to Spanish did not show the same evidence of mutual systemic influence of the two languages. Note that these infants had sequential exposure to two languages. In analogous cases of sequential exposure to a second language, adult (and child) learners of a second language have not been shown to form distinct phonological systems for the two languages (for reviews see Best, 1995;Flege, 1995;Guion, 2003). In contrast, monolingual English-learning infants with early, short-term exposure to Spanish did develop two distinct representations for the two languagesas demonstrated by their ability to alter their speech production to match the prosody of the interlocutor. However, these infants did not show any evidence of transfer, i.e., interaction between the two systems. This leads us to an intriguing, hitherto novel hypothesis that bilingual infants might progress through a period (however brief and transitory) when their two language systems are independent, before showing evidence for interactions.
Finally, our results provide compelling evidence for the immediate benefits of early exposure to a second language on speech production. We have known for some time that, compared to novice adult learners of Hindi (Tees & Werker, 1984) Spanish (Au, Knightly, Jun & Oh, 2002;Au, Oh, Knightly, Jun & Romo, 2008;, adults who either heard or spoke the language as a child have better perception and production abilities when relearning that childhood language later in life. Thus, exposure to a language within the first year of life provides sustained benefit to learners, helping infants become native listeners. This is evident most strikingly in cases where that exposure is discontinued (Oh et al., 2010). More recent are findings that benefits of early exposure begin to be observed almost immediately. For example, infants' speech perception abilities change with as little as 5 hours of exposure to a second language (Kuhl et al., 2003;. Here we demonstrate that infants' speech production as well changes with as little as 5 hours of exposure to a second language in the first year of life, even a month after exposure has been discontinued. In studies showing the effects of short-term exposure on speech perception and production, a supporting social context accompanying the exposure seems essential. When compared explicitly, infants are only able to exploit regularities in the second-language input when multiple speakers interact with them directly, but not when the same speech is presented through audio or DVD recordings (Kuhl et al., 2003). The effects of presenting infants with a few minutes of decontextualized distributions of synthetic stimuli in controlled laboratory settings, devoid of social interaction, as in artificial language experiments (Maye & Weiss, 2003;Maye, Werker & Gerken, 2002), seem to be much more variable (Cristia, 2018). We know from crossspecies research that this is likely because social interactions provide infants with contingent feedback, which they are able to exploit (Goldstein & Schwade, 2010). Further, the form of caregivers' responses has also been shown to influence infants' babbling. When caregivers respond with contingent feedback to infants' productions of fully resonant vowels, the infants' subsequent production of fully resonant vowels increases; when the feedback is provided to infants' CVs instead, infants' production of CVs increases (Goldstein & Schwade, 2008). Perhaps caregivers respond differentially to elements of infant babbling that do and do not conform to their native language, and, in turn, infants selectively increase different forms of babbling in response to their caregivers' contingent feedback. This is a hypothesis that needs to be tested.
What our results show is that infants do not alter the length and syllable shape of their babbling immediately in response to caregivers' feedback. Recall that monolingual English-learning infants without exposure to Spanish did not alter the length of their utterance in response to the Spanish, compared to the English, interlocutor. Our results also show that they do so when they have had about 5 hours of previous exposure to Spanish, even when they are tested one month after the exposure has been discontinued. How much longer the effects of shortterm exposure endure is also a question for future research.
In conclusion, we investigated whether infants' speech production exhibits language-specific characteristics reflecting the ambient language of their interlocutor. We did this by comparing babbling data from bilingual Spanish-and English-learning infants (Experiment 1), and monolingual English learning infants with and without short term exposure to Spanish (Experiments 3 and 2, respectively) while they are interacting with Spanish or English-speaking adults. With data from these three groups we show that, by 12 months, infants can alter their babbling to match that of Spanish and English interlocutors, but only if they have had at least some previous social exposure to both languages.