Heritage language exposure impacts voice onset time of Dutch–German simultaneous bilingual preschoolers

This study assesses the effects of age and language exposure on VOT production in 29 simultaneous bilingual children aged 3;7 to 5;11 who speak German as a heritage language in the Netherlands. Dutch and German have a binary voicing contrast, but the contrast is implemented with different VOT values in the two languages. The results suggest that bilingual children produce ‘voiced’ plosives similarly in their two languages, and these productions are not monolingual-like in either language. Bidirectional cross-linguistic influence between Dutch and German can explain these results. Yet, the bilinguals seemingly have two autonomous categories for Dutch and German ‘voiceless’ plosives. In German, the bilinguals’ aspiration is not monolingual-like, but bilinguals with more heritage language exposure produce more target-like aspiration. Importantly, the amount of exposure to German has no effect on the majority language's ‘voiceless’ category. This implies that more heritage language exposure is associated with more language-specific voicing systems.


Introduction
Bilingual children's realization of the voicing contrast has received substantial attention in language acquisition research during the past two decades, and consistently revealed differences from monolingual children's VOT production (Deuchar & Clark, 1996;Fabiano-Smith & Bunta, 2012;Johnson & Wilson, 2002;Kehoe, Lleó & Rakow, 2004;Khattab, 2000;McCarthy, Mahon, Rosen & Evans, 2014). These studies have been conducted on mainly small samples of bilinguals immersed in an aspiration language (i.e., English, except for Kehoe et al., 2004 on German) with a prevoicing language as the * We thank Arjan Cuppen, Lottie Gort, Eva Koch, Jana Loh and Rianne Vlaar for their help collecting the data and Franziska Klier, Ann-Katrin Ohlerth and Natascha Roos for their help annotating the data. We are also grateful to three anonymous reviewers for insightful comments and suggestions on an earlier version of this paper.
Supplementary material can be found online at http://dx.doi.org/10.1017/S1366728917000116 minority language. Although these studies used adequate statistical analyses, they were not designed to statistically assess the effects of age or language exposure, which are important factors in monolingual and bilingual language acquisition (Armon-Lotem & Ohana, 2017;Gathercole & Hoff, 2007;Gathercole & Thomas, 2009;Mayr & Siddika, published online 17 October, 2016;Unsworth, 2013;Yu, De Nil & Pang, 2015). To determine to what extent age and language exposure can explain bilinguals' linguistic behaviors, samples of participants must be large enough to allow for association analyses. Furthermore, it is essential to the field of early bilingual phonological acquisition to determine whether previous findings on minority languages acquired in an English-dominant environment extend to other acquisition settings and languages (Kehoe, 2015). The present study is the first to address these outstanding issues in a sample of Dutch-German bilingual preschoolers that is large enough to allow for association analyses between the effects of both age and language exposure, and bilingual children's speech production.
CLI can cause bilingual speech to be differential or not 'native-like' (see Kupisch & Rothman, published online June 22, 2016 for a critical perspective on terminology), meaning that bilinguals produce speech sounds differently from monolinguals. When a bilingual's speech differs from a monolingual's speech, it may be perceived as foreign-accented (Flege, 1984;Major, 1987;Riney & Takagi, 1999;Sancier & Fowler, 1997;Schoonmaker-Gates, 2015). Such differential bilingual speech can still be 'language-specific' if similar sounds are produced differently in the two languages. Conversely, CLI may have facilitative effects on bilinguals' language development and accelerate their acquisition of certain linguistic structures compared to monolingual acquisition (Grech & Dodd, 2008;Mayr, Howells & Lewis, 2015;Tamburelli, Sanoudaki, Jones & Sowinska, 2015). Acceleration can occur when one of the bilingual's languages contains a difficult and/or infrequent structure that is more frequent in the other language. The practice with such a structure in one language may have facilitative effects in the other language.
Bilinguals acquire two languages in the same amount of time in which a monolingual acquires a single language, resulting in overall less exposure and therefore less experience with each language relative to monolingual acquisition (Gathercole & Thomas, 2005Unsworth, 2008;Unsworth, Argyri, Cornips, Hulk, Sorace & Tsimpli, 2014). Reduced exposure likely results in slower acquisition of linguistic structures that are distinct between the bilingual child's two languages. As a result of this reduced exposure, bilinguals may reach certain developmental stages later than their age-matched monolingual peers.
To date, there is no framework that specifically targets the speech of young simultaneous bilingual children. However, models of the speech of sequential bilingual adults and monolingual children are available and can be extended to account for CLI and language exposure effects in simultaneous bilingual children. The SPEECH LEARNING MODEL (SLM; Flege, 1995) originally focuses on age of acquisition-related constraints on native-like production of L2 sounds, and can partially account for CLI in simultaneous bilinguals' speech (Fabiano-Smith & Bunta, 2012;Fabiano-Smith & Goldstein, 2010;Gildersleeve-Neumann & Wright, 2010). The SLM assumes that many production errors in the L2 are rooted in sound perception, and puts forward seven hypotheses of the L2 learner's sound perception, sound processing and storage, and sound production. Two of these hypotheses can be extended to the sound production of simultaneous bilingual children.
The first hypothesis, henceforth the 'Age of Acquisition Hypothesis', states that increasing age of acquisition goes hand in hand with a decreasing ability to distinguish L1 and L2 sounds. This hypothesis inversely suggests that an early age of acquisition promotes the ability to discriminate between sounds, resulting in less CLI and more language-specific acquisition of speech sounds. In the case of simultaneous bilingual acquisition, both languages are acquired in parallel from birth, and the Age of Acquisition Hypothesis can be extended to suggest that simultaneous bilingual children may be less prone to CLI and are likely to acquire native-like sounds in both of their languages.
The second hypothesis, henceforth the 'Equivalence Classification Hypothesis' (cf. Flege, 1987) formulates an exception to the Age of Acquisition Hypothesis. Equivalence classification is one form of CLI and proposes that the formation of new phonological categories may be blocked if an L2 sound overlaps with a similar L1 position-sensitive allophone. In the context of simultaneous bilingual acquisition, equivalence classification may cause a bilingual child to acquire only one category for two sounds that she perceives to be alike in the two languages. Such category mergers are natural language change processes that normally unfold over time in language communities (Romaine, 1978;Wells, 1982). In sum, the SLM can account for differential sound production by simultaneous bilinguals as a result of CLI in the perception and category formation of sounds that are perceptually similar between the two languages. This model does not ascribe the bilinguals' differential sound production to differences in language exposure between bilinguals and monolinguals.
The second model that can be extended to the speech of young bilinguals is the A(RTICULATORY)-MAP model (McAllister Byun, Inkelas & Rose, 2016), which explains differences between (monolingual) child and adult speech through anatomical and motor control differences. The model proposes that experience-based information about previous articulator movements and the resulting acoustic outputs is stored in episodic memory. Two grammatical constraints draw on these episodic traces: ACCURACY formalizes the pressure to match adult speech production, while PRECISION formalizes the pressure to produce stable and wellpracticed realizations, even if they do not perfectly match the adult-target. Interactions between accuracy, precision, and other relevant constraints, determine a child's actual speech production. The A-Map model explicitly predicts that children's speech production becomes increasingly precise with more production experience, leading to a decreasing deviation from the adult-target. Bilingual children necessarily gain less production experience than monolinguals with sounds that occur in only one of their languages.
The A-Map model extended to bilingual children can account for delays in bilinguals' production of articulatory complex sounds that are limited to one of their languages. The bilinguals' reduced production experience in combination with the precision constraint explains that bilinguals take longer than age-matched monolinguals to reach the adult-target for such sounds. However, bilingual children may gain more production experience than monolinguals with sounds that exist in both the bilingual's languages, but with differing frequency. In these cases, the bilingual A-Map encompasses more traces of motor-actions and acoustic outcomes in episodic memory than the monolingual A-Map, which may accelerate target-like production of that structure. In sum, the A-Map model extended to simultaneous bilingual children's speech offers a framework that captures how different production experience across two languages delays the acquisition of unshared speech sounds. Linked to production experience, the extended A-Map model can also account for acceleration effects in bilinguals' speech through motor practice accumulated in the other language, which can be interpreted as positive CLI.
Irrespective of these theoretical models, disentangling CLI and language exposure as possible reasons for linguistic differences between bilinguals and monolinguals is inherently difficult because acquiring two languages necessarily reduces the exposure to each language. It is possible, however, to assess language exposure effects by relating linguistic differences within a bilingual population to individual differences in language exposure -provided the sample is large enough to allow for association analyses. Once the exposure effects have been assessed, one can establish which findings require an additional explanation in terms of CLI. The present study addressed these issues with regards to VOICE ONSET TIME (VOT).

Voice onset time
Voice onset time (VOT) is an acoustic cue that contributes to the phonological distinction between 'voiced' and 'voiceless' plosives, such as /b/ and /p/. VOT is the duration of the interval between the start of vocal cord vibration relative to the release of a plosive's burst, and is the most important cue to voicing (Abramson & Lisker, 1973;Cho & Ladefoged, 1999;Van Alphen, 2004;Van Alphen & Smits, 2004). Although many of the world's languages have a two-way contrast 1 between 'voiced' and 'voiceless' plosives, this phonological contrast can have different phonetic implementations. As schematized in Figure 1, the VOT continuum can be divided into three phonetic categories: prevoicing, short lag, and aspiration. Languages like Dutch, Arabic, French, Japanese, Spanish, and Sylheti contrast 'voiced' and 'voiceless' plosives by means of prevoicing vs. short lag VOT. Languages like German and English implement the voicing contrast with short lag VOT vs. aspiration 2 . Language-specific VOT values within these ranges may differ cross-linguistically.
The 0 ms point in a VOT continuum denotes the plosive's burst release. Vocal fold vibration that starts prior to burst release falls into the prevoicing range. Prevoiced plosives are phonologically and phonetically described as 'voiced', and occur for example in Dutch (Deighton-Van Witsen, 1976;Lisker & Abramson, 1964;Van Alphen & Smits, 2004). If the onset of voicing falls between 0 ms and approximately 20-35 ms after the burst release, the plosive falls within the short lag VOT range. Phonetically, such sounds can be described as devoiced, but phonologically, they can be classified as 'voiceless' or 'voiced', depending on the language. In Dutch, plosives produced with short lag VOT are considered the 'voiceless' counterpart of prevoiced plosives. In other languages, like German, short lag plosives represent the majority of 'voiced' plosives. Although not required in German, adults sometimes prevoice even up to around 50% of their 'voiced' plosives (Fischer-Jørgensen, 1976;Hamann & Seinhorst, 2016;Jessen, 1998;Kohler, 1977;Stock, 1971).
If the onset of voicing exceeds the 20-35 ms upper limit of short lag VOT, the plosive falls within the aspiration range on the VOT continuum. These aspirated plosives are always phonologically 'voiceless' and represent the 'voiceless' counterparts to 'voiced' short lag plosives in German. The duration of aspiration typically averages between 45-70 ms in adult native speakers of German (Fischer-Jørgensen, 1976;Haag, 1979;Jessen, 1998;Neuhauser, 2011).
Research on the acquisition of aspiration found that English-speaking children between 0;6 and 4;6 develop a voicing contrast by 2;6, which is similar to the contrast of older children, but not yet adult-like (Kewley-Port & Preston, 1974). Longitudinal data from English-speaking children starting at age 1;6 to just after 2;0 revealed three acquisition stages (Macken & Barton, 1980a): 1) 'voiced' and 'voiceless' plosives have short lag VOT; 2) 'voiced' and 'voiceless' plosives have a covert contrast within the short lag range that is presumably not perceived by adults; and 3) 'voiceless' plosives have adult-like aspiration. Other research found that English-speaking two-year-olds (2;6-3;0) and six-year-olds (6;1-6;11) produce on average shorter aspiration in 'voiceless' plosives than adults despite producing an overt and reliable voicing contrast (Zlatin & Koenigsknecht, 1976). Data on languages other than English are sparse, but one case study showed that a German-speaking child aged 1;0 to 2;2 initially aspirated 50% of 'voiceless' plosives and only reliably aspirated by age 2;0 (Kager et al., 2007). The finding that children commonly produce aspiration values diverging from adults can be related to still-developing control of timing between the plosive's burst release and the onset of vocal fold vibration (Barton & Macken, 1980;Kewley-Port & Preston, 1974;Koenig, 2000;Macken & Barton, 1980a;Menyuk & Klatt, 1975;Whiteside, Dobbin & Henry, 2003;Yu et al., 2015;Zlatin & Koenigsknecht, 1976). In sum, children acquiring an aspiration language overtly distinguish 'voiceless' from 'voiced' plosives by approximately two years of age, although the length of aspiration may still be different from adults.
Research on the acquisition of prevoicing found that Dutch-speaking children aged between 1;0 and 1;2 prevoice only 30% of all 'voiced' plosives. The percentage of prevoiced 'voiced' plosives increases to 60% by the end of their third year of life (Kager et al., 2007). The majority of Italian-speaking children aged between 1;6 and 1;9 do not contrast plosives by voicing and instead produce the majority of plosives within the short lag VOT range (Bortolini et al., 1995). French-speaking children aged between 1;9 and 2;8 generally avoid 'voiced' plosives and prevoice less than 2% of all produced plosives (Allen, 1985). Longitudinal data of Spanish-speaking children aged 1;7 to 2;1 and at 3;10 revealed that even at the age of almost 4, children still do not reliably produce prevoicing for 'voiced' plosives (Macken & Barton, 1980b). Instead, 'voiced' plosives are spirantized -that is, produced as fricatives -to make a voicing distinction. Between 2;6 and 4;6, Canadian French-speaking children acquire a voicing contrast that nevertheless differs phonetically from adult ranges in that they produce prevoicing less reliably than adults (MacLeod, 2016). Arabic-speaking  (2000) Arabic 5;4, 7;4, 10;3 3 children produce prevoicing inconsistently at 5;4 and even 7;4, but seem to have acquired adult-like prevoicing at 10;3 (Khattab, 2000). In sum, prevoicing poses a challenge to young children and non-target-like production persists in school-aged children. Table 1 summarizes details about the studies on monolingual children's VOT development.

VOT development in bilingual children
Bilingual children who simultaneously acquire a prevoicing language like Dutch and an aspiration language like German have to acquire plosive categories from both languages. They further need to resolve the phonological ambiguity of the short lag VOT range that corresponds to 'voiceless' plosives in Dutch, and to 'voiced' plosives in German. During the last two decades, researchers turned to the question how children's VOT develops when they grow up with two languages that differ in their implementation of voicing (Deuchar & Clark, 1996;Fabiano-Smith & Bunta, 2012;Johnson & Wilson, 2002;Kehoe et al., 2004;Khattab, 2000;Mayr & Siddika, published online 17 October, 2016;McCarthy et al., 2014; Table 2 provides an overview of the investigated languages, environments and participants). All these studies report on the acquisition of a majority language that has aspiration and a heritage language that has prevoicing, and most report data of the bilinguals' two languages. The results are variable, as will be discussed in more detail below, with a general emergent pattern that aspiration is acquired early and that prevoicing is generally avoided, which resembles the monolingual acquisition pattern. Deuchar and Clark (1996) investigated a bilingual English-Spanish speaking child in England recorded at 1;7, 1;11 and 2;3. During this period, the child acquired the English voicing distinction between short lag VOT and aspiration, but produced only short lag plosives in Spanish, which is similar to monolingual Spanish-learning children of this age. Khattab (2000) reported data from three bilingual English-Arabic speaking children in England aged 5;6, 7;1 and 10;2 and three age-matched monolingual children in each language. Although the children were older than the one in Deuchar and Clark (1996), their VOT pattern was similar. In English, the bilingual children produced VOT values similar to monolinguals. In Arabic, two of the three bilingual children did not produce prevoicing for 'voiced' plosives, but inconsistent prevoicing was also observed in the five-and seven-year-old Arabic-speaking monolinguals. Johnson and Wilson (2002) recorded two bilingual English-Japanese speaking children in Canada at 2;10 and 3;0 for one child and at 4;8 and 4;11 for the other child. Both children produced aspirated 'voiceless' plosives and short lag 'voiced' plosives in English. Unlike the bilinguals of Deuchar and Clark (1996) and Khattab (2000), the bilinguals contrasted voicing in their heritage language Japanese, but with an English-like contrast between short lag VOT and aspiration. The older child produced longer VOT for /p/ and /t/ in English than in Japanese, but no evidence for language differentiation was observed in the younger child. Similar findings come from Mayr and Siddika (published online 17 October, 2016) who investigated VOT of twenty Sylheti-English speaking bilingual children aged 3;7 to 5;0 in Wales (10 secondgeneration bilinguals and 10 third-generation bilinguals). In English, both groups of children produced targetlike VOT. In Sylheti, both groups produced 'voiceless' plosives with aspiration, and most 'voiced' plosives with short lag VOT. Only the second-generation bilinguals produced some 'voiced' plosives with prevoicing. Yet, the children's Sylheti VOT was not entirely English-like: The second-generation bilinguals produced longer VOT in English /k, ɡ, t/, and the third-generation bilinguals produced longer VOT in English /k/. In a longitudinal study, McCarthy et al. (2014) investigated the acquisition of English VOT in 40 sequential bilingual Sylheti-English speaking children in England and 15 monolingual English-speaking children. At the first time of testing, the bilinguals had been exposed to English for an average of 7 months. Their English VOT in labial and dorsal plosives was tested at about age 4;4 and 5;4. In line with the findings of Deuchar and Clark (1996), Mayr and Siddika (published online 17 October, 2016), Khattab (2000), and Johnson and Wilson (2002), the bilinguals produced VOT for English 'voiceless' plosives similar to monolinguals in both testing sessions. The bilinguals' VOT for English 'voiced' plosives was significantly shorter than that of monolinguals in the first testing session, but became indistinguishable from monolinguals' VOT in the second testing session. These five studies indicate that the acquisition of aspiration is not problematic in bilingual acquisition when the children are immersed in a country in which the aspiration language is the majority language. CLI from the aspiration of the majority language to the heritage language may occur (Johnson & Wilson, 2002;Mayr & Siddika, published online 17 October, 2016). CLI of prevoicing from the minority language can also play a role, at least in the Sylheti-English speaking sequential bilinguals in McCarthy et al. (2014), and this has similarly been shown for older child L2-learners (Heselwood & McChrystal, 2000). The studies discussed so far originated from Englishspeaking countries where English was the medium of instruction at daycare and school, while the use of the heritage language was mostly limited to the homecontext. Only the children in McCarthy et al. (2014) were regularly exposed to their heritage language in the London-Bengali community. The acquisition process is potentially different in an environment in which exposure to both languages is more balanced, with frequent input from multiple speakers and schooling in both languages. Fabiano-Smith and Bunta (2012) evaluated VOT of /p/ and /k/ in eight Spanish-English speaking bilingual children aged 3;0 to 3;11 in a Spanishspeaking immigrant community in the United States, where they attended a bilingual preschool. Although the children were raised in the United States, their broader environment provided them with frequent language input from multiple speakers in both English and Spanish. The bilinguals' productions were compared to those of eight age-matched monolinguals per language. Interestingly, the bilinguals' VOT pattern was different from the studies described above, in which heritage language exposure was mostly limited to the home context. In English, the bilinguals of Fabiano-Smith and Bunta (2012) produced overall shorter -and thus more Spanishlike -VOT than monolinguals, although this difference was only statistically significant for /k/. In Spanish, no VOT differences were observed between bilinguals and monolinguals. In addition, there was no evidence for VOT differentiation between the bilinguals' two languages. This study suggests that aspiration can be prone to delayed or differential acquisition in bilinguals when the aspiration language does not provide the clear majority of children's input. In addition, CLI from Spanish to English can explain the shorter, more Spanish-like, VOT in English.
Bilingual children can follow different patterns of VOT development even if their acquisition context is similar. Kehoe et al. (2004) investigated VOT production of four bilingual German-Spanish speaking children in Germany and three monolingual German-speaking children. Recordings took place every other week starting when the children began producing words (1;0 to 1;3) through to approximately 2;6 to 3;0 years. The four bilingual children reflected three different patterns of VOT development: delay, transfer (CLI), and autonomously developing systems. Two bilingual children showed a delay in their VOT development, as they had not acquired a target-like voicing contrast in German by the end of data collection. One bilingual child showed evidence for bidirectional CLI with instances of prevoicing in German and aspiration in Spanish. Nevertheless, the child maintained a distinction between German and Spanish VOT (cf. Johnson & Wilson, 2002;Mayr & Siddika, published online 17 October, 2016). The fourth bilingual child showed no evidence for CLI. By 2;3 to 2;6, he acquired a voicing opposition between short lag VOT and aspiration in German. Similar to monolingual Spanish acquisition, no voicing opposition had been acquired in Spanish, and instead 'voiced' and 'voiceless' plosives were both produced with short lag VOT (cf. Deuchar & Clark, 1996;Khattab 2000).
In sum, previous work on the acquisition of VOT in young bilingual children demonstrated that the phonologies of bilinguals often interact in a way that can be interpreted as CLI. However, Khattab (2000) emphasizes that the absence of prevoicing in the heritage language is not necessarily related to CLI from the majority language, but may be due to insufficient heritage language exposure.
The above review also revealed variability in bilingual children's patterns of VOT development in seemingly similar acquisition contexts. A possible reason for these different developmental patterns may be rooted in individual variation in the amount of language exposure (cf. Mayr & Siddika, published online 17 October, 2016). Due to relatively small sample sizes, previous research did not allow to statistically test the role of individual differences in language exposure on VOT development. Further, all studies had been conducted in countries where the majority language had aspiration, which raises the question of whether similar acquisition patterns are observed when the prevoicing language is the majority language. The current study is designed to address these still outstanding issues.

The current study
The current study investigates VOT production of Dutch-German speaking simultaneous bilingual children aged 3;7 to 5;11 in the Netherlands who acquired German from one or both parents from birth. This study is the first to investigate effects of age and relative language exposure on VOT production of bilingual children. In contrast to previous research in which the majority language was an aspiration language, the children in this study are immersed in a prevoicing language (Dutch). In addition, Dutch and German monolingual children were tested in the same experimental paradigm. First, we verify the expected VOT production differences between monolingual Dutch and German preschoolers. We then turn to the following three research questions regarding the bilinguals' VOT: 1) Do Dutch-German bilingual children produce language-specific VOT in Dutch and in German and is more exposure to German associated with longer VOT in both languages?
2) Do Dutch-German bilingual children differ from monolingual children in their Dutch and German VOT production?
3) Is VOT associated with age in Dutch-German bilingual and monolingual preschoolers?
If the bilingual children are subject to CLI, their VOT productions should differ from those of monolinguals in at least one language. Given that the bilinguals' majority language is Dutch, an influence from Dutch to German is expected to be more prominent than the influence from German to Dutch. The SLM's Age of Acquisition Hypothesis (Flege, 1995) suggests that bilinguals acquire language-specific categories for 'voiceless' and 'voiced' plosives. By contrast, a prediction that follows from the SLM's Equivalence Classification Hypothesis (Flege, 1987(Flege, , 1995 is that the 'voiceless' plosives of the two languages may be merged to one single category, and similarly, the 'voiced' plosives of the two languages may be merged into one category. Based on the A-Map model (McAllister Byun et al., 2016), it is expected that bilinguals may not yet have acquired prevoicing in Dutch and aspiration in German similarly to their monolingual peers. This is because bilingual children have accumulated less production experience with these articulatory and aerodynamically complicated sounds in their two languages relative to their monolingual peers. Similarly, bilingual children with more exposure to German, and therefore more heritage language experience, are predicted to be more successful in producing target-like VOT in German, and may consequently be less successful in producing targetlike VOT in Dutch than bilingual children with less exposure to German. Finally, because anatomical and motor-control constraints may be decreasing between 3;6 and 6;0 years, older bilingual and monolingual children are expected to produce prevoicing and aspiration more reliably than younger children.
The children were recruited from the participant pools of the Baby Research Center Nijmegen and the University of Amsterdam, or via online and offline classifieds. The bilingual children were tested in different regions of the Netherlands (Gelderland (N = 16), Amsterdam (N = 9), Utrecht (N = 2), Limburg (N = 1), North Brabant (N = 1)).
All monolingual Dutch children were tested in Gelderland in the Central Eastern Netherlands. The monolingual German children were tested in Central Western Germany (N = 27) and Northern Germany (N = 2).

Materials and procedure
The investigated plosives were 'voiceless' /p/, /t/ and /k/ and 'voiced' /b/ and /d/. The 'voiced' dorsal plosive /ɡ/ is not a native phoneme in Dutch, and is therefore not addressed in this study. For each of the five plosives, a total of six target words per language were selected from the Dutch version of the MacArthur-Bates Communicative Development Inventories (Zink & Lejaegere, 2002), and for German from the questionnaire on early child language development (Szagun, Stumper & Schramm, 2009) as well as from the parental questionnaire on early diagnosis of at-risk children (Grimm & Doil, 2000). Tables S1 and S2 in the online supplementary materials (Supplementary Material) provide an overview of the Dutch and German target words, respectively. All target words were picturable plosive-vowel-initial nouns. Due to restrictions in the availability of suitable target words, no match in vocalic contexts between Dutch and German target words could be achieved. We address this issue in Table S3 in the online supplementary materials with descriptive statistics showing how the children's VOT differs by vocalic context. Table S3 is supplemented by an additional analysis supporting that the imbalance of the vocalic context in Dutch and German did not influence the results reported in this study. Testing took place in a quiet room at the children's homes. At the beginning of the session, parents gave informed consent and completed a language background questionnaire. The questionnaire for bilingual children was based on the BiLEC (Unsworth, 2013), and the monolingual version was custom-made and screened for potential exposure to additional languages and foreign accents.
The children named all target words in two different picture-naming tasks to enhance the number of produced tokens per child while keeping the children engaged. In the picture-naming story, the experimenter read a story to the child. The target words were replaced by pictures, which the child was prompted to name. Afterwards, a speech perception task was administered for a different subproject. The picture-naming game followed, in which a hand puppet elicited the child's speech from picture cards. When a child produced a target word more than once, every production entered the analysis. The bilinguals were tested by native speakers in two sessions that were scheduled approximately two weeks apart. Half of the children completed the Dutch session before the German session, and the other half started with the German session. Throughout the session, children were rewarded with stickers. At the end of each session, they were compensated with €10 or a book.

Recordings and VOT measurements
Recordings were made with an Olympus Linear PCM Recorder LS-10 with uncompressed 24bit/96kHz Figure 2. Acoustic landmarks from top to bottom: prevoicing, short lag, and aspiration. recording capability. The first author measured VOT of all children in Praat (Boersma & Weenink, 2015) taking into account waveforms and spectrograms viewed at 0-5000 Hz. Burst onset was defined as the onset of abrupt energy release. If there was more than one release burst, VOT was measured from the first visible release burst (Mayr & Siddika, published online 17 October, 2016). Onset of voicing was defined as the first periodic component of the waveform and was measured at the preceding zerocrossing (Francis, Ciocca & Man Ching Yu, 2003). When the amplitude increase of prevoicing was gradual, voicing onset measurements were based on visual characteristics. Figure 2 provides examples of VOT measurements in the prevoicing, short lag, and aspiration ranges, respectively. Three additional phonetically trained coders measured 25% of the data. Inter-coder reliability indicated 98% agreement. For 'voiceless' plosives, measurements were considered in agreement when they differed in less than 10 ms (Fabiano-Smith & Bunta, 2012). Measurements of 'voiced' plosives were considered in agreement when both coders rated VOT as either prevoiced or devoiced. Across groups and plosives, 11% of the tokens were excluded from the analyses because they could not be unambiguously measured, for example, due to coarticulation, sound overlap, creaky voice, or whispering.

Statistical analyses
Mixed effects models were performed in R (R Core Team, 2013). An alpha level of .05 was adopted throughout. For the 'voiceless' plosives /p/, /t/ and /k/, mixed effects linear regression was performed with VOT as the dependent variable. Initial data screening revealed a bimodal distribution of VOT in the 'voiced' plosives in 59/60 children in Dutch and 46/59 children in German. As presence versus absence of prevoicing rather than duration of prevoicing plays a crucial role in Dutch (Van Alphen & McQueen, 2006), VOT was converted into a categorical variable with the levels 'prevoiced' for negative VOT and 'devoiced' for positive VOT. This categorical dependent variable entered a mixed effects logistic regression.
Several independent variables (IVs) were used in the models. Language (Dutch, German) was the IV of main interest in within-group analyses that compared the bilinguals' two languages, and also in between-group analyses involving the two monolingual groups. Language Background (monolingual, bilingual) was the IV of main interest in the between-group comparisons of bilinguals and monolinguals that were conducted separately for Dutch and German. The IV Age (in months) was included in all analyses, and Percent of Exposure to German 5 was only included in the within-group analyses on the bilinguals. These latter two IVs were centered around zero for each analysis.
Three additional IVs were included in the models: Elicitation Task of the item, Place of Articulation of the plosive, and Word Length ('voiceless' plosives only) of the item. These additional IVs were merely included to account for variance in the data, but did not contribute to the main results reported here. Due to space limitations, we do not report simple effects of these IVs. Table 3 provides an overview of the model specifications including fixed effects, interaction terms, random effects, intercepts, and random slopes for each group comparison. All models include interaction terms between the IV of main interest and all secondary IVs. Significant interactions are reported below, and

Results
This section starts with the descriptive statistics before we turn to the statistical effects of Language and Language Background on VOT, taking into account the children's age and, in case of the language comparison within the bilinguals, their exposure to German. For 'voiceless' plosives, monolingual Dutch children produced the shortest and German monolingual children the longest average VOT. The bilinguals' VOT was intermediate to the two monolingual groups. The bilinguals further produced shorter VOT in Dutch than in German (see Table 4 and Figure 3).
For 'voiced' plosives, monolingual Dutch children produced the highest and German monolingual children  the lowest percentage of prevoiced plosives. Bilinguals fell in between the monolinguals, with only a slightly higher percentage of prevoicing in Dutch than in German (see Table 5 and Figure 4). These percentages reflect the behavior of the vast majority of children, who prevoiced part of their 'voiced' plosives. Only 13 children (one bilingual speaking Dutch, three bilinguals speaking German, and nine German monolinguals) never produced prevoicing. Conversely, only one child (a bilingual speaking German) produced all 'voiced' plosives with prevoicing. In Dutch, only six monolingual and three bilingual children fell within the adult-like 75-100% range of prevoicing. The devoiced 'voiced' plosives fell on average within the short lag VOT range. All groups produced devoiced /b/ with VOT around 10 ms. For devoiced /d/, the Dutch monolinguals and the bilinguals in both languages produced VOT around 20 ms. The German monolinguals produced shorter VOT with a mean of 13 ms (see Table 6). All groups produced shorter VOT for devoiced 'voiced' plosives than for 'voiceless' plosives, but this difference is very small in the group of Dutch monolingual children (cf. Tables 4 and 6). Figure 5 shows the distribution of VOT across all 'voiced' plosives by group and language.

Discussion
This study examined bilingual preschoolers' VOT development in their majority language Dutch and their heritage language German, in comparison to age-matched monolingual peers. In the following, the findings are summarized and explained in terms of CLI and language exposure. We specifically discuss whether these two more general constructs can be captured by the A-Map model (McAllister Byun et al., 2016) and the Speech Learning Model's Age of Acquisition and Equivalence Classification Hypotheses (Flege, 1995). We first discuss the children's production of 'voiceless' plosives and then turn to the production of 'voiced' plosives.
In sum, the bilingual and monolingual children's production of VOT in 'voiceless' plosives revealed three main findings, and an initial analysis confirmed the expected differences between Dutch and German monolingual preschoolers. The bilingual children's productions provide evidence for language-differentiation between their Dutch and German phonetic systems, and furthermore reveal an effect of language exposure on VOT in the heritage language German, but not on the majority language Dutch (Research Question 1). Moreover, the bilinguals produced VOT differently from their monolingual peers in the heritage language German, but not in the majority language Dutch (Research Question 2). Finally, we did not observe an age-effect on VOT (Research Question 3).
Monolingual Dutch children produced 'voiceless' plosives with short lag VOT whereas monolingual German children produced aspiration, which is in line with Dutch and German adults' VOT production, respectively (Deighton-Van Witsen, 1976;Fischer-Jørgensen, 1976;Haag, 1979;Jessen, 1998;Lisker & Abramson, 1964;Neuhauser, 2011). Equivalent to Dutch and German monolingual children, the bilinguals produced longer VOT in German than in Dutch, suggesting bilingual children have separate phonological categories for Dutch and German 'voiceless' plosives. This finding is in line with the SLM's Age of Acquisition Hypothesis, which suggests that early bilingual acquisition promotes language-specific category formation. Importantly, those bilingual children with more exposure to German produced longer, and therefore more German-like VOT in German, but more exposure to German did not detectably influence their Dutch VOT. Previous research on Welsh-English bilinguals similarly revealed effects of language exposure on the minority language, but not on the majority language (Gathercole & Thomas, 2009). These results indicate that more heritage language exposure is beneficial to the development of the heritage language, but not at the cost of the counterpart category in the majority language. As needs to be confirmed by future research, the bilingual children's Dutch VOT is presumably not perceived as foreign-accented, even when exposure to the heritage language German is high (Flege, 1984;Major, 1987;Riney & Takagi, 1999;Sancier & Fowler, 1997;Schoonmaker-Gates, 2015). Despite the bilinguals' production of aspiration in German, they produced 'voiceless' plosives with shorter VOT than monolingual German children. Differences between bilinguals and monolinguals in absolute VOT duration in German may be related to CLI and differences in exposure to German.
Language exposure was a crucial factor impacting on the German VOT in the bilingual group, suggesting that differences in language exposure between bilingual and monolingual children can similarly account for differences in VOT duration between the two groups.
The A-Map model captures these differences in language exposure within the group of bilinguals and also between the bilinguals and monolinguals. All children in this study are clearly beyond the critical age of 2;0 at which monolingual children start producing aspiration (Kager et al., 2007;Macken & Barton, 1980a), but the bilinguals' exposure to German is limited to 42% of their waking hours on average. Compared to the monolingual A-Map, the bilingual A-Map is therefore based on less experience in the production of aspiration, which can explain why the bilinguals produced more variable and overall shorter aspiration than monolingual children.
The specific A-Maps of bilingual children can further differ between children as a result of individual differences in language experience. More experience with German could increase the urge of bilingual children to reproduce the adult aspiration target accurately, as well as provide them with more practice to reach that target precisely. However, this experience and precision in aspirating in the heritage language German does not result in the children abandoning the fully accurate and precise short lag VOT of 'voiceless' plosives in the majority language Dutch. Individual differences in language experience suggest that the Dutch and German 'voiceless' categories may in fact be separate and autonomous. Note, however, that a lack of surfacing CLI cannot preclude the existence of CLI.
Specific analyses on the bilingual children's production of 'voiced' plosives revealed three main findings, and confirmed the expected production differences between monolingual Dutch and German children. First, we did not observe language-differentiation between the bilinguals' Dutch and German 'voiced' plosives, and a child's language exposure was not detectably associated with her production of 'voiced' plosives (Research Question 1). Second, the bilinguals' productions of 'voiced' plosives differed from monolinguals' productions in the heritage language German and also in the majority language Dutch (Research Question 2). Third, no age-effect on the percentage of prevoiced 'voiced' plosives was observed (Research Question 3).
Monolingual Dutch children prevoiced about 50% of their 'voiced' plosives and devoiced the remaining 50%. This percentage is below the adult-target of 75% to 100% of prevoiced 'voiced' plosives in Dutch (Stoehr et al., published online 3 May, 2017;Van Alphen & Smits, 2004). Previous research on different languages similarly reported devoicing of target prevoiced plosives, possibly lasting into the early school years, and suggests that prevoicing is inherently difficult to produce (Allen, 1985;Bortolini et al., 1995;Kager et al., 2007;Kewley-Port & Preston, 1974;Khattab, 2000;Macken & Barton, 1980b;MacLeod, 2016). The A-Map model can explain the high within-child variation in prevoicing and devoicing of 'voiced' plosives by the monolingual Dutch children as a result of the competing pressures to accurately reproduce the adult-target (i.e., prevoicing) and to achieve a precise production (i.e., short lag) with a still-developing anatomy and motor control. The high variability across the monolingual Dutch children can be accounted for in terms of different rankings of these competing constraints.
Bilingual children prevoiced to a similar extent in Dutch (30%) and German (25%) and their percentages of prevoiced plosives falls in between the two monolingual groups. According to the A-Map model extended to bilingualism, the bilingual children's low percentage of prevoiced 'voiced' plosives in Dutch suggests that they are more affected by the constraint to achieve a precise production (i.e., short lag) than their monolingual peers. Possibly, less exposure to the 'prevoiced' adulttarget makes the urge to reproduce prevoicing accurately relatively less impactful. The ranking of the constraints to achieve a precise production and to accurately match the adult-target may change with increasing language experience. However, within the group of bilinguals, neither age nor their wide range of exposure to Dutch (22-89% of the children's waking hours) was detectably associated with the bilinguals' production of prevoicing in Dutch or German. This also renders it unlikely that differences in exposure to Dutch between bilinguals and monolinguals can account for the groups' different percentages of prevoicing. Hence, the A-Map model cannot entirely account for the bilinguals' differential production of 'voiced' plosives.
Instead, bidirectional CLI can explain the bilinguals' production of 'voiced' plosives. In this case, CLI may be captured through equivalence classification or acceleration. The SLM's Equivalence Classification Hypothesis predicts that CLI results in the formation of a single category for two perceptually close sounds from two languages. Accordingly, Dutch-German bilingual children appear to have only one 'voiced' category for Dutch and German. The bilinguals may be in the process of approaching the prevoiced Dutch adult-target with this merged 'voiced' category, as they produce prevoicing in German, which is articulatory and aerodynamically complex and unlikely to result from any default behavior (Kewley-Port & Preston, 1974). This merger would effectively take the German 'voiced' category out of the short lag VOT range and eliminate the double phonological function of the short lag VOT range, which otherwise corresponds to 'voiceless' in Dutch and to 'voiced' in German. The hypothesized merger may eventually match the target Dutch phonology, in which prevoicing is crucial for the realization of the voicing opposition without violating the target German phonology, in which prevoicing occurs as free variation (Fischer-Jørgensen, 1976;Hamann & Seinhorst, 2016;Jessen, 1998;Stock, 1971;Stoehr et al., published online 3 May, 2017).
However, the present data is also compatible with the hypothesis that the bilinguals have two separate 'voiced' categories for Dutch and German that develop indistinguishably at the current developmental stage. In this case, CLI occurs as acceleration from Dutch to German, and can be explained by the A-Map model. Similar acceleration effects in the domain of phonology have previously been reported in bilingual children of different language backgrounds (Grech & Dodd, 2008;Mayr et al., 2015;Tamburelli et al., 2015). The bilinguals prevoiced more frequently in German (25% of all 'voiced' plosives) than monolingual German children (8% of all 'voiced' plosives; cf. Kehoe et al., 2004). German adults prevoice on average up to 50% of 'voiced' plosives, which means that the bilingual children are in fact closer to the adult-target than their monolingual peers (Fischer-Jørgensen, 1976;Hamann & Seinhorst, 2016;Jessen, 1998;Stock, 1971;Stoehr et al., published online 3 May, 2017). The bilinguals' exposure to Dutch leads to more exposure to prevoicing, and more experience producing it. In line with the A-Map model, bilinguals accumulate prevoicing experience in Dutch, and their episodic memory therefore encompasses more traces of the articulator movements associated with prevoicing. This production experience may accelerate the bilinguals' acquisition of this typically late-acquired structure in German. Assuming acceleration in German, the bilingual children's percentage of prevoiced 'voiced' plosives should increase in German until they reach similar variation between prevoicing and short lag VOT as observed in German-speaking adults. The Dutch category should then keep developing to the adulttarget of 75%-100% of prevoicing. Speech perception or longitudinal speech production research is needed to identify whether CLI in bilingual children's production of 'voiced' plosives occurs as equivalence classification or acceleration.

Conclusion
This study contributed new insights into the role of heritage language exposure in bilingual children's VOT development. The results extend findings of previous small-scale studies through evidence that inherently difficult prevoicing is not only prone to differential acquisition in a heritage language, as previously reported, but also in a majority language. The bilinguals' similar production of prevoicing in both languages and the observed differences between bilinguals and monolinguals seem to be unrelated to variation in language exposure or age, and may instead result from CLI. Moreover, aspiration can be prone to differential acquisition in a heritage language, especially when the exposure to the heritage language is low. Despite differences from monolingual VOT development, the bilinguals nevertheless seem to have acquired two separate and autonomous categories for Dutch and German 'voiceless' plosives. Importantly, this study revealed a positive effect of more heritage language exposure on the production of 'voiceless' plosives: bilingual children with more heritage language exposure produced more targetlike VOT in the heritage language, but not at the cost of the majority language. What surfaces as CLI from Dutch to German in 'voiceless' plosives can be explained by language exposure alone. This novel evidence suggests that more exposure to the heritage language is associated with better-separated language-specific voicing systems.