Vocabulary production in toddlers from low-income immigrant families: evidence from children exposed to Romanian-Italian and Nigerian English-Italian

Abstract The relationship between first and second language in early vocabulary acquisition in bilingual children is still debated in the literature. This study compared the expressive vocabulary of 39 equivalently low-SES two-year-old bilingual children from immigrant families with different heritage languages (Romanian vs. Nigerian English) and the same majority language (Italian). Vocabulary size, vocabulary composition and translation equivalents (TEs) were assessed using the Italian/L1 versions of the CDI. Higher vocabulary in Italian than in the heritage language emerged in both groups. Moreover, Romanian-Italian-speaking children produced higher proportions of TEs than Nigerian English-Italian-speaking children, suggesting that L1-L2 phonological similarity facilitates the acquisition of cross-linguistic synonyms.


Introduction
The relationship between first language (L1) and second language (L2) early vocabulary acquisition in bilingual children is still debated in the literature. The earliest theoretical positionthat young bilingual children construct their first vocabulary as a single undifferentiated language (Volterra & Taeschner, 1978) has not been confirmed. More recent research has shown that bilingual first language learners construct two separate lexical systems from the earliest stages of word learning (Bosch & Sebastian-Galles, 2001). However, we need to explore further to what extent L1 and L2 early vocabularies are acquired independently or interdependently.
Results from studies analysing early vocabulary sizes in L1 and L2 diverged when different language measures (comprehension -production), different children's age Ramon-Casas, 2014). Recently, Floccia and colleagues (Floccia et al., 2018) found that a higher L1-L2 phonological overlap (such as in English-Dutch, English-German language combinations) led to higher levels of L1 vocabulary production by 2-year-old bilinguals learning British English as L2. Importantly, linguistic distance contributed unique variance even when other factors, such as exposure to L2 child-directed input, were entered in the same model. The facilitation effect of L1-L2 phonological similarity suggests a relationship between first language and second language vocabulary acquisition.
On the whole, the relationship between L1 and L2 vocabulary acquisition processes and the role of the phonological distance between the two languages need to be investigated further in different language combinations (Hammer, Jia, & Uchikoshi, 2011). Studies on bilingual children acquiring languages other than L1-English (L2) may help us understand better children's early bilingual vocabulary in general and, in particular, to identify those aspects of bilingual development that are shared between languages and cultures and those that are not.

The present study
The present study examined the expressive vocabulary of two-year-old bilingual toddlers from two linguistic communities of immigrant families, all attending nursery schools, in both the family's heritage language (L1, Romanian or Nigerian English) and the majority language (L2, Italian).
The study addresses the following research questions (RQs): 1. Are there differences in the early expressive vocabulary size between L1 and L2 within each bilingual group and between the two bilingual groups -Romanian-Italian-speaking children and Nigerian English-Italian-speaking childrenall from similarly low-income families and with similar amounts of exposure to Italian? 2. Are there differences in the composition of early expressive L2 vocabulary between the two bilingual groups? 3. Are there differences in the proportions of TEs in total vocabulary between the two bilingual groups?
Because Romanian and Italian are both Romance languages, they share more phonological word-forms than do Italian and Nigerian English, a nativised form of English, a Germanic language. On this basis, we expected that Romanian-speaking bilingual children would have greater vocabulary sizes and higher proportions of TEs than Nigerian English-speaking bilingual children. We also expected, based on the hypothesis of a smaller vocabulary size in Nigerian English-speaking bilinguals, that these children will show a higher proportion of nouns and a lower proportion of verbs and closed-class words than Romanian-speaking bilingual children.

Method
Participants Thirty-nine two-year-old children regularly attending five nursery schools in north-East Italy participated in the study, divided into two groups: 22 children from Romanian immigrant families (14 females) exposed to Italian and Romanian (ROM bilingual group), and 17 children from Nigerian immigrant families (8 females) exposed to Italian and Nigerian English (NEN bilingual group). The participants were selected from those recruited for a larger longitudinal research project on the lexical trajectories of children from low-income families. Only low-income familiesthose paying the lowest rate for their children's nursery schools (⩽€130)were eligible for participation in the project. All the families were recruited at the nursery schools. The bilingual group in the larger research project included 74 immigrant families from 9 different countries. Thirteen immigrant families withdrew from the study. The immigrant families included in the current study belong to the two major immigrant groups in the province. The inclusion criteria for children were: having both parents speaking the heritage language at home, being born in Italy, having been exposed to Italian before age 1;6 and producing words in at least one of the two languages at 24 months. All the children were born within two weeks of their due date and were healthy at birth, with normal hearing and had no certified disabilities or developmental disorders.
All parents were native Romanian-or Nigerian English-speakers. Nine parents of the NEN group stated that they also spoke the language of their ethnic group (Edo or Igbo), but never directed to their children. Five immigrant families stated that they did not speak Italian at home (1 in the NEN group, 4 in the ROM group).
The demographic characteristics of the children and their mothers are reported in Table 1. No significant differences were found between the two groups in the children's gender, age, birth order, singleton condition, age of entry at nursery school, age and amount of exposure to Italian. Children in the ROM group spent a significantly higher number of daily hours at nursery school than children in the NEN group. The mothers in the NEN bilingual group had significantly fewer years of formal education, were significantly older, and had been in Italy significantly longer than those in the ROM bilingual group.
All participating nursery schools were state-regulated and funded. All children were cared for in a group with up to 8 children. A total of 24 nursery teachers primarily responsible for the participating children participated in the study and evaluated the children's language abilities through the Italian version of the CDI. All the teachers were female and native Italian speakers. Five teachers held a degree in Education; 19 of them held a high school diploma in Infant Education or Teaching.

Procedure
Parents were asked to complete a consent form and a brief demographic questionnaire. Then, the researcher conducted a semi-structured interview on language exposure with the parents (usually mothers) at the nursery school. Parent interviews were in Italian and, when necessary, were supported by a cultural mediator in the language of parents' choice. Parents gave information about their child's typical weekdays and weekend and the hours that they (or others in regular contact with the child) use L1 vs L2 with the child at home. Based on this information, an estimate of the weekly hours of exposure to Italian (L2) at home was calculated and added to the hours of weekly attendance at nursery school in order to calculate the total amount of weekly exposure to Italian. Then, this amount was expressed as a percentage of a typical number of weekly waking hours (84), following Onofrio and colleagues (Onofrio, Rinaldi, & Pettenati, 2012). Note. a Welch's t; b Only one child entered the nursery school when he was nineteen-month-old.
The children's Italian vocabulary skills were assessed at 24 months using the Italian version of the MacArthur-Bates Communicative Development Inventories (CDI): Words and Sentences-short form (Primo Vocabolario del Bambino-PVB, Caselli, Bello, Rinaldi, Stefanini, & Pasqualetti, 2015;Rinaldi, Pasqualetti, Stefanini, Bello, & Caselli, 2019). The short form of the CDI includes a 100-word expressive vocabulary checklist with lexical categories such as nouns (55), verbs (18), adjectives-adverbs (16) and closed-class words (11). The Italian version of the CDI was completed by the nursery teachers andafter the interview at nursery schoolthe parents who were familiar with Italian and the child's use of Italian. Since Romanian and Nigerian English adaptations of the CDI were not available, the L1 vocabulary skills of the bilingual children were assessed using the CDI-short form translated into the two languages by cultural mediators in collaboration with a language development researcher. Because the translation of each instrument is problematic (Widenfelt, Treffers, de Beurs, Siebelink, & Koudijs, 2005), the cultural mediators were L1 native speakers who could adapt the words of the checklists to the Romanian and Nigerian cultures. One concept in the translated Romanian version and one concept in the translated Nigerian English version were changed because of their low input frequency in the respective languages (for Romanian 'orange soda' was changed to 'milk'; for Nigerian English 'stroller' was changed to 'baby pram').

Vocabulary size
The total number of words checked by parents in L1 served as outcome measures of vocabulary size in the heritage language. For productive vocabulary in Italian, a composite score was computed by providing a single credit for words reported by one or both reporters, because of evidence indicating the utility of combining parents and teachers reports (Pua, Lee, & Rickard Liow, 2017;Vagh, Pan, & Mancilla-Martinez, 2009). Composite inventory scores in Italian served as outcome measures of vocabulary size in the Italian-single vocabulary size. Then, the total vocabulary score was calculated by summing the score in L1 and the composite score in Italian; the total conceptual vocabulary score was calculated with credit being given for knowing a concept, regardless of whether the child knew the word for that meaning in one or both languages.

Vocabulary composition
We calculated the numbers of nouns, verbs, adjectives/adverbs, and closed-class words as proportions of all words produced in Italian.

Translation equivalents
First, all the lexical items that each child had in each language for the same concept were identified as TE pairs, based on the words checked in their inventories in L1 and in Italian. We included translation equivalents for content words only (nouns, verbs, adjectives and adverbs) because of the difficulty in assigning translation equivalents for closed-class categories. Second, cognates (TE pairs similar in sound and spelling [e.g. festa-festa 'party')] and semi-cognates (TE pairs similar in sound but with slightly different spelling [e.g. penna-pen]), were calculated for each child. Then, following Legacy et al. (2017) two calculations of TEs were performed: one to obtain an estimate of the proportion of TEs that included cognates and semi-cognates (total TEs), and one to obtain the proportion of TEs excluding cognates and semi-cognates. The proportion of TEs including cognates and semi-cognates was calculated by summing the number of identified TE pairs in the inventories and multiplying this score by two. This number was then divided by the child's total vocabulary MINUS non-equivalents (i.e., words that have no translation on the other inventory form) and onomatopoeias. The proportion of TEs excluding cognates and semi-cognates was then calculated by summing the identified TE pairs in the inventories, subtracting all cognate and semi-cognate pairs, and multiplying by two. This number was then divided by the child's total vocabulary minus cognates, semi-cognates, and non-equivalents. Two independent raters identified the TE pairs in the inventories, and they came to an agreement on which words would be selected as pairs and on which TEs would be classified as cognate and semi-cognate pairs.

Results
First, the normal distribution of the data was assessed using the Kolgomorov-Smirnov test. Most of the measures were found to violate the assumption of normality within groups. We therefore ran all analyses using robust statistical techniques that take into account the violations of the main assumptions required by parametric statistical analyses. Using robust statistics prevents type-I error ('false-positives') inflation (Erceg-Hurn, Wilcox, & Keselman, 2013) and increases the likelihood of discovering genuine differences between groups and associations among variables without increasing the risk of type-II errors ('false-negatives'). The types of corrections usually included in robust techniques are trimmed means and M-estimators. We used recently devised types of M-estimators called MM-estimators (Maechler et al., 2018) for regressions, trimmed means and M-estimators for a robust mixed ANOVA (Mair & Wilcox, 2019), the Welch's correction for t-tests, and Winsorized correlation analyses.
Preliminary analyses related to maternal education and age/amount of exposure to Italian Since maternal education is known to impact on children's lexical abilities, and the NEN bilingual group differed significantly from the ROM bilingual group in the number of years of maternal education, we controlled for a possible effect of maternal education. Specifically, we used an MM-estimators robust analysis of covariance (with maternal education as covariate) based on regression (Maechler et al., 2018). Furthermore, because it is well documented that language exposure impacts on the vocabulary skills of bilingual children, Winsorized correlation analyses were run for each bilingual group between vocabulary size in Italian (L2) and age (number of months) and amount of exposure to Italian andas measures of exposure to Italian spoken by native-speakersage of entry to nursery school and daily attendance there. In the ROM group none of the different measures of exposure to Italian was significantly related to Italian vocabulary size [ExpAge r w (20) = -.322, p = .154; ExpAmount r w (20) = .028, p = .904; EntrNursery r w (20) = .113, p = .619; AttNursery r w (20) = .137, p = .548]. Similar results emerged for the NEN group, although in this group the age of exposure to Italian was related to Italian vocabulary size at 6% [ExpAge r w (15) = -.484, p = .061; ExpAmount r w (15) = .162, p = .540; EntrNursery r w (15) = .105, p = .692; AttNursery r w (15) = .190, p = .473]. Therefore, in the following analyses, we did not control for a possible effect of this factor.

Vocabulary size (RQ1)
First, a robust mixed ANOVA was computed on the single-language vocabularies of bilingual children with language (L1 vs L2) as within-subject factor and group (ROM vs NEN) as between-subject factor, using WRS2:bwtrim in R (Mair & Wilcox, 2019). Then, robust Welch's T-tests were computed to compare the ROM and NEN groups on total vocabulary and total conceptual vocabulary. Descriptive statistics are reported in Table 2. The robust ANOVA showed that L2 vocabulary was significantly larger than L1 vocabulary in both ROM and NEN groups [F(1,17.45) = 8.38,p = .001,d = .96]. The two groups did not significantly differ in their single-language vocabulary sizes [F(1, 20.67) = 2.04, p = .168, d = .474], and no effect was found for the interaction between group and language [F(1, 20.67) = 0.06, p = .814, d = .079]. T-tests also showed that in their total vocabulary and total conceptual vocabulary sizes the two groups of bilingual children did not significantly differ [respectively, t(36.78) = 1.17, p = .248, d = .370, and t(36.84) = 1.44, p = .159, d = .380].

Vocabulary composition (RQ2)
Four Welch's T-tests with group (ROM vs NEN) as an independent factor were performed on the proportions of nouns, adjectives/adverbs, verbs and closed-class words in Italian. Descriptive statistics are reported in Table 2. No significant differences were found in the proportions of any lexical category in Italian vocabulary between the two groups of bilingual children.

Translation equivalents (RQ3)
Robust Welch's T-tests were conducted to compare the proportions of total TEs and TEs minus cognates and semi-cognates between the two bilingual groups. Descriptive statistics are reported in Table 3. The bilingual children in the ROM group produced a significantly higher proportion of total TEs than those in the NEN group [t(32.17) = 2.91, p = .006, d = .964]. No significant differences emerged between the two groups when cognates and semi-cognates were excluded from the count of TEs [t(34.31) = 1.55, p = .131, d = .504].

Discussion
The present study examined the expressive vocabulary of two groups of bilingual toddlers from low-income immigrant families in both the majority language of the country (Italian) and their heritage languages (Romanian and Nigerian English), in order to contribute to the debate on the relationship between vocabularies in L1 and L2.
Significantly higher vocabulary in Italian than in the heritage language emerged in both groups. These results suggest that children orientate towards the majority language, as observed in other immigration contexts (Oller, Jarmulowicz, et al., 2007). Indeed, heritage languages of immigrant families are minority languages which are less likely to be supported by the larger community. Therefore, these children might, regardless of their degree of exposure, have experienced that Italian is the language that can be used successfully to communicate in different contexts. At the nursery school Italian was the only language used to communicate, while at  home at least one of the two immigrant parents could understand Italian well, and most of the immigrant parents (34 out of 39) said they also spoke Italian at home. In other words, communication could be conducted in Italian in both contexts (home and nursery school), while the heritage language was useful only at home. In this regard, studies comparing L1 and L2 vocabulary size in preschool bilingual children have suggested that early educational settings in which only L2 is spoken may influence the L2 "dominance" in children's expressive vocabulary (Kan & Kohnert, 2005;Sheng, Lu, & Kan, 2011).
Although the Romanian-speaking children showed larger vocabulary sizes than the Nigerian English-speaking children, especially for the total vocabulary, our first hypothesis was not supported by the findings to a statistically significant extent. Coherently with the lack of differences in their vocabulary sizes, the children's vocabulary composition in Italian did not differ between the two bilingual groups.
Finally, the differences found in the proportion of total translation equivalents between the two bilingual groups were in line with our third hypothesis. Romanian-Italian-speaking children produced a significantly higher proportion of total translation equivalents (more than double) than Nigerian English-Italian-speaking children. However, when cognates and semi-cognates were excluded from the calculation of the proportions of translation equivalents, no differences were found between the two bilingual groups. This result indicates that most of the translation equivalents produced by Romanian-speaking children were identical or similar cross-linguistic items. This finding could be interpreted in the light of the L1-L2 phonological similarity between Romanian and Italian. Similar evidence of the role of phonological form proximity on TEs in bilingual toddlers' expressive vocabulary was reported by Bosch and Ramon-Casas (2014) for Spanish-Catalan bilingual children at 18 months of age. They showed that TEs represented 30% of the bilinguals' total vocabulary size and, interestingly, form-identical TEs (cognates) accounted for 28% of the total vocabulary. Dissimilar words, meanwhile, accounted for 2% and were only present in the largest vocabularies (Bosch & Ramon-Casas, 2014). What is the basis for the facilitation effect of form-similarity on the acquisition of cross-linguistic synonyms? As suggested (Bosch & Ramon-Casas, 2014), children experience a double exposure to words with very similar phonological forms linked to the same concepttherefore, the form-concept pairing may be facilitated since it is heard in both languages. In addition, the ability to produce a word-form may drive the acquisition of other similar word-forms, as documented for monolingual children (Vihman, 1993). A recent study on phonological priming in bilingual preschoolers found a facilitating effect in target recognition when L2 primes were phonologically overlapped with L1 target words (Von Holzen & Mani, 2012). This result provided evidence of interconnectivity between the lexicons in first language and in second language: specifically a cross-language activation based on the phonological form. Learning a word in one language may therefore facilitate the acquisition of a form-similar word in a second language and, in turn, foster vocabulary acquisition in both languages, as suggested by Floccia and colleagues (Floccia et al., 2018).

Conclusions
The present study makes an original contribution to the literature on the early expressive vocabulary of bilingual children. It extends previous research in two ways. First, few studies examining early bilingual vocabulary development have focused on children speaking Italian as the majority language. Second, we compared bilingual groups with two heritage languages that differ in their phonological distance from Italian.
The small sample size is a major limitation of this study when interpreting the results. In the light of the small sample size, we used robust statistical analyses that avoid the production of false positives and control the probability of producing false negatives. However, the sample size in future studies needs to be enlarged in order to deepen the findings of the current study. Another limitation is that short versions of a Romanian and a Nigerian English CDI were developed for the present study by translating the short version of the Italian CDI without validating them with a norm group. Therefore, although particular care was taken to develop parallel word-lists and assessment procedures, these measures of Romanian and Nigerian English vocabulary size might not have captured the bilingual children's total L1 vocabulary size.
The study produced evidence of some interconnectivity between vocabulary acquisition in the first language and in the second language, because the phonological form similarity between L1 and L2 emerged as a factor that facilitates the acquisition of cross-linguistic synonyms and supports lexical learning. Moreover, the dominance of the majority language in the expressive vocabulary of young bilingual children from immigrant families regularly speaking their heritage languages suggests that we need to deepen our understanding of the role of educational and cultural contexts in multiple language acquisition.
These results have practical implications. The phonological distance between the two languages should be taken into account in educational contexts, in order to differentiate vocabulary learning opportunities, going beyond the idea that all bilingual children from immigrant families have the same language learning needs.