Interdependence between L1 and L2: The case of Syrian children with refugee backgrounds in Canada and the Netherlands

Abstract Children who are refugees become bilingual in circumstances that are often challenging and that can vary across national contexts. We investigated the second language (L2) syntactic skills of Syrian children aged 6-12 living in Canada (n = 56) and the Netherlands (n = 47). Our goal was to establish the impact of the first language (L1 = Syrian Arabic) skills on L2 (English, Dutch) outcomes and whether L1–L2 interdependence is influenced by the length of L2 exposure. To measure L1 and L2 syntactic skills, cross-linguistic Litmus Sentence Repetition Tasks (Litmus-SRTs) were used. Results showed evidence of L1–L2 interdependence, but interdependence may only surface after sufficient L2 exposure. Maternal education level and refugee camp experiences differed between the two samples. Both variables impacted L2 outcomes in the Canadian but not in the Dutch sample, demonstrating the importance to examine refugee children’s bilingual language development in different national contexts.

Adopting an ecological approach to refugee experiences (Lustig, 2010), we investigated language outcomes of Syrian refugee children across two national contexts, Canada and the Netherlands. For several years in a row, Syria has been the main country of origin for refugees in Europe and in Canada. Millions of Syrian refugees crossed the Mediterranean Sea to Europe, where most went to Germany (532,100), followed by Sweden (109,000), Austria (49,000), and the Netherlands (32,100) (UNHCR, 2019). Tens of thousands have crossed the Atlantic Ocean to North America. Canada has admitted over 70,000 refugees from Syria (Government of Canada, 2020). The specific goals of this study were to determine: (1) whether L1 Syrian Arabic abilities predicted emergent L2 English and L2 Dutch abilities, and the role of length of L2 exposure in this relation and (2) if bilingual development patterns for these Syrian refugees were similar in Canada and the Netherlands.

Refugee experiences and language development
It is often assumed that children learn language almost automatically. However, language development, including simultaneous (Hoff, 2018) and sequential bilingual development (Pfenninger & Singleton, 2019), is impacted by the environmental conditions under which children learn a language, resulting in individual differences (Armon-Lotem & Meir, 2019;Chondrogianni & Marinis, 2011;Paradis, 2011). Previous studies on child bilingual development have investigated contextual factors in children with migration backgrounds (e.g., Spanish L1 speakers in the USA, Russian L1 speakers in Israel). While these populations share many characteristics with refugee children (e.g., learning and maintaining a minority L1 and majority L2, limited L1 resources, being educated in the L2, biculturalism), different pre-migration experiences (e.g., interrupted schooling and traumatic experiences), and often post-migration experiences (e.g., resettlement and integration challenges) may place refugee children in a more vulnerable position for learning in general, and language learning specifically.
For example, among young refugees, the prevalence of posttraumatic stress disorder (PTSD) is high, ranging between 19 and 54%, whereas in non-refugees, the estimates are between 2 and 9% (Bronstein & Montgomery, 2011). PTSD is linked to lower cognitive functioning and may affect learning negatively (Yasik et al., 2007). Parents who are refugees may struggle with mental disorders and settlement issues. Research has demonstrated that children with depressed parents have lower language outcomes (Ahun et al., 2017;Paulson et al., 2009). Less sensitive and responsive parenting behaviors of depressed parents, as well as less positive engagement behaviors and less reading to their children may contribute to this effect (Paulson et al., 2009;Sohr-Preston & Scaramella, 2006). Moreover, in the case of refugee children, fewer opportunities for L1 interactions may arise because family structures are often disrupted, potentially affecting L1 acquisition. In addition, L2 acquisition could be hampered because L2 learning opportunities are confined to school hours with few one-on-one interactions. Moreover, many refugee children experience interrupted schooling before and after arriving in the host country (Browder, 2018;Herzog-Punzenberger et al., 2017), negatively affecting the acquisition of L1 and L2 academic registers (Brown et al., 2006). These examples suggest that findings about bilingual language development from previous child L2 research do not necessarily generalize to refugee children, and that targeted research on the language development of refugee children is needed.
Cross-country research could add further insights, as illustrated by the two national contexts in which the current study is situated. In Canada, more than half of the Syrian refugees are government-assisted refugee families (Korntheuer et al., 2017) who are selected based on their vulnerability and whose initial resettlement is entirely supported by the government. In contrast, in the Netherlands, the majority of Syrian refugees enter the country using private funds. In general, the socioeconomic status (SES) of Syrian refugees will thus be lower in Canada than in the Netherlands. Lower SES (often indexed by maternal education) is associated with lower L2 outcomes of both migrants (Golberg et al., 2008;Prevoo et al., 2013;Sorenson Duncan & Paradis, 2020) and refugee children . In addition, Syrian refugee children in Canada are more likely than those in the Netherlands to have experienced adverse conditions in refugee camps, for example, faced violence, discrimination, poverty, and malnutrition (Harrell-Bond, 2000;Krause & Gato, 2019), which could all have an impact on cognitive and socioemotional well-being, which, in turn, could affect children's uptake of linguistic input .
The current study investigated Syrian refugee children across two countries to obtain insight into developmental risk factors for low language outcomes, focussing primarily on the role of the L1. Migrant children are at risk of a lower rate of L1 development, in particular, if L2 exposure starts early (Bedore et al., 2016). In the case of Syrian refugee children, low socioemotional well-being  has been found to be related to low L1 abilities. A well-developed L1 may, however, equip refugee children with an important experience that fosters L2 development, an issue to which we turn below.

L1-L2 interdependence
A prominent question in the field of child bilingualism, which has not yet been addressed for refugee children, is whether L2 learning depends on the L1, and if refugee children transfer abilities. Transfer of abilities is central in the well-known Interdependence Hypothesis (Cummins, 1979), which holds that access to abilities acquired in the L1 facilitates learning in the L2, and vice versa. The Interdependence Hypothesis posits that "on a cognitive level, the languages are not separate but connect with each other by means of a common underlying proficiency" (Cummins, 2017, p. 106). This type of transfer does not refer to the transfer of linguistic structures, which can result in either positive effects if structural properties of the L1 and L2 overlap or negative effects if structural properties of the L1 and L2 differ (Castilla et al., 2009;Larsen-Freeman & Long, 1991). Rather, it refers to abilities that facilitate learning and that are to some degree independent of language-specific structural linguistic properties. Cummins mentions, for example, abstract thinking, problem-solving, verbal reasoning, metalinguistic awareness, metacognitive and metalinguistic learning strategies, pragmatic aspects of language use, and conceptual knowledge as examples of transferable underlying proficiencies. Furthermore, this notion of transfer refers to processes by which both languages have access to the same store of knowledge, rather than to processes by which knowledge "moves" from one language to the other (Riches & Genesee, 2006;Rolstad & MacSwan, 2014).
In line with the assumption that the languages connect on a cognitive level, various studies demonstrate that lower level nonlinguistic literacy skills (e.g., phonological awareness, decoding abilities) may be more likely to be transferred than oral language skills (Lechner & Siemund, 2014;Melby-Lervåg & Lervåg, 2011;Proctor et al., 2017;Riches & Genesee, 2006;Rolstad & MacSwan, 2014;Verhoeven, 1994). For example, Verhoeven (1994), who investigated Turkish L1/ Dutch L2 children reports limited predictive effects for L1 vocabulary and morphology measures, in contrast with L1 phonological awareness and pragmatic knowledge, which showed respectively moderate and strong relationships with L2 phonological awareness and pragmatic knowledge.
Other studies do find support for L1-L2 relationships in oral language. The metaanalysis of Melby-Lervåg and Lervåg (2011) shows a significant, albeit weak, correlation between L1 and L2 oral language based on measures of vocabulary. In addition, Goodrich et al. (2016) found that bilingual Spanish-English children did readily acquire translational equivalents of words known in one language but not the other, suggesting that they transfer specific concepts. Further support for transfer in the domain of vocabulary comes from a study in which a vocabulary intervention in children's home language/L1 (English) was associated with vocabulary gains in the school language/L2 (Hebrew) (Armon-Lotem et al., 2021). Studies that examined correlations between L1 and L2 grammatical outcomes report more mixed findings (Méndez et al., 2019). For example, Gottardo (2002) did not find significant cross-language relationships for Spanish-English bilinguals, whereas Castilla and colleagues (2009) found that L1 Spanish grammatical abilities predicted L2 English grammatical abilities.
As the two languages of a bilingual may share a conceptual store (Kroll & Stewart, 1994;Kroll et al., 2010), transfer effects in the domain of vocabulary could indeed be expected, as vocabulary items consist of mappings between phonological forms and conceptual representations. In the domain of grammar, transfer effects may be related to metalinguistic awareness, specifically morphological and syntactic awareness. Morphological awareness refers to the ability to combine and decompose words into morphemes, while syntactic awareness refers to knowledge of the rule system that governs how words are combined into larger units. Several studies show that bilingual children transfer morphological awareness (for an overview, see Chen & Schwartz, 2018), and suggest that joint experiences in two languages are associated with heightened morphological and syntactic awareness (Bialystok et al., 2014). Rauch et al. (2012) argue that metalinguistic awareness is closely related to the concept of language aptitude, which is the potential that an individual has to learn languages. It would thus be expected that children who are proficient in the L1 and have developed metalinguistic awareness in this language, will also be relatively proficient in the L2 because they transfer their metalinguistic awareness ability.
In sum, findings on L1-L2 interdependence in oral language are discrepant, warranting further research. Moreover, it is unknown to what extent the specific bilingual group of refugee children transfers abilities that they have developed in the L1 to promote learning the L2. It is not uncommon that policymakers and educators view the L1 as an obstacle for the successful L2 development of migrant children (Cummins, 2000;Van Der Wildt et al., 2017), instead of a resource. L1 skills are, however, important for socioemotional development, harmonious family relations, and for the overall well-being of migrant children (DeCapua & Wintergerst, 2009;Han, 2010;Liu et al., 2009;Tannenbaum & Berkovich, 2005). If positive relationships exist between the L1 and L2 of refugee children, this would provide additional grounds for promoting L1 maintenance.

Length of L2 exposure as a moderator
Interdependence between the L1 and L2 may be moderated by and depend on contextual factors (Verhoeven, 1994). For example, L2 exposure may influence L1-L2 relationships because interdependence might be delayed and not present at the outset of L2 development. The reason for this discontinuity in L1-L2 interdependence could be that a certain degree of L2 experience is required before children can map words in the new language to known concepts, or recruit and apply their metalinguistic abilities to foster L2 learning.
The moderating role of L2 exposure fits in with the Linguistic Threshold Hypothesis (Cummins, 1979), which holds that the learner needs a sufficient level of bilingualism to benefit from their bilingualism by experiencing, among other things, improved linguistic skills. Empirical evidence for linguistic thresholds is limited, because few studies have researched this topic (Prevoo et al., 2015;Rolstad & MacSwan, 2014). Feinauer et al. (2017) found that among young Spanish-English bilinguals the rate of transfer depends on oral proficiency in the L2. Prevoo et al. (2015) observed that children's relative use of Turkish L1 and Dutch L2 affected the relation between Turkish and Dutch vocabulary growth. Specifically, in their study, the relationship between growth in the two vocabularies was stronger for children who used relatively more Turkish, suggesting that sufficiently developed L1 skills enable children to use the L1 as a resource. To our knowledge, no previous research has investigated the length of L2 exposure as a moderator of positive relationships between refugee children's L1 and L2 abilities, or, more in general, in the abilities in the L1 and L2 of child second language learners. We do so in the present study by employing a sentence repetition task to measure L1 and L2 outcomes.

Present study
Few studies have investigated the language development of refugee children, even though the bilingual development in this group of children raises concerns (Kaplan et al., 2016). Accordingly, in this study, we employed different adaptations of the Litmus Sentence Repetition Tasks (Litmus-SRTs) in order to tap into L1 and L2 syntactic knowledge of children who are Syrian refugees (see for a similar approach Hamann et al. (2020)). We did so with two main objectives in mind: (1) to investigate risk factors for refugee children's L2 abilities, with a particular focus on the role of L1 abilities and the interaction between L1 abilities and length of L2 exposure and (2) to compare patterns across two different national contexts, Canada and the Netherlands.
In terms of the first research goal, we asked: To what extent are L2 outcomes of Syrian refugee children living in Canada and the Netherlands measured with the Litmus-SRT predicted by L1 outcomes, length of L2 exposure, and the interaction between L1 outcomes and length of L2 exposure? Correlations between the L1 and L2, and whether L1 outcomes predict L2 outcomes provide information on L1-L2 interdependence and transfer of linguistic abilities. From prior research, we expected the length of L2 exposure to affect L2 outcomes (Blom et al., 2012;Chondrogianni & Marinis, 2011;Paradis et al., 2017;Roesch & Chondrogianni, 2016;Sorenson Duncan & Paradis, 2020). Crucially, and in line with the Threshold Hypothesis, we predicted that L2 exposure would moderate L1-L2 interdependence.
In order to obtain a cross-national perspective, we conducted the same research across two national contexts: Canada and the Netherlands. Collecting data in two independent samples is important for replication purposes and determining the robustness of results. It furthermore allows us to detect contextual differences that could be related to refugee policies that vary from country to country. With regards to the second research objective, we asked: To what extent are patterns found for Syrian refugee children in Canada and the Netherlands comparable, and could different patterns be related to maternal education and refugee camp experiences? Maternal education and refugee camp experiences were singled out as potentially relevant variables as Syrian refugees in Canada and the Netherlands are likely to differ on these variables, as discussed above. As argued in the introduction, these variables could affect L2 development, potentially modulating L1-L2 relations differently in the two national contexts. Tentatively, and based on findings suggesting that a higher level and more use of the L1 and L2 promote transfer (Feinauer et al., 2017;Prevoo et al., 2015), we predicted that interdependence between the L1 and L2 could be weaker in Canada than in the Netherlands, because increased refugee camp experiences and lower SES for the Syrian refugees in Canada may imply lower levels and less use of the L1 and L2.

Participants
Data for the present study were collected within two partly parallel projects in Canada and the Netherlands. The Canadian project is a longitudinal study among children who migrated to Canada that started in 2018 and continues until 2021. In total, 133 children participated in the first data collection of the Canadian study (year 2018), and 122 participated in the second one (2019) due to participant attrition (i.e., participants becoming untraceable or moving away). The data from either rounds 1 or 2 of 56 children from the Canadian sample were selected for the purpose of having two samples across the two nations that were comparable in age and L2 exposure (age at testing and arrival, and length of residence and L2 schooling). There were 30 boys and 26 girls. The average age of the children was 8 years and 8 months (SD = 1.52, range = 6.08-12.33). The Dutch project was of a smaller scale and was not longitudinal. Data were collected between October 2018 and January 2019 from 52 Syrian children. To achieve the optimal matching, data from 5 children with 50 months of residence in the Netherlands or more were excluded from the present study. The final Dutch sample included 16 boys and 31 girls. The average age of the children was 8 years and 7 months (SD = 1.87, range = 5.67-12.17). Further details on the trimming and matching processes for the two samples are given in the Matching section below. Information on the characteristics of the matched samples is described in the Results section.

Litmus Sentence Repetition Task (Litmus-SRT)
Litmus-SRTs were used to test L1 and L2 skills. SRTs tap into lexical knowledge (Klem et al., 2015) and verbal short-term memory (Alloway & Gathercole, 2005), but they are mostly a measure of syntactic competence (Frizelle & Fletcher, 2014;Polišenská et al., 2015). The Litmus-SRT was conceptualized and developed within the COST Action IS0804 (Language Impairment in a Multilingual Society: Linguistic Patterns and the Road to Assessment) in order to create comparable SRTs for a wide range of different languages to be able to assess bilingual children's language abilities in both languages (Marinis & Armon-Lotem, 2015). An increasing number of studies have used this instrument for research purposes. 1 The Syrian Arabic Litmus-SRT consists of 31 sentences: 6 simple sentences in the active voice (with modals, negation, and adjectives), 3 coordinated sentences, 3 simple sentences in the agentless passive voice, 3 object topicalizations, 6 questions (2 who, 2 what, 2 which), 4 subordinate clauses, and 6 relative clauses. The English Litmus-SRT consists of 31 sentences as well: 6 simple sentences in the active voice (with modals and auxiliaries), 3 coordinated sentences, 6 simple sentences in the passive voice (3 short and 3 long passives, i.e. without and with a by-phrase), 6 questions (2 who, 2 what, 2 which), 4 subordinate clauses, and 6 relative clauses. The Dutch Litmus-SRT consists of 30 sentences: 9 simple sentences in the active voice (with modals, negation, and combinations of auxiliaries and modal verbs), 6 simple sentences in the passive voice (3 short and 3 long passives), 6 questions (2 who, 2 what, 2 which), 3 subordinate clauses, and 6 relative clauses. All stimuli are included in Appendix 1. The Litmus-SRTs were designed to be of comparable difficulty in the three languages so that correlations across languages would be meaningful. To this end, stimuli included structures that were of a similar syntactic complexity across languages. Whether transfer as defined by Cummins (1997) is dependent on typological overlap is not well researched. However, major differences between the two national settings are unlikely as L1 Syrian Arabic is an Afro-Asiatic Semitic language, whereas L2 English and L2 Dutch are both West Germanic languages and typologically closely related.
Alberta Language Environment Questionnaire-4 (ALEQ-4, Paradis et al., 2020) The ALEQ-4 was completed by the parents and was used to collect information on the participants, their language use, and pre-migration experiences (e.g., time spent in refugee camps, schooling in Arabic). In terms of language use, parents were asked to indicate how much Arabic versus English/Dutch children used with older and younger siblings and with parents and other relatives. This information was collected using 1-5 descriptors (1 = Mainly or only Arabic, 2 = Usually Arabic/L2 sometimes, 3 = Arabic and L2, 4 = Usually L2/Arabic sometimes, 5 = Mainly or only L2). Language input to the child and language output from the child were assessed separately. Parents were asked about the frequency with which their children engaged in language-rich activities in English/Dutch and Arabic in a given week using a 1-5 scale (1 = 0-1 hr, 2 = 1-5 hr, 3 = 5-10 hr, 4 = 10-20 hr, 5 = 20 hr). Activities included listening/speaking activities (television, YouTube, WhatsApp, music), reading/writing activities (books, websites, messaging), playing with friends, and extracurricular activities (homework clubs, sports, religious activities). Individual rating scale scores were obtained and composite scores, estimating the richness of the English/Dutch and Arabic environments, were calculated by adding the rating scale numbers and dividing by the total number of scales to generate a proportion score. Parents were also asked questions about their own education background (including English/Dutch training), the number of children in their family, their self-rated fluency in English/Dutch (using 1-5 descriptors; 1 = Not fluent, 2 = Limited fluency, 3 = Somewhat fluent, 4 = Quite fluent, 5 = Very fluent) and use of English/Dutch outside the home (also using 1-5 descriptors; 1 = 0-1 hr, 2 = 1-5 hr, 3 = 5-10 hr, 4 = 10-20 hr, 5 = 20 hr). The complete ALEQ-4 is available as online Supplementary Materials in Paradis et al. (2020).
Strengths and Difficulties Questionnaire (SDQ) (Goodman, 1997) This screening questionnaire is composed of 25 items and can be used to assess emotional and behavioral difficulties in children aged 4-16. The SDQ produces five subscales: hyperactivity, conduct, emotional, peer relationship problems, and prosocial behavior. Scores can be considered independently, amalgamated into externalizing (hyperactivity and conduct) and internalizing (emotional and peer problems) scores, or all combined into one total difficulties score (Goodman et al., 2010). With validated translations into over 30 languages, including Arabic, the SDQ is one of the most widely used questionnaires on child mental health worldwide (Goodman & Scott, 1999). However, prior studies have argued that the cross-cultural validity of certain items within this test may not be completely adequate (Kersten et al., 2016;Thabet et al., 2000).
For this study, parents completed the SDQ for each participant. Both in Canada and the Netherlands, the validated Arabic translation of the SDQ was used.

Procedures Recruitment
In Canada, participants were recruited in three English-majority cities, Edmonton, Toronto, and Waterloo, after ethics approval. In the three cities, members of the Arabic community who were involved in the project as research assistants disseminated the call for participants within the community. In Toronto, principals in public schools serving refugees were also asked to disseminate the study to students' parents. Finally, some participants were recruited via word-of-mouth. Participants who were interested and fulfilled the requirements (i.e., were of Syrian origin and arrived in Canada as refugees) were recruited. In the Netherlands, participants were recruited in seven cities and towns, Maarssen, Breukelen, Houten, Overvecht, Woerden, Nieuwegein and Loenen aan de Vecht, all located in the Dutch province of Utrecht. A Syrian research assistant recruited the participants through her network, among which were directors of Arabic weekend schools, and snowball sampling. In addition, social media was used. A text with relevant information for possibly interested participants was posted on Facebook in groups of the Syrian community in the Netherlands. The same text was shared via Whatsapp within Syrian networks. Parents who were interested contacted the research assistant. Their informed consent was obtained prior to data collection.

Testing
In Canada, children were either tested in their homes or in a quiet room at a school. The Litmus-SRT was part of a larger test battery (see Paradis et al., 2020). Language order was counterbalanced across participants in the full sample, so that half of the participants completed the English tasks first and half the Arabic tasks. Task order was randomized across participants. Parents completed the ALEQ-4 as an interview delivered in Arabic by a native speaker of the language. Literate parents completed the Arabic version of the SDQ about their child on their own, filling in a paper copy. Parents with low literacy skills completed the SDQ as an interview. Whether parents had sufficient skills to fill the SDQ in on their own was determined by parents themselves. In the Netherlands, most participants were tested in a quiet room at home. This was not always their own home: in each city or town, a Syrian parent offered her home for testing. A schedule was created to test the children individually and avoid overlap between participants. Part of the participants was tested in a quiet room at a neighborhood center where the Arabic school is held. Language order was counterbalanced across participants, so that half of the participants started with the Dutch SRT, and half the Arabic SRT. Arabic ALEQ-4 copies and validated Arabic translations of the SDQ were sent by post to the participants. Most parents preferred to fill in the paper copy of the questionnaire while having the research assistant over the phone.

Scoring
SRTs can be scored in different ways depending on the purpose. In both samples, SRTs were initially scored in two manners, verbatim and structural accuracy. The results of the current study are presented for verbatim scoring. In this type of scoring, a participant's response is scored as 1 if the target sentence is repeated entirely verbatim and as 0 if the participant makes one or more changes to it. Disfluencies and phonological errors are not considered as changes. The structural accuracy scoring considered the nature of the repetition of the target syntactic structure only, disregarding errors that did not affect the target structure. In the literature, comparisons of scoring schemes, including verbatim and structural accuracy scoring, have yielded different results (Abed Ibrahim & Fekete, 2019). For this reason, we also provide information on structural accuracy scoring in Appendix 2. Since the structural accuracy scoring correlated strongly with the verbatim scoring and analyses using both as outcome variables yielded largely the same results, we did not retain the structural accuracy scoring in the main text for reasons of focus.
In Canada, seven research assistants who were native speakers of English and four research assistants who were native speakers of Arabic transcribed and scored the Litmus-SRTs for the two data collection periods. Twenty-five percent of the recordings in each language in each round were transcribed and scored by a second research assistant. An α coefficient of inter-rater reliability was obtained for each language for each round, and they all were between .91 and .94. In the Netherlands, three research assistants who were native speakers of Dutch and two assistants who were native or near-native speakers of Syrian Arabic transcribed and scored Litmus-SRT; 25% of the Litmus-SRT was independently scored a second time. The α coefficients of inter-rater reliability were 1.00 for Dutch and .99 for Arabic, indicating almost perfect agreement (Cohen, 1960).

Matching
Matching of the two samples was done by manually trimming the Canadian sample considering participants at both data collection periods. Initial preliminary analyses of the Canadian sample (n = 133) indicated that participants were significantly older and had a significantly shorter residence in the host country than those in the Dutch sample. These two variables were the only two considered during the trimming: older participants and participants with shorter residence were trimmed until samples were matched according to inferential tests. Of the final 56 participants from the Canadian sample, the data for 35 corresponds to the first time of data collection, and the data for 21 corresponds to the second time, with no participants being included at both time points. As mentioned above, 5 participants of the original Dutch sample (n = 52) were also eliminated because they exceeded the Canadian range in length of residence significantly.

Data analysis
Descriptive and inferential statistics for this study were carried out using R (version 3.6.3; R Core Team, 2020). Specific information about the analytical approach is included below according to the research objective is addressed.

Description of the matched samples
In order to compare the two samples and to explore the relations between different variables within each sample, we ran parametric two-tailed independent samples t tests and Pearson correlations using the base package. Cohen's d effect sizes were calculated using the package effsize (version 0.7.8; Torchiano, 2020). In describing the L1-L2 performance of the matched samples on the Litmus-SRT, we employed two-tailed paired samples t tests.
Association between L2 outcomes and L1 level/L2 exposure We address our first research objective in two ways. First, we employed Pearson correlations to investigate the relationship in performance in the two languages and their association to L2 exposure. Second, for each sample, we ran separate generalized mixedeffects regression models with a binomial distribution using the lme4 package (version 1.1-21; Bates et al., 2015). For each sample, the model included L2 verbatim scores as the dependent variable, which could take the value 1 (verbatim repetition) or 0 (non-verbatim repetition). The random-effect structure included one random intercept for Participant and one for Item, without any random slopes. The initial model contained three predictors (i.e., fixed effects): L1 verbatim score, length of L2 exposure, and the interaction between the two. We then followed backwards selection on the fixed effects, which were centered around 0 and standardized to improve the interpretability of the intercept and prevent any issues related to differences in scale. At each step, the predictor with the highest p-value was eliminated and the reduced model was compared to the previous model using a log likelihood ratio test with the anova function, and visual inspection of the AIC and BIC values. The reduced model was retained when it did not entail a significant loss of model fit. The optimal models for the two samples are discussed in detail in the Results section. The C-index of concordance for these models was computed using the somers2 function of the Hmisc package (version 4.3-1; Harrell, 2020), following Baayen (2008).

Effect of maternal education and refugee camp experiences on L2 outcomes
The Canadian and Dutch samples are likely to differ on SES/maternal education and refugee experiences. To address our second research goal of exploring the effect of such differences, we performed follow-up analyses. The restricted sample size limited the number of predictors we could include in the models without overfitting them. Thus, we fitted follow-up models where only one predictor was added to the optimal models for associations between L1 level/length of L2 exposure and L2 outcomes. To evaluate model fit, the above procedures were employed.

Description of matched samples
Demographic information about the two samples is summarized in Table 1 (see  Appendix 3 for a correlation matrix of these variables). The two groups did not differ in age (testing, arrival), length of residency in Canada/Netherlands, and schooling in the L2 (English/Dutch). In addition, there were no differences in Arabic schooling, richness of Arabic in the home, and in maternal and paternal interaction and fluency in the L2 (English/Dutch). In the Dutch sample, there was only one child with refugee camp experience. In the Canadian sample, 24 children had refugee camp experiences of Information was obtained for input to the child and output from the child and was subsequently averaged into Language use (Scale: 1 = Mainly or only Arabic, 2 = Usually Arabic/L2 sometimes, 3 = Arabic and L2, 4 = Usually L2/Arabic sometimes, 5 = Mainly or only L2). b Scale is an average of language use with mother and father. c Scale is an average of older and younger siblings. Age and exposure can be related to L2 children, complicating the interpretation of these variables (Unsworth & Blom, 2010). In the Canadian sample, length of L2 schooling showed a positive correlation with age at testing, r(54) = .34, p = .01 indicating that older children tended to have longer L2 schooling (see Appendix 3). The strength of the correlation is weak, showing that there is relatively little overlap between age and length of L2 schooling. Length of L2 schooling did not correlate with age of arrival, r(54) = .07, p = .60. In the Dutch sample, the correlation between L2 schooling and age at testing, r(45) = .25, p = .09, and between L2 schooling and age at arrival, r(45) = −.16, p = .27, did not reach significance. In the present study, we use the length of L2 schooling, and not length of residency in the host country, as a proxy for L2 exposure since onset of and sustained exposure to the L2 are more likely associated with schooling in these newcomer families.
In addition to the demographic and linguistic information presented in Table 1, information was collected about the children's socioemotional well-being, and the extent to which children showed internalizing and externalizing problem behaviors, combined into a total difficulties score where higher numbers indicate more socioemotional difficulties. In the Canadian sample, the M(SD) total difficulties score was 9.60 (4.00), and in the Dutch sample, it was 8.64 (5.10). This difference was not statistically significant (t(87.165) = 1.0123, p = .31). It should be noted that nine participants from the Canadian sample are not included in the calculation of the total difficulties score due to missing data.
In both samples, participants obtained higher scores in the L1 compared to the L2 (see Table 2). Specifically, in the Canadian sample, a two-tailed paired samples t test showed a significant difference, t(55) = 7.8018, p <.001 (Cohen's d = 1.27; large effect size). Similarly, a two-tailed paired samples t test showed a significant difference in the Dutch sample, t(46) = 7.1748, p <.001 (Cohen's d = 0.93; large effect size). These results suggest that most children are (still) dominant in L1 Arabic. This language dominance pattern is somewhat stronger in the Canadian compared to the Dutch sample, as shown by the larger effect sizes in the former sample.

Effect of L1 level and L2 schooling on L2 outcomes
As a first approach of investigating the relationship between L1 and L2 performance and L2 schooling, we ran Pearson's correlations. The results in Table 3 demonstrate that L1 and L2 outcomes are positively correlated and correlations reach significance, though in the Canadian study this correlation shows only a trend (see also Table A4 in Appendix 2). L2 outcomes are positively and significantly correlated with the length of L2 schooling in both samples. Overall, very similar patterns arise in both samples, but the strength of the relations is considerably stronger in the Dutch compared to the Canadian sample.
To fully address our first objective, we ran a mixed-effects logistic regression for each sample (see Data analysis section). For the Canadian sample, the optimal model is shown in Table 4. For questions regarding the interpretation of logistic regression model results, which are in log-odds, we direct readers to Winter (2020). In this sample, L1 verbatim scores and length of L2 schooling were both positive predictors, and the model with the interaction between these two factors was not a better fit than the model without this interaction, χ 2 (1, 6) = 1.8731, p = .17. The optimal model had a C-index of concordance of .91, indicating excellent discrimination (Hosmer et al., 2013). Table 5 shows the optimal model for the Dutch sample. In this sample, the interaction between L1 scores and length of L2 schooling was significant and we unpack this interaction below. This model also had outstanding discrimination (C = .91). .64*** Note: . = p < .10, * = p < .05, ** = p < .01, *** = p < .001. In order to understand the relationship between L1 scores and L2 schooling in the Dutch model, we plotted the marginal effects of these interaction terms (Figure 1) and, using visual binning with one cut-point, compared subsamples based on shorter and longer length of L2 schooling ( Table 6). The interaction depicted in Figure 1 can be broadly interpreted in the following way: in the Dutch sample, the effect of the length of L2 schooling is modulated by L1 performance, with L2 schooling having a greater effect on participants with better L1 performance. That is, participants with short L2 schooling, regardless of L1 performance, are unlikely to repeat the L2 sentence verbatim (i.e., L1 performance has little effect on participants with short L2 exposure). On the other hand, for participants with longer L2 schooling, those with better L1 performance are more likely to repeat the L2 sentence verbatim. Likewise, Table 6 shows that in the Dutch sample, the correlation between L1 and L2 is considerably stronger for the long L2 exposure subsample compared to the short L2 exposure subsample. What Table 6 also demonstrates is that although the interaction effect between L1 level and L2 schooling did not reach significance for the Canadian sample, a similar pattern can be observed.

Effect of maternal education and refugee camp experiences on L2 outcomes
The model with maternal education for the Canadian sample, shown in Table 7, had a significantly better fit than the model without this variable (Table 4); χ²(1, 6) = 10.858, p < .001. According to the results, there was an association between more educated mothers and stronger L2 performance. Adding maternal education to the optimal model for the Dutch sample shown in Table 5 did not result in a better fit; χ²(1, 7) = 0.4464, p = .50.
Adding refugee camp experiences to the model without it (Table 4) resulted in a significantly better fit for the Canadian sample, χ²(1, 6) = 8.4763, p = .004. As shown in the model output in Table 8, the time spent in a refugee camp was a negative predictor, indicating that there was an association between longer time spent in refugee camps and lower L2 verbatim scores. We did not attempt to replicate this model in the Dutch sample since only one participant had spent time at a refugee camp.

Discussion
In this study, we investigated the syntactic abilities of 6-to 12-year-old Syrian refugee children in Canada and the Netherlands using a multilingual assessment instrument, the Litmus-SRT. Our first goal was to determine whether or not children's L1 Syrian Arabic abilities are related to L2 English and Dutch abilities, and to establish if such L1-L2 interdependence is moderated by the length of L2 exposure. Positive relationships between the L1 and L2 indicate that the L1 could be a resource for L2 learning, instead of an obstacle. Our second goal was to determine if patterns are the same in samples from Canada and the Netherlands and whether different patterns could be due to cross-national differences. Notable differences between the two cross-national samples, anticipated beforehand based on refugee policies, were a lower maternal education and more refugee camp experiences in the Canadian compared to the Dutch sample. Other crossnational differences included family size, with Canadian families being significantly larger, and language environments, with Dutch participants using a larger proportion of L2 in the home than Canadian participants and having comparatively richer L2 environments. The two samples were comparable in the age of arrival and testing, in participant mental health difficulties, and in length of residence and L2 schooling.
Participants in both samples scored better in L1 Syrian Arabic than in L2 English and Dutch in the Litmus-SRT, demonstrating that participants are still L1-dominant. L1 dominance patterns were slightly more pronounced in the Canadian sample. We hypothesize that this could be related to the lower maternal education in this sample. Different mechanisms could drive the relation between L1 versus L2 development and maternal education. Mothers who are more educated tend to use the L2 more frequently with their children, while less educated mothers use the L1 relatively more often (Prevoo et al., 2013). This link between maternal education and L1 versus L2 use at home is reflected in our study: as mentioned above, Dutch participants were exposed to the L2 slightly more in the home and had higher L2 richness than Canadian participants. More L2 use at home could hypothetically help L2 development; this hypothesis is, however, not widely supported for schoolaged children (Chondrogianni & Marinis, 2011;Kaltsa et al., 2019;Paradis et al., 2017), and it is more likely that a higher degree of L2 use negatively affects children's L1 development and/or that more highly educated mothers have more means to create opportunities for L2-rich activities for their children. In addition, the A common approach to identify L1-L2 interdependence is to establish the amount of shared variance and investigate cross-language correlations. Previous research on cross-language correlations in the domain of grammar (morphology, syntax) reports discrepant results, raising the question of whether there is L1-L2 interdependence and whether for L1 and L2 learning a bilingual child has access to the same store of knowledge, as has been posited by the Interdependence Hypothesis. Our results demonstrate that L1 Syrian Arabic predicts children's L2 English and Dutch outcomes. These findings tie in with those of Castilla and colleagues (2009) who found that Spanish L1 morphosyntax abilities predicted English L2 morphosyntax, but differ from those of Gottardo (2002) and Verhoeven (1994) who found limited evidence for L1-L2 relations in the domain of grammar. Typological proximity (Larsen-Freeman & Long, 1991) does not explain the different findings, as both Castilla et al. (2009) and Gottardo (2002) investigated Spanish L1/English L2 children and our study showed interdependence between L1 and L2 abilities, despite typologically distant L1 and L2s. In order to understand the differential findings on the interdependence of grammar skills, timing and exposure may be relevant as this transfer of skills may be dependent on children's L1 and L2 proficiency levels (Cummins, 1979;Feinauer et al., 2017;Prevoo et al., 2015). Sufficient L1 knowledge to positively boost the L2 may be more likely at older ages (Castilla et al., 2009), and in children who use the L1 frequently (Prevoo et al., 2015). Although age does not explain why Castilla et al. (2009), who tested younger children than Gottardo (2002), found more robust evidence of interdependence, a relatively well-developed and frequently used L1 may have supported this cross-linguistic association in the children we investigated. Both the Canadian and the Dutch samples showed that children with longer L2 exposure performed better in the L2, replicating previous research (Blom et al., 2012;Chondrogianni & Marinis, 2011;Paradis et al., 2017;Roesch & Chondrogianni, 2016;Sorenson Duncan & Paradis, 2018). Length of L2 exposure modulated the effect of L1 on L2 abilities in the Dutch sample in line with the Threshold Hypothesis (Cummins, 1979), confirming that children need to have some L2 experience to benefit from their L1 skills. Prevoo and colleagues (2015) also found support for moderated L1-L2 relationships. Their results highlight the importance of frequent L1 use to facilitate the cross-linguistic transfer of skills in bilingual Turkish-Dutch children, in contrast with our results, which demonstrate the importance of L2 experience for bilingual Syrian Arabic-Dutch children. Whereas, the children in our study have recently arrived in the Netherlands, are L1dominant, and use the L1 at home most of the time, the children studied by Prevoo et al. (2015) are second or third-generation migrants in the Netherlands who are exposed to the L2 from birth and are raised in homes where the L2 is frequently used. Thus, for children who have strong L1 skills and are relatively new to the L2, interdependence relies more heavily on sufficient L2 experience, while for children whose L1 skills are more under pressure, sufficient L1 experience is more important. In the Canadian sample the interaction effect between L1 ability and length of L2 exposure did not reach significance or improved the model fit. However, the L1-L2 correlations in subsamples that differed in length of L2 exposure suggested that also in the Canadian sample L1-L2 correlations were stronger for children with longer L2 exposure, in line with the Dutch data.
In sum, findings from both samples reveal significant L1-L2 interdependence, indicating that 6-to 12-year-old refugee children's L1 abilities form the basis of common underlying proficiencies that promote L2 learning even though the L1 and L2 are typologically distant. The positive L1-L2 relationships underscore the conclusion that L1 knowledge does not hinder L2 development, but might, in fact, be a useful resource for children.

Exploration of modulating factors: maternal education and refugee camp experiences
Due to different refugee policies, the Syrian refugee populations in Canada and the Netherlands are different. We singled out two variables that presented crossnational differences, maternal education, and refugee camp experiences, as factors that could weaken the associations between the L1 and the L2 in the Canadian sample specifically. While L1-L2 patterns were very similar across the Canadian and Dutch samples, supporting their robustness, they were more pronounced and stronger in the Dutch sample, in line with our predictions.
Interestingly, the two national contexts differed with respect to the role of maternal education. Both maternal education and time spent in a refugee camp predicted L2 outcomes in the Canadian sample, in addition to L1 outcomes and length of L2 schooling. Maternal education did not contribute to explaining variation in L2 Dutch outcomes, despite a higher mean than in the Canadian sample but similar spread. In line with various other studies, the Canadian sample showed that more educated mothers had children who scored better in the L2 (Golberg et al., 2008;Sorenson Duncan & Paradis, 2020). These mothers may have more resources and cultural capital to stimulate their children's development in the L2 through, for example, literacyrelated activities (Prevoo et al., 2013). However, this does not seem to be the case since, as shown in Appendix 3, the correlation between L2 richness and maternal education is not significant in the Canadian sample (r(54) = −0.02, p = .87). It is also unlikely that more educated mothers support their children's L2 skills via oral interactions in the Canadian sample since Syrian Arabic was reported to be used most of the time in the home, as shown in Table 1. Overall low L2 use, including L2 use of highly educated mothers, could explain why maternal education did not predict L2 outcomes in the Dutch study, but it does not explain why maternal education did predict L2 outcomes in the Canadian study.
An alternative, and perhaps most likely explanation of why maternal education predicts L2 outcomes in Canada but not in the Netherlands is that English, unlike Dutch, is present in Syria. Also, English is taught as a subject at Syrian schools so more educated mothers are more likely to help their children in the L2 from the very beginning. Thus, the onset of using English is different from the onset of using Dutch, which may have helped Syrian highly educated mothers in Canada to support their children with school tasks in the L2, stimulating their children's L2 development, even though they do not tend to use the L2 in conversational interactions with their children. Support for this hypothesis is provided by the correlations and is shown in Appendix 3. The correlation between maternal education and maternal L2 fluency is significant in the Canadian sample (r(54) = .49, p < .001) but not in the Dutch sample (r(45) = .17, p = .25). On the other hand, the correlation between maternal L2 fluency and length of residency in the host country is not significant in the Canadian sample (r(54) = .03, p = .81) but it is in the Dutch sample (r(45) = .60, p < .001). This means that Syrian mothers in the Netherlands are more fluent in Dutch due to longer residency in the host country, whereas Syrian mothers in Canada are more fluent in English due to higher education in the home country. This finding supports our hypothesis that Syrian mothers in Canada were better equipped than those in the Netherlands to support their children's L2 development from the beginning of their residency in the host country, explaining the diverging model fits. This is a possible hypothesis at this time that remains for future research to be tested.
Refugee camp experiences were not investigated in the Dutch study, as hardly any children had such experiences. Time spent in refugee camps was negatively related to L2 outcomes in the Canadian study. Living in a state of limbo, with no access to education and faced with a lack of freedom, children's development may slow down in refugee camps. In addition, they can face adverse conditions that permanently limit their ability to learn, such as malnutrition (Harrell-Bond, 2000), affecting not only the uptake of linguistic input, which, in our study could be reflected in weaker effects of length of L2 exposure, but also the strength of positive L1-L2 relations. The influence of refugee camp experiences on L2 outcomes is important for clinical practice in terms of exclusionary factors for attributing low language abilities to a congenital language disorder. Overall, the cross-national differences observed across the two samples in this study demonstrate the need to consider factors at the macrosystem level, which may indirectly affect refugee children's bilingual development even after resettlement (Lustig, 2010).

Limitations and future research
To compare the Canadian and Dutch samples, we matched them on age and exposure. Matching increases the comparability of the samples, but it also reduced the sample size, which limited the number of predictors that we could simultaneously evaluate in the statistical models. The second limitation concerns the sampling. The samples were not a priori recruited to be representative of the refugee populations in both countries, and although they reflected what would be expected based on national refugee policies, they may not be fully representative. Third, the correlational data in our study do not provide insight into specific transfer mechanisms. In relation to this, it should be noted that the L1-L2 correlations do not necessarily stem from L1 to L2 dependencies (Castilla et al., 2009). L1 development shows stable and meaningful individual differences, which are linked to endogenous factors (Kidd & Donnelly, 2020). Some children are thus better language learners than others in both the L1 and the L2. Language learning ability, and specific mechanisms that underlie language learning such as verbal short-term memory, statistical learning, and analytical reasoning, would thus be a relevant third variable. Future research could investigate the role of third variables by researching L1-L2 relations factoring out the variance shared through general language learning ability.

Conclusions
This study on 6-to-12-year-old bilingual L1 Syrian Arabic children with refugee backgrounds found evidence of L1-L2 interdependence in terms of syntactic skills. Interdependence may however not surface in the earlier phases of L2 development when children have had only limited L2 exposure. Interdependence between the two languages provides an argument in favor of L1 maintenance in order to support L2 development. The Canadian and Dutch samples showed overlap in many demographic and linguistic variables, but there were important differences related to maternal education level and refugee camp experiences, and unique patterns of interdependence. Maternal education level impacted L2 outcomes in the Canadian but not in the Dutch sample. Refugee camp experiences predicted L2 outcomes in the Canadian sample, while in the Dutch sample, hardly any child had refugee camp experiences. It is important to examine bilingual language development in refugee children in different contexts, as they may fare differently. They have been riding the bicycle around the backyard.
They are eating the bananas in the park.
The kitten could have bounced the ball down the stairs.
The boy must sweep the floor in the kitchen.
The teacher has been looking at us all day.
Short passive (k=3) She was stopped at the big red lights.
The children were taken to the office.
He was pushed hard against the ground.
Long passive (k=3) The cow was kicked in the leg by the donkey.
She was seen by the doctor in the morning.
The mother was followed by the girl.
(Continued) Question (k=6) What did the mother cook in the evening?
Who have they seen near the front door?
Which picture did he paint at home yesterday?
What did the father buy last month?
Who did the girl meet in the library yesterday?
Which drink did the neighbour spill in the house?
Coordinated (k=3) The mother is shopping and the child is studying at home.
The dog barks outside and the child cries inside.
Our neighbor cleans the car and his son plays basketball.

Subordinate (k=4)
If the weather is warm, we can go to the park.
Before the girl eats dinner, she will play with the computer.
The children will get a present if they clean the house.
The child ate breakfast after he washed his face.
Relative (k=6) The boy that the neighbour helped has lost his way.
They should wash the baby that the mother is feeding.
The horse that the farmer pushed kicked him in the back.
The mother made the meal that the children are eating.
The children enjoyed the candy that they tasted.
The team that my brother cheered for won the race. ndaefaeʃ bʔuwe ʕaelʔardˤ PASS-pushed in-hard to-the-ground 'He was pushed hard against the ground.' txaeːlafet ʕaeliʃaeːra lħaemra PASS-fined-3SG.F on-the-light the-red 'She got a fine at the red light.' Topicalization (k=3) lʔm l e ħʔa e s ˤsˤabi ʕaeʃʃaːreʕ the-mother followed-her the-boy to-the-street 'As for the mother, the boy followed her to the street.' lbaʔara lħmaːr d ˤarabae barra the-cow the-donkey hit-her outside 'As for the cow, the donkey hit her outside.' lʔaeb eddoktɔːr faeħasˤɔ əssɒbəħ the-father the-doctor examined-him the-morning 'As for the father, the doctor examined him in the morning.' Question (k=6) miːn əlli ʃaefətɔ lbənt who that saw-3SG.F-him the-girl bəlmaektaebe mbae:reħ in-the-library yesterday? 'Who did the girl see [him] in the library yesterday?' Appendix 2. Structural accuracy scoring For the structural accuracy scoring, a participant's response was scored as 1 if the child repeated the target syntactic structure, regardless of other changes to the sentence. Otherwise, the response was coded as 0. The criteria for a sentence to be considered structurally accurate depended on the target syntactic structure (Marinis & Armon-Lotem, 2015). For example, for a sentence with a subordinate clause to be considered preserved, participants' responses had to contain two clauses, one main and one subordinated, and a subordinator.
The structural accuracy scoring was carried out simultaneously with the verbatim scoring. Therefore, the same team of scorers as described in the main text completed both types of scoring. For the Canadian sample, full reliability of structural accuracy scoring was ensured by having every sentence scored independently twice by a team of trained scorers and all disagreements were settled by group discussion. In the Netherlands, twenty-five percent of the Litmus SRT was independently scored a second time. The α coefficients of inter-rater reliability for structural accuracy for this site were .93 for Dutch and .88 for Arabic.
As mentioned in the text, verbatim and structural accuracy scores were strongly correlated, as shown in Table A4 (a more complete version of Table 3).