Receptive and expressive vocabulary development in children learning English as an additional language: Converging evidence from multiple datasets

Children learning English as an additional language (EAL) are a diverse and growing group of pupils in England ’ s schools. Relative to their monolingual (ML) peers, these children tend to show lower receptive and expressive vocabulary knowledge in English, although interpretation of findings is limited by small and heterogeneous samples. In an effort to increase representativeness and power, the present study combined published and unpublished datasets from six cross-sectional and four longitudinal studies investigating the vocabulary development of 434 EAL learners and 342 ML peers (age range: 4;9 – 11;5) in 42 primary schools. Multilevel modelling confirmed previous findings of significantly lower English vocabulary scores of EAL learners and some degree of convergence in receptive but not expressive knowledge by the end of primary school. Evidence for narrowing of the gap in receptive knowledge was found only in datasets spanning a longer developmental period, hinting at the protracted nature of this convergence.


Introduction
A large and diverse vocabulary allows children to understand and interact with peers, caregivers, and other members of their community and to access school curricula. Notwithstanding child-internal factors such as verbal and nonverbal intelligence, rate of vocabulary acquisition is determined to a significant degree by the language learning environment. In particular, those children acquiring a second language (L2) in addition to a minority status first language (L1) may face particular challenges in their L2 vocabulary knowledge due to differing patterns of linguistic exposure. The present study focuses on the growing but understudied population of children learning English as an additional language (EAL) in England. Specifically, we investigate the developmental trajectories of English receptive and expressive vocabulary knowledge in EAL learners and their monolingual peers in primary school by aggregating data from a number of published and unpublished sources, resulting in a larger sample size and longer developmental window than offered by previous studies.

Vocabulary development in bilingual learners
Vocabulary development is a quantitatively and qualitatively different process in bilingualism. We use the term 'bilingual' and 'bilingualism' here to refer to the simultaneous or sequential acquisition of two or more languages. For bilingual children, a concept and its lexical label may exist in one language but not another, and the depth of this knowledge may also differ across languages such that a child may recognise a label (receptive knowledge) but not be able to produce it (expressive knowledge). Therefore, while conceptual vocabulary is language-independent, total vocabulary size (the sum of lexical labels across both languages) is dependent on patterns of linguistic input in each language. Due to this distributed nature of vocabulary knowledge across languages, mono-and bilingual infants and toddlers are shown to possess similar total vocabulary sizes (Core, Hoff, Rumiche & Señor, 2013;Pearson, Fernández & Oller, 1993) and as a result, any statements concerning the lower vocabulary knowledge of bilingual learners in relation to their monolingual counterparts must take account of the language of assessment.
Bilingual development is typically characterised by a great deal of heterogeneity due to varying sociocultural and educational demands. Much of the international bilingualism literature describes the language development of language minority learners in countries in which the home language is not officially recognised or widely spoken. In such cases, contact with family and members of the community may represent a child's primary sources of input in the home language, while input in the second or additional language is received primarily through the media, school instruction, and ultimately wider society (Scheele, Leseman & Mayo, 2010). Importantly, such heterogeneity poses challenges for the synthesis of individual studies of bilingual development.
Broadly speaking, the same language environment factors that impact vocabulary development in ML children also apply to bilingual children, with studies reporting the significant role of amount of exposure to the L2 in determining L2 vocabulary size (Collins, O'Connor, Suárez-Orozco, Nieto-Castañon & Toppelberg, 2014;Hoff & Ribot, 2017). Note also that L2 input and socioeconomic status (SES) may interact, and that language minority learners are often found in neighbourhoods of relatively high social deprivation (Scheele et al., 2010;Strand, Malmberg & Hall, 2015). In a 3-year longitudinal study, Paradis and Jia (2017) followed a sample of Chinese first language (L1) English language learners (ELLs) in Canada between the ages of 8 and 10. Language environment variables sourced from parental questionnaires explained more variance in children's language development than nonverbal intelligence or memory: specifically, children with longer length of exposure to English and a higher 'richness' of English exposure performed significantly higher on measures of receptive and expressive vocabulary. It is also interesting to note that children's exposure to and use of English increased throughout the period of the study, illustrating the dynamic aspect of bilingual vocabulary development.
A number of studies in the international literature explicitly seek to compare the vocabulary knowledge and development of bilingual children relative to their ML peers. In an attempt to compare receptive vocabulary trajectories across a wide age range, Bialystok, Peets, Yang and Luk (2010) aggregated data for 772 ML and 966 bilingual 3 to 10 year-olds who had taken part in a number of different studies in Canada over a period of five years. Although both groups scored within the average range, results indicated consistently higher age-standardised scores of ML participants on the Peabody Picture Vocabulary Test (Dunn & Dunn, 1997). An interaction between language group and age failed to reach significance, suggesting fairly stable group differences in receptive vocabulary knowledge between the ages of 3 and 10 years.
Despite robust evidence of early ML group advantages in vocabulary knowledge, there is some disagreement in the literature concerning the extent to which bilingual learners 'close the gap' over time in their L2 vocabulary knowledge relative to their ML peers. For instance, although Limbird, Maluch, Rjosk, Stanat and Merkens (2014) found a significant initial advantage of 7-8 year-old German-speaking ML children relative to their Turkish-German bilingual peers in expressive vocabulary, both groups were found to make the same rate of progress over a period of 18 months, serving to maintain the performance gap. Similarly, among a younger sample of 5 to 6 year-old Norwegian ML and Urdu/Punjabi bilingual children in Norway, Karlsen, Lyster and Lervåg (2017) report significant ML advantages in receptive vocabulary breadth and expressive vocabulary depth (definitions), but parallel growth of the two groups over time. On the other hand, evidence of 'catching up' is also found in the literature. For instance, in a 2-year longitudinal study of 6 to 9 year-old Greek ML and Albanian-Greek bilingual children, Simos, Sideridis, Mouzaki, Chatzidaki and Tzevelekou (2014) found significantly faster growth of the bilingual group on a Greek-language version of the PPVT after controlling for parental education and gender. Farnia and Geva's (2011) study of 91 ELLs and 50 of their ML peers between Grades 1 to 6 found that despite their initially lower levels of receptive vocabulary (measured using the PPVT), ELLs did make faster progress particularly in the early grades, which served to significantly reduce the group discrepancy in receptive vocabulary knowledge. Finally, the results of Paradis and Jia (2017) also provide evidence of closing the gap, as Chinese-English bilingual children in that study converged with monolingual norms on measures of English receptive vocabulary knowledge between age 8 and 9; however, the same convergence was not found for expressive vocabulary. While the extent to which bilingual learners close the gap may be equivocal, one moderating factor may be methodological; that is, with the exception of Bialystok et al. (2010), studies of longer duration (e.g., two years or more) appear more likely to report convergence over time between the two groups of children, suggesting that such convergence plays out gradually.

Learning English as an additional language in England
In England, the term English as an additional language (EAL) is used to describe a child who is exposed to a language other than English during early development and continues to be exposed to that language in the home or community (Department for Education and Skills, 2007). Thus, it is a term which defines a large and heterogeneous group of children with a variety of language learning experiences (it is estimated that EAL learners in Vocabulary in children learning English as an additional language England speak upwards of 300 distinct languages; Centre for Information on Language Teaching and Research, 2005). The term 'EAL' will be used henceforth to refer specifically to this population of children in England.
England has seen a steady increase in the proportion of EAL learners in recent years, from under 1 in 10 pupils at the turn of the century to approximately 1 in 5 in 2020 (Department for Education, 2020). As EAL learners are educated alongside their ML peers in mainstream classrooms, there is an expectation that they will acquire English language proficiency through classroom teaching and peer interaction (Costley, 2014); however, the efficacy of this strategy may be called into question given enduring group discrepancies between EAL and ML learners in reading and writing performance on high-stakes statutory examinations taken between ages 5 and 11 (Strand et al., 2015). While factors such as neighbourhood deprivation, special educational needs, and eligibility for free school meals (FSM) have relatively similar negative effects on achievement for all pupils, other factors such as joining school after Key Stage 1 (age 6-7) and changing school before the end of KS2 are relatively more negative for EAL learners (Strand et al., 2015). One other particularly important risk factor for low educational attainment is English language proficiency. Although no longer a statutory requirement, in 2018 the Department for Education in England introduced a five-stage proficiency in English (PiE) assessment for EAL learners ranging from A (new to English) to E (fully fluent). Research indicates that not only are PiE ratings significantly predictive of future attainment (explaining 18-27% of variance over and above background variables such as FSM status), but that EAL pupils in the 'competent' and 'fluent' categories on average achieve HIGHER academic grades than their ML peers (Demie, 2018;Hessel & Strand, 2021). In summary, EAL learners in England are fully capable of achieving at the same level or higher than their ML peers in national examinations, though may face particular challenges in terms of school mobility and English language proficiency. While generalisability of EAL children as a whole is problematic due to considerable heterogeneity in this population, it is important to identify the skills which predict academic success in order to provide tailored and effective support (Murphy, 2021). L2 vocabulary knowledge is one such skill.
Vocabulary knowledge is a key element of oral language proficiency and has been shown to be linked to educational achievement (e.g., Spencer et al., 2017). Specifically, vocabulary is known to be a strong and unique predictor of reading comprehension (Landi & Ryherd, 2017), and children classified as 'poor comprehenders' have been shown to achieve below expected standards in reading and writing examinations at age 11 and beyond (Ricketts, Sperring & Nation, 2014). Consistent with international work, a small number of studies in England report lower English vocabulary knowledge of EAL learners relative to their ML peers throughout the period of compulsory education (age 5 to 16) (Babayiğit, 2014;Burgoyne, Whiteley & Hutchinson, 2011;Cameron, 2002;Hutchinson, Whiteley, Smith & Connors, 2003;Mahon & Crutchley, 2006). Although longitudinal work on EAL development in England is rare, extant studies tend to report similar developmental trajectories such that early ML advantages are maintained over time. For instance, Hutchinson et al. (2003) followed 86 children across primary school years 2 to 4 (ages 6 to 8) who were assessed on a battery of language and literacy skills including the Test of Word Knowledge (Wiig & Secord, 1992). Results revealed significant and enduring weaknesses of EAL learners, amounting to a 2-year vocabulary 'developmental lag' on average relative to their ML peers. In terms of trajectories, both groups made a very similar rate of progress in receptive and expressive vocabulary in this study. In another longitudinal study, Burgoyne et al. (2011) followed 78 children between years 3 and 4 (age 7 to 9 years), assessing vocabulary knowledge with the Receptive and Expressive One-Word Picture Vocabulary Test (Brownell, 2000). Contrary to the results of Hutchinson et al. (2003), the two groups in this study showed trends for convergence in expressive vocabulary over time, but divergence in receptive vocabulary. Finally, it is noteworthy that relatively lower levels of English vocabulary knowledge have also been attested in secondary school EAL learners with an average of 10 years of English-medium schooling (Cameron, 2002), hinting at the enduring nature of this discrepancy.

The present study
The amount of time required to attain proficiency in a second or additional language is a function of the linguistic skill in question. Specifically, surface-level fluency necessary for day-to-day communication is typically acquired in around two years or less, whereas a deeper level of proficiency necessary for accessing the most challenging parts of school curricula is acquired in a longer time span of four to seven years (Cummins, 1981;Demie, 2013). Vocabulary knowledge plays an important role in this development as pupils are met with increasingly challenging texts over their educational careers. Previous research on the vocabulary developmental trajectories of EAL learners in England is lacking relative to the evidence base available for other bilingual populations such as in the U.S and Canada. Existing studies in England are characterised by fairly small sample sizes and the utilisation of a diverse range of vocabulary measures, making comparison across studies difficult. Similar to the methodology of Bialystok et al. (2010), the present study combines a number of different datasets on EAL learners in England using the same measures to examine receptive and expressive vocabulary growth over time relative to ML peers. Such data aggregation affords major advantages in an increased sample size and an elongated developmental window. As discussed in Data Analysis below, we employ a multilevel modelling framework to analyse these data in order to account for child-and school-level variance in an effort to derive more robust estimates of growth. Our research questions were the following: (1) To what extent do EAL learners and their ML peers differ in English receptive and expressive vocabulary knowledge during the primary school period of education? (2) To what extent do discrepancies in English receptive and expressive vocabulary knowledge between the two groups converge over time?

Measures
The BPVS-III (Dunn, Dunn & NFER, 2009) is a standardised, norm-referenced assessment of receptive vocabulary breadth in which examinees are presented with four pictures and are asked to identify the picture that fits the target word spoken by the test administrator. BPVS-III norms are based on a nationally representative sample of 3,278 individuals in the UK, of whom 1.39% are EAL learners. As described below, datasets in this study administered both the BPVS-III and the BPVS-II. Both measures are administered in the same way, and both consist of 168 items, of which 54 (32%) are shared. Reliability information is reported in the BPVS manual.

Vocabulary in children learning English as an additional language
The Clinical Evaluation of Language Fundamentals IV (CELF-IV; Semel, Wiig & Secord, 2006) is a standardised, norm-referenced measure of expressive vocabulary knowledge. Examinees are presented with a series of colour illustrations, including objects, people, and actions, and are asked 'what is this?'. A correct response receives a score of 2, although partial credit may also be awarded (e.g., for 'instrument' instead of 'saxophone'). The CELF-IV is normed on a nationally representative sample of 871 individuals in the UK, of whom 12.4% are EAL learners. Split-half reliability for the expressive vocabulary subtest is .83 across all age groups.

Datasets
Data were sourced from six doctoral theses (Dixon, 2018;Hessel, 2018;Nielsen, 2016;Oxley, 2019;Smith, 2019;Wesierska, 2018), one master's dissertation (Ajjour, 2019), and one undergraduate dissertation (Wilson, 2020) comprising four longitudinal and six cross-sectional studies carried out between 2013 and 2020 (summarised in Table 1). Studies are indexed by 'C' for cross-sectional and 'L' for longitudinal. The final combined dataset represents a convenience sample of EAL researchers based at UK universities who agreed to share their anonymised data. Complete data were available for a total of 776 participants, including 434 EAL learners and 342 ML peers between the ages of 4;9 and 11;5 recruited for the most part from the same schools. Although studies targeted children in different locations and school years, all recruited EAL learners and their ML peers (boys and girls) with the explicit aim of comparing the language and vocabulary knowledge and development of the two groups (with the exception of study C5 which only recruited EAL learners). Data in the combined dataset were collected from 42 different schools in 11 different Local Authority government districts in 5 geographical regions of England (Yorkshire and the Humber, n=34; South East, n=4, North West, n=1, London, n=1, East of England, n=1). Note that Yorkshire and the Humber represented six different Local Authority districts in the data discussed here. One school was shared by studies L1 and L4, and one school was shared by studies C2 and C6.
All studies recruited children characterised as 'typically developing', stipulating that participants should not have any statement of Special Educational Needs (SEN). Study C6 additionally required participants to score no lower than -1SD on the Test of Word Reading Efficiency-2 (Torgesen, Wagner & Rashotte, 1999), and study L1 re-recruited participants from a previous language intervention study, who purposely possessed lower levels of English language proficiency than their same-age ML peers, but no statement of SEN. Six studies further stipulated that participants have a minimum of 1 year of education or residence in the UK. Minimum amount of schooling experience was not applicable for studies C1 and L1 as children were initially assessed at or followed from the very early stages of education (e.g., Reception or Year 1; age 4 or 5).
All ten studies administered custom researcher-designed questionnaires completed by children, their parents, or both to gain information concerning exposure to and use of English and the L1 outside of school. Unfortunately, language questionnaire data from studies C6 and L1 were not available and complete aggregation of language questionnaire data from the remaining studies was not possible due to differences in question wording and response options. Despite this, we were able to combine questionnaire data where questions and response options overlapped (see Table 2). Studies C4, C5, L2, and L3 all administered the same child-administered language background questionnaire, which asked EAL learners to rate the extent to which they spoke English in the home (never, sometimes, most of the time, all of the time). Additionally, study L4 administered a parent questionnaire with similar response options to this question: for the purposes of aggregation, response categories 'rarely' and 'sometimes' in this questionnaire were combined into 'sometimes'. Language questionnaires in studies C1, C3, and L4 used a 3-point response scale, with either children or parents being asked to indicate what was most commonly spoken by the child in the home: English, the home language, or both languages. Finally, dataset C2 scored language use on a scale from 9 to 22 where a lower score indicates more frequent use of English in the home (M=13.6, SD=3.8). Three studies also asked whether children had been born in the UK, with proportions as follows: C3: 57%, C6: 92%, and L4: 92%.
In summary, aggregation of questionnaire data indicated that EAL learners in the sample were indeed being exposed to and using the L1 in the home sometimes or most of the time. Relatively few responses indicated that the L1 was 'never' spoken in the home. Within the combined dataset, EAL learners spoke a total of 143 different languages, with the most common being Punjabi (14.2%), Polish (12.3%), Urdu (12.3%), Arabic (11.1%), and Slovak (8%). This distribution was very similar across cross-sectional and longitudinal datasets and is broadly consistent with national trends for the most common languages (other than English) spoken in England and Wales (Office for National Statistics, 2013).
At the point of data collection in each study, the mean proportion of pupils eligible for FSM (an indicator of SES) among the 42 schools in the pooled dataset was 24.6% (SD = 12.3%; min = 5.3%, max = 58.4%) and therefore above the national average for primary schools in England (fluctuating between 18.1% and 15.7% in the period 2013 to 2020; Department for Education, 2020). Schools also varied as to the proportion of pupils classified as EAL (mean = 45.4 %; SD = 28.5%; min = 3%; max = 91.9%): given each study's focus on EAL development, this was somewhat higher than the national average for England (ranging from 18.1% in 2013 to 21.3% in 2020; DfE, 2020). Availability of only school-level SES data meant that we were not able to compare EAL and ML groups directly on SES. However, this was mitigated in two ways. Firstly, 66.6% of participants in cross-sectional datasets and 92.5% of participants in longitudinal datasets were recruited from the same schools and therefore from the same sociodemographic catchment areas. Secondly, we included school as a random effect in our statistical models (discussed in Data Analysis) in order to account for variance in vocabulary growth attributable to school-level factors such as differing FSM eligibility. Age-standardised scores for each dataset (presented in Supplementary Material) indicated that in some cases, ML comparison groups scored consistently below average for the norming population (particularly in longitudinal datasets). Therefore, this will be taken into account in the interpretation of findings.
Combined data for receptive vocabulary Cross-sectional data for receptive vocabulary (studies C1 to C6) comprised 176 EAL and 120 ML children from nine schools (age range: 59-130 months; mean = 97.02 [8;1], SD = 18.7). EAL and ML groups did not differ significantly in age (t(292) = 0.24, p = .809). Note that studies C2 and C6 employed the second edition of the BPVS, and 2 ML children in dataset C1 had missing data for age in months and were excluded. Longitudinal data for receptive vocabulary (studies L1 to L4) comprised 227 EAL and 188 ML children from 27 schools (age range: 57-137 months; mean = 98.95 [8;3], SD = 17.5). EAL and ML groups did not differ significantly in age at the first time point assessed (t(308) = 0.23, p = .819). Of the 415 children in longitudinal datasets, 220 (53.0%) were assessed at two time points, 186 (44.8%) at three time points, and due to attrition, 9 children (2.2%) were assessed at one time point only. All available data were utilised in analyses, including those from participants with incomplete data (linear mixed modelling does not resort to listwise deletion in the face of missing data). All longitudinal studies utilised the third edition of the BPVS.
Longitudinal data for expressive vocabulary (studies L1, L2, and L4) comprised 167 EAL and 138 ML children from 24 schools (age range: 57-125 months; mean = 96.3 [8;11], SD = 17.1). The ages of the two groups were once again similarly distributed at the first time point of assessment (t(198) = -0.56, p = .575). Of the 305 children in the three longitudinal datasets, 110 (36.0%) were assessed at two time points, 186 (61.0%) at three time points, and 9 (3.0%) at one time point only due to attrition.

Data analysis
The aim of the study was to compare the receptive and expressive vocabulary development of EAL learners and their ML peers. The hierarchical structure of the data justified the use of multilevel modelling to account for the clustering of participants in different schools. Multilevel models allow the inclusion of both fixed and random effects, with fixed effects being usual predictor variables, and random effects accounting for non-independence within data, as is the case when participants are nested within schools (Gelman & Hill, 2007).

Vocabulary in children learning English as an additional language
To test our hypotheses on how language groups differed in their vocabulary knowledge and how these differences may change over time, we entered main effects and interaction of age and language group into models to predict receptive and expressive vocabulary, respectively (BPVS and CELF raw scores). Language group was a two-level factor which was effect-coded (ML = 1; EAL = -1), while age (in months) was a continuous variable that was scaled and centred before being entered into the model. The same fixed effects structure was used in four separate models that were calculated for the cross-sectional and longitudinal datasets described above.
To account for the nested structure of the data, we checked which random effects to include: by testing which effects would be identifiable based on the number and distribution of observations in the data. Regarding random intercepts, schools proved to be the only grouping variable with a sufficient number of levels (> 5; Harrison, 2015) and balanced distribution across fixed effects grouping factors, with there being too few levels in our city variable (only 4) and too few levels per BPVS version (only 2) to allow inclusion as random effects. Lack of item-level data from the BPVS and CELF precluded a random intercept term for item. Regarding random slopes, we had sufficient observations in longitudinal datasets across language and age groups in all schools to include random slopes for both age (> 3 unique observations in each school) and language group (> 2 observations per language group in each school) within schools, but not for their interaction (as in some schools, only children from one language group had been tested within a certain age range). The resulting random effects structure thus included intercepts for schools (in cross-sectional datasets), and additionally slopes for language group and age by school (in longitudinal datasets). Model syntax is provided for each model in results tables below. In cross-sectional models predicting receptive vocabulary, we additionally added BPVS version as a fixed effect to control for the use of both BPVS-II and BPVS-III.
All models were fit using the lme4 package (version 1.1-23; Bates, Maechler, Bolker & Walker, 2015) in R (version 4.0.2; R Core Team, 2019) using restricted maximum likelihood estimation. Significance of the fixed effects was tested through Type II model comparisons using the Anova function of the car package (version 3.0-8; Fox, Friendly & Weisberg, 2013). Model fit was assessed with reference to marginal and conditional pseudo R 2 computed with the MuMIn package (version 1.43.1; Bartón, 2019). Effect size estimates in descriptive statistics tables were computed with the effsize package (version 0.8.0; Torchiano, 2020). Finally, figures were created using the ggplot2 package (version 3.3.2; Wickham, 2016).

Results
Tables 3 and 4 present descriptive statistics for the cross-sectional and longitudinal datasets, respectively. Both tables provide effect size in terms of Hedges' g and its 95% confidence interval (in order to correct for unequal group sizes). Effect sizes for group differences in longitudinal datasets are calculated for each time point separately. Table 5 presents the model for receptive vocabulary development in cross-sectional datasets. This model represented good fit to the data (R 2 m = .25; R 2 c = .56), and residuals were approximately normally distributed with no evidence of non-constant variance.

Receptive vocabulary development
Model fixed effects indicated significantly higher BPVS raw scores with increasing age in months (b age = 10.91, p < .001), and significantly lower performance of EAL learners relative to ML peers (b group = -5.90, p < .001). The interaction of age and group did not achieve significance, however (b group*age = 0.33, p = .763), suggesting no evidence for convergence over time. Table 6 presents the model for receptive vocabulary development in longitudinal datasets, with a random intercept term for participant and a random slope for age and group within school. Given the dependency in these datasets, the model represented a relatively better fit (R 2 m = .57; R 2 c = .93). Again, residuals were approximately normally distributed with no evidence of non-constant variance. A somewhat similar pattern to Model 1 appeared in fixed effects, with BPVS raw scores significantly increasing with age in months (b age = 16.14, p < .001), and EAL learners performing significantly below their ML peers (b group = -6.47, p < .001). However, the model also indicated a statistically significant interaction between age and group (b group*age = 2.01, p < .001), this time suggesting evidence of convergence.   Table 7 presents the model for expressive vocabulary development in cross-sectional datasets. The model again represented good fit to the data (R 2 m = .39; R 2 c = .50), and residuals were approximately normally distributed with no evidence of non-constant variance. Similar to Model 1 for BPVS cross-sectional data, model fixed effects indicated significantly higher CELF raw scores with increasing age in months (b age = 11.54, p < .001), and significantly lower performance of EAL learners relative to ML peers (b group = -4.45, p < .001). The interaction of age and group did not reach significance (b group*age = 1.35, p = .386), again providing no evidence for convergence in vocabulary knowledge.  However, as indicated in the Method, cross-sectional data for expressive vocabulary showed a restricted age range relative to data for receptive vocabulary. Finally, longitudinal data modelling for expressive vocabulary is presented in Table 8. Random effects included intercepts for participant and school, as well as a random slope for age within school. This model represented good fit to the data (R 2 m = .48; R 2 c = .88) and assumptions were met. Fixed effects indicated significantly higher CELF raw scores associated with increasing age in months (b age = 6.35, p < .001), and significantly lower performance of EAL learners relative to ML peers (b group = -2.74, p < .001). In contrast to modelling of receptive vocabulary longitudinal data, this model did not provide evidence for convergence between the two groups over time in expressive vocabulary (b group*age = 0.52, p = .090).

Discussion
Children learning EAL in England face the challenge of accessing classroom teaching and high-stakes national assessments of reading and writing in a language which they may not have mastered by the end of primary school, placing them at risk of educational underachievement. Despite a growing EAL population in English schools, there exists little empirical research on the language trajectories of these children relative to their ML peers. The present study represents a unique contribution to the EAL research literature  in England through its aggregation of several datasets, both published and unpublished, from a sample of researchers in English universities. The aggregation of these studies is justified due to their similar focus, recruitment of participants from state-maintained primary schools, and administration of the same standardised assessments of vocabulary. Our first research question asked to what extent EAL learners and their ML peers differ in English receptive and expressive vocabulary knowledge during the primary school period of education. Multilevel models accounting for clustering of children within schools revealed similar trends with respect to the extant EAL literature in England (Babayiğit, 2014;Burgoyne et al., 2011;Hutchinson et al., 2003;Mahon & Crutchley, 2006). Specifically, we found significantly lower performance of EAL learners relative to ML peers across the age ranges studied. Aggregated effect sizes from cross-sectional datasets were moderate in magnitude for receptive and expressive vocabulary; similar to effect sizes for individual time points of longitudinal studies, these were broadly within the range reported in the EAL literature.
The aggregation of datasets in the current study allowed insight into the English vocabulary growth of EAL learners and their ML peers which was unprecedented in terms of sample size and developmental span. Our second research question asked to what extent the developmental trajectories of the two groups differ and whether EAL learners 'close the gap' over time. Evidence for an interaction of group and time was found only in the longitudinal modelling of receptive vocabulary knowledge, with none of the three remaining models providing any evidence for convergence or divergence between the groups over time. This pattern of results bears similarity to Paradis and Jia (2017) who also found no evidence of convergence in expressive vocabulary. However, this null effect for expressive vocabulary in the present study may have been partly attributable due to a ceiling effect in the CELF-IV Expressive Vocabulary subtest, containing only 27 items.
Evidence for convergence in English receptive vocabulary knowledge should be interpreted with caution, however: as indicated by study L3 with the oldest sample of children, EAL learners were still not scoring on a par with their ML peers around the transition to secondary school at age 11 (equivalent age-standardised scores on the BPVS: EAL: 80; ML: 94; see Supplementary Material), and thus the data do not support a complete closing of the gap in this domain. Nevertheless, the question remains as to why this pattern of convergence was found only in longitudinal datasets. Firstly, this may have been due to the ability of longitudinal data to account for within-subject change (i.e., avoiding cohort effects found in cross-sectional studies), or enhanced statistical power due to repeated measurements. Secondly, as discussed in the introduction, evidence of closing of the gap between bilingual and ML learners tends to be observed in longitudinal studies of longer duration, suggesting that convergence between the groups over time is rather gradual and plays out over a number of years. To some extent this applies to the current study, as the age range in the four longitudinal datasets for receptive vocabulary (4;9 to 11;5, spanning 80 months) was relatively extended compared to that of cross-sectional datasets (4;11 to 10;10, spanning 71 months). Similarly, age ranges were relatively more restricted for cross-sectional (6;4 to 8;5) and longitudinal (4;9 to 10;5) studies of expressive vocabulary, spanning only 25 and 68 months, respectively. Therefore, a relatively restricted age range may not have allowed us a sufficiently long developmental window in which to observe EAL learners catching up to their ML peers in cross-sectional and expressive vocabulary datasets. The possibility also remains that the groups do not converge at all in English expressive vocabulary knowledge during this period.
We turn next to some possible explanatory factors in the developmental trajectories found in this study. To some extent, differing trajectories may be accounted for by the nature and task demands of different vocabulary tests. Particularly, expressive vocabulary measures tap more deeply than receptive measures into depth of word knowledge, and rely on greater encoding speed, mental organisation, and phonological retrieval (Ouellette, 2006). There is some evidence to suggest difficulties of bilingual children in vocabulary depth tasks, and therefore this may be a contributory factor (Booton, Hodgkiss, Mathers & Murphy, 2021;Droop & Verhoeven, 2003;Hessel & Murphy, 2019;Lervåg & Aukrust, 2010; but see Dixon, Thomson &Fricke, 2020 andVermeer, 2001 for contradictory findings). The possession of two lexical labels across languages for the same concept (i.e., 'doublet' vocabulary) may result in competition between phonological representations for selection, thus increasing the cognitive demands of expressive vocabulary tasks. Indeed, studies have found that expressive vocabulary tests also pose particular challenges for bilingual children. Some work has taken the perspective of lexical access, arguing that bilingual children's lower expressive vocabulary knowledge in English is likely a result of retrieval difficulties as opposed to a smaller stock of word knowledge (Yan & Nicoladis, 2009). However, as reported in Gross, Buac and Kaushanskaya (2014), even when Spanish-English bilingual children are awarded credit for vocabulary knowledge in either language, they nevertheless possess smaller conceptual expressive vocabularies overall than their ML peers. This latter finding suggests that the expressive gap may not be as straightforward as competing phonological representations.
A second explanatory factor in the pattern of vocabulary development found here may concern shifting patterns of language exposure over time. In an ethnographic study of three bilingual families in England, Parke and Drury (2001) discuss the perception of an explicit 'change' from the L1 to English at the onset of formal schooling, with children suddenly finding themselves immersed in an English-language environment. Interestingly, there is emerging evidence that as well as exposure to English, bilingual children's own use of English is positively associated with growth in vocabulary knowledge (Paradis & Jia, 2017;Ribot, Hoff & Burridge, 2018). Indeed, in line with a usage-based approach of language acquisition (i.e., a learner's lexicon develops in accordance with the input they receive; Tomasello, 2003), EAL children's English vocabulary knowledge would continue to develop as their time spent in Englishspeaking schools accumulates.
Therefore, while increased exposure to and use of English may account for a general tightening of the vocabulary gap between EAL learners and their ML peers over time, the differing nature and task demands of vocabulary tests may result in relatively faster convergence in receptive than expressive vocabulary, such as the pattern found in the present study.
Limitations and future directions Despite its large sample size, wide developmental window, and robust statistical modelling strategy, the present study was not able to take certain moderating factors into account. In particular, lack of child-level data on language learning background and SES (in the form of eligibility for FSM) precluded us from including these variables in our models. Nevertheless, such data were available in aggregated format and indicated that average eligibility for FSM was higher than the national average in six of the ten studies.

Vocabulary in children learning English as an additional language
This is likely to have had an effect on the vocabulary knowledge and growth of participants; as indicated by age-standardised scores in Supplementary Material, ML children tended to score within the average or low-average range particularly in terms of receptive vocabulary. Where available, data from home language questionnaires also provided important contextual information regarding the individual samples studied here. While there was significant heterogeneity of the sample in terms of home language spoken, the majority of participants were using and being exposed to English in the home and at school. Conclusions concerning patterns of language exposure among the EAL learners studied here are somewhat limited: while missing data were generally rare, some longitudinal studies did suffer from attrition, and parental questionnaire response rates were typically low. This may have introduced a source of bias into the results, though lack of child-level data precluded further analysis of this. The challenge of aggregating these questionnaire data raises the need for the adoption of a common language exposure questionnaire in future research on this population for greater comparability. Finally, although 143 distinct languages were spoken by participants in the aggregated dataset, it was not possible to analyse vocabulary growth by language group. While some analyses have not found differences in this respect (e.g., between Asian an non-Asian L1s in Bialystok et al., 2010), others do suggest that speakers of particular languages may face challenges in English vocabulary development during primary school (Strand et al., 2015), and this remains a question for future research examining vocabulary growth among EAL learners.
The findings of the present study may be extended and refined by future work. Firstly, although longitudinal modelling revealed evidence for group convergence over time in receptive vocabulary knowledge, it is unknown to what extent such convergence may continue beyond the primary school phase (age 11þ). Recent longitudinal work with ML students in England indicates high stability in vocabulary growth during adolescence (Ricketts, Lervåg, Dawson, Taylor & Hulme, 2020), though with the corollary that students with poor vocabulary are likely to maintain their ranking over time. We are aware of only one study reporting low English vocabulary scores among adolescent EAL learners in England (Cameron, 2002), though this was not compared with a ML comparison group. Secondly, although the BPVS and CELF measures are advantageous for their standardisation and reliability, such single-word vocabulary measures are unable to capture the full extent to which a word is known. Other recent work on EAL learners has examined depth of lexical knowledge through the use of verbal definitions (Dixon et al., 2020), idiomatic/figurative language (Hessel & Murphy, 2019), and polyseme knowledge (Booton et al., 2021). This research suggests that EAL children additionally experience challenges in these aspects of vocabulary, although this domain is little researched among this population. A distinction exists in the vocabulary intervention literature concerning 'shallow but wide' versus 'deep but narrow' approaches to word knowledge instruction (Bowers & Kirby, 2010); further insight into EAL children's developmental trajectories in breadth as well as depth of vocabulary knowledge would serve to inform instructional decisions as to where resources are most effectively placed in order to close gaps with ML peers.
In conclusion, we believe this study to be the largest and longest-spanning investigation of EAL learners' vocabulary development in England to date. In line with previous research, our aggregated data suggest that EAL learners have a significantly smaller English vocabulary than their ML peers throughout primary school. Our results provide partial support for a closing of this vocabulary gap in receptive but not expressive vocabulary knowledge in English, though analyses for expressive knowledge were subject to a relatively limited age range. Our study also identifies opportunities for future work, particularly in the adoption of a common language exposure questionnaire, the examination of depth of word knowledge and idiomatic/figurative language, and the development and reporting of vocabulary knowledge norms for EAL learners in order to promote the early identification of language difficulties.