Theoretical and Empirical Background
According to the latest statistics published by the United Nations, in 2019, 3.5% of the global population, i.e., 272 million people, were international migrants (UN DESA, 2019). Many of these migrants have settled in a country where their first language (L1) is not the majority language and are therefore in the process of acquiring a second language in immersion. This process and its results in terms of the development of language proficiency are mediated by a variety of factors, including individual and contextual differences. In the following, we briefly review four factors: language learning aptitude, age of onset, exposure to the majority language, and anxiety.
Foreign Language Learning Aptitude and Second Language Proficiency Development
Foreign language aptitude has been found to be an important predictive factor of success in developing language proficiency since Carroll's pioneer work in the 1960s on the abilities needed to learn foreign languages in short and intense foreign language training courses (Carroll, Reference Carroll and Glaser1962, Reference Carroll1973). His research resulted in the definition of foreign language aptitude as consisting of four dimensions: phonetic coding ability (the ability to discriminate and identify new language sounds), grammatical sensitivity (the awareness of the grammatical function of the different elements constituting a sentence), inductive ability (the ability to identify grammatical/meaning patterns in an unknown language sample), and rote memory (the ability to learn a large number of items in a short time). Since this definition, foreign language aptitude has demonstrated its predictive power in a number of studies aiming to investigate inter-individual differences in foreign language learning in school contexts (among others Doughty et al., Reference Doughty, Campbell, Mislevy, Bunting, Bowles and Koeth2010; Erlam, Reference Erlam2005; Kiss & Nikolov, Reference Kiss and Nikolov2005; see also Li, Reference Li2015 for a meta-analysis).
The importance of Carroll's four dimensions of aptitude has nevertheless been critiqued both theoretically and empirically over the years. Skehan (Reference Skehan and Robinson2002), for instance, proposed a model of aptitude using three dimensions (auditory processing, language analysis, and memory), while Robinson (Reference Robinson2005, Reference Robinson and DeKeyser2007, Reference Robinson and Pawlak2012) introduced the notion of aptitude complexes, or sets of abilities activated differently depending on the stage of acquisition the learner is in.
Another critique is directed at Carroll's definition of (rote) memory as the “ability to store information passively” (Erlam, Reference Erlam2005, p. 149). This conceptualization of memory as static and passive has been challenged since Baddeley & Hitch (Reference Baddeley, Hitch and Bower1974) defined working memory as consisting of three components (the central executive, the phonological loop, and the visuospatial sketchpad) that not only store information but also process it in real-time, and serve as a gateway to long term memory. Several recent studies have found working memory to “exercise consistent and distinctive influences on various aspects of L2 acquisition and processing” (Wen et al., Reference Wen, Biedroń and Skehan2016, p. 19). Given the range of findings, the importance of working memory deserves more empirical testing (DeKeyser, Reference DeKeyser, Wen, Skehan, Biedroń and Sparks2019; Li, Reference Li2015; Sáfár & Kormos, Reference Sáfár and Kormos2008; Singleton, Reference Singleton2014; Wen, Reference Wen, Wen, Skehan, Biedroń and Sparks2019; Wen et al., Reference Wen, Biedroń and Skehan2016; Wen & Skehan, Reference Wen and Skehan2011).
Another issue concerning aptitude is its role in naturalistic settings. Krashen (Reference Krashen1981), for instance, claimed that aptitude (or at least its analytic language ability component) is not a relevant factor in naturalistic second language acquisition settings (i.e., in an immigration context) where other factors might play a greater role. Other scholars have, on the contrary, claimed that aptitude could be even more relevant in immersive than school contexts, as naturalistic learners must find regularities in a great amount of input (see Granena, Reference Granena2014; Skehan, Reference Skehan1991). Furthermore, in Li's (Reference Li2015) meta-analysis, where six out of the 33 studies that were considered investigated aptitude in naturalistic settings, it appeared that “the mean effect size associated with naturalistic learning was also significant, which seemed to suggest that aptitude was drawn on in untutored contexts as well as in language classes.” (p. 398)
Age of Onset and Second Language Proficiency Development
Aptitude has also been investigated in relation to another factor commonly considered as predicting language proficiency development: biological age of onset. Following Granena & Long (Reference Granena and Long2013b), for instance, age of onset would typically explain about 30% of the variance in second language ultimate attainment; younger learners usually achieve a higher level of proficiency than older learners in the long run. Note, however, that this advantage of young starters only shows up after a significant amount of exposure to the language. During the first months of language acquisition (i.e., in terms of rate of acquisition), the tendency goes in the opposite direction: older learners usually develop proficiency in the second language faster than younger learners (Asher & Price, Reference Asher and Price1967; Krashen et al., Reference Krashen, Long and Scarcella1979; Snow & Hoefnagel-Hohle, Reference Snow and Hoefnagel-Hohle1978).
Over the last decades, several maturational hypotheses have been proposed to explain this (paradoxical) effect of age on second language acquisition (see Lambelet & Berthele, Reference Lambelet and Berthele2015; Singleton, Reference Singleton2005; Singleton & Ryan, Reference Singleton and Ryan2004 for a discussion). One hypothesis concerns procedural differences between adults and children. According to this view of maturational constraints on language learning, children can acquire languages implicitly, while adults need more explicit teaching and learning to develop proficiency because their implicit learning ability is diminished (DeKeyser, Reference DeKeyser, Hulstijn and Schmidt1994, Reference DeKeyser, Doughty and Long2008, Reference DeKeyser, Wen, Skehan, Biedroń and Sparks2019; DeKeyser & Larson-Hall, Reference DeKeyser, Larson-Hall, Kroll and De Groot2005; Ellis, Reference Ellis2005, Reference Ellis2014). Since a greater amount of exposure is needed to acquire regularities without explicit teaching, the use of implicit learning skills could explain the disadvantage of children over adults at the beginning stage of L2 acquisition. On the other hand, adults and adolescents would rarely achieve full proficiency in the target language because of the conscious and laborious efforts that explicit learning requires.
Age of Onset, Aptitude, and Second Language Proficiency Development
Even though children usually outperform adults in the long term, some adults achieve a level of proficiency high enough to be considered native-like. From a maturational perspective, these highly proficient late L2 starters would have, for some reason, kept some of their initial language acquisition ability, and therefore would show a higher level of foreign language learning aptitude (Carroll, Reference Carroll1973; Doughty et al., Reference Doughty, Campbell, Mislevy, Bunting, Bowles and Koeth2010; Ross et al., Reference Ross, Yoshinaga, Sasaki and Robinson2002). In this view, although children acquiring a foreign or second language early in life would not need any special ability to develop the language, only gifted adolescents and adults would be able to achieve a high L2 level. Research results of several studies on native-like L2 users seem to confirm this hypothesis. For instance, in a study with 56 Hungarians who immigrated to the United States either as adults (late starters) or before the age of 16 (early starters), foreign language aptitude appeared as a predictor of English proficiency in late starters but not in early starters (DeKeyser, Reference DeKeyser2000). Similarly, in a second study with immigrants in the United States and in Israel, correlations were found between foreign language aptitude and English proficiency scores for participants who began their L2 learning between the ages of 18 and 40, but not for early starters or those who began their learning after age 40 (DeKeyser et al., Reference DeKeyser, Alfi-Shabtay and Ravid2010).
More nuanced results appear in Granena & Long (Reference Granena and Long2013a). They investigated the predictive effect of age, length of stay in the L2 country and foreign language aptitude on L2 language proficiency in 65 Chinese immigrants in Spain. While Granena & Long (Reference Granena and Long2013a) did find predicted correlations between aptitude and phonology and between aptitude and lexis in the late starter group but not in the early starters, they did not find correlations between aptitude and morphosyntax in any group. As a whole, these results suggest that in immersive settings, aptitude does not play an important role in young learners. In contrast, however, Abrahamsson & Hyltenstam (Reference Abrahamsson and Hyltenstam2008) found an effect of aptitude in both early and late starters in a study on 42 Chilean immigrants in Sweden. One possible explanation for these contradictory results is that the measure of L2 proficiency used by Abrahamsson & Hyltenstam was more discriminating than the measures of L2 proficiency used in the other studies. This could have helped to avoid a ceiling effect in the early starter groups that would have “[made] it harder to show the influence of any moderator variable, including aptitude, and easier to show its effects in adult groups with the greater variability in ultimate attainment that is typical of late starters” (Granena & Long, Reference Granena and Long2013a, pp. 333–334). If this explanation is correct, the effect of aptitude in early second language learners should appear in the early phases of language acquisition, when variability in language outcome is still present. Research is therefore needed on the effect of aptitude on rate of acquisition, e.g., with adults and children who have recently arrived in a country where the majority language is different from their L1.
Exposure and Other Factors Correlated with Age
Other studies have emphasized factors correlated with age that could also provide an explanation for the long-term advantage of children over adults, such as length of stay, L2 education (especially if children study the L2 at school and learn to read in it while their parents do not), and L1 use (Moyer, Reference Moyer2011). Studies have also highlighted the importance of affective and exposure variables such as frequency of contact with native speakers as predictors of L2 ultimate attainment (Flege & Liu, Reference Flege and Liu2001; Kinsella & Singleton, Reference Kinsella and Singleton2014; Moyer, Reference Moyer2011).
Based on the results of her study on native-like accent in foreign students in US universities, Moyer (Reference Moyer2011) argues that quality of exposure (in the sense of exposure where the “learner interacts in functionally significant ways, representing interpersonal as well as instrumental purpose”, p. 195) is more important than quantity of exposure (length of residence, years of instruction, etc.). In the same vein, the fact that children often encounter more diversified input and come into contact with a larger number of target language speakers in a variety of settings (Flege, Reference Flege1987) could partially explain their superior learning outcomes in the long term.
Recent research on adult immigrants in various contexts has proposed language anxiety as another important factor that mediates both exposure to the language and development of language proficiency (Dewaele & Sevinç, Reference Dewaele and Sevinç2017; Garcia de Blakeley et al., Reference Garcia de Blakeley, Ford and Casey2017; Sevinç, Reference Sevinç2018; Sevinç & Backus, Reference Sevinç and Backus2017), which confirms the well-documented effect of foreign language anxiety in the classroom (for a review, see Gkonou et al., Reference Gkonou, Daubney and Dewaele2017). Level of foreign language anxiety in turn depends on age of onset, according to a study with multilingual adults (Dewaele et al., Reference Dewaele, Petrides and Furnham2008), but research is needed on the interaction between age, anxiety, and rate of acquisition in an immersive context.
As in other subfields of SLA, research on age, aptitude, and other individual differences has measured the learning outcomes in many ways. For the present study, the main criteria for the choice of the tests of English proficiency were their suitability for different age groups (see Lambelet & Berthele, Reference Lambelet and Berthele2015, for a discussion of research designs in studies on age effect on L2 learning), and their closeness to real communication skills. In this sense, elicited oral narratives based on picture books or short video sequences allow participants to make use of their linguistic repertoire to resolve the task as they would do in real-life situations, while ensuring the comparability across participants thanks to the common thematic frame of the story to be told. Elicited narratives have been used in a wide range of SLA and heritage language studies with adults and children to measure language dominance, narrative complexity, verbal tense production, motion events expressions, or lexical diversity (e.g., De Clercq & Housen, Reference De Clercq and Housen2019; Sánchez Abchi et al., Reference Sánchez Abchi, Bonvin, Lambelet and Pestana2017; Strömqvist & Verhoeven, Reference Strömqvist and Verhoeven2004; Treffers-Daller, Reference Treffers-Daller, Bullock and Toribio2009). In this article, we focus on the lexical diversity of oral narratives, a measure that gives an insight on the size of the active vocabulary of a L1 or L2 user and reflects learners’ level of language proficiency and the complexity of their developing vocabulary (e.g. Malvern et al., Reference Malvern, Richards, Chipere and Durán2004). Lexical diversity has been of interest to researchers since the 1930's, initially in relation to monolingual first language acquisition and then in relation to bilingual acquisition and second language learning. Over the years, various indices have been developed to measure lexical diversity while avoiding the confounding effect of text length (see Jarvis, Reference Jarvis2013, for a discussion). For this contribution, we will use the Guiraud Index of lexical diversity, a measure that has been found to be a good predictor of the human perception of lexical richness (e.g., Vanhove et al., Reference Vanhove, Bonvin, Lambelet and Berthele2019) and is widely used in SLA studies.
The Language Aptitude Outside the Classroom (LAOC) Study
In the present study, we use a longitudinal design to investigate the development of lexical diversity in recently arrived parent-child immigrant pairs. We investigate the effects of age of onset, cognitive abilities (foreign language learning aptitude, working memory), exposure, and anxiety. We aim to answer the following research questions about recently arrived Spanish-speaking immigrants in the United States:
- 
Research question 1: What factors predict rate of English acquisition, measured as the lexical diversity of oral narratives? 
- 
Research question 2: Are the same factors predictive of rate of English acquisition in adults as in children? 
Methodology
All the participants were tested by the principal investigator of the project who met with each dyad three times over a one-year period. Data collections took place either at participants’ homes, in a library, or in a community center. During the first half of the first session (T1), participants completed the LLAMA aptitude tests and two short-term/working memory tests (see below, Tasks). The experimenter gave the instructions orally in Spanish to both adult and child simultaneously and answered any questions the participants had. When she felt it necessary, the experimenter asked the child or the adult to repeat the instructions in their own words to check for misunderstandings. The administration of the six cognitive tests (four aptitude tests and two short-term/working memory tests) took approximately 40 to 50 minutes.
In the second half of the first session, child and adult English proficiency was assessed with three tests (a verbal fluency task, an oral narrative, and a listening comprehension task). While participant 1 (the adult) was answering a listening comprehension task on a laptop computer with headphones, participant 2 (the child) performed the verbal fluency and oral narrative (frog story) tasks with the experimenter. After reversing the tasks (the child using the laptop for the listening comprehension task, the adult with the experimenter for the oral tasks), parent and child answered a short questionnaire in Spanish about their exposure to English, their anxiety when speaking in English and other socio-demographic questions. At T2 and T3, respectively 6 months and 12 months after the first session, participants’ English proficiency was assessed a second and third time with a variation of the same tests. Participants also answered the socio-biographic questionnaire. The first session lasted for 90 to 120 minutes. Sessions 2 and 3 lasted for 45 to 60 minutes. The study was approved by the IRB of the University of Maryland.
Participants
Participants are 38 parent-child dyads of Spanish-speaking immigrants in New York city (N = 76). An additional 13 parent-child dyads were excluded from the analysis because they did not participate in all three data collections or had missing answers (total participant sample = 51 dyads). Table 1 shows the gender and mean age by group of the participants included in the analysis. Participants come from 9 Latin American countries reflecting the current waves of immigration in the United States: Venezuela (N = 28), Dominican Republic (N = 22), Ecuador (N = 8), Honduras (N = 6), Mexico (N = 4), Bolivia (N = 2), El Salvador (N = 2), Peru (N = 2) and Puerto Rico (N = 2). The answers of the participants to a brief socio-biographical questionnaire shows that the majority of them arrived in the United States with a very low level of proficiency: 1 adult and 5 children reported not understanding a single word when they entered the country; 27 adults and 22 children reported knowing a few words such as “good morning,” “thank you,” some numbers and some colors; 10 adults and 9 children reported being able to communicate on very basic everyday needs; 1 child but none of the adults reported being able to participate in a conversation on everyday topics; and 1 child (but no adults) reported being able to communicate with certain fluency on a variety of topics.
Table 1. Participants’ gender and age at T1 and at arrival by group.

Note: the age on arrival is estimated based on the participants’ age at T1 (in years) and their length of residence at T1 (in months).
Tasks
Foreign Language Learning Aptitude Tests
Participants’ foreign language learning aptitude was assessed with the four subtests of the LLAMA aptitude tests (Meara, Reference Meara2005). The LLAMA tests are computer-run and picture-based exercises simulating the learning of an artificial language. They can be used by speakers from any L1 because they do not rely on any specific language system, and their user-friendly interface makes them easily usable for any type of population (including children) which explains their wide use in the field of aptitude, even if their internal validity has been recently questioned (Bokander & Bylund, 2020). The first test, LLAMA_B simulates vocabulary learning. The participant has 120 seconds to learn the names of a set of invented objects (drawings) in an unknown language (training phase). They are then tested on their learning. LLAMA_D is a sound discrimination and recognition task. It is intended to measure the participant's ability to recognize oral patterns in an unknown language. Participants first hear a series of sounds (training phase) and then must discriminate between new and previously heard items (test phase). LLAMA_E measures sound-symbol association ability. For 120 seconds, participants learn the relationship between 22 sounds and 22 written forms in an artificial orthographic system (training phase). They then hear a word and choose the correct written word between two variants (test phase). LLAMA_F measures inductive learning ability. For 300 seconds, participants infer the grammatical system of an artificial language with a set of visual and written stimuli (training phase). They then choose the grammatically correct variant out of two new stimuli (test phase).
Working Memory and Short-Term Memory Tests
Participants answered two tests of working/short term memory on two different laptops with headphones. The tests were built on the PEBL environment (Mueller & Piper, Reference Mueller and Piper2014). Participants began with the Corsi Block task, which measures visuo-spatial short-term memory. Participants observed sequences of blocks light up on their screen, and then repeated the sequences back in order. The sequence increased in length at each successful trial until the participant was unable to remember it. When they made errors, participants were given a second chance with a new sequence of the same length before continuing to the increased lengths. After two missed trials, the task was terminated, and the score saved. The second working memory test was the backward digit span. Participants saw sequences of single digits and were asked to tap the sequence in the reverse order than they appeared on the screen. As for the Corsi Block task, sequences of digits increased in size at each trial. After two missed trials, the score was saved. Corsi block and backward digit span were chosen to avoid confounding memory and language abilities and because they could be administrated to adults and children simultaneously.
Oral Narrative
At each session, adults and children were asked to perform an oral narrative task using the Frog Stories books by Mercer Mayer. At T1 and T3, participants told the story illustrated in the picture book Frog, where are you? (Mayer, Reference Mayer1969), while at T2 they told the story illustrated in the picture book A boy, a dog, and a frog (Mayer, Reference Mayer1967). Before telling the story, they were given time to look through the book and prepare mentally. Preparation and performance were untimed, and performance was recorded. Here, we only consider the oral narratives from T1 and T3 since T2 pictured a different story, therefore not allowing for direct comparison.
Socio-Biographic Questionnaire
At the end of each session (T1, T2, and T3), participants individually answered a short questionnaire in Spanish. The experimenter helped in reading the questions when necessary. The adult version of the questionnaire contained nine questions on exposure, six questions on anxiety when speaking in English, five questions on motivation and feeling of integration and three questions on education level, employment in the country of origin and employment in the United States. The exposure, anxiety, motivation and feeling of integration questions were adapted for the child questionnaire.
Exposure to English was self-rated in eight domains: at home, with friends, in school/at work, in the neighborhood, in church, when reading, when watching television, and when listening to music. For each of these domains, the participant answered if they were using only English, mainly English, as much English as Spanish, mainly Spanish, or only Spanish. At T2 and T3, anxiety was self-rated in six domains: at home, with friends, in the neighborhood, in school/at work, on the phone, during adult ESL classes (last question only in the adult questionnaire). Participants answered if they were nervous in each context on a five-point scale ranging from not nervous at all to extremely nervous. At T1, the questionnaire contained just one general question on anxiety when speaking in English (same five-point scale).
Data Analysis
Data Preparation
Cognitive Variables
The scores for the LLAMA_B, LLAMA_E, and LLAMA_F were retrieved directly from the software. The possible scores on the three subtests range from 0 to 100. The LLAMA_D score as computed from the software ranges from 0 to 75. Participants’ scores on the LLAMA_D were therefore transformed to correspond to the range of the other subtests (0–100). The Backward Digit and Corsi Block Spans were retrieved directly from PEBL. To run the linear mixed models, each of the cognitive variables was mean centered (z_scores of the combined age groups for the first model, then separately by age group for the second part of the analysis (see the following sections below: Factors Explaining Lexical Diversity Development in the Entire Sample (RQ 1) and Factors Explaining Lexical Diversity Development in Adults and Children Separately (RQ 2)).
Affective and Contextual Variables
The answers to the eight questions of the questionnaire on exposure were coded on a scale from 1 to 5 (1 = only Spanish, 5 = only English), averaged for the three sessions, and transformed to a score of 1 to 100. The same applied to the questions on anxiety. The internal consistency of the Exposure and Anxiety variables was very satisfactory (Exposure to English: Cronbach's alpha = .92, N = 24; Anxiety when speaking in English: Cronbach's alpha = .86, N = 13). To run the linear mixed models, the newly created Exposure and Anxiety variables were centered (z_scores of the combined age groups for the first model, then separately by age group for the second part of the analysis, (see sections below: Factors Explaining Lexical Diversity Development in the Entire Sample (RQ 1) and Factors Explaining Lexical Diversity Development in Adults and Children Separately (RQ 2)).
Lexical Diversity of the Oral Narratives
Participants’ oral narratives were transcribed by the principal investigator as soon as possible after data collection. All transcriptions were then checked and revised by a research assistant who is a native English speaker. Differences in transcriptions were reviewed by the principal experimenter and, if needed, discussed with the assistant. Filled pauses, unintelligible words, and Spanish code-switching were removed from each transcription before the Guiraud Index measure of lexical diversity (number of types divided by the square root of number of tokens) was computed.
Results
Dependent Variable: Guiraud Index of Lexical Diversity
At T1, the mean Guiraud Index score of the children was 4.17 (SD = 1.5) and the parents’ score was 3.74 (SD = 1.1). One year later (T3), the Guiraud Index score of the children increased to 4.96 (SD = 1.2) and the parents’ score slightly increased to 3.91 (SD = 1.1). A 2 × 2 repeated measures ANOVA revealed a significant main effect of group (F(1, 35) = 16.25, p < .001, ηp2 = .32), a significant main effect of time (F(1, 35) = 22.97, p <.001, ηp2 = .40) and a significant effect of the interaction between time and group (F(1,35) = 10.1, p < .001, ηp2 = .22). In other words, (a) the children scored higher than their parents in general, (b) the scores of the entire sample increased between T1 and T3, but (c) this increase differed by group: the difference between T1 and T3 is significant in the child group (t(35) = 5.30, p <.001), but not in the adults (t(35) = 1.64, p = .11). The scores of the adults and the children are plotted in Figure 1, below.

Figure 1. Boxplots: Distribution of the Guiraud Index Scores for Each Group at Each Session (T1 and T3)
Independent Variables
The means and standard deviations for both groups on the nine independent variables are shown in Table 2. Regarding cognitive variables, adults and children differ significantly on the LLAMA_B and LLAMA_F (higher score for the children than for their parents) but not on the other four cognitive tests. They nevertheless differ significantly on both exposure and anxiety; children report more exposure to English in their everyday life than their parents do and report less anxiety when speaking in English. The two groups do not differ significantly in terms of length of residence in the United States.
Table 2. Mean, standard deviations and paired t-tests of the score on the nine independent variables for each group.

Correlation Matrix
Before running linear mixed effects models on the data, we examined the correlations between the Guiraud Index and the cognitive, exposure, and anxiety variables. As shown in Table 3, the Guiraud Index was moderately correlated with the four measures of aptitude, length of residence, exposure and anxiety. Contrary to expectations, no significant correlations appeared between the measures of working/short term memory and the independent variable (Corsi: r = −.025, p = .76; Backward Digit: r = .054, p = .51).
Table 3. Pearson's correlations matrix between the Guiraud Index (dependent variables) and the nine independent variables to be included in the linear mixed models.

*p < .05, **p < .01, ***p < .001
Factors Explaining Lexical Diversity Development in the Entire Sample (RQ 1)
To answer our first research question, we fitted backward elimination linear mixed effects models to the data using the lmer() function of the lme4 package for R (Bates et al., Reference Bates, Maechler and Bolker2012). The dependent variable was the Guiraud Index. To control for the influence of household, dyad was included with a random intercept and random slope for time. Time, age group, length of residence, aptitude, working memory, exposure to English, and anxiety when speaking in English were included in the analysis as fixed effects. We computed the variance inflation factor using the vif() function of the car package for R (Fox et al., Reference Fox, Weisberg, Adler, Bates, Baud-Bovy, Ellison, Firth, Friendly, Gorjanc and Graves2012), and found no problem of collinearity among the predictors (the values ranged between 1.3 and 4.7). Time and age group were treated as factors (T1, T3; adults, children) while the other independent variables were mean-centered and standardized (z_scores). At each step of the analysis, the predictor with the largest nonsignificant p-value was removed until the simplest model was found. At each step, the model with the predictor was compared to the model without the predictor to determine the predictor's significance by using the likelihood ratio test in the anova() function of the lme4 package. The explained variance (marginal and conditional R 2) of the best fitting model was then computed using the rsquaredGLMM() function of the MuMIn package for R (Barton, Reference Barton2020).
The results of the best fitting model (marginal R 2 = .40, conditional R 2 = .73) are summarized in Table 4. As the table shows, length of residence, LLAMA_D, LLAMA_E, exposure to English, and the interaction of time by group are significant predictors of lexical diversity. For each increase of one standard deviation in length of residence, the adults score .25 units higher than the adults at T1 (intercept). Also, for each increase of one standard deviation in LLAMA_D and LLAMA_E, their score increases by, respectively, .28 and 22 units. Exposure to English is the most important predictor, with an increase in lexical diversity of .69 units for each increase of one standard deviation in exposure to English. As shown in the model, group and time are not significant predictors of lexical diversity development, but the interaction of time by group is: At T3, the children score .42 units higher than the adults at T1.
Table 4. Best-fitting Model for the Entire Sample

Factors Explaining Lexical Diversity Development in Adults and Children Separately (RQ 2)
To follow up on the significant interaction between age group and time, in a second step of the analysis, we fitted linear mixed effects models separately to the adults’ and children's data. The independent variables were centered and standardized by group (z_score). We followed the same process described above, removing the predictor with the highest p-value at each step and comparing the model with the predictor to the model without the predictor to determine the predictor's significance. We then computed the explained variance of the best fitting models in each group (Adult group: marginal R 2 = .33, conditional R 2 = .76; Child group: marginal R 2 = .54, conditional R 2 = .81).
The results of the best-fitting models for both groups are shown in Tables 5 and 6 below. In the adult group, the only significant predictor of lexical diversity was exposure to English, with a .63 unit increase for each increase of one standard deviation in exposure compared to the adult score at T1. In the child group, time, length of residence, LLAMA_D, LLAMA_E and anxiety are significant predictors of lexical diversity. The most important predictor is time (at T3, children score .80 units higher than at T1), followed by anxiety (decrease of .51 units of lexical diversity for each increase of one standard deviation in anxiety). Length of residence is also a significant predictor: for each increase of one standard deviation in length of residence, children score .44 score higher. Finally, two dimensions of aptitude have an impact on lexical diversity development in the children: for each increase of one standard deviation in LLAMA_D and LLAMA_E, their scores increase .38 and .30 units, respectively.
Table 5. Best-fitting Model for the Adult Group.

Table 6. Best-fitting Model for the Child Group.

Discussion
The LAOC study investigates the effect of age of onset, cognitive variables (aptitude and working memory), exposure to English, and anxiety on English rate of acquisition, measured as the lexical diversity of oral narratives. In the following section, we will review and discuss our results for each dimension.
First, regarding cognitive variables, our results show that two subtests of the LLAMA aptitude battery predict lexical diversity when the entire sample is considered: the higher the participants’ scores on the measure of sound discrimination (LLAMA_D) and sound-symbol association (LLAMA_E), the better their lexical diversity development. This effect nevertheless disappears for the adults when modeled separately from the children. This last result contradicts those from the former studies on age of onset and aptitude in naturalistic settings. As a reminder, aptitude showed a predictive effect on ultimate attainment in the adult immigrants in every prior study (Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2008; DeKeyser, Reference DeKeyser2000; DeKeyser et al., Reference DeKeyser, Alfi-Shabtay and Ravid2010; Granena & Long, Reference Granena and Long2013a), and in the younger starters in one of the studies (Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2008). Also, in contradiction to former studies using working memory as a new measure of aptitude, we did not find any effect of working memory on lexical diversity development. This could result either from the measures of working memory themselves, or it could be an indication that working memory predicts language learning in a school context, but not in naturalistic settings, where exposure to the language varies between individuals.
In fact, in our sample, the effect of exposure is of paramount importance for both groups, and greater than the effect of any other variable in the adult group. Following the distinction made by (Moyer, Reference Moyer2011), quality of exposure, self-rated in eight domains, seems particularly important in the adults, while length of residence (which Moyer defines as quantity of exposure), predicts development of proficiency in the children. In the adult group, exposure to the majority language in everyday life is even more important than length of residence; adults recently arrived but in high contact with the majority language make more progress than adults with a longer stay who do not use English in their everyday lives. This result is surprising considering the range of length of residence present in the data (see Table 2), but it confirms the picture emerging from the 2 × 2 ANOVAS and paired t-tests, which showed no significant improvement of proficiency in the adult group over a one-year period (see Figure 1).
A closer look at the exposure to English variable shows that children self-report significantly more exposure to English than their parents in all life domains but church (where both children and adults use predominantly Spanish). But it also appears that there is a wide range of exposure within each group (adult group: M = 45, SD = 13, Min = 23, Max = 79; child group: M = 66, SD = 12, Min = 39, Max = 91). Adult participants who are in contact with English through work, personal relationships and/or leisure activities significantly increase their chance to develop proficiency in comparison to other participants (including children) who find themselves in almost exclusively Spanish-speaking environments.
Interestingly, exposure to English does not correlate with length of stay in the adult group (Pearson's r = .08, p = .52), which explains the lack of improvement of lexical diversity in this group over a one-year period. In comparison, in the child group, length of stay correlates with self-rated exposure to English (Pearson's r = .47, p < .001) and, negatively, with anxiety (Pearson's r = −.35, p = .002). It appears therefore that the longer children stay in the country, the more contact they have with the language, the less anxious they are when speaking it, and, as a consequence, the more they develop their proficiency.
The study suffers from several limitations. First, the reliability of the LLAMA aptitude tests has been recently questioned (see Bokander and Bylund, Reference Bokander and Bylund2020), and they might be less adapted to recent immigrants and low-SES participants than to other categories of participants. For another contribution in preparation, we ran ANOVAs comparing the score on the LLAMA tests of the LAOC participants to the scores of the participants from 34 other studies using the same test battery and found out that, except for the LLAMA_D, the adults from the LAOC study performed significantly lower than the adults and teenagers from the other studies, and even lower than the children from the other studies on the LLAMA_B. The children from the LAOC study performed similarly to the children of all the other studies on the four subtests.
Another limitation concerns the dependent variable. As mentioned by an anonymous reviewer of this paper, the Guiraud Index does not consider the various dimensions of lexical knowledge, in particular lexical sophistication (examples such as “the boy looked in the hole of the tree” vs. “the boy gazed into the hollow of the tree”, which are identical in terms of lexical diversity, but not in terms of sophistication). Note however that the Guiraud Index has proved to be a good predictor of untrained raters’ perception of the lexical richness of short narratives even in comparison to complex models containing a wide range of lexical properties, including lexical sophistication (see Vanhove et al. Reference Vanhove, Bonvin, Lambelet and Berthele2019). Additionally, it is worth mentioning that the analysis of the other measures of proficiency (verbal tense listening comprehension and verbal fluency) show the same paramount effect of exposure, which appears to be the most stable predictor across all three measures of English proficiency development.
Conclusion
With a sample of 38 parent-child dyads of recently arrived immigrants, this study shows that development of lexical diversity over a one-year period is predicted by exposure (and, for the children, anxiety); foreign language learning aptitude has a smaller effect. We did not replicate results from previous studies showing a differential effect of aptitude as a function of age of onset on ultimate attainment. Contrary to expectations, aptitude measures appeared to be predictors of lexical diversity development in the children but not in the adults. This somewhat surprising result indicates that, at the beginning of adult second language acquisition in a naturalistic setting, exposure to the target language is more important than any other individual factor including working memory and aptitude.
Funding
The study was funded by an Advanced Mobility Grant to the author from the Swiss National Science Foundation (SNF).
Acknowledgments
The author thanks Martin Chodorow and Virginia Valian (CUNY Hunter College) for their comments on a previous version of this paper and their support during the analyses, as well as Michelle Antonov (CUNY Hunter College) for her thorough revision of the transcriptions. Special thanks also to Robert DeKeyser (UMD), Kira Gor (UMD) and all the members of the LARC lab (CUNY Hunter College) for their continuous support and insightful feedback throughout the project.
Heartfelt thanks to Angel Diaz and Ella Nimmo (Cabrini Immigrant Services), Judit Criado Fiuza (Mercy Center), Niurka Melendez and Héctor Arguinzones (VIA), Anna Bazán, Carlos Espinoza, Gianina Enriquez, Carlos Varas, and Victor Lagos for helping me reaching the immigrant community in New York. And of course, my deepest thanks to the 51 children and parents who accepted to participate in the study.
 
 






