Revisiting the traditional conceptualizations of vocabulary knowledge as predictors of dual language learners’ English reading achievement in a new destination state

Abstract The unprecedented growth of Spanish-English dual language learners (DLLs) in new destination states (e.g., Georgia, Indiana, South Carolina, Tennessee) calls for better understanding of the relation between their bilingual vocabulary skills and English reading achievement. The current study focused on school-age Spanish-English DLLs (N = 60) in Tennessee and explored how various vocabulary knowledge conceptualizations predict English reading comprehension achievement, controlling for English word reading skills and grade level. Vocabulary knowledge was assessed using monolingual (English-only and Spanish-only) and bilingual (conceptual and total) scoring methods. Results showed that, while DLLs performed below the national mean for English-only and Spanish-only vocabulary, they performed within the average to above-average range for bilingually scored conceptual vocabulary. More uniquely, the expressive vocabulary knowledge emerged as a robust predictor of English reading comprehension above and beyond the influence of English word reading skills. Findings suggest practical and theoretical value of bilingually driven vocabulary assessment approaches. As expected, bilingually scored vocabulary provided a more comprehensive understanding of DLLs’ vocabulary knowledge by accounting for vocabulary knowledge in both Spanish and English, compared to monolingually scored vocabulary. We discuss theoretical and instructional implications, with a focus on asset-driven and scientific assessment understandings for supporting DLLs’ vocabulary and reading achievement in new destination states.

Literacy Foundation, 2018). The challenge is particularly salient for school-age dual language learners (DLLs) who are simultaneously developing English proficiency and acquiring academic content in English (Mancilla-Martinez, 2020). In the U.S., DLLs comprise approximately 23% of the K-12 student population (Kids Count Data Center, 2020), and the majority are from Spanish-speaking homes (García, 2012). To be clear, speaking more than one language itself is not a risk factor for compromised academic outcomes (Mancilla-Martinez, 2020). However and of concern, school-age Spanish-English DLLs often come from households living in or near poverty (Gennetian et al., 2019;Lopez & Velasco, 2011), and poverty compromises students' academic experiences and trajectories, including RC (Heidlage et al., 2020;Luo et al., 2020). The large and growing presence of Spanish-English DLLs across the nation necessitates that we revisit the traditional conceptualization of a well-known RC contributor, vocabulary knowledge. Even though DLLs are not monolinguals, the traditional conceptualization of DLLs' vocabulary knowledge is based on the assessment of language-specific vocabulary, typically English-only vocabulary.
DLLs are not a new U.S. student population, but certain areas of the countrycommonly known as new destination states (e.g., Georgia, Indiana, South Carolina, Tennessee)-have experienced rapid, unprecedented growth of DLLs in the last decade (Smolarek, 2020). Mirroring national trends, DLLs in new destination states predominantly come from Spanish-speaking and low-income homes (Gándara & Mordechay, 2017). In other words, new destination states do not stand out as an anomaly relative to national DLL demographic trends. However, compared to traditional destination states (e.g., California, Texas), new destination states do stand out in that, by definition, they have more nascent experience educating DLLs. Consequently, schools and educators in these contexts tend to have comparatively less experience and fewer resources to support DLLs' overall academic achievement (Lee & Hawkins, 2015;Lowenhaupt & Reeves, 2015;Umansky et al., 2018).
In this study, we draw on Hoover and Gough's (1990) Simple View of Reading (SVR) model-a well-known, parsimonious model that frames RC as a product of word reading (i.e., the ability to read the printed words in a text) and language comprehension (i.e., the ability to automatically associate meaning to speech sounds). The SVR has been empirically supported across numerous scientific studies of reading involving both monolinguals and DLLs (e.g., Catts et al., 2015;Hammer et al., 2014;. Within the SVR, vocabulary knowledge represents one important aspect of language comprehension (e.g., Braze et al., 2016;Proctor et al., 2009;Tunmer & Chapman, 2012;van Steensel et al., 2016), which is the component of interest in this study. Additionally, we draw on the shared (distributed) asymmetrical model as our theoretical framework for assessing DLLs' vocabulary (Dong et al., 2005).
As U.S. student demographics continue to change and DLLs become less "new" across all contexts, a more nuanced and scientifically based understanding of the relation between Spanish-English DLLs' vocabulary skills-which generally consist of vocabulary knowledge across English and Spanish-and their English RC achievement is both a pressing need and a timely matter of educational equity. To contribute to the literature on English RC among school-age DLLs and particularly students from Spanish-speaking homes, we conducted this exploratory study in Tennessee (i.e., a new immigrant destination state in the American South). Our goal was to compare the relation between monolingual and bilingual conceptualizations of DLLs' vocabulary knowledge and their English RC outcomes. By studying DLLs in a new destination state (i.e., Tennessee), this study can offer theoretical and practical insight for better supporting English RC of English-Spanish DLLs broadly and specifically for those in new destination states where access to formal instruction and resources in Spanish tend to be relatively limited compared to traditional DLL-serving states.

Conceptualizing English reading comprehension
Decades of empirical and theoretical studies on RC have revealed a multitude of contributors to successful RC achievement (e.g., Gough & Tunmer, 1986;Kendeou et al., 2013;Kirby & Savage, 2008;Lesaux et al., 2010;Perfetti & Stafura, 2014;RAND, 2002). We frame our exploration of English RC predictors in terms of the SVR (Gough & Tunmer, 1986). The SVR posits that successful RC relies on the interrelation between language comprehension and word reading. Language comprehension encompasses the ability to process language and derive meaning, across word-, sentence-, or discourse-level oral language skills, while word reading refers to students' ability to process printed text. Importantly, both components are multidimensional (Tunmer & Chapman, 2012), comprised of multiple subskills such as various top-down (e.g., vocabulary knowledge, syntactic knowledge, pragmatics) and bottom-up (e.g., sight word recognition, phonemic decoding, morphological awareness) processes. Proficiency in both language comprehension and word reading skills, and their cross-product, is essential to successful RC for monolingual English speakers and DLLs (e.g., Carver & David, 2009;Catts et al., 2015;Florit & Cain, 2011;Mancilla-Martinez & Lesaux, 2017;Nakamoto et al., 2008;Silverman et al., 2020). In fact, Hoover and Gough's (1990) well-known longitudinal SVR study revealed that both language comprehension and word reading made independent, significant contributions to English RC. The two components were not only independent, but also highly inter-dependent, such that simply supporting language comprehension or word reading would be insufficient (Hoover & Gough, 1990). A less well-known fact is that this study was conducted with school-age Spanish-English DLLs.
A wealth of research points to the critical role of word reading for successful RC (Foorman et al., 2018;Lonigan et al., 2018). Specifically, research with Spanish-English DLLs suggests a robust relation between RC and English and Spanish word reading skills (e.g., Cárdenas-Hagan et al., 2007;Durgunoğlu et al., 1993;Melby-Lervåg & Lervåg, 2011). In this study, we examine DLLs' language comprehension skills using DLLs' vocabulary knowledge, which is a typical proxy of language comprehension within the SVR framework. In doing so, it is important to acknowledge that Spanish-English DLLs are not English monolinguals. Although DLLs come from homes where a language other than-or in addition to-English is used, not all DLLs are English learners (i.e., students formally identified and designated as limited English-proficient by their schools). In fact, DLLs' language proficiency can vary widely. For example, some Spanish-English DLLs are more proficient in Spanish than in English, some more proficient in English than in Spanish, and others similarly proficient in both English and Spanish. This raises questions about how vocabulary skills have traditionally been assessed among this heterogenous and fast-growing group of learners. Indeed, although SVR studies on U.S. Spanish-English DLLs have expanded the field's understanding of the reading process within and across languages (e.g., Cho et al., 2019;Goodrich & Namkung, 2019;Gottardo & Mueller, 2009;Mancilla-Martinez & Lesaux, 2017;Nakamoto et al., 2008;Proctor et al., 2006), research on the contribution of bilingually scored-compared to traditional, monolingually scored (e.g., English-only, Spanish-only)-vocabulary knowledge to Spanish-English DLLs' English RC remains limited .
Conceptualizing Spanish-English DLLs' vocabulary knowledge Substantial evidence underscores the central role of vocabulary knowledge for DLLs' successful English RC (e.g., August et al., 2005;Lesaux et al., 2010;RAND, 2002;Snow & Kim, 2007). In fact, studies on Spanish-English DLLs' vocabulary knowledge have reported significant effects of Spanish vocabulary knowledge on English RC (e.g., Goodrich & Namkung, 2019;Proctor et al., 2006). Furthermore, it is important to recognize that DLLs' vocabulary knowledge is distributed across multiple languages (Mancilla-Martinez et al., 2011;Oh & Mancilla-Martinez, 2021;Pearson et al., 1995), which necessitates attention to both English and Spanish, for example, to ensure comprehensive vocabulary assessment. Indeed, decades of research have cautioned against expecting DLLs to demonstrate vocabulary knowledge in each language on par with that of monolinguals (e.g., Grosjean, 1989). Notwithstanding, the use of monolingual vocabulary assessments-typically English monolingual in the U.S. context-remains common practice with DLLs (Arias & Friberg, 2017;Caesar & Kohler, 2007).
Reliance on monolingual vocabulary assessments (e.g., English-only vocabulary) when assessing DLLs could lead to a partial and potentially misguided understanding of their word knowledge (Bedore et al., 2005;Hammer et al., 2014;Oh & Mancilla-Martinez, 2021). In fact, studies using monolingual vocabulary assessments have consistently found DLLs to score below their monolingual peers, as monolingual testing taps only a part of what DLLs know and underestimates their comprehensive word knowledge (Ehl et al., 2020;Oh & Mancilla-Martinez, 2021;Oller & Pearson, 2002). Even when Spanish-English DLLs' vocabulary knowledge is assessed using both English monolingual and Spanish monolingual tests, many studies report low vocabulary knowledge in both languages (Gross et al., 2014;Hoff, 2018). A central consideration is that the bulk of vocabulary assessments, whether in English or in Spanish, is designed for and normed on monolinguals. However, DLLs are not monolinguals. Given that high-stakes educational decisions (e.g., diagnoses for languagerelated difficulties, special education identification) are often made based on standardized assessment results, continued reliance on monolingual vocabulary assessments raises important educational equity concerns (Sugarman & Villegas, 2020). Hence, there is a pressing need to investigate the utility of bilingually scored vocabulary assessments with the growing population of Spanish-English DLLs.
Numerous models have been proposed to explain the bilingual mental lexicon. Given our focus on vocabulary knowledge as a proxy of language comprehension within the SVR framework, we draw on tenets of the shared (distributed) asymmetrical model proposed by Dong and colleagues (2005), as this model incorporates features of the most prominent models of the bilingual mental lexicon (i.e., the distributed model, the revised hierarchical model, the separate storage model, the word-association model, and the conceptual mediation model). As the name implies, according to the shared (distributed) asymmetrical model, there is shared storage for conceptual representations of the bilingual's two vocabularies (in this study, Spanish vocabulary and English vocabulary). However, there are also asymmetrical (i.e., distributed or separate) links between concepts and lexical names in the two languages. The shared (distributed) asymmetrical model thus offers a useful framework for exploring the distributed nature of school-age Spanish-English DLLs' vocabulary knowledge. Specifically, this model suggests that bilingual, rather than monolingual, scoring methods may provide a more appropriate and scientifically grounded approach for assessing DLLs' vocabulary knowledge. In other words, Spanish-English DLLs can be expected to produce lexical names (labels) in only English (e.g., cat), only Spanish (e.g., gato), both languages (e.g., gato and cat), or neither language (i.e., does not know the lexical name for the target concept) for one underlying concept. As such, vocabulary assessments that account for these varied response patterns among DLLs are needed.
The two most common bilingual scoring methods used to characterize DLLs' vocabulary knowledge are conceptual vocabulary and total vocabulary (e.g., Oh & Mancilla-Martinez, 2021;Pearson et al., 1995). Between these two bilingual scoring methods, conceptually scored vocabulary-which accounts for DLLs' vocabulary in either language (e.g., English or Spanish) by giving credit for known concepts, rather than for correct responses in a single language (as is the case with monolingual assessments)-has been more widely studied. Conceptually scored vocabulary scores are typically derived by summing all the words a DLL knows in both languages, then subtracting translation equivalents. In other words, if DLLs know both cat and gato, they receive single credit for the overall concept and not double credit for knowing both labels. Conceptually scored vocabulary has been used as an indicator of DLLs' language ability (e.g., Marchman et al., 2010;Pearson et al., 1993Pearson et al., , 1995, with growing evidence of its utility for understanding Spanish-English DLLs' linguistic competence (e.g., Anaya et al., 2018;Bedore et al., 2005;Peña & Halle, 2011;Hwang et al., 2020). Further, conceptual scoring mitigates not only score differences compared to monolingual scoring, but also widely observed vocabulary achievement differences between DLLs and their English monolingual peers (Bedore et al., 2005;Core et al., 2013;Gross et al., 2014;Mancilla-Martinez et al., 2011;Pearson et al., 1993;Peña et al., 2015).
Total scoring represents the second, albeit comparably rarely used, method of bilingual scoring for DLLs. Similar to conceptual scoring, total scoring allows the child to respond using both of their languages (e.g., English and Spanish). However, total scoring does give DLLs double credit. If DLLs know both cat and gato, they would receive double credit, unlike conceptual scoring that would give single credit. Thus, total vocabulary scores represent the sum of all the words DLLs know in English and Spanish, without subtracting translation equivalents.
As such, total vocabulary has been described as a more comprehensive indicator of what DLLs know compared to conceptual vocabulary (Core et al., 2013;Hoff & Core, 2015;O'Toole et al., 2017). Studies to date find DLLs' total vocabulary to be on par with or to exceed that of their monolingual peers (De Houwer et al., 2014;Hoff et al., 2012;Poulin-Dubois et al., 2013). Further, compared to conceptual vocabulary, total vocabulary taps both word form (i.e., phonological representations) and meaning (i.e., concepts) to a greater extent, which are two of the key components of word knowledge (McGregor et al., 2002;Storkel, 2001). Core et al. (2013) assert that, while conceptual vocabulary puts word form at the peripheral (i.e., due to its focus on the conceptual aspect of word knowledge), total vocabulary attends to both dimensions by prompting both English and Spanish for each test item. By testing words known in both languages, total scoring helps measure the full scope of DLLs' semantic and phonological knowledge-and thus more elaborate word knowledge (Pearson, 1998)-compared to conceptual scoring.
However, most studies on bilingual scoring of DLLs' vocabulary knowledge are concentrated at the early childhood level. Only recently have studies started to use bilingual scoring with school-age Spanish-English DLLs and mostly focus on conceptually scored vocabulary (e.g., Hwang et al., 2020;. Few studies have directly compared conceptual and total vocabulary scores among DLLs (e.g., Kern et al., 2019;Legacy et al., 2016;Oh & Mancilla-Martinez, 2021). Studies suggest that DLLs have relatively larger total vocabulary than conceptual vocabulary (Kern et al., 2019;Oh & Mancilla-Martinez, 2021) or report DLLs to have significantly larger total vocabulary-but not conceptual vocabulary-compared to their monolingual peers (Legacy et al., 2016). In particular, research on bilingually scored vocabulary among DLLs from Spanish-speaking homes-the most common language background among the school-aged DLLs in the U.S. (Gándara & Mordechay, 2017)-is limited (for exceptions, see Core et al., 2013;Marchman et al., 2010;Oh & Mancilla-Martinez, 2021), especially at the elementary-school level when high-stakes educational decisions (e.g., identification for language-related difficulties) are often made. Therefore, studies that examine elementary-aged DLLs' bilingually scored vocabulary as contributors to English RC are warranted for identifying equitable assessment practices for the growing population of DLLs across the nation.

Current study
The primary aim of this study is to gain an understanding of the extent to which vocabulary assessments designed for monolinguals (i.e., the typical scenario in most U.S. classrooms) compared to those designed for DLLs differentially predict English RC of school-age Spanish-English DLLs in a new immigrant destination state in the American South. Our study aims to address this measurement gap by using bilingual vocabulary assessments designed for and normed on Spanish-English DLLs to proxy conceptual and total vocabulary. To our knowledge, studies to date have not compared the predictive roles of conceptual and total vocabulary-the two most common bilingual scoring vocabulary approaches-on English RC among school-age Spanish-English DLLs.
Our study aims to address open questions about the relative utility of these related but distinct conceptualizations of bilingual vocabulary scoring for predicting English RC of the fastest-growing U.S. student population. The following research question guides this study: How do vocabulary assessments designed for use with monolinguals (i.e., English-only and Spanish-only) compare to those designed for use with DLLs (i.e., conceptual and total) in predicting school-age Spanish-English DLLs' English RC, above and beyond the well-known influence of English word reading?

Participants
The dataset used in the current study comes from a larger three-year longitudinal study on DLLs' conceptual vocabulary knowledge development, for which three cohorts (i.e., kindergarten, first grade, or second grade in the first year of the study) of Spanish-English DLLs were recruited from three elementary schools in the same large urban school district in Tennessee. In the participating schools, approximately half of the student population consisted of English learners (i.e., DLLs actively receiving English language services; M = 51.02%; SD = 4.16%), followed by English-proficient DLLs (i.e., DLLs who formerly or never received English language services; M = 25.85%, SD = 5.74%) and native English speakers (M = 22.73%; SD = 2.02%). Further, on average, students were predominantly Hispanic (M = 65.10%; SD = 2.38%), White (M = 14.18%; SD = 2.80%), and Black (M = 13.83%; SD = 2.75%). Other racial and ethnic backgrounds made up less than 10% of the student population (i.e., Asian, Pacific Islander, Native American). As is often the case in classrooms in new destination states, students in our sample were instructed in an English-only context. The analytical sample was drawn from the last year (i.e., third year) of the project, when the kindergarten cohort was in second grade (n = 34) and the first-grade cohort was in fourth grade (n = 26). Specifically, we used student data from the last wave of the project when total vocabulary was assessed. To note, students who were in the second-grade cohort had transitioned into middle school and were no longer followed in the study.
Students were considered DLLs if their parents indicated that Spanish was spoken at home to some extent. Of students' parents who reported their child's language use at home in the first year of the larger study (n = 54), 33% (n = 18) used both English and Spanish, 5% (n = 3) used mostly English, 18% (n = 11) used mostly Spanish, and 35% (n = 21) used only Spanish. In addition, 63% of the students (n = 38) were formally identified as English learners by their schools, and the remaining 37% (n = 22) were English-proficient during the last wave. Only 8% of the students (n = 5) were identified for special education services. Family demographic information was collected as a part of the larger longitudinal project (n = 54), and the vast majority (90%) of DLLs were U.S.-born, and the rest were born outside the U.S. (6% in Honduras, 2% in Mexico, and 2% in Cuba).

Procedure
Students' English RC scores come from a district-wide assessment administered in May, the end of the academic year. Additionally, students' vocabulary and English word reading skills were individually assessed in May by Spanish-fluent graduate and undergraduate research assistants in a quiet area in each school.

English reading comprehension
English RC was measured with the reading test in the Measures of Academic Progress (MAP) Growth (Northwest Evaluation Association [NWEA], 2019). The MAP reading test is a multiple-choice, computer-adaptive assessment (i.e., the assessment presents students either harder or easier items based on their responses to the previous test items) that the school district of our participating schools used to monitor students' academic progress. MAP reading scores are in the form of Rasch Unit (RIT) scores, which are equal-interval, vertically scaled scores that allow measurement of student improvement (Thum & Kuhfeld, 2020). The types of passages and content structure of the MAP reading assessment slightly vary by grade levels (NWEA, 2019). From kindergarten to second grade, the content areas include the following: foundational skills (e.g., word recognition), language and writing (e.g., spelling, grammar), literature and informational (e.g., key ideas of informational and literary texts), and vocabulary use and functions (e.g., vocabulary use, context clues). From second grade to fifth grade, the content areas include the following: information text (e.g., inference, point of view), literacy text (e.g., figurative language, point of view, text structures), and vocabulary acquisition and use (e.g., context clues, word parts, academic vocabulary). The publisher reports marginal reliability coefficients of .96 for second graders and .94 for fourth graders (NWEA, 2011).

Monolingually scored vocabulary
English-only vocabulary. English-only vocabulary was measured both receptively and expressively. Receptive vocabulary was measured with the Peabody Picture Vocabulary Test-4 (PPVT-4; Dunn & Dunn, 2007), a measure of a child's ability to correctly point to a picture that corresponds to a target word provided by the examiner. The PPVT is normed on English-proficient population in the U.S. The internal consistency reliabilities are .96-.97 for ages 7 and 8 (average years of age for second graders) and .94-.96 for ages 9 and 10 (average years of age for fourth graders). Expressive vocabulary was measured with the Woodcock-Muñoz Language Survey-III (WMLS-III) Picture Vocabulary subtest (Woodcock et al., 2017), a measure of a child's ability to label the target item prompted by an examiner. The publisher reports Cronbach's alphas of .77 to .79 for ages 7 to 10.
Spanish-only vocabulary. Spanish-only vocabulary was also measured both receptively and expressively. Receptive vocabulary was measured with the Test de Vocabulary en Imágenes Peabody (TVIP; Dunn et al., 1986), the parallel test of PPVT-4 in Spanish. The TVIP is normed on monolingual Spanish speakers in Latin America. For ages 7-8 and 9-10, the publisher reports split-half reliability coefficients of .94 and .91-.94, respectively. Expressive vocabulary was measured with the WMLS-III Vocabulario sobre Dibujos subtest (Woodcock et al., 2017), the Spanish equivalent to the WMLS-III Picture Vocabulary subtest. The publisher reports Cronbach's alphas of .77 to .79 for ages 7 to 10.
Conceptual vocabulary. Receptive and expressive conceptually scored vocabulary were measured by following the standardized protocol established by the ROWPVT-4: SBE (Martin, 2013a) and EOWPVT-4: SBE (Martin, 2013b). Both ROWPVT-4: SBE and EOWPVT-4: SBE administration started with practice items. During the receptive test, an examiner provided the target test item in the student's dominant language (i.e., either English or Spanish), based on parent or teacher report. If the student could not respond or responded incorrectly at the first prompt (e.g., in English), the examiner prompted the student again in the other language (e.g., in Spanish). Similarly, during the expressive test, an examiner presented the child with a picture and asked them to label the test item in either English or Spanish (i.e., "What is this?" or "¿Qué es esto?"). As bilingual assessments of vocabulary knowledge, conceptual scoring accepts a response as correct regardless of the language of the response (i.e., English or Spanish) as long as the child accurately identifies the concept of the target item. For both tests, the publisher reports median internal consistency reliability coefficient as .95.
Total vocabulary. Receptive and expressive total vocabulary scores were researchergenerated using the two bilingual vocabulary assessments, the ROWPVT-4: SBE and the EOWPVT-4: SBE, due to the lack of a standardized protocol and assessment for total vocabulary scoring. To generate total vocabulary scores, we used a nonstandard administration of the ROWPVT-4: SBE and the EOWPVT-4: SBE immediately after the standardized conceptual vocabulary administration using the two measures. In contrast to standardized conceptual scoring that began at the agerecommended start item, total scoring began at item 1 of the ROWPVT-4: SBE and the EOWPVT-4: SBE.
During total vocabulary assessment, an examiner prompted the child in both languages (i.e., English and Spanish) from item 1 to the test ceiling (i.e., where the conceptual testing ended under the standardized protocol). Specifically, the child was prompted in either English or Spanish for items that were not prompted in the other language during the conceptual vocabulary assessment (i.e., an examiner did not prompt the child in the other language if the correct response was given at the first prompting). In other words, all tested items were not administered twice. Rather, total vocabulary assessment ensured that the test items that were not prompted in both or either of the languages (i.e., English and Spanish) during conceptual scoring were probed. By doing so, we were able to record whether the child knew the labels in both languages (i.e., English and Spanish), in only one (i.e., English or Spanish) or in neither language (i.e., they did not know the label in neither English nor Spanish) for all tested items. If the child provided a correct answer in both English and Spanish for an item, two points were given for the item.
Because receptive and expressive total vocabulary scores were researcher-generated, there were no associated norms. Nonetheless, the sample-derived reliability values (i.e., Cronbach's alphas) for total receptive and expressive total vocabulary were .81 and .73, respectively. While the values satisfy the acceptable threshold of Cronbach's alpha, we underscore that total vocabulary scoring is not a standardized procedure, as total vocabulary scores were generated in our exploratory effort to understand bilingually driven scoring methods for DLLs' vocabulary knowledge. Given that existing empirical data on total vocabulary is mostly focused on the early childhood level (e.g., Core et al., 2013;Mancilla-Martinez & Vagh, 2013), our exploration of total vocabulary as a contributor to school-age DLLs' English RC achievement may offer valuable insight into bilingual vocabulary knowledge and its relation to reading achievement of this fast-growing student population.

English word reading
We used the Test of Word Reading Efficiency-2 (TOWRE-2; Torgesen et al., 2012) to measure students' English word reading. TOWRE-2 is comprised of two subteststhe Sight Word Efficiency and Phonemic Decoding Efficiency-that tap real-world reading and non-word reading, respectively. In the current study, raw scores from the two subtests were combined to gauge DLLs' overall English word reading skills. Reported reliability coefficients range from .95 to .96 for the average age ranges for second and fourth graders. To note, given the English-only instructional context of our participants and the well-documented robust and positive relation between English and Spanish word reading skills (e.g., Cárdenas-Hagan et al., 2007; Melby-Lervåg & Lervåg, 2011), we did not assess students' Spanish word reading skills.

Analytic plan
To answer our research question on the predictive roles of monolingually scored vocabulary and bilingually scored on Spanish-English DLLs' English RC, we conducted ordinary least squares (OLS) regression analyses using SAS PROC MIXED with full maximum likelihood estimation to account for missing data. We created four OLS regression models for each vocabulary conceptualization (i.e., English-only, Spanish-only, conceptual, and total). In addition to grade level, we controlled for English word reading given the SVR framework (Hoover & Gough, 1990;Joshi & Aaron, 2000).

Descriptive analyses
Tables 1 and 2 present the descriptive findings of all student-level variables in our study. We identified two outliers in the English-only expressive vocabulary data  (i.e., Picture Vocabulary subtest of the WMLS-III) and the Spanish-only expressive vocabulary data (i.e., Vocabulario de Dibujos subtest of the WMLS-III), which represented less than 4% of the data for both variables. Outliers for each variable were transformed using winsorization (Ludwig-Mayerhofer, 2020). We then checked for normality of distribution for all variables using a Shapiro-Wilk test, which showed that all variables were normally distributed. Table 1 presents the correlations among English RC, vocabulary (i.e., Englishonly, Spanish-only, conceptual, and total), and English word reading. There were positive and moderate to high correlations between English RC and all other measures. There were also positive within-language vocabulary correlations between the receptive and expressive domains. The bilingually scored vocabulary measures revealed positive and moderate to large correlations with the monolingual vocabulary measures, except for conceptually scored vocabulary and Spanish-only expressive vocabulary. Total vocabulary showed positive and moderate to large correlations with other vocabulary measures. Table 2 presents the means and standard deviations of DLLs' English RC, vocabulary, and English word reading scores for the total sample and by grade level. Raw scores are reported for all assessments, except for English RC, and standardized scores (e.g., standard scores, scaled scores, and Rasch unit scores) are also reported, except for total vocabulary. For each grade, DLLs' English RC scores fell below the national average, compared to the MAP Reading national norms (NWEA, 2020). Using Cohen's (1992) conventions (i.e., .2 = small effect, .5 = medium effect, greater than .8 = large effect), we found a moderate difference for second graders (n = 34, Cohen's d = .48) and a small difference for fourth graders (n = 25, Cohen's d = .26) compared to the grade-specific national average. Furthermore, DLLs in both grade levels performed below the standard mean of 100 on the monolingual vocabulary assessments (i.e., English-only and Spanish-only). On average, DLLs performed better receptively than expressively. Monolingual receptive vocabulary scores fell approximately one standard deviation below the national average, but their monolingual expressive vocabulary scores fell two to three standard deviations below the national norms. Specifically, DLLs performed well below the average on Spanish-only expressive vocabulary (i.e., over three standard deviations) and demonstrated slightly better, yet still low, Englishonly expressive vocabulary (i.e., within two standard deviations). In stark contrast, for both grade levels, DLLs' performance on the receptive and expressive bilingually scored conceptual vocabulary was within the average to above-average range. Finally, on average, English word reading scores fell within the national average.

Roles of vocabulary conceptualizations
To address our research question on how the various conceptualizations of DLLs' vocabulary knowledge predict their English RC above and beyond overall English word reading, we developed four OLS regression models for each vocabulary conceptualization (i.e., English-only, Spanish-only, conceptual, and total; see Table 3). Again, raw scores were used to allow comparisons with the researchergenerated total vocabulary scores due to the absence of normative scores.
In addition to English word reading, grade level was included as a control variable given the significant-and expected-difference in vocabulary achievement between grade levels (see Table 3). All reported coefficients are standardized estimates. That is, the coefficients for grade reflect its effect size on English RC, accounting for receptive vocabulary, expressive vocabulary, and English word reading scores under each vocabulary conceptualization. By accounting for English word reading and grade level across all four models, we aimed to specifically examine the extent to which receptive and expressive knowledge within each vocabulary conceptualization predicts DLLs' English RC.
As shown in Table 3, expressive, but not receptive, vocabulary consistently predicted English RC across Models 1, 3, and 4. The only exception was Model 2, for Spanish-only vocabulary, in which none of the vocabulary predictors (receptive and expressive) were significant. Given the robust role of expressive vocabulary, we then tested subsequent models utilizing only expressive vocabulary as a predictor of English RC (see Table 4, Models 1 and 2). Models 1 and 2 both included monolingually scored expressive vocabulary (i.e., English-only and Spanish-only) as predictors and controlled for English word reading and grade level, but they differed in their separate attention to either conceptual (Model 1) or total (Model 2) expressive vocabulary. This approach allowed us to contrast the predictive roles of the two bilingually scored approaches. English-only expressive vocabulary positively and significantly predicted English RC, for both the conceptual (β = .18, p = .02) and total vocabulary models (β = .17, p = .03), while Spanish-only expressive vocabulary did not.
Interestingly, only total expressive vocabulary emerged as significant (β = .21, p = .03)-although with a small effect size-above the influence of English-only expressive vocabulary while conceptual expressive did not (see Table 4, Model 2). Across all models (in both Tables 3 and 4), the R 2 values ranged from Note. N = 60 using multiple imputation. OLS = Ordinary least squares. Standard error is in parentheses. *p < .05. **p < .01. ***p < .001.
.73 to .79, which indicates that 73-79% of the variance in DLLs' English RC were explained by vocabulary, English word reading, and grade level. Furthermore, including total expressive vocabulary while controlling for English-only expressive vocabulary did not significantly explain additional variance. However, given that the total expressive vocabulary and English-only expressive vocabulary were moderately correlated (r = .55) and that total vocabulary does tap English vocabulary knowledge, this was not unexpected. Despite the lack of additional variance explained by total expressive vocabulary (Model 2 in Table 4), the results-total expressive vocabulary and English-only expressive vocabulary predicting English RC-are noteworthy given our small sample of participants.

Discussion
The large and growing representation of Spanish-English DLLs in schools across the nation underscores the need for accurate understanding of DLLs' language and reading skills. We compared various conceptualizations of Note. N = 60 using multiple imputation. OLS = Ordinary least squares. Standard error is in parentheses. *p < .05. **p < .001.
school-age Spanish-English DLLs' vocabulary knowledge-attending to current understandings of the bilingual mental lexicon based on Dong and colleagues' (2005) shared (distributed) asymmetrical model-and explored the utility of bilingually scored vocabulary methods as predictors of English RC within the SVR framework (Gough & Tunmer, 1986). Two key findings emerged. Aligned with growing research evidence (e.g., Goodrich & Namkung, 2019; Hwang et al., 2020), the expressive vocabulary domain emerged as a significant predictor of DLLs' English RC. More uniquely, bilingually scored vocabulary indeed contributed to English RC. We discuss our findings from theoretical and practical perspectives below.

Bilingual vocabulary assessment approaches have theoretical and practical value
Our results show that bilingual approaches to vocabulary assessment provide a more comprehensive understanding of DLLs' language skills, compared to monolingual approaches that remain ubiquitous in U.S. schools. Thus, our findings support the tenets of the shared (distributed) asymmetrical model (Dong et al., 2005), which suggests both shared storage for conceptual representations and asymmetrical links.
Our results build on previous work that reveals school-age Spanish-English DLLs' varied response patterns to vocabulary items (e.g., some vocabulary knowledge is shared between Spanish and English while some vocabulary knowledge is unique in each language) depending on whether monolingual or bilingual-scored approaches are used (Oh & Mancilla-Martinez, 2021). In fact, to a similar extent as English-only vocabulary scores, bilingually scored vocabulary scores were positive and significant predictors of English RC. This finding is noteworthy and supports theoretically based rationale for utilizing bilingually scored vocabulary assessment approaches with DLLs. Results based on bilingual scoring of vocabulary knowledge reveal that the linguistic knowledge of DLLs in our study is on par with, and even above, national norms (see Table 2). These results noticeably differ from findings based on monolingual vocabulary measures (i.e., English-only and Spanish-only). The distinct vocabulary profiles from comparing DLLs' monolingually scored versus bilingually scored vocabulary underscore the promise of the latter to describe DLLs' vocabulary skills in a more comprehensive and accurate way. This information, in and of itself, can pave the way for instructional efforts aimed at preventing the watering-down of curriculum, which risks overly simplifying classroom content for DLLs (Kibler et al., 2015). In other words, DLLs' academic capabilities might be underestimated if educators only have access to results from monolingual vocabulary measures. This may lead to reliance on simplified texts with high-frequency vocabulary words and instructional focus on basic English skills regardless of DLLs' comprehensive vocabulary knowledge (as indicated by conceptual or total vocabulary results) and learning potential.
When working with DLLs, educators must be cognizant of the fact that reliance on monolingual measures will likely reveal only partial information of DLLs' linguistic abilities. However, this does not imply that the instructional language must be changed and such a recommendation is beyond the scope of this study. The predominant profile of the U.S. teacher workforce remains English monolingual (Williams et al., 2016), but our results do suggest the value of accounting for DLLs' linguistic resources in both their home language and English, if feasible. At minimum, our results point to the importance of acknowledging the limitations of relying on monolingual measures. Teachers, including English monolingual teachers, can nonetheless facilitate Spanish-English DLLs' word knowledge by providing English labels for known concepts in the home language, ensuring adequate instructional scaffolding (e.g., modeling language), or engaging students in metalinguistic activities that highlight similarities and/or differences between English and Spanish.
We also note that, although DLLs in this study were developing both English and Spanish, their performance on the Spanish-only vocabulary assessments suggested that their Spanish skills were not at age-based norms. This finding may be partially attributable to their English-only instructional environment, a context that may have limited the extent to which their Spanish skills could influence their English RC outcomes (Proctor et al., 2005). Indeed, DLLs' Spanish-only receptive and expressive vocabulary skills did not predict English RC. This finding not only converges with previous evidence on the inconclusive influence of Spanish-only vocabulary on Spanish-English DLLs' English outcomes (e.g., Mancilla-Martinez & Lesaux, 2017;Proctor et al., 2012) but also underscores the value of bilingually scored assessments. That is, reliance on Spanish-only assessments might have led to the erroneous conclusion that it is unnecessary to account for Spanish-English DLLs' home language. As DLLs' performance on the bilingually driven vocabulary assessments revealed, this was not the case. An important consideration is that Spanish-only vocabulary assessments are designed for and normed on Spanish monolinguals, while Spanish-English DLLs in the U.S. are not Spanish monolinguals. The bilingually scored measures we used, in contrast, were designed for and normed on Spanish-English bilinguals in the U.S.

Expressive vocabulary as a predictor of English reading comprehension
Consistent with the SVR framework (Gough & Tunmer, 1986), both vocabulary and word reading predicted school-age Spanish-English DLLs' English RC. While DLLs' vocabulary knowledge was the focus of our study, as expected, English word reading emerged as a robust predictor of English RC (Mancilla-Martinez & Lesaux, 2017;Vellutino et al., 2007). For vocabulary knowledge, we tested whether one or both vocabulary domains (i.e., receptive, expressive) predict DLLs' English RC. In contrast to previous studies that have found both domains to be predictive (e.g., Catts et al., 2014;Foorman et al., 2018;Ouellette, 2006), we found a differential influence that has implications for informing future studies that seek parsimonious models of English RC for DLLs. Our sample of DLLs performed better receptively than expressively on all vocabulary measures, a finding that is consistent with a recent longitudinal study . However, only expressive vocabulary (except for Spanish-only vocabulary) emerged as a significant predictor of DLLs' English RC, similar to recent research on the predictive role of expressive vocabulary on Spanish-English DLLs' language and literacy outcomes Ribot et al., 2018). We speculate that when receptive vocabulary knowledge is in competition with expressive vocabulary knowledge in predicting English RC, the former may become a weaker predictor. Expressive vocabulary knowledge requires students to retrieve semantic knowledge (i.e., by accessing the meaning of the target word) and phonological knowledge (i.e., by pronouncing the retrieved target word), which are two elements that have long been identified as key components of word learning and significant predictors of language and literacy outcomes (Carlo et al., 2004;Wise et al., 2007). Existing evidence likewise points to a differential influence of receptive and expressive vocabulary on reading skills (Wise et al., 2007). Thus, attending to DLLs' expressive vocabulary may have practical value given the time constraints of testing in real school settings and its strong relation to English RC. The prominent role of expressive vocabulary knowledge on DLLs' English RC-despite relatively lower scores in expressive vocabulary than receptive vocabulary-underscores the importance for DLLs to have ample explicit opportunities to use language in the classroom.
Finally, our findings based on use of bilingually scored expressive vocabulary knowledge contribute to and expand how the field might conceptualize the language comprehension component of the SVR framework (Hoover & Gough, 1990) for DLL students. The seminal study on the SVR by Hoover and Gough was effectively conducted with a sample of elementary-age Spanish-English DLLs in Texas, which is a lesser-known fact about the beginnings of the SVR framework. Our findings not only support the validity of the SVR as a model for understanding RC contributors among DLLs but also specify how the language comprehension component within the SVR framework could be assessed and conceptualized for this growing student population.
Given previous findings on the predictive role of conceptual vocabulary on Spanish-English DLLs' English RC (e.g., Hwang et al., 2020), it was somewhat unexpected to find that expressive conceptual vocabulary was not a significant predictor of English RC. However, expressive total vocabulary was a significant predictor, even though both bilingually scored approaches accounted for Spanish and English. To understand these intriguing results, we turn to empirical work on word learning and development and offer three potential explanations. First, while both expressive conceptual and total vocabulary reflect the concept (i.e., semantic knowledge) and form (i.e., phonological knowledge) of a given word, expressive total vocabulary does so to a greater extent, by prompting students in both languages for every item of the bilingually normed assessments. Because expressive total vocabulary taps the ability to access both semantic knowledge and phonological representations to a greater extent than conceptual total vocabulary does, it may be more strongly related to English RC achievement. Second, the predictive role of expressive total vocabulary highlights the link between word retrieval (i.e., production) and lexical quality representation. Existing research suggests that efficient word retrieval (i.e., skilled ability to produce words in response during an assessment) might be associated with a higher quality of linguistic representations that learners have (Newman & German, 2002;Ouellette, 2006). Likewise, Oller et al. (2007) describe receptive vocabulary as a passive skill compared to expressive vocabulary, given that expressive vocabulary knowledge places an increased demand to not only comprehend a word but to comprehend and retrieve it during the assessment. Theoretical evidence similarly links language production (i.e., expressive domain) to language knowledge construction and consequently RC (Swain, 1985). This relation between word production to overall language knowledge leads to the last explanation for our findings. Given that vocabulary development involves conceptual development that builds and expands background knowledge (Mancilla-Martinez & McClain, 2018), RC research finds vocabulary knowledge as an adequate proxy for readers' background knowledge (Perfetti, 1998), another well-documented predictor of RC (Kintsch & van Dijk, 1978). Therefore, it is possible that expressive total vocabulary elicits even more background knowledge (i.e., by prompting in both languages for every item) compared to expressive conceptual vocabulary, thus providing a more comprehensive reflection of DLLs' language knowledge.

Limitations and future directions
There are limitations of this study worth noting. First, given the exploratory nature of the study, our sample size was small (N = 60) and bilingual vocabulary scoring was limited to a sample of Spanish-English DLLs using the ROWPVT-4: SBE and the EOWPVT-4: SBE. Due to the small sample size, we could not test for interactions with English word reading and vocabulary knowledge-as hypothesized under the SVR (Gough & Tunmer, 1986). Although we did control for grade level to account for potential developmental differences, a larger sample would help detect more detailed effects given the developmental realities of the student population. Hence, our findings on the predictive role of expressive vocabulary on English RC should be interpreted with recognition of these limitations. Our participants were also heterogeneous in that they were from different grade levels, had varying levels of English proficiency (e.g., some were identified as English learners, some were reclassified as being English-proficient), and a small percentage of our participants were receiving special education services. Hence, our findings and practical implications should be interpreted with caution, and this study would need to be replicated with larger samples of DLLs.
Despite limitations regarding our sample, we were able to examine the differential roles of various vocabulary conceptualizations on Spanish-English DLLs' English RC. Our findings suggest that the use of bilingual scoring can be beneficial for understanding DLLs' linguistic knowledge. Second, our study participants were all recruited from English-only instruction public schools. This likely influenced our results, particularly on students' Spanish-only vocabulary scores and their relation to English RC. Future studies that include a more diverse sample of Spanish-English DLLs in different instructional contexts (e.g., bilingual programs, dual language immersion programs) would be needed to better understand the predictors of English RC for this student population. Third, we employed total scoring using standardized conceptual vocabulary measures as, to our knowledge, there is no standardized vocabulary measure that can be used to assess Spanish-English DLLs' total vocabulary knowledge. Therefore, standard or scaled scores that would allow us to compare our participants' scores to the national norm were not available. Designing and developing a systematic assessment to measure DLLs' bilingual vocabulary knowledge comprehensively and equitably is warranted. This would allow researchers and educators to examine DLLs' total vocabulary knowledge and track lexical growth of DLLs in a way that is less limited by relying on monolingual or conceptual vocabulary assessments.
Despite these limitations, our study contributes to the limited body of research on Spanish-English DLLs in new immigration destination states. Given that school is one of the primary sources of English linguistic input for DLLs (Bowers & Vasilyeva, 2011), our findings underscore the importance of linguistically rich instruction (e.g., opportunities to use language). Finally, results also demonstrate the benefits of bilingual scoring. It helps capture DLLs' comprehensive linguistic knowledge that could be used in classroom instruction and to build asset-driven perspectives towards helping DLLs grow into successful and independent readers.
Conflict of interest. The authors declare none.