Geographic variability in limited English proficiency: A cross-cultural study of cognitive profiles

Abstract Objective: This study was designed to evaluate the effect of limited English proficiency (LEP) on neurocognitive profiles. Method: Romanian (LEP-RO; n = 59) and Arabic (LEP-AR; n = 30) native speakers were compared to Canadian native speakers of English (NSE; n = 24) on a strategically selected battery of neuropsychological tests. Results: As predicted, participants with LEP demonstrated significantly lower performance on tests with high verbal mediation relative to US norms and the NSE sample (large effects). In contrast, several tests with low verbal mediation were robust to LEP. However, clinically relevant deviations from this general pattern were observed. The level of English proficiency varied significantly within the LEP-RO and was associated with a predictable performance pattern on tests with high verbal mediation. Conclusions: The heterogeneity in cognitive profiles among individuals with LEP challenges the notion that LEP status is a unitary construct. The level of verbal mediation is an imperfect predictor of the performance of LEP examinees during neuropsychological testing. Several commonly used measures were identified that are robust to the deleterious effects of LEP. Administering tests in the examinee’s native language may not be the optimal solution to contain the confounding effect of LEP in cognitive evaluations.


Introduction
Most neurocognitive tests have been developed in North America and normed on native English speakers (NSEs).Normative systems typically focus on age, education, gender, or race (Abeare et al., 2019;Heaton et al., 2004Heaton et al., , 2009) ) and tend to ignore variability in language proficiency (Gasquoine et al., 2007;Gasquoine, 1999).Limited English proficiency (LEP) refers to a continuum of deficits in phonology (systematic phoneme substitutions characteristic of foreign accents), lexicon (limited vocabulary and speed of retrieval), and syntax (deviation from grammatical rules) attributable to late-life language acquisition (i.e., outside the sensitive period) in the context of normal verbal skills in the individual's mother tongue.In other words, LEP is a learned deficit reflecting a delay in exposure to English.
Recent research demonstrated that LEP can be a significant confound in test result interpretation even in cognitively highfunctioning examinees (Ali, Brantuo, et al., 2022;Erdodi et al., 2017a).Consequently, existing norms may not apply to individuals with LEP (Celik et al., 2020;Funes et al., 2016;Gasquoine & Gonzales, 2012), as they systematically underestimate verbal skills in generalwhile perhaps providing an accurate measure of English proficiency.As the world grows diverse due to migration and the percentage of bilinguals increases both in Europe and the USA (Eurostat, 2018;Ryan, 2013), so do the chances of encountering patients with LEP in clinical settings.
Therefore, understanding the impact of LEP on cognitive testing is of immediate practical interest.
Recent reviews (Antoniou, 2019;Celik et al., 2020) have outlined bilinguals' advantages and disadvantages in different cognitive tasks.The tasks' level of verbal mediation further complicates the interpretation of cognitive profiles associated with LEP.Verbal mediation has recently been referenced in LEP research to classify neuropsychological instruments based on the extent to which intact language skills and/or native-level proficiency in the language of administration is required for the test to provide a valid measure of its target construct (Brantuo et al., 2022).Throughout this paper, we refer to "verbal" tests as having high verbal mediation, indicating that verbal skills are central to optimal performance.In contrast, we refer to "non-verbal" tests (i.e., tasks designed to measure visual-perceptual skills; Gasquoine et al., 2007) as having low verbal mediation.
Studies comparing NSE and LEP groups on tests administered in English have yielded contradictory results (Boone et al., 2007;Gasquoine et al., 2007;Kisser et al., 2012).On the one hand, there are reports of NSE performing better on verbal but not nonverbal tasks, such as tests of visuospatial abilities (Boone et al., 2007;Kisser et al., 2012).On the other hand, significant language administration effects in Spanish-English bilingual groups were documented for some (e.g., letter fluency, Stroop Color and Word trials) but not other verbal tasks (i.e., verbal learning, Digit Span; Gasquoine et al., 2007).
Theoretically, nonverbal tests should be immune to LEP.Indeed, NSE norms for certain visuospatial measures can be applied to Spanish-speaking LEP samples without increasing false-positive rates (Gasquoine & Gonzales, 2012;Gasquoine et al., 2007).Similarly, Walker et al. (2010) found no differences between NSE and LEP participants with different English proficiency levels on several tests (e.g., Digit Symbol, Matrix Reasoning).However, Funes et al. (2016) demonstrated that administering tests in English to Spanish-speaking participants may overestimate deficits even on non-verbal tasks (e.g., Digit-Symbol Coding, Block Design).
Such divergent findings raise several questions about the effect of English proficiency on neuropsychological testing: which cognitive tasks are most affected by LEP?Is the neurocognitive profile associated with LEP more complex than a predictable pattern of deficits based on the level of verbal mediation?Are there meaningful subtypes within LEP?This study was designed to provide tentative answers to these questions.Since most prior research on LEP has been based on US Spanish-English bilinguals, we recruited two geographically and linguistically diverse LEP samples to test the limits of generalizability.
These two bilingual samples (Arabic-dominant students from Canada and Romanian-dominant students from Romania) were recruited to examine the geographic, cultural, and linguistic variability in cognitive profiles associated with LEP.Their main shared commonality is their non-NSE status.In contrast, the differences between them are significant and multifactorial: different native languages (Romanian versus Arabic), writing systems (26letter Latin alphabet versus the abjad), directions of writing/reading (left-to-right versus right-to-left), educational systems, the broader cultural context (Central Europe versus the Middle East) and cultural identity (Romanian versus Arabic Canadians), the relative homogeneity within the groups and immigration status (all Romanian participants were born and raised in Romania and recruited from a single university whereas the Arabic participants immigrated from various countries) could potentially influence performance on neuropsychological testing.Therefore, comparing the Romanian and Arabic samples provided a robust method for examining whether LEP should be considered a unitary or a heterogeneous construct.
We made the following predictions: (1) All participants with LEP would perform worse than NSEs and below the US normative mean on verbal tests; (2) There would be no difference between NSE and LEP on nonverbal tests; (3) Within participants with LEP, performance on verbal tests would differ as a function of the relative level of English proficiency.

Participants
Data were collected from 113 cognitively healthy university students (98 women; M Age = 22.7; SD = 5.6; M Education = 14.2;SD = 2.0).Participants were recruited from two countries (the Western region of Romania and South-Central Canada) and divided into three samples: Romanian-English bilinguals with LEP (n = 59; LEP-RO), Arabic-English bilinguals with LEP (n = 30; LEP-AR) from Canada, and Canadian NSEs (n = 24).The LEP-RO group was established by default: all participants grew up in a non-English-speaking country and learned English later in life.LEP-AR was psychometrically operationalized: a BNT-15 score of ≤11 was requireda level of performance highly specific to LEP status (Ali, Elliott, et al., 2022;Brantuo et al., 2022).The NSE sample included participants born and raised in an English-speaking part of Canada.
To control for noncredible responding as a confound (Abeare et al., 2021), only participants who passed the first trial of the Test of Memory Malingering (i.e., scored >43 on the TOMM-1; Crișan & Erdodi, 2022;Erdodi, 2022;Jones, 2013;Kulas et al., 2014;Rai & Erdodi, 2021) were included in the study.Six participants from LEP-RO and four from LEP-AR were excluded based on their TOMM-1 scores.All NSEs scored above the cutoff and were retained in the study.No participant reported any neurological or neuropsychological condition associated with cognitive impairment.The three samples were similar in age and gender.LEP-RO participants had higher levels of education than NSEs (Table 1).
The EWFT instructs examinees to generate as many emotion words as possible within 1 minute.The initial validation study placed the normative output (raw score) in Canadian university students between 10.6 (SD = 3.3) and 11.4 words (SD = 3.3; Abeare et al., 2017).Subsequent research reported slightly higher but more variable performance in cognitively healthy students (M = 13.3,SD = 3.3) and slightly lower scores in clinical patients (M = 9.9, SD = 4.4; Abeare An et al., 2022).
Age-corrected scaled scores (ACSSs) for the D-KEFS, HVLT-R, Digit Span, and CD were derived from norms published in the Technical Manuals.Demographically adjusted T-scores for TMT and animal fluency were determined using norms published by Heaton et al. (2004).Although norms developed on and for NSEs in the USA cannot be assumed to be the appropriate reference group for examinees with LEP in the USA, Canada, or other countries, these are the normative data most likely available to clinicians when assessing LEP examinees.Therefore, an empirical evaluation of the extent to which widely used norms may or may not be appropriate for such individuals is directly relevant to North American neuropsychologists.

Procedure
Participants were recruited as volunteers in a study on cognitive performance and received extra credit for their time.Tests were administered face-to-face individually in quiet rooms by bilingual research assistants with a Bachelor's degree in psychology, relevant coursework in psychometrics, and specialized training and ongoing supervision received by the first and last authors in administering and scoring the employed battery.Research assistants in Romania and Canada followed the same standardized procedure developed by test publishers during administration and scoring.All tests were administered in English, following standard protocols.In addition, animal fluency and EWFT were administered in both languages only in the LEP samples to directly evaluate the effect of language of administration (native versus English).All data collection, storage, and processing were done with the approval of relevant institutional authorities regulating research involving human participants, in compliance with the 1964 Helsinki Declaration and its subsequent amendments or comparable ethical standards.

Data analysis
Descriptive statistics (percentage, M, SD) for each group were reported as relevant.The main inferential statistics evaluating the significance of between-group differences were one-way ANOVAs, chi-square, and independent (Welch's) and within-sample t-tests (all contrasts were two-tailed).Post hoc contrasts were performed using the Games-Howell test to control the familywise error rate and protect against alpha inflation.Effect size estimates were expressed in Hedge's g (with corresponding 95% CIs) and partial eta squared (η p 2 ).

Results
A large main effect on Digit Span ACSS and a medium effect on longest Digit Span backward were driven by the below-average score of LEP-RO.No difference was noted on longest Digit Span forward (Table 2).There was a small-medium main effect on CD caused by the above-average performance of NSE participants.An extremely large effect emerged on TMT-A, driven by the unusually low score of the LEP-RO sample.The performance gap between groups narrowed on the TMT-B but remained significant.A large effect emerged on the TMT B/A raw score ratio, driven by low scores of the LEP-RO sample (indicating better cognitive flexibility relative to visuomotor sequencing speed).A very large main effect was observed on the Color Naming subtest of the D-KEFS, reflecting a linear increase in performance from LEP-RO through LEP-AR to NSE.The contrasts on the Word Reading and Stroop subtests of the D-KEFS were not significant.Figure 1 displays the between-group trends on the three trials of the D-KEFS.
A very large main effect re-emerged on animal fluency in English, driven by the normative performance of the NSE sample relative to the mean score in the impaired and borderline range, respectively, of the LEP samples.The contrast between the two LEP samples on EWFT approached significance (medium effect).When we compared performances of the two LEP samples on animal fluency and EWFT administered in their native language (Romanian and Arabic), extremely large effects emerged for both measures.
Finally, within-sample t-tests revealed a significantly higher performance in raw scores on animal fluency [t(58) = 12.9, Significant main effects emerged on all three individual acquisition trials of the HVLT-R, although the magnitude of the difference declined gradually with each subsequent trial (from large to medium effects).However, a large effect re-emerged on the sum of Trials 1-3 (Table 3).There was a very large effect on delayed free recall.Although the ANOVA remained significant on recognition performance, the effect size was notably smaller (medium) on raw scores.Once age correction was applied (T-scores), betweengroup differences disappeared.All contrasts above were driven by the notably lower performance of the LEP-AR sample.Although the main effect on the FCR trial was significant, this likely reflects the mathematical artifact of very low SDs, as all three samples performed near the ceiling (i.e., a score of 12.0).Figure 2 provides a visual summary of the between-group patterns of auditory verbal learning performance.Given the prominence of North American normative systems, one-sample t-tests were computed for each of the samples against US norms (Table 4).The LEP-RO performed significantly below the normative mean on Digit Span (large effect), TMT A & B (very large and large effects), animals (very large effect), HVLT-R (medium effects), D-KEFS Color Naming (large effect) and Word Reading (small-medium effect), showing no difference on CD and Stroop.The LEP-AR performed significantly below the normative mean on Digit Span (large effect), TMT A & B (large effects), animals (very large effect), HVLT-R (small to very large effects), and the Color Naming (medium effect) subtest, with no difference on Digit Span, CD, and Word Reading or Stroop subtests.The NSE sample performed above the normative mean on CD and Stroop (medium effects) and below the normative mean on the acquisition trials of the HVLT-R (medium effect).
Since a BNT-15 score ≤ 11 has been proposed as a psychometric marker of LEP (Ali, Elliott, et al., 2022;Brantuo et al., 2022;Erdodi et al., 2017a), whereas a score of 12 has been identified as the low end of intact performance among NSEs (Abeare et al., 2022), two subgroups were created first within the LEP-RO sample along this cutoff.Participants with BNT-15 ≤ 11 scored significantly lower than those with BNT-15 ≥ 12 on animal fluency in both languages (despite smaller effects during the Romanian administration) and the English administration of the EWFT (large effect).Similarly, large effects emerged on the time-to-completion of both the Yes/No and the FCR recognition trials of the HVLT-R (Table 5).To control for the method variance in selecting participants for the LEP-RO (by default) and the LEP-AR (BNT-15 ≤ 11) samples, the main contrasts were re-computed after Romanian participants with BNT-15 scores >11 were excluded.This change in the composition of the LEP-RO sample ensured that the two groups had comparable levels of English proficiency.The overall pattern of positive and negative findings captured in Tables 2 and 3 was preserved after equalizing the groups (Table 6).
Finally, to investigate whether there is an incremental loss in performance on cognitive tests as a function of decreasing English proficiency, test scores were compared across five BNT-15 scores: 11, 10, 9, 8, and ≤7 using a series of one-way ANOVAs (Table 7).Only two significant main effects emerged: on CD (η p 2 = .205,large) and animal fluency T-scores (η p 2 = .149,large).Examining the pattern of CD scores revealed that the finding was driven by the combination of an isolated high average range mean associated with BNT-15 = 11 (12.1)compared to a narrow (average) range performance (M = 9.5-9.8)at the other four levels of BNT-15 and low variability (SD = 1.9-2.5).However, a linear decline in animal fluency T-scores was observed, from M = 35.6 at BNT-15 = 11 to M = 24.8 at BNT-15 ≤ 7.

When developmental history and cognitive profile collide: a case study
Although learning a language outside the sensitive period (age > 15) is commonly considered a developmental marker of LEP (Johnson & Newport, 1989;Lenneberg, 1967;Sakai, 2005), individual variability in language acquisition results in notable exceptions from this principle.To illustrate this, we present psychometric data from a a 47-year-old right-handed female patient with 16 years of education referred to the senior author's private practice for assessment following an uncomplicated mild traumatic brain injury.She grew up speaking Russian, immigrated to Canada at age 18, and obtained a bachelor's degree.By history, she would be classified as LEP.However, she had no obvious accent when speaking English and obtained the following scores on verbal neuropsychological tests: BNT-15 = 14 (the mean of the NSE sample in the present study was 14.1 and 13.5 in the most recently published norms; Abeare et al., 2022); Complex Ideational Material = 12 (perfect score); letter and animal fluency T = 61; California Verbal Learning Test acquisition trials raw score = 66/80 (T = 69), longdelay free recall raw score = 4/16 (z-score = 1.0);Similarities ACSS = 16, Vocabulary ACSS = 19 (Verbal Comprehension Index = 150).Based on her cognitive profile, her neuropsychological functioning better matches an NSE's.

Discussion
This study was designed to investigate geographic differences in cognitive profiles associated with LEP and compare them to norms developed on and for NSEs.To this end, two different LEP samples were recruited (Romanian and Arabic Canadian students), and their cognitive profiles were compared to NSE norms and a student sample of Canadian NSEs.We predicted that LEP participants would perform worse than NSEs and below the normative mean on verbal tests; no difference between NSE and LEP on nonverbal tests; and that performance on verbal tests would differ based on English proficiency levels within the LEP sample.Results generally supported the first hypothesis, with several notable exceptions: the LEP-RO and LEP-AR samples demonstrated a unique pattern of strengths and weaknesses that defies a unifying interpretation.The support for the second hypothesis was mixed due to the divergent performance between the two LEP samples.The third hypothesis was only supported in the verbal fluency tests and the HVLT-R time-to-completion metrics.
Results are broadly consistent with previous research on the deleterious effect of LEP on performance during verbal tasks  ( Bialystok et al., 2008Bialystok et al., , 2009;;Boone et al., 2007;Coderre et al., 2013;Kisser et al., 2012;Mattys et al., 2017;Rivera Mindt et al., 2008;Walker et al., 2010).Previous reports of the heightened sensitivity of the D-KEFS Color Naming to LEP relative to Word Reading were replicated (Brantuo et al., 2022), with one caveat: LEP-RO continued to improve on the Stroop task, whereas performances of LEP-AR declined.Consistent with existing research (Brantuo et al., 2022;Erdodi et al., 2017a), animal fluency was very sensitive to LEP, as evidenced by a mean performance of 1.5-2 SDs below the normative mean.Consistent with previous reports (Wauters & Marquardt, 2017), the EWFT was less susceptible to the administration language than animal fluency, although both the magnitude and the direction of the effect of native versus English administration were different in LEP-RO from LEP-AR.
The performance of the LEP-RO sample improved during the native language compared to the English administration of the animal fluency test.Applying the demographically adjusted norms by Heaton et al. (2004) to raw scores increased their average scores by almost 1.5 SDs.However, the LEP-AR sample demonstrated the opposite pattern: participants performed better during the English administration, resulting in a 1 SD difference.This pattern complicates the interpretation of the results and precludes clear recommendations to assessors in clinical settings.Findings from the LEP-RO sample indicate that scores during the task's standard English administration underestimate semantic fluency skills that could be obtained in their native language by 1-1.5 SDs.Therefore, adjusting the T-score obtained in English by 10-15 T-score points may provide a more accurate estimate of the true cognitive ability of LEP examinees who could not be tested in their native language.
However, findings in the LEP-AR sample suggest that such an adjustment is far from universally applicable.Whether the Heaton norms provide a valid normative comparison for individuals with LEP has yet to be established.Known variability in verbal fluency scores as a function of broader cultural and linguistic variables (Ardila, 2020) suggests that the accurate clinical interpretation of test scores may require a deeper understanding of the complex interactions among the various factors influencing performance on cognitive testing.
Similar to the clinical case study, the LEP-RO sample produced an auditory verbal memory profile that was indistinguishable from that of NSEs, whereas the LEP-AR consistently underperformed the NSE sample.The fact that LEP-AR participants were immersed in an English-speaking language environment, whereas LEP-RO participants lived in a non-English-speaking country, makes this pattern even more difficult to interpret.The most parsimonious explanation seems to be the inclusion criterion of BNT-15 ≤ 11: although needed to ensure that the English-Arabic bilinguals had LEP, it may have inadvertently resulted in oversampling participants from the lower end of the English proficiency continuum.
However, ANOVAs using five levels of the BNT-15 (11, 10, 9, 8, and ≤7) as the independent variable only found two significant  contrasts, indicating that below the LEP cutoff (≤11) BNT-15 scores no longer predict performance on most cognitive tests.Therefore, the unexpectedly high performance of the LEP-RO sample cannot be attributed to 23 of the Romanian participants having scored above this cutoff and, hence, proved superior English proficiency than LEP-AR.Findings on non-verbal tests are less conclusive: although both LEP samples performed close to the normative mean on CD, consistent with previous research (Walker et al., 2010), NSEs scored above it, suggesting that a mild LEP disadvantage persists even in the absence of frank deficits.The outcome on the TMT is puzzling and contradicts previous reports (Boone et al., 2007;Kisser et al., 2012).The LEP-RO sample performed 2 SDs below the normative mean on TMT-A and one SD below on TMT-B.In the context of intact performance on CD and D-KEFS Stroop, these findings are difficult to interpret and serve as an important reminder of the relevance of population-specific norms (Bezdicek et al., 2012(Bezdicek et al., , 2016)).
Assuming normative performance in examinees with LEP on nonverbal tests on rational grounds alone increases the risk of significant errors in the clinical interpretation of scores (Celik et al., 2020;Funes et al., 2016;Gasquoine & Gonzales, 2012).In fact, our results challenge the notion of "LEP profile" as a unitary construct.They suggest that other parameters (geographic location, level of English proficiency, native language, cultural differences in the significance of response speed, etc.) may be equally important factors in understanding the clinical implications of test scores by LEP examinees (Ardila, 2020;Coderre et al., 2013;Durand-Lopez, 2020;Marian et al., 2013;Roselli et al., 2002;Singh & Mishra, 2013;Tse & Altarriba, 2012;Walker et al., 2010).
Separating the LEP-RO sample into high and low English proficiency levels operationalized using BNT-15 scores (Ali, Elliott, et al., 2022;Brantuo et al., 2022) revealed a performance pattern with potential clinical relevance.Although both groups obtained significantly lower scores during the English relative to Romanian administration of animal fluency, participants with BNT-15 ≥ 12 performed consistently better on both administrations.These findings support the use of the BNT-15 as an objective index of English proficiency (Erdodi et al., 2017a) and reveal that BNT-15 scores may tap the broader construct of general verbal skills independent of any specific language, which includes fund of word knowledge and the speed of lexical retrieval.In other words, BNT-15 preserves its original function of measuring cognitive functioning in addition to LEP status.
Finally, a BNT-15 ≤ 11 score was associated with higher timeto-completion on the HVLT-R recognition trials, indicating increased processing demands in participants with lower levels of English proficiency.This finding has implications for both performance validity assessment and academic accommodations for LEP students at English-speaking institutions.Since time-tocompletion often serves as an index of response credibility on word recognition tests generally (Cutler et al., 2022;Erdodi &  Lichtenstein, 2021; Erdodi et al., 2017b;Kim et al., 2010;Lupu et al., 2018) and the HVLT-R specifically (Cutler et al., 2021), assessors should exercise caution before interpreting slow responding on the HVLT-R as evidence of invalid performance in LEP examinees to protect them against increased false positives.In an academic context, extending the time limit on exams may be construed as a reasonable and necessary accommodation for LEP students (Ali, Brantuo, et al., 2022).
It is widely accepted that translating and norming commonly used neuropsychological tests to all languages is not feasible (Franzen et al., 2021).Administering tests in the examinee's native language is often considered the next best solution for neutralizing the effects of LEP (Franzen et al., 2021;Fujii, 2018).However, our results indicate that such an accommodation can have the opposite (i.e., suppressing rather than enhancing) effect.Indeed, while the Romanian administration significantly improved verbal fluency performance in LEP-RO compared to the English administration, the Arabic administration of these tests produced lower scores in LEP-AR compared to the English administration.This finding suggests that administering psychometric tests in the examinee's native language fails to neutralize LEP as a confound and may even inadvertently magnify distortions within the neurocognitive profile, especially in the absence of appropriate norms for many LEP populations.
Results point towards identifying a list of tests that are robust to the variability in the level of English proficiency as the best pragmatic safeguard to LEP status.Within the present study, three such tests emerged as possible "LEP-resistant" candidates: CD, the Word Reading subtest of the D-KEFS, and the EWFT.Agecorrected T-scores for the Yes/No Recognition Discrimination trial of the HVLT-R were also immune to LEP.However, their utility as an overall measure of auditory verbal learning and memory might be limited, considering that the test's key trials remain vulnerable to LEP.
Results should be interpreted in the context of the study's limitations.The most obvious one is the relatively small samples of convenience.In addition, all participants were recruited from two universities, raising questions about the representativeness of the samples.On the one hand, university students may be cognitively higher functioning than the general population.As such, results may not generalize to clinical populations (Braw, 2021).On the other hand, the significant variability in English proficiency within LEP-RO may have masked general trends relevant to cross-cultural neuropsychology.Additionally, several poorly understood cultural and educational differences between samples might have confounded results, especially on verbal fluency tests (Ardila, 2020).In the absence of appropriate norms for individuals with LEP in general (let alone specific cultural/linguistic communities), the clinical interpretation of cognitive profiles in such populations remains uncertain.
The study also has several strengths.It recruited two LEP samples from different countries (indeed, continents) with linguistically and orthographically dissimilar native languages to empirically investigate the variability in cognitive profiles across different LEP subtypes.Such a design enabled several populationand instrument-specific discoveries with potential clinical and cross-cultural relevance.Participants were screened for noncredible responding, a significant source of error variance in academic research on university students (An et al., 2017;Hurtubise et al., 2020;Roye et al., 2019) and even in normative samples (Erdodi & Lichtenstein, 2017).The battery was selected to include a strategic combination of tests with low and high verbal   mediation informed by previous research to further flesh out LEPspecific performance patterns.

Conclusions
Results are broadly consistent with previous research on the deleterious effects of LEP on cognitive profilesespecially on verbal tests.At the same time, findings revealed clinically significant heterogeneity among individuals with LEP, both within and across samples.Therefore, results challenge the notion that LEP status is a unitary construct and emphasize the importance of population-specific research, as findings may not generalize to different groups with LEP (Braw, 2021).Although the BNT-15 proved a valid overall psychometric marker of English proficiency, some of the evidence suggests that it may also capture general verbal/cognitive skills that are not English-specific.Even in the context of high accuracy scores, LEP is associated with slowed processing speed with clear implications for performance validity assessment and eligibility for academic accommodations.Finally, there may be no straightforward definition of LEP status, as individual history of language acquisition and performance-based markers of English proficiency can produce contradictory conclusions (as illustrated by the case study).More research is needed to better understand cognitive profiles associated with LEP and the optimal method for operationalizing the construct itself.

Figure 1 .
Figure 1.Pattern of performance on various trials of the Delis-Kaplan Executive System (D-KEFS) across the three samples.LEP-RO: Romanian Limited English Proficiency sample (n = 59); LEP-AR: Canadian Arabic LEP sample (n = 30); NSE: Canadian native speakers of English (n = 24).Error bars represent the standard error of the mean.

Figure 2 .
Figure 2. Pattern of performance on various trials of the Hopkins Verbal Learning Test -Revised (HVLT-R) across the three samples.LEP-RO: Romanian Limited English Proficiency sample (n = 59); LEP-AR: Canadian Arabic LEP sample (n = 30); NSE: Canadian native speakers of English (n = 24).T: Trial; DFR: Delayed free recall; RD: Recognition discrimination (true positives minus false positives); FCR: Forced Choice Recognition; Error bars represent the standard error of the mean.

Table 1 .
Sample characteristics

Table 2 .
One-way ANOVAs comparing performance across samples on tests of visuomotor speed, attention, and executive function Note.All tests were administered in English unless marked with * (those tests were administered in the native language of the LEP sample); TMT: Trail Making Test; D-KEFS: Delis-Kaplan Executive Systems; EWFT: Emotion Word Fluency Test; Animals: Category fluency; EWFT: Emotion Word Fluency Test; LDF: Longest digit span forward; LDB: Longest digit span backward; COL: Color Naming; WOR: Word Reading; STR: Stroop; ACSS: Age-corrected scaled score (M = 10, SD = 3); T: T-score (M = 50, SD = 10); LEP-RO: Romanian limited English proficiency sample; LEP-AR: Canadian Arabic LEP sample; NSE: Canadian native speakers of English; η p

Table 3 .
One-way ANOVAs comparing performance across samples on the HVLT-R

Table 4 .
One sample t-tests against the normative mean across samples.

Table 5 .
Performance on cognitive tests within the Romanian LEP sample as a function of BNT-15 score
Note.All tests were administered in English; BNT-15: Boston Naming Test -Short Form; TMT: Trail Making Test