Robust reference group normative data for neuropsychological tests accounting for primary language use in Asian American older adults

Abstract Objective: The present study aimed to develop neuropsychological norms for older Asian Americans with English as a primary or secondary language, using data from the National Alzheimer’s Coordinating Center (NACC). Method: A normative sample of Asian American participants was derived from the NACC database using robust criteria: participants were cognitively unimpaired at baseline (i.e., no MCI or dementia) and remained cognitively unimpaired at 1-year follow-up. Clinical and demographic characteristics were compared between Primary and Secondary English speakers using analyses of variance for continuous measures and chi-square tests for categorical variables. Linear regression models compared neuropsychological performance between the groups, adjusting for demographics (age, sex, and education). Regression models were developed for clinical application to compute demographically adjusted z-scores. Results: Secondary English speakers were younger than Primary English speakers (p < .001). There were significant differences between the groups on measures of mental status (Mini-Mental State Examination, p = .002), attention (Trail Making Test A, Digit Span Forward Total Score, p <.001), language (Boston Naming Test, Animal Fluency, Vegetable Fluency, p < .001), and executive function (Trail Making Test B, p = .02). Conclusions: Separate normative data are needed for Primary vs. Secondary English speakers from Asian American backgrounds. We provide normative data on older Asian Americans to enable clinicians to account for English use in the interpretation of neuropsychological assessment scores.


Introduction
There are race-related differences in cognitive performance in later life (Masel et al., 2010;Sloan & Wang, 2005;Zsembik & Peek, 2001).At times, these disparities are attenuated by accounting for demographic characteristics, such as educational attainment (Barnes & Yaffe, 2011), literacy (Manly et al., 2004), reading level (Byrd et al., 2005), socioeconomic status (Schwartz et al., 2004), or health-related factors (Mungas et al., 2009).Thus, researchers have argued that "race" is merely a proxy for such differences (Dotson et al., 2008;Sisco et al., 2015).In the United States, normative standards derived from Non-Hispanic White, English-speaking populations are sometimes applied to ethnically and linguistically diverse examinees.This has resulted in increased rates of diagnostic errors and low-test specificity for such individuals (Byrd and Rivera-Mindt, 2022).
With growing populations of racially and ethnically diverse individuals in the United States, it is a priority in neuropsychology to improve methods for ascertaining the diagnosis of neurocognitive disorders and to reduce rates of over-and underdiagnosis of cognitive impairment.Development of demographically adjusted norms, which account for fundamental sociocultural factors affecting neuropsychological performance, will advance our ability to evaluate individuals across a range of racial and ethnic groups with improved accuracy.Neuropsychological assessments can further benefit from specific norms that take into account demographic factors beyond just race alone, as this only captures one facet of an individual and may serve as a proxy for many other disparate factors such as literacy, language use, and educationrelated factors such as quality of education and educational attainment.In addition, instrument and test bias including the use of Latin alphabet and culturally-biased terminology in stimulus material may influence performance on neuropsychological assessments (Barker-Collo, 2001;Fernández & Abe, 2018).
To this end, there has been a multiplicity of efforts to test new standardization samples consisting of individuals from diverse backgrounds.Some examples in the United States include Mayo's Older African-American Normative Studies MOANS; (Lucas et al., 2005) and the Neuropsychological Norms for the US-Mexico Border Region in Spanish (NP-NUMBRS) Project (Rivera Mindt et al., 2021).Despite these efforts, there remains a dearth of normative data on many racial and ethnic groups, including Asian Americans in particular.Specifically, there is a dearth of robust norms for Asian Americans.Conventional norms are based on individuals studied at a single timepoint (De Santi et al., 2008;Holtzer et al., 2008); however, one limitation of conventional norming is that normative samples may include individuals who are in the preclinical stages of dementia and perform in the normal range on neuropsychological test (Sliwinski et al., 1996).Conversely, robust norming utilizes longitudinal assessment to exclude individuals who develop cognitive impairment at follow-up, thereby excluding individuals in the preclinical stages of disease (Holtzer et al., 2008;Koscik et al., 2014;Sliwinski et al., 2003).To date, few studies have established robust norms for neuropsychological tests in Asian American older adults.
In addition, one factor that has been scantly accounted for in the neuropsychological assessment of individuals from underrepresented and understudied groups is their primary language use.According to the 2019 US Census, the number of individuals who speak a language other than English at home has increased from 23.1 million in 1980 to 67.8 million in 2019, representing a 194% increase (Dietrich & Hernandez, 2022).Of these 67.8 million individuals, there were 3.49 million Chinese speakers, 1.76 million Tagalog speakers, 1.57 million Vietnamese speakers, and 1.08 million Korean speakers (Dietrich & Hernandez, 2022).
The present study sought to address the gap in available neuropsychological tools for Asian American assessment by providing normative neuropsychological data on Asian American individuals drawn from the National Alzheimer's Coordinating Center (NACC).Study aims included examination of whether participants' use of Englishas a primary or secondary languageis an important factor to consider in normative practices.It was hypothesized that use of English as a primary vs. secondary language would be significantly related to neuropsychological performance.Thus, the study also aimed to create normative data accounting for the type of English use (i.e., primary vs. secondary).

Method
The current study was conducted in accordance with the World Medical Association Declaration of Helsinki.

Study population
This study involves secondary analysis of the National Alzheimer's Coordinating Center (NACC) database, obtained using the request form available at https://www.naccdata.org/.Data in the NACC database were from participants recruited at 33 Alzheimer's Disease Centers (ADCs) across the United States between September 2005 and February 2020.Participants underwent the same assessments and were evaluated for incident MCI and dementia at yearly intervals.Data from the first follow-up visits with these participants through February 2021 were included.The present study included 338 participants (Fig. 1) who met the following inclusion criteria at baseline: (1) were aged ≥ 55 years; (2) self-reported race as Asian or Asian American; (3) had at least one follow-up visit; (4) were diagnosed as cognitively healthy at baseline and at the first follow-up visit.We employed a robust norming approach, whereby all participants were cognitively healthy at least two timepoints (Holtzer et al., 2008).All contributing ADCs obtained informed consent from their participants and received approval from local institutional review boards.

Clinical diagnosis
Cognitive status was established based on neuropsychological testing and Clinical Dementia Rating (CDR) score, diagnosed by a single clinician or consensus panel as outlined in the NACC protocol.Normal cognition was established based on neuropsychological testing within normal range and/or global CDR score of 0. Independence in functional abilities, change in cognition, history and objective cognitive assessment were all considerations in determining diagnosis.

Neuropsychological tests
Neuropsychological tests were drawn from the Uniform Data Set versions 1, 2, and 3, and included the Mini-Mental State Exam (MMSE), Wechsler Memory Scale -Revised Logical Memory Story A Immediate Recall (Logical Memory I) and Delayed Recall (Logical Memory II), Wechsler Adult Intelligence Scale-Revised (WAIS-R) Digit Span Forward, Digit Span Backward, Animal Fluency, Vegetable Fluency, Trail Making Test Parts A and B, WAIS-R Digit Symbol, and the Boston Naming Test -30-item version (BNT-30 odd-numbered items).Between the Uniform Data Set Version 2 and Version 3, the tests included in the battery were changed.We examined the data from all versions if the same tests were administered across all versions, and only examined tests from Version 1 and 2 if those tests were discontinued in Version 3. Higher scores indicate better performance on all tests except for the Trail Making Test, for which a higher score indicates longer time to completion and therefore worse performance.

Medical history
Body mass index (weight, height), systolic or diastolic blood pressures, and history of hypertension, diabetes, or depression was determined based on clinical evaluation during study visits.Stroke that affected cognition represents any history of stroke that had a relationship with cognitive impairment.

Statistical analysis
Clinical and demographic characteristics of the study sample were compared using t-tests, analyses of variance for continuous measures and chi-square tests for categorical variables.Where applicable, we also conducted the same comparisons using non-parametric tests such as Mann-Whitney.To examine differences in baseline neuropsychological performance between the participants who used English as a primary language ("Primary English speakers") and those who used it as a secondary language ("Secondary English speakers"), analyses of covariance was applied, adjusting for age, sex and education.We utilized partial eta squared (η 2 p) as a measure of effect size, where 0.01 represents a small effect, 0.06 represents a medium effect and 0.14 represents a large effect size.Mean, standard deviation, median and interquartile range for each cognitive test for Primary and Secondary English speakers were also computed to illustrate differences between the two groups.These analyses were done to determine the need, if any, for separate normative data based on English use.
Furthermore, using baseline test scores, multiple regression equations were developed to estimate the effect of English use (0 = secondary, 1 = primary), age (in years), sex (0 = female, 1 = male), and education (in years) for each neuropsychological test in NACC separately.These equations can be used to obtain demographically adjusted z-scores and corresponding percentiles for tests commonly used in the diagnosis of dementia (Clark et al., 2016;De Santi et al., 2008;Shirk et al., 2011a).For any participant i, where Y 0 = the predicted population mean score for any one test, β oj = random intercept for each test, and β 1 , β 2 , β 3 and β 4 = coefficients corresponding to English use, age, sex, and education.Obtaining z-scores for individual participants will follow the formula: where z = z-score estimate for any one individual's performance on a neuropsychological test, Y = the raw score obtained by this individual on the test, Y 0 = the predicted population mean score, derived from Equation 1 detailed above, and RMSE = root mean square error of the regression equation.The RMSE is the square root of the average squared differences between observed and predicted scores.We also evaluated multicollinearity, normal P-P plot and scatterplots of the residuals for each model to evaluate assumptions.All analyses were performed using R version 3.6.2software and SPSS for Mac OS X version 21.0 (SPSS, Armonk, NY: IBM Corp.).

Clinical and demographic information
Table 1 shows primary language use frequencies in the sample as well as years of education and proportion of males and females among different age categories for each language.Table 2 shows the baseline clinical and demographic characteristics for the NACC-derived Asian American robustly normative sample (i.e., cognitively healthy at baseline and at 1-year follow-up).Secondary English speakers were younger than Primary English speakers (p < .001).However, there were no significant differences in years of education, sex, body mass index scores, systolic or diastolic blood pressures, global CDR scores, or proportions with hypertension, diabetes, or depression within the last 2 ears (all p's > .05).The average length of follow-up was 1.25 years (SD = 0.50).Within our sample, only 2 participants identified as having Hispanic/Latino ethnicity."Other" languages (Table 1) included languages such as Vietnamese, Thai, Tagalog, and

Neuropsychological performance
As shown in To further illustrate these differences, summary statistics including mean, standard deviation, median and interquartile range for Primary English and Secondary English speakers are presented in Table 4. Raw mean scores on all assessments at baseline and follow-up for primary and secondary English speakers are included in the Supplemental Materials (Supplemental Figures 1-13).
Table 5 presents the coefficients with 95% confidence intervals, as well as RMSE values, for our multivariate regression equations.The variance inflation factors were below 2 for all variables in every model.Based on evaluation of normal P-P plots and scatterplot of the residuals for each model, assumptions of linear regression were met and a linear model was deemed most appropriate.
Values from Table 5 can be used for estimating z-scores corresponding to various neuropsychological tests, accounting for English use, age, sex, and years of education.To illustrate the use of Table 5, the predicted mean BNT-30 score for a theoretical population of 70-year-old women with 12 years of education, who are Secondary English speakers (Primary English Use = 0,  Age = 70, Sex = 0, Education = 12), the following variables would be entered into Equation 1 to obtain a predicted BNT-30 total score of 20.65 out of 30 possible points.
To obtain a z-score corresponding to a BNT-30 score of 25 obtained by an individual who is a 70-year-old woman, with 12 years of education and who is a Secondary English speaker, we would then enter Y 0 BNT = 20.65 and the RMSE score for MMSE from Table 5 into Equation 2: This z-score can then be looked up in any number of conversion tables for its corresponding percentile, i.e., 84%.
In contrast, if this same individual was scored using normative data developed using largely Non-Hispanic White, primary English speakers (i.e., NACC norms), (Shirk et al., 2011b), they would receive a z-score of -0.724, i.e., 23%.
Excel files to calculate predicted and z-scores are included in the supplementary material.In addition, bootstrapped coefficients for each regression model are included in the Supplemental Tables (Supplemental Tables 1-13).

Discussion
This study presents normative data for older Asian American individuals using neuropsychological data from the NACC database, which to our knowledge, have not been published elsewhere.Our analysis included 338 individuals between the ages of 55 and 91 who identified as Asian or Asian American and were cognitively healthy at baseline and at first follow-up visit.Our analyses indicated significant neuropsychological differences among primary and secondary English speakers in a robustly normative sample, which consisted of older Asian Americans who were cognitively unimpaired at baseline and after 1-year follow-up.Differences between primary and secondary English language usage were observed on tests of mental status, attention, language   (verbal fluency and naming), and executive function, demonstrating the clear need for normative data to account for how English is used.Given the number of tests and cognitive domains that were influenced by type of English use (primary vs. secondary), regression equations were developed to account for English use, in addition to sex, age, and years of education.These equations may be used by clinicians and researchers who are assessing older Asian Americans to compute standardized scores (e.g., z-scores and percentile ranges) that are easily interpretable.
The regression equations provided by the present study may be of great value to the field.It is noteworthy that neuropsychological testing in older Asian Americans with English as a secondary language may activate multiple languages compared to primary English speakers.Research in bilingualism has elucidated two cognitive mechanisms that cause differences on neuropsychological performance between bilinguals and monolinguals.These mechanisms are (1) interference or competition between languages for use/selection, and (2) lower frequency of language-specific use, since each language is only spoken for some of the time (Rivera Mindt et al., 2008).These mechanisms may explain the robust bilingual disadvantages found on verbal tasks (Bialystok et al., 2008;Gollan et al., 2008;Gollan & Brown, 2006), even when tested solely in their dominant, first-acquired language (Gollan & Acenas, 2004;Ivanova & Costa, 2008).Research has largely shown that both languages in bilinguals are always active.The presence of consistent dual-language activation suggests that bilinguals need to exert a measure of inhibitory control while interacting with/in, and responding to only one language (Green, 1998).
Despite the more taxing cognitive processing that is necessitated, the longer amount of time taken is likely spuriously misinterpreted as slower and therefore poorer performance.The exception would be on measures of cognitive control, in which, unsurprisingly, bilinguals show subtle advantages (Bialystok & Martin, 2004;Bunge et al., 2002).It has been hypothesized that bilingualism may enhance domains such as executive function.However, this remains an area of active study and debate, given that others have argued that the bilingual advantage may not exist (Paap et al., 2015).Attitudes towards time and speed also vary across cultures and may influence performance on timed measures among individuals and cultures who do not prioritize speed or are not familiar with timed assessments (Agranovich et al., 2011).In the worst-case scenario, lower scores on tests among secondary English speakers may be inaccurately perceived as impaired.In other situations, clinicians may simply throw out lower scores that are otherwise uninterpretable given the lack of normative data in this population.
Prior studies have development norms for Mandarin-speaking and Spanish-speaking older adults (Qi et al., 2022;Stricks et al., 1998), however no prior study has development robust normative data accounting for primary language use in Asian American older adults.This study adds to the growing need for normative studies in secondary English speakers.Moreover, while prior norms were developed for specific languages or ethnic populations, the norms developed in this study included an adjustment for primary or secondary English use within a sample of Asian Americans, which may allow for more precise norms within this population.
One study limitation was the homogeneity in terms of years of education, as all our participants had a high school diploma or higher education, with the average participant for both Primary and Secondary English speakers having a college degree.Among individuals with fewer years of education, differences in neuropsychological test scores between primary and secondary English speakers may be more pronounced and may be influenced by whether an individual attended an institution where instruction was in English.Additionally, Secondary English speakers reported the use of many different primary languages, including Mandarin (46%), Cantonese, (19%), Japanese (6%) and other languages (28%).These categories were necessarily collapsed into one ("Secondary English speakers") as cell numbers would be too small for analyses otherwise.Moreover, there were no data we could use to account for the degree of acculturation, where participants' main educational experiences took place, age at which a language was learned, level of proficiency and quality of education, which are important factors that may affect neuropsychological performance.This study also utilized self-report to determine primary language as opposed to a formal measure of language proficiency, which is a limitation.In addition, we could not determine practice effects at the follow-up visit.While practice effects may have resulted in improved perform at follow-up, given that cognitive status was determined based on clinical consensus using scores on numerous measures, it is unlikely that it affected diagnosis.Another limitation of this study was that robust norms were established based on normal cognition at two visits; however, future studies incorporating additional follow-up assessments could further enhance the robustness of these norms.It should also be acknowledged that the term "Asian American" can obfuscate the fact that this is a racially, culturally, and linguistically diverse group.Indeed, the term encompasses individuals with ethnic heritage from Asia (e.g., Chinese, Indians, Filipinos, Japanese, Koreans, Thai, Vietnamese, Cambodians, Hmong, Indonesians, Laotians, Pakistanis) as well as the Pacific Islands (i.e., Polynesia, Micronesia, and Melanesia).While subgroup analyses were underpowered in the current study, clinicians would do well to consider the unique history of individuals from any particular subgroup, as each was influenced differently by immigration policies, patterns, and experiences (Wong, 2000).Readers are also encouraged to review the excellent work by Riccio et al. (2014) and Wong and Fujii (2015) regarding crucial considerations and practical guidelines with regard to neuropsychological assessment of Asian Americans.Moreover, Ardila (2005) illustrates the cultural values underlying cognitive testing and highlights how factors such as the relationship and cultural differences between the examiner and examinee, test instruction interpretation, and the social situation of testing are all culture-dependent (Ardila, 2005).These factors may also play a role in influencing performance on neuropsychological assessments.
Another limitation is that this study only included individuals between the ages of 55 and 91, with education ranging from 6 to 25 years, speaking largely only 4 primary languages.Therefore, the findings of our study are likely most applicable to those represented in our sample.Moreover, these norms were developed based on a secondary analysis of a large dataset.While this allowed a large sample, the NACC database was not originally intended to be utilized for development of gold standard normative data.Accordingly, we were only able to create regression equations for tests with available data.In addition, cognitive status in this study was determined based on neuropsychological testing.It is possible that due to biases inherent in neuropsychological tests, non-English-speaking individuals may have been over-or underidentified as cognitively healthy.Moreover, the tests administered as part of NACC data collection were not available for certain language groups and testing was conducted in Mandarin or Cantonese for some participants.Therefore, there was some variability in administration of tests for different language groups.Future studies are warranted to improve neuropsychological test stimuli, norms and diagnosis for non-English-speaking individuals and allow standardization of testing procedures in non-Englishspeaking populations.
Finally, it is important to acknowledge the limitations of race-based norms (Franzen et al., 2022).In this study, we aimed to account for primary language use to acknowledge differences among Asian Americans in language use.However, many neuropsychological measures are biased and may not be adequate for assessment in diverse populations.Screening tools for diverse population are available in numerous languages and can be administered to better capture cognitive functioning in different populations (Huang et al., 2018;Lim et al., 2021).Until additional research, training and novel instruments are available to enhance neuropsychological assessment for diverse populations, the adjusted norms may allow us to account for differences such as language use among Asian Americans.Moreover, robust norms for other cultural and ethnic populations are also lacking.Additional studies are warranted to develop culturally sensitive tests and robust norms.
Despite the limitations detailed above, the present study represents a significant advance for the field given the paucity of normative data available for older Asian Americans at risk for dementia.The present study benefits from additional strengths.First, the study utilized a robustly normal sample undergoing the NACC neuropsychological battery, which consists of many tests that target the most common presentations of age-related neurodegenerative conditions, including Alzheimer's disease.Second, this is the only study, to our knowledge, that takes into account English usage (as a primary vs. secondary language) in providing normative data for individuals from underrepresented backgrounds.The way in which English is used may be considered a proxy for other sociocultural factors that the present study was not able to evaluate, such as acculturation, noted above.
Further development of normative data for individuals from underrepresented backgrounds will improve our ability to determine a patient's cognitive status more accurately.This, in turn, will have important implications for neuropsychological research and clinical practice in underserved and understudied populations.
LM II = WMS-R Logical Memory Story A Delayed Recall; DS = Digit Span; Animals = Animal Fluency; Vegetables = Vegetable Fluency; TMT-A = Trail Making Test Part A; TMT-B = Trail Making Test Part B; Digit Symbol = Wechsler Adult Intelligence Scaled -Revised (WAIS-R) Digit Symbol; BNT-30 = Boston Naming Test 30-item version.MMSE = Mini-mental State Exam; LM I = Wechsler Memory Scale -Revised (WMS − R) Logical Memory Story A Immediate Recall; *In Equation1, Primary English Use = 1 for Primary English speakers, and Primary English Use = 0 for Secondary English Speakers.

Table 1 .
Demographics by primary language use Data is not presented for cells with N = 1.404ArunimaKapoor et al.Korean.Testing was conducted in Mandarin or Cantonese for some participants (N = 35).

Table 3
Secondary English speakers had significantly worse performance than the Primary English speakers on Trail Making Test B (p = .02),MMSE (p = .002),Digit Span Forward Total Score, Animal Fluency, Vegetable Fluency, Trail Making Test A, BNT-30 (all p's < .001),and Digit Span Forward Span (p = .001).

Table 2 .
Baseline characteristics of the robustly normative subsample (those with at least 1 follow-up, and who are cognitively healthy at baseline and at 1-year follow-up) BMI = Body Mass Index; CDR = Clinical Dementia Rating scale; SD = Standard Deviation; IQR = Interquartile Range.Note: T-test and chi-square test were utilized for continuous and categorical variables, respectively.Cohen's d and Cramer's V were utilized for continuous and categorical variables, respectively.
1Represents percentage of primary English speakers included in the analysis (N = 198).2Represents percentage of secondary English speakers included in the analysis (N = 140).Same results were obtained when we utilized non-parametric tests (i.e., Mann -Whitney).

Table 3 .
Differences on baseline neuropsychological performance between primary and secondary English speakers in the robust sample MMSE = Mini-Mental State Exam; LM I = Wechsler Memory Scale -Revised (WMS-R) Logical Memory Story A Immediate Recall; LM II = WMS-R Logical Memory Story A Delayed Recall; DS = Digit Span; Animals = Animal Fluency; Vegetables = Vegetable Fluency; TMT-A = Trail Making Test Part A; TMT-B = Trail Making Test Part B; Digit Symbol = Wechsler Adult Intelligence Scaled -Revised (WAIS-R) Digit Symbol; BNT-30 = Boston Naming Test 30-item version.Data are presented as Mean (SD).Higher scores indicate better performance for all tests except for Trail Making Test (Parts A and B) for which higher scores indicate longer time to completion and therefore worse performance.Significant differences between groups are indicated in boldface type.F-test (ANCOVA) controlling for age, sex and education was utilized.

Table 4 .
Baseline summary statistics for cognitively healthy participants Mini-Mental State Exam; LM I = Wechsler Memory Scale -Revised (WMS-R) Logical Memory Story A Immediate Recall; LM II = WMS-R Logical Memory Story A Delayed Recall; DS = Digit Span; Animals = Animal Fluency; Vegetables = Vegetable Fluency; TMT-A = Trail Making Test Part A; TMT-B = Trail Making Test Part B; Digit Symbol = Wechsler Adult Intelligence Scaled -Revised (WAIS-R) Digit Symbol; BNT-30 = Boston Naming Test 30-item version.

Table 5 .
Regression coefficients with 95% confidence intervals and the root mean square error (RMSE) for our multivariate regression equations, for estimating z − scores corresponding to various neuropsychological tests