Statement of Research Significance
Research Question(s) or Topic(s): This study compared the diagnostic accuracy of abbreviated 20-item versions of the Copenhagen Cross-Linguistic Naming Test and Naming Assessment in Multicultural Europe for dementia and mild cognitive impairment in a multicultural memory clinic population and examined the influence of demographic and cultural factors on diagnostic accuracy. Main Findings: Both naming tests demonstrated moderate to high diagnostic accuracy for dementia and limited accuracy for mild cognitive impairment. Diagnostic accuracy of neither the Copenhagen Cross-Linguistic Naming Test or Naming Test and Naming Assessment in Multicultural Europe were influenced by immigrant status, acculturation, or administration with an interpreter, indicating little cultural and language bias. Study Contributions: The study supports the validity of the Copenhagen Cross-Linguistic Naming Test and Naming Assessment in Multicultural Europe for assessing anomia in patients with dementia in multicultural populations. Both naming tests appear to be valid time-saving alternatives to their full-length versions.
Introduction
Anomia is a common linguistic impairment seen in various neurological conditions, including stroke, acquired head injury, and in several dementia disorders (Kristensson et al., Reference Kristensson, Longoni, Östberg, Rödseth Smith, Åke and Saldert2024; Nørkær et al., Reference Nørkær, Halai, Woollams, Lambon Ralph and Schumacher2024; Strain et al., Reference Strain, Didehbani, Spence, Conover, Bartz, Mansinghani, Jeroudi, Rao, Fields, Kraut, Cullum, Hart and Womack2017; Vogel et al., Reference Vogel, Mellergaard and Frederiksen2025). Thus, it is standard clinical practice to assess anomia in patients referred to memory clinics, which is most frequently done with confrontation naming tests (Georgiou et al., Reference Georgiou, Prapiadou, Thomopoulos, Skondra, Charalampopoulou, Pachi, Anagnostopoulou, Vorvolakos, Perneczky, Politis and Alexopoulos2022).
The Boston Naming Test (BNT) (Kaplan et al., Reference Kaplan, Goodglass and Weintraub2001) is the most widely used confrontation naming (Maruta et al., Reference Maruta, Guerreiro, De Mendonça, Hort and Scheltens2011; Rabin et al., Reference Rabin, Paolillo and Barr2016). The original version consists of 60 black-and-white line drawings depicting objects of increasing difficulty, but several briefer 30-, 20- or 15-item versions have also been developed (Mack et al., Reference Mack, Freed, Williams and Henderson1992; Williams et al., Reference Williams, Mack and Henderson1989). However, the BNT has faced longstanding criticism for its cultural and linguistic bias, limiting its cross-cultural applicability (Harry & Crowe, Reference Harry and Crowe2014; March et al., Reference March, Worrall and Hickson2000). In addition, studies have shown systematic disparities across ethnic and linguistic groups, with lower scores observed in bilingual and multilingual individuals, and in minoritized populations (Baird et al., Reference Baird, Ford and Podell2007; Boone et al., Reference Boone, Victor, Wen, Razani and Ponton2007; Gollan et al., Reference Gollan, Fennema-Notestine, Montoya and Jernigan2007; Kohnert et al., Reference Kohnert, Hernandez and Bates1998; Roberts et al., Reference Roberts, Garcia, Desrochers and Hernandez2002). Different patterns in BNT performance have also been linked to other factors, such as education or rural versus urban upbringing (Kim et al., Reference Kim, Lee, Bae, Kim, Kim, Kim, Park, Cho and Chang2017).
In response to these concerns, there have been several efforts to develop cross-cultural naming tests, including the Cross-Linguistic Naming Test (CLNT) (Ardila, Reference Ardila2007) and Multilingual Naming Test (MINT) (Gollan et al., Reference Gollan, Weissberger, Runnqvist, Montoya and Cera2012). While these efforts represent meaningful progress, they also highlight the ongoing need for developing brief, cross-cultural confrontation tests with high diagnostic accuracy. For instance, while CLNT has shown promising cross-cultural properties it has low sensitivity for naming impairment associated with dementia due to a ceiling effect (Gálvez-Lara et al., Reference Gálvez-Lara, Moriana, Vilar-López, Fasfous, Hidalgo-Ruzzante and Pérez-García2015), and although MINT has shown high diagnostic accuracy for dementia (Ivanova et al., Reference Ivanova, Salmon and Gollan2013; Stasenko et al., Reference Stasenko, Jacobs, Salmon and Gollan2019), it is influenced by factors such as sex, education, and ethnic background (Stasenko et al., Reference Stasenko, Jacobs, Salmon and Gollan2019) and has been criticized for including items that are culturally unfamiliar for some populations (Li et al., Reference Li, Zeng, Neugroschl, Aloysi, Zhu, Xu, Teresi, Ocepek-Welikson, Ramirez, Joseph, Cai, Grossman, Martin, Sewell, Loizos and Sano2022).
The above criticisms are increasingly relevant with rising global migration. Neuropsychologists have frequently reported language differences as a primary barrier in neuropsychological assessments (Franzen et al., Reference Franzen, Papma, Van Den Berg and Nielsen2020, Reference Franzen, Watermeyer, Pomati, Papma, Nielsen, Narme, Mukadam, Lozano-Ruiz, Ibanez-Casas, Goudsmit, Fasfous, Daugherty, Canevelli, Calia, Van Den Berg and Bekkhus-Wetterberg2022; Nielsen et al., Reference Nielsen, Andersen, Kastrup, Phung and Waldemar2011; Nielsen et al., Reference Nielsen, De Mendonça, Frölich, Engelborghs, Gove, Lamirel, Calia and Waldemar2024), and recent surveys of European clinical dementia centers found that half of the centers found it more challenging to assess culturally and linguistically diverse patients (Nielsen et al., Reference Nielsen, De Mendonça, Frölich, Engelborghs, Gove, Lamirel, Calia and Waldemar2024). Misdiagnosis in these populations remains a well-documented issue (Hinton et al., Reference Hinton, Tran, Peak, Meyer and Quiñones2024; Lin et al., Reference Lin, Daly, Olchanski, Cohen, Neumann, Faul, Fillit and Freund2021; Nielsen, Andersen, et al., Reference Nielsen, Andersen, Kastrup, Phung and Waldemar2011; Nielsen, Vogel, Phung, et al., Reference Nielsen, Vogel, Phung, Gade and Waldemar2011). At the same time, dementia prevalence continues to rise globally. Thus, dementia is currently estimated to affect approximately 50 million people worldwide, and this number is expected to triple within the next 25 years (World Health Organization, 2021). This underscores the urgent need for appropriate tools to diagnose dementia disorders in a timely and accurate manner, no matter ethnicity, language, or cultural origin.
Two promising cross-cultural naming tests, the Copenhagen Cross-Linguistic Naming Test (C-CLNT) and the Naming Assessment in Multicultural Europe (NAME) were both developed for use across diverse cultures, languages, and educational backgrounds (Franzen et al., Reference Franzen, Van Den Berg, Ayhan, Satoer, Türkoğlu, Genç Akpulat, Visch-Brink, Scheffers, Kranenburg, Jiskoot, Van Hemmen and Papma2023; Nielsen et al., Reference Nielsen, Grollenberg, Ringkøbing, Özden, Weekes and Waldemar2023). However, given the time constraints often present in clinical practices, longer assessment tools may be impractical due to fatigue, cognitive load, limited personnel resources, etc. (Calero et al., Reference Calero, Arnedo, Navarro, Ruiz-Pedrosa and Carnero2002). As such, abbreviated versions can be particularly valuable for rapid screening, as well as for other purposes such as in brief test batteries used in research and clinical trials. This study aimed to compare the diagnostic accuracy of the abbreviated 20-item versions of C-CLNT20 and NAME20 for dementia and mild cognitive impairment (MCI) in a multicultural memory clinic sample, and to examine the influence of demographic and cultural factors on diagnostic accuracy. We hypothesized that: 1) the C-CLNT20 and NAME 20 would have similar diagnostic accuracies for dementia and MCI, and 2) their diagnostic accuracies would be unrelated to cultural factors.
Materials and methods
Participants
Patients were recruited from multidisciplinary memory clinics across five European countries (Denmark, Spain, the Netherlands, France, and the United Kingdom). All patients underwent a comprehensive clinical assessment that included interviews with both the patient and, when possible, a close relative or caregiver. This was followed by neurological, physical, and psychiatric evaluations, incorporating cognitive screening tools such as the Mini-Mental State Examination (MMSE) (Folstein et al., Reference Folstein, Folstein and McHugh1975) or Rowland Universal Dementia Assessment Scale (RUDAS) (Storey et al., Reference Storey, Rowland, Conforti and Dickson2004). Standard laboratory tests, including blood work and electrocardiograms, and structural brain imaging with computed tomography or magnetic resonance imaging were also performed. Additional assessments, such as positron emission tomography scans, cerebrospinal fluid analysis, or in-depth neuropsychological and psychiatric evaluations were conducted when clinically indicated. A team of experienced clinicians established diagnoses based on evidence from all clinical and investigational results, except the C-CLNT20 and NAME20, using the 5th edition of the Diagnostic and Statistical Manual of Mental Disorders (American Psychiatric Association, 2013) criteria for major neurocognitive disorder (i.e., dementia), and diagnostic research criteria for specific dementia subtypes (Gorno-Tempini et al., Reference Gorno-Tempini, Hillis, Weintraub, Kertesz, Mendez, Cappa, Ogar, Rohrer, Black, Boeve, Manes, Dronkers, Vandenberghe, Rascovsky, Patterson, Miller, Knopman, Hodges, Mesulam and Grossman2011; McKeith et al., Reference McKeith, Boeve, Dickson, Halliday, Taylor, Weintraub, Aarsland, Galvin, Attems, Ballard, Bayston, Beach, Blanc, Bohnen, Bonanni, Bras, Brundin, Burn, Chen-Plotkin and Kosaka2017; McKhann et al., Reference McKhann, Knopman, Chertkow, Hyman, Jack, Kawas, Klunk, Koroshetz, Manly, Mayeux, Mohs, Morris, Rossor, Scheltens, Carrillo, Thies, Weintraub and Phelps2011; Rascovsky et al., Reference Rascovsky, Hodges, Knopman, Mendez, Kramer, Neuhaus, Van Swieten, Seelaar, Dopper, Onyike, Hillis, Josephs, Boeve, Kertesz, Seeley, Rankin, Johnson, Gorno-Tempini, Rosen and Miller2011; Sachdev et al., Reference Sachdev, Kalaria, O’Brien, Skoog, Alladi, Black, Blacker, Blazer, Chen, Chui, Ganguli, Jellinger, Jeste, Pasquier, Paulsen, Prins, Rockwood, Roman and Scheltens2014), MCI (Winblad et al., Reference Winblad, Palmer, Kivipelto, Jelic, Fratiglioni, Wahlund, Nordberg, Bäckman, Albert, Almkvist, Arai, Basun, Blennow, De Leon, DeCarli, Erkinjuntti, Giacobini, Graff, Hardy and Petersen2004), and subjective cognitive decline (SCD) (Jessen et al., Reference Jessen, Amariglio, van Boxtel, Breteler, Ceccaldi, Chételat, Dubois, Dufouil, Ellis, K., van der Flier, Glodzik, van Harten, de Leon, McHugh, Mielke, Molinuevo, Mosconi, Osorio and Perrotin2014). Patients with primary affective or other psychiatric conditions or cognitive impairment due to causes other than dementia or MCI, were excluded. Participants with physical impairments likely to interfere with cognitive testing (e.g., significant movement disorders, uncorrected hearing or vision problems) were excluded.
Cognitively healthy participants were recruited from local community centers, general practice clinics or through social networks of the researchers working at the memory clinics. The exclusion criteria for cognitively healthy participants included severe psychiatric or neurological disorder, substance abuse, or scoring <24/30 points on the MMSE or <23/30 points on the RUDAS, or >5/15 points on the two-step 5/15-item Geriatric Depression Scale (GDS-5/15). (Weeks et al., Reference Weeks, McGann, Michaels and Penninx2003).
Participants with immigrant background were defined as first-generation immigrants or refugees residing in the country where the data was collected. European native-born participants were defined as participants without migration background, meaning those who were born in the country of data collection, and typically belonged to the majority ethnic group of that country (e.g., ethnic Danes in Denmark). All participants were included between March 2023 and August 2024.
Procedure
As part of the data collection process, participants completed an assessment of approximately one hour. During this assessment, demographic and medical information was collected, and a brief neuropsychological test battery was administered, including the C-CLNT20 and NAME20. To minimize bias, assessors were generally blinded to participants’ diagnostic classifications, except for the cognitively intact group (as these participants were recruited separately). Participants with immigrant background (n = 116) were assessed in their first language whenever possible, either by multilingual research staff or with the assistance of interpreters (n = 79). However, a subset of these participants (n = 37) was assessed in their second language. The study adhered to the Declaration of Helsinki for research involving human subjects and was assessed and approved by the Scientific Ethics Committees (reference no. 22007675) and Data Protection Agency (reference no. P-2022-444) for the Capital Region of Denmark as well as relevant local ethics and data protection authorities at other sites. All participants provided written consent.
Measures
C-CLNT20
C-CLNT is a newly developed cross-cultural naming test (Nielsen et al., Reference Nielsen, Grollenberg, Ringkøbing, Özden, Weekes and Waldemar2023) that consists of 30 colored drawings, 20 of which depict objects and 10 depict actions. One point is given for each correctly named item, and participants are given 20 seconds to respond per item. Semantic cues may be provided when appropriate (e.g., in cases of visual misperception) (Nielsen et al., Reference Nielsen, Grollenberg, Ringkøbing, Özden, Weekes and Waldemar2023) and a correct response following this count as correctly named. The abbreviated 20-item version of C-CLNT (C-CLNT20) was developed by including only the 20 object items, excluding the original test’s 10 action items (Nielsen et al., Reference Nielsen, Grollenberg, Ringkøbing, Özden, Weekes and Waldemar2023). In the context of multilingualism and inherent language mixing, participants are allowed to respond in any language. A correct response in any language is considered correct.
NAME20
NAME represents another novel cross-cultural naming test (Franzen et al., Reference Franzen, Van Den Berg, Ayhan, Satoer, Türkoğlu, Genç Akpulat, Visch-Brink, Scheffers, Kranenburg, Jiskoot, Van Hemmen and Papma2023) that consists of 60 items, including colored photographs of objects, natural phenomena, animals, body parts, colors, occupations, and actions. One point is given for each correctly named item. When administering NAME, no cues are provided and there is no formal time limit. The abbreviated 20-item version of NAME (NAME20) was constructed by selecting the 20 items that best separated patients with Alzheimer’s disease (AD) and mixed dementia (AD/vascular dementia [VaD]) from the remainder of the sample in the original validation study (Franzen et al., Reference Franzen, Van Den Berg, Ayhan, Satoer, Türkoğlu, Genç Akpulat, Visch-Brink, Scheffers, Kranenburg, Jiskoot, Van Hemmen and Papma2023). NAME20 includes items representing all the original categories of NAME. Participants are allowed to respond in any language. A correct response in any language is considered correct. Figure 1 provides examples of items from C-CLNT20 and NAME20.

Figure 1. Examples of items from C-CLNT20 (bone and fly) and NAME20 (butcher and nose). Items are reproduced from the original C-CLNT and NAME papers (Franzen et al., Reference Franzen, Van Den Berg, Ayhan, Satoer, Türkoğlu, Genç Akpulat, Visch-Brink, Scheffers, Kranenburg, Jiskoot, Van Hemmen and Papma2023 and Nielsen et al., Reference Nielsen, Grollenberg, Ringkøbing, Özden, Weekes and Waldemar2023) with permission from the authors.
Other measures
In addition to the C-CLNT20 and NAME20, participants were administered a brief battery of neuropsychological tests. These tests included RUDAS (Storey et al., Reference Storey, Rowland, Conforti and Dickson2004), Category fluency (animals and supermarket items; Lehman, Reference Lehman1970), Clock Reading Test (CRT; Schmidtke & Olbrich, Reference Schmidtke and Olbrich2007), and Interlocking Finger Test (ILFT; Moo, Reference Moo2003).
RUDAS is a cross-cultural cognitive screening tool for dementia. It takes approx. 10 minutes to administer and comprises subtasks covering six different domains: episodic memory, body orientation, visuo-spatial construction, practical coordination, judgement, and language function. Scores range from 0–30 (Storey et al., Reference Storey, Rowland, Conforti and Dickson2004).
In Category fluency, the participant is instructed to name as many words as possible belonging to a specific category within 60 seconds (Wright et al., Reference Wright, De Marco and Venneri2023). In this study, two versions were used: 1) animals, and 2) items found in a supermarket. The score is the number of correct words produced within 60 seconds.
CRT is a brief 12-item visuo-spatial test, where the participant is presented with 12 different clocks faces with no digits. All clock faces show different times, and the task is to read and report the time. The score range is 0–12 points (Schmidtke & Olbrich, Reference Schmidtke and Olbrich2007).
In the ILFT, the participant is shown four non-symbolic hand gestures that are to be imitated. The score range is 0–4 points (Moo, Reference Moo2003).
Also, to measure acculturation in participants with immigrant backgrounds, the Brief Acculturation Scale (Norris et al., Reference Norris, Ford and Bova1996) was used, which is a four-item self-report measure focusing on language use. Each item is rated on a five-point Likert scale, with a total score ranging from 4–20 points. Higher scores indicate more acculturation towards the mainstream majority culture (Norris et al., Reference Norris, Ford and Bova1996).
Statistical analyses
To determine the significance of group differences on categorical variables, Pearson’s χ 2-test or Fishers Exact Test were used, while differences between groups on continuous variables were determined using analysis of variance (ANOVA). All group differences were pretested for homogeneity of variances, and when homogeneity of variances was not met, Welch’s ANOVA was used. Effects sizes were calculated as Partial Eta Squared (η 2), with η 2 = .01 considered a small effect, η 2 = .06 a medium effect, and η 2 = .14 a large effect. Spearman’s rank order correlations were used to determine correlations between performance on C-CLNT20 and NAME20, and other neuropsychological tests, with r = .00–.20 considered a negligible effect, r = .21–.40 a weak effect, r = .41–.60 a moderate effect, r = .61–.80 a strong effect, and r = .81–1.00 a very strong effect.
To assess diagnostic accuracy, receiver operating characteristics (ROC) analysis was conducted to compute area under the curve (AUC), sensitivity, specificity, and positive (LR+) and negative (LR–) likelihood ratios, using the consensus diagnosis provided by a team of experienced clinicians as the reference standard. In these analyses, patients with SCD were grouped with cognitively healthy participants to form a cognitively intact group. By definition, individuals with SCD have no objective cognitive impairment (Jessen et al., Reference Jessen, Amariglio, van Boxtel, Breteler, Ceccaldi, Chételat, Dubois, Dufouil, Ellis, K., van der Flier, Glodzik, van Harten, de Leon, McHugh, Mielke, Molinuevo, Mosconi, Osorio and Perrotin2014). Youden’s index was used for determining optimal cut-off values to maximize sensitivity and specificity. AUCs were compared using the DeLong-method (DeLong et al., Reference DeLong, DeLong and Clarke-Pearson1988). Binary logistic regression analyses were used to determine the influence of demographic and cultural variables on classification accuracy. Nagelkerke R 2 was reported as a measure for the explained variance in diagnostic group status. All analyses were conducted in IBM SPSS Statistics version 29.0.2.0 or clinical calculators from VassarStats.com. All statistical significance was determined using a p-value of < .05 (two-tailed).
Results
Participant characteristics
A total of 192 participants were recruited for the study. Of these, 22 memory clinic patients were excluded due to being diagnosed with a primary affective disorder, and nine cognitively intact participants were excluded, seven due to scoring <23/30 points on the RUDAS, and two due to scoring >5/15 points on GDS-5/15. The final sample consisted of 161 participants (see Table 1), representing 36 different countries of origin and 30 different languages. A total of 56.5% (n = 91) of the sample had immigrant background, and among them seven originated from Europe, 31 from the Middle East, 13 from Africa, 19 from Asia, and 21 from Latin America.
Table 1. Participant characteristics and test performance (n = 161)

Note: BAS = Brief Acculturation Scale, C-CLNT20 = Copenhagen Cross-Linguistic Naming Test (20 items), CRT = Clock Reading Test, ILFT = Interlocking Finger Test, MCI = Mild Cognitive Impairment, NAME20 = Naming Assessment in Multicultural Europe (20 items), RUDAS = Rowland Universal Dementia Assessment Scale.
* Group comparison is only based on participants with immigrant background.
Among the 86 memory clinic patients, 53 were diagnosed with dementia (32 AD, four VaD, three mixed dementia (AD/VaD), four frontotemporal dementia (FTD; including two behavioral variant FTD and two primary progressive aphasia [PPA]), one dementia with Lewy bodies (DLB), two other specified dementia (normal pressure hydrocephalus and HIV-associated neurocognitive disorder), and seven unspecified dementia cases. Furthermore, 33 were diagnosed with MCI, including 27 with amnestic MCI and six with non-amnestic MCI.
There were no significant differences in distribution of sex or years of education across the diagnostic groups, but the cognitively intact group was younger (F(2, 157) = 13.22, p < .001) and included a larger proportion of participants with immigrant background (χ 2(2, n = 161) = 12.77, p = .002). Analyses further showed significant differences on all neuropsychological tests across diagnostic groups: C-CLNT20 (Welch’s F(2, 70.56) = 18.92, p = <.001, η 2 = .23), NAME20 (Welch’s F(2, 84.87) = 19.02, p < .001, η 2 = .22), RUDAS (Welch’s F(2, 63.89) = 39.01, p < .001, η 2 = .37), Category fluency (animals) (Welch’s F(2, 84.46) = 44.22, p < .001, η 2 = .34), Category fluency (supermarket) (Welch’s F(2, 87.47) = 40.72, p < .001, η 2 = .33), CRT (Welch’s F(2, 74.95) = 24.28, p < .001, η 2 = .3) and ILFT (F(2, 158) = 13.87, p < .001, η 2 = .15). However, no significant differences were found between patients with SCD and cognitively healthy participants on any of the neuropsychological tests.
When comparing participants with immigrant background to European native-born participants, participants with immigrant background were significantly younger (67.6 ± 7.5 years vs. 77.5 ± 5.7 years; Welch’s F(1, 157.83) = 89.58, p < .001), but there were no significant differences in years of education, distribution of sex, or performance on C-CLNT20 and NAME20. Out of the 91 participants with immigrant background, 31 were assessed with help from an interpreter, with no significant differences in interpreter use across diagnostic groups.
Construct validity of the abbreviated versions
Correlation analyses showed that C-CLNT20 and NAME20 were strongly correlated with each other (r = .67, p < .001). C-CLNT20 was also moderately correlated with RUDAS (r = .43, p < .001) and category fluency (animals: r = .41, p < .001), weakly to moderately correlated with category fluency (supermarket items: r = .34, p < .001), and CRT (r = .39, p < .001) and weakly correlated with ILFT (r = .25, p = .001). NAME20 correlated moderately to strongly with category fluency (animals) (r = .60, p < .001), moderately with RUDAS (r = .46, p < .001), category fluency (supermarket items) (r = .45, p < .001) and CRT (r = .51, p < .001), and weakly with ILFT (r = .25, p = .001).
Diagnostic accuracy
AUCs were .75 for C-CLNT20 and .82 for NAME20 in discriminating patients with dementia from other diagnostic groups (cognitively intact + MCI). This difference in AUC values was not statistically significant (z = −1.77, p = .076) (see Figure 2). With optimal cut-off values at ≤18/20 for both tests, sensitivity and specificity were .72 and .71 for C-CLNT20 and .76 and .80 for NAME20. Using prediction models adjusting for age, sex, years of education, and immigrant status (see below), slightly reduced the AUCs to .73 for C-CLNT20 and .79 for NAME20. In a sub-comparison between patients with dementia and cognitively intact participants only, both tests demonstrated slightly higher diagnostic accuracy, with marginally higher AUC and increased specificity (see Table 2). When using the tests to discriminate between patients with MCI and cognitively intact participants, the AUC for C-CLNT20 was .64 and the AUC for NAME20 was .62, which again was not a significant difference between the two naming tests (z = .37, p = .712). With optimal cut-off values at ≤19 for both tests for detecting MCI, sensitivity and specificity were .73 and .51 for C-CLNT20 and .55 and .67 for NAME20.

Figure 2. ROC-curves for C-CLNT20 and NAME20 for dementia.
Table 2. Diagnostic accuracy

Note: AUC = Area under the curve, C-CLNT20 = Copenhagen Cross-Linguistic Naming Test (20 items), NAME20 = Naming Assessment in Multicultural Europe (20 items), +LR = positive likelihood ratio, −LR = negative likelihood ratio.
* Optimal cut-off scores were based on Youdens J.
In a subsample consisting only of participants with immigrant background (n = 91), AUCs for C-CLNT20 and NAME20 were .81 (95% CI: .71–.91) and .86 (95% CI: .78–.95), respectively, in discriminating dementia from other diagnostic groups (cognitively intact + MCI), which was not a significant difference (z = –1.23, p = .220). In a subsample of European native-born participants (n = 70), AUCs were .70 (95% CI: .56–.83) and .77 (95% CI: .66–.89) for C-CLNT20 and NAME20, respectively, which was also not a significant difference (z = −1.2, p = .231). Also, AUC values were not significantly different between participants with and without immigrant background on either of the tests (C-CLNT20: z = 1.25, p = .213; NAME20: z = 1.15, p = .252).
Influence of demographic and cultural factors on diagnostic accuracy
Binary logistic regression analyses were conducted to determine the effects of demographic and cultural variables on the diagnostic accuracy of C-CLNT20 and NAME20 (Tables 3 and 4). In a model, including C-CLNT20, age, sex, years of education, and immigrant status as covariates, the model predicted 36.2% (Nagelkerke R 2) of the variance in group status (dementia vs. cognitively intact + MCI), and correctly classified 80.5% of all cases. Lower C-CLNT20 score, older age, and female sex were significant predictors of dementia, while years of education and immigrant status were not. In a model including NAME20 and the same covariates, the model predicted 36.6% of the variance in group status, and correctly classified 79.9% of all cases. In this model, lower NAME20 score and older age were significant predictors, while there was a trend for female sex (p = .056).
Table 3. Logistic regression analyses for diagnosis of dementia in the full sample (n = 161)

Note: P.E. = parameter estimate (B-value), S.E. = standard error, OR = odds ratio.
Table 4. Logistic regression analyses for diagnosis of dementia in participants with immigrant background (n = 90)

Note: P.E. = parameter estimate (B-value), S.E. = standard error, OR = odds ratio.
The regression analyses were repeated in a subsample of participants with immigrant background only. In a model with C-CLNT20, age, sex, years of education, and acculturation as covariates, the model explained 54.7% of the variance in group status, and correctly classified 88.4% of all cases. Significant predictors were C-CLNT20 score (B = –0.9, p < .001, OR: 0.41 [95% CI: 0.26–0.64]), age (B = 0.13, p = .01, OR: 1.14 [95% CI: 1.03–1.26]) and years of education (B = 0.14, p = .035, OR: 1.15 [95% CI: 1.01–1.32]). In a model including NAME20 and the same covariates, the model explained 54.9% of the variance in group status, and correctly classified 86% of all cases. Significant predictors in this model were NAME20 score (B = −0.96, p < .001, OR: 0.38 [95% CI: 0.23–0.63]) and age (B = 0.11, p = .022, OR: 1.11 [95% CI: 1.02–1.22]). Adding the use of an interpreter to the regression analyses, did not significantly influence the diagnostic performance of either C-CLNT20 (p = .191) or NAME20 (p = .948).
Discussion
This study presents a head-to-head comparison of the diagnostic accuracy and cross-cultural applicability of abbreviated 20-item versions of the C-CLNT and NAME. Overall, both tests demonstrated moderate to high diagnostic accuracy for dementia, limited accuracy for MCI, and minimal bias related to cultural and demographic factors. Construct validity of both tests was supported by moderate to strong correlations with other language measures (Category fluency) and weaker correlations with visual measures (ILFT and CRT), indicating good convergent and acceptable divergent validity. Supporting our first hypothesis, ROC curve analyses showed similar AUCs of .75 for C-CLNT20 and .82 for NAME20 for classifying dementia, with acceptable levels of sensitivity and specificity for both tests at their optimal cut-offs of ≤18/20. Diagnostic accuracy for MCI was lower (AUC = .64 and .62, respectively), suggesting that the two confrontation naming tests are not sufficiently sensitive to MCI. Diagnostic accuracy did not significantly differ between European native-born and immigrant participants. Regarding the influence of demographic and cultural variables, in the full sample diagnostic accuracy of C-CLNT20 and NAME20 was influenced by age and sex and in a sub-analysis in participants with immigrant status alone, C-CLNT20 was additionally influenced by education. Notably, however, in support of our second hypothesis, neither NAME20 or C-CLNT20 were influenced by immigrant status, acculturation, or administration with an interpreter, indicating little cultural and language bias.
While the C-CLNT and NAME20 offer efficiency, abbreviating a test can affect its psychometric properties. The full versions of C-CLNT and NAME were not included in the present study, which limits the possibility of direct comparisons with their abbreviated counterparts. However, Nielsen et al. (Reference Nielsen, Grollenberg, Ringkøbing, Özden, Weekes and Waldemar2023) conducted a direct comparison of C-CLNT and C-CLNT20 in the original validation sample and found only a minimal reduction in AUC (from .80 to .78), suggesting that C-CLNT20 retained much of the original test’s diagnostic value. No such comparison between NAME and NAME20 has been conducted within the same sample or using the same methodology. Future research should more systematically investigate differences in diagnostic accuracy between full and abbreviated versions within the same samples to better understand potential trade-offs between test length and performance. Differences in sample composition and methodology also apply when comparing the present findings to other studies on abbreviated naming tests, limiting the strength of cross-study comparisons. Nonetheless, with these limitations in mind, the findings of this study are generally consistent with previous research on abbreviated naming tests. A 24-item version of the MINT demonstrated very similar diagnostic accuracy (AUC = .81; sensitivity = .91; specificity = .59) (Vélez-Uribe et al., Reference Vélez-Uribe, Rosselli, Newman, Gonzalez, Gonzalez Pineiro, Barker, Marsiske, Fiala, Lang, Conniff, Ahne, Goytizolo, Loewenstein, Curiel and Duara2024). Regarding the gold standard BNT, Li et al. (Reference Li, Zeng, Neugroschl, Aloysi, Zhu, Xu, Teresi, Ocepek-Welikson, Ramirez, Joseph, Cai, Grossman, Martin, Sewell, Loizos and Sano2022) reported an AUC of .78 for a 30-item version, while Katsumata et al. (Reference Katsumata, Mathews, Abner, Jicha, Caban-Holt, Smith, Nelson, Kryscio, Schmitt and Fardo2015) found AUCs ranging from .85 to .92 for various 15-item versions of BNT. Notably, the relatively high AUCs reported by Katsumata et al. (Reference Katsumata, Mathews, Abner, Jicha, Caban-Holt, Smith, Nelson, Kryscio, Schmitt and Fardo2015) may reflect that the study used a very demographically homogeneous sample (i.e., 93% White participants with a mean of 16 years of education), which may have inflated diagnostic accuracy. In more diverse populations, AUCs for 15-item version of BNT have been reported as low as .59 (Nielsen et al., Reference Nielsen, Grollenberg, Ringkøbing, Özden, Weekes and Waldemar2023), suggesting that diagnostic accuracy of BNT is lower in more demographically and culturally heterogeneous samples. In this context, both C-CLNT20 and NAME20 appear to perform comparably, or even favorably, relative to other abbreviated confrontation naming tests.
Both C-CLNT20 and NAME20 demonstrated limited diagnostic accuracy for identifying MCI, with AUCs of .64 and .62, respectively. This corresponds with results from Nielsen et al. (Reference Nielsen, Grollenberg, Ringkøbing, Özden, Weekes and Waldemar2023) on the full-length C-CLNT, which had an AUC of .53 for MCI. The AUC for MCI patients was not formally examined for the original NAME (Franzen et al., Reference Franzen, Van Den Berg, Ayhan, Satoer, Türkoğlu, Genç Akpulat, Visch-Brink, Scheffers, Kranenburg, Jiskoot, Van Hemmen and Papma2023). However, the original study reports medians in the MCI group closer to those of control participants than to patients with AD/mixed dementia with notable variation in this group, however. These results regarding detection of MCI are consistent with findings from other abbreviated naming tests. Short forms of the BNT have shown AUCs between .58 and .70 (Katsumata et al., Reference Katsumata, Mathews, Abner, Jicha, Caban-Holt, Smith, Nelson, Kryscio, Schmitt and Fardo2015; Li et al., Reference Li, Zeng, Neugroschl, Aloysi, Zhu, Xu, Teresi, Ocepek-Welikson, Ramirez, Joseph, Cai, Grossman, Martin, Sewell, Loizos and Sano2022), and the 24-item MINT reported similar performance (AUC = .60), with acceptable sensitivity (.79) but very low specificity (.35) (Vélez-Uribe et al., Reference Vélez-Uribe, Rosselli, Newman, Gonzalez, Gonzalez Pineiro, Barker, Marsiske, Fiala, Lang, Conniff, Ahne, Goytizolo, Loewenstein, Curiel and Duara2024). The limited accuracy across naming tests may reflect that anomia is not a core symptom of MCI, and that a significant proportion of individuals with MCI do not exhibit any anomia (Joubert et al., Reference Joubert, Brambati, Ansado, Barbeau, Felician, Didic, Lacombe, Goldstein, Chayer and Kergoat2010). Additionally, the present study’s MCI group included both amnestic and non-amnestic subtypes, unlike earlier studies that focused almost solely on amnestic MCI (Katsumata et al., Reference Katsumata, Mathews, Abner, Jicha, Caban-Holt, Smith, Nelson, Kryscio, Schmitt and Fardo2015; D. Li et al., Reference Li, Yu, Hu, Zhang, Liu, Fan, Ruan and Wang2022; Vélez-Uribe et al., Reference Vélez-Uribe, Rosselli, Newman, Gonzalez, Gonzalez Pineiro, Barker, Marsiske, Fiala, Lang, Conniff, Ahne, Goytizolo, Loewenstein, Curiel and Duara2024). Since naming impairments are more common in the amnestic subtype (Liampas et al., Reference Liampas, Folia, Morfakidou, Siokas, Yannakoulia, Sakka, Scarmeas, Hadjigeorgiou, Dardiotis and Kosmidis2023), this broader inclusion criteria for MCI may have slightly reduced classification performance in this sample compared to other samples. These findings indicate that brief naming tests, such as the C-CLNT20 and NAME20, may have limited utility in detecting naming deficits in individuals with MCI. Prior research suggests that naming tests incorporating items with a stronger semantic load, such as names of famous people or culturally significant landmarks, may offer greater sensitivity in identifying such impairments (Vogel et al., Reference Vogel, Johannsen, Stokholm and Jørgensen2014). However, because semantic knowledge is inherently culture-specific, the inclusion of these items poses challenges for test adaptation and validity in cross-cultural contexts.
Lastly, C-CLNT20 and NAME20, like their full-length counterparts, demonstrated minimal cultural bias. Unlike other naming tests, their performance was not significantly influenced by immigrant status, level of acculturation, or use of interpreter. In contrast, previous studies have shown that MINT-24 was affected by education, and the full MINT was also influenced by cultural group (Vélez-Uribe et al., Reference Vélez-Uribe, Rosselli, Newman, Gonzalez, Gonzalez Pineiro, Barker, Marsiske, Fiala, Lang, Conniff, Ahne, Goytizolo, Loewenstein, Curiel and Duara2024). Similarly, performance on the BNT has been shown to vary with education level, acculturation, and immigration background among other factors (Boone et al., Reference Boone, Victor, Wen, Razani and Ponton2007; Nussbaum et al., Reference Nussbaum, May, Cutler, Abeare, Watson and Erdodi2022; Shaikh et al., Reference Shaikh, Zaidi, Wong Gonzalez, Dimech, Gilson, Stokes and Paterson2025). Compared to these findings, both C-CLNT20 and NAME20 appear to be more culturally fair.
Taken together, the head-to-head comparison between C-CLNT20 and NAME20 showed no statistically significant differences in diagnostic accuracy between the two tests across all subgroup comparisons. However, NAME20 systematically showed slight advantages in classifying dementia, with higher sensitivity, specificity, and AUC values. Subtle differences in diagnostic performance may partly stem from the different approaches to item selection. NAME20 item selection was based on a psychometric decision using the 20 items that best discriminated patients with dementia from cognitively healthy controls, including colored photographs of objects, natural phenomena, animals, body parts, colors, occupations, and actions. In contrast, C-CLNT20 selectively included colored drawings of objects, as some literature suggests noun naming may be more affected than verb naming in Alzheimer’s disease (Williamson et al., Reference Williamson, Adair, Raymer and Heilman1998). These differences in item selection may have affected diagnostic performance of NAME20 and C-CLNT20 as some studies indicate that deficits in noun and verb naming vary based on affected brain regions and specific dementia subtypes (Hillis et al., Reference Hillis, Oh and Ken2004; Pisoni et al., Reference Pisoni, Mattavelli, Casarotti, Comi, Riva, Bello and Papagno2018). For instance, a study showed that patients with FTD (including non-fluent PPA) showed greater difficulty with verb naming than noun naming (Hillis et al., Reference Hillis, Oh and Ken2004). Thus, the differences in diagnostic performance may partly be due to NAME20 being more sensitive to naming impairment associated with FTD. However, this needs to be established in future research. Ideally, such research should examine C-CLNT20 and NAME20 performance separately across larger, well-defined, dementia syndromes, including patients with clinically documented anomia. It would also be interesting to examine whether incorrect responses on these tests reflect naming impairment, or whether some errors may be due to impairments in other cognitive functions (e.g., gnosis).
A key strength of this study was the direct head-to-head comparison of C-CLNT20 and NAME20 within the same multicultural sample, reducing methodological disparities and enhancing the validity of comparative findings. Recruitment across five countries and inclusion of 36 different nationalities and 30 languages further supported the generalizability of results across countries and cultural contexts. However, the small sample sizes from some countries (four from France and one from the UK) represent a limitation. Another limitation is the lack of accurate matching across diagnostic groups on variables such as age and immigrant status. Although we tried to correct for this in the analyses, it cannot be ruled out that this may have exacerbated some group differences and influenced diagnostic performance of the tests. Furthermore, our dementia and MCI samples were too small to analyze the C-CLNT20 and NAME20 across specific dementia and MCI subtypes. Additionally, while the findings supported construct validity through correlations with category fluency tests, the lack of a well-established confrontation naming tests for multicultural populations, including the BNT and MINT, complicate this type of research and limits direct comparability with other studies using these measures. Finally, although interpreter assistance was generally provided when necessary, access to interpreter services and the quality of professional interpreter training vary considerably across European countries (Nielsen et al., Reference Nielsen, De Mendonça, Frölich, Engelborghs, Gove, Lamirel, Calia and Waldemar2024). At most participating sites, neuropsychologists collaborated with interpreters who lacked specific training in cognitive assessment. Nonetheless, the study served to cross-validate parts of C-CLNT and NAME in a new, large, and diverse sample, strengthening the findings reported by Franzen et al. (Reference Franzen, Van Den Berg, Ayhan, Satoer, Türkoğlu, Genç Akpulat, Visch-Brink, Scheffers, Kranenburg, Jiskoot, Van Hemmen and Papma2023) and Nielsen et al. (Reference Nielsen, Grollenberg, Ringkøbing, Özden, Weekes and Waldemar2023).
In conclusion, this study supports the validity of C-CLNT20 and NAME20 for assessing anomia in patients with dementia in multicultural populations. While both tests showed acceptable diagnostic accuracy for dementia, their sensitivity for MCI was limited, likely due to subtle or absent naming deficits in these patients and potential ceiling effects. In this study, NAME20 showed slight but consistent advantages over C-CLNT20. However, these findings should be confirmed by additional studies before any recommendations are made regarding the choice between the two tests. Both NAME20 and C-CLNT20 appear to be valid time-saving alternatives to their full-length versions and represent a meaningful progress towards cross-cultural naming tests.
Funding statement
This research was supported by THE VELUX FOUNDATIONS (grant number 00042578), which had no role in the formulation of research questions, choice of study design, data collection, data analysis or decision to publish. The Danish Dementia Research Centre is supported by the Danish Ministry of Health. Sanne Franzen is supported by grants from the Netherlands Organisation for Health Research and Development (#733050834 and #10510032120004). She also received consulting fees from Biogen in 2022 (unrelated to this work) and receives royalties on the Dutch version of the Five Digit Test and the modified Visual Association Test (published by Hogrefe).
Competing interests
T. Rune Nielsen and Maria Özden are coauthors on the original C-CLNT validation paper and Sanne Franzen is the main author on the original NAME validation paper.



