Statement of Research Significance
Research Question(s) or Topic(s): The study’s aim was to translate and adapt an emotion recognition test originally developed in French to Dutch, Moroccan-Arabic, and Turkish, and to report the procedures and challenges encountered during the translation and adaptation process. These adapted versions were then piloted among Dutch, Moroccan, Turkish, and Surinamese healthy participants and patients, and a cross-national comparison was conducted. Main Findings: This study’s main finding demonstrates the particular challenges of translating and adapting tests of social cognition, given cultural differences in normative ways to express emotion. Study Contributions: This study highlights the linguistic and cultural challenges involved in adapting neuropsychological tools. It shows that translation alone is insufficient; cultural meaning and nuance must be considered to ensure fair assessment. It also demonstrates that the TIE-93’s clinical potential merits further investigation.
Introduction
Despite rising demand, culturally appropriate neuropsychological tests, especially for social cognition domains like emotion recognition, remain limited (Bourdage et al., Reference Bourdage, Narme, Neeskens, Papma and Franzen2024; Franzen et al., Reference Franzen, Papma, van den Berg and Nielsen2021; Li et al., Reference Li, Quang, Filipčíková, Xu, Kumfor, Spehar and McDonald2025). Globally, the number of elderly migrants has nearly doubled between 1990 and 2020, rising from 25.5 to 48.2 million (World Health Organization, 2024). Compounding this, aging migrants face a higher risk of developing dementia, potentially due to increased exposure to “modifiable” risk factors (Livingston et al., Reference Livingston, Huntley, Liu, Costafreda, Selbæk, Alladi, Ames, Banerjee, Burns, Brayne, Fox, Ferri, Gitlin, Howard, Kales, Kivimäki, Larson, Nakasujja, Rockwood and Mukadam2024; Schmachtenberg et al., Reference Schmachtenberg, Monsees, Hoffmann, van den Berg, Stentzel and Thyrian2020; World Health Organization, 2021, 2024). In the Netherlands, individuals with a migration background represent 27.8% of the Dutch population, with older adults of Turkish, Moroccan, and Surinamese origin (see Textbox 1) forming the largest groups of first-generation immigrants (European Commission, 2024, cognition is crucial for diagnosing 2025).
Textbox 1. The Surinamese population in the Netherlands
The Surinamese migrant population in the Netherlands occupies a distinctive linguistic and cultural junction. Suriname, located in South America, was a former Dutch colony and has had Dutch as its official language since 1667 (Diepeveen & Hüning, 2016). However, Suriname is a multiethnic society comprising several major population groups, including Hindustani, Maroon, Creole, and Javanese communities, whose distinct historical trajectories and cultural traditions contribute to the country’s pronounced linguistic and cultural diversity (Stell, Reference Stell2018). The Dutch spoken in Suriname, therefore, has been influenced by the country’s multilingual and multicultural histories (Borges, Reference Borges2014). For example, the populations of Suriname speak over 20 languages (e.g., Surinamese Javanese, Kari’na; Carlin et al., Reference Carlin, Léglise, Migge and Fat2015). Accordingly, when many Surinamese people migrated to the Netherlands prior to Suriname’s independence in 1975 (Van Amersfoort, Reference van Amersfoort1984), they brought with them linguistic forms of Dutch similar yet distinct from native Dutch, and contributed a unique cultural presence to Dutch cities (Sansone, Reference Sansone1994).
In the Netherlands, dementia prevalence among Turkish, Moroccan-Arabic, and some Surinamese communities is significantly higher, 14.8%, 12.2%, and 12.6% respectively, compared to approximately 3.5% in the native Dutch population (Parlevliet et al., Reference Parlevliet, Uysal-Bozkir, Goudsmit, van Campen, Kok, Ter Riet, Schmand and de Rooij2016). Consequently, clinicians are increasingly assessing diverse populations (meaning culturally, linguistically, and educationally diverse; Franzen et al., Reference Franzen, van den Berg, Bossenbroek, Kranenburg, Scheffers, van Hout, van de Wiel, Goudsmit, van Bruchem-Visser, van Hemmen, Jiskoot and Papma2022; Nielsen, Reference Nielsen2022). Therefore, accurately assessing social cognition is crucial for diagnosing and differentiating stages of Alzheimer’s disease (AD) dementia from behavioral variant frontotemporal dementia (bvFTD; Barker et al., Reference Barker, Gottesman, Manoochehri, Chapman, Appleby, Brushaber and Onyike2022; Bediou et al., Reference Bediou, Brunelin, d’Amato, Fecteau, Saoud, Hénaff and Krolak-Salmon2012; Bourdage et al., Reference Bourdage, Narme, Neeskens, Papma and Franzen2024; Franzen et al., Reference Franzen, van den Berg, Goudsmit, Jurgens, van de Wiel, Kalkisim, Uysal-Bozkir, Ayhan, Nielsen and Papma2020 Strijkert et al., Reference Strijkert, Huitema and Spikman2022).
Despite its clinical relevance, social cognition remains difficult to assess in diverse groups, such as the migrant population of the Netherlands, largely due to a lack of appropriately adapted tools (Bourdage et al., Reference Bourdage, Narme, Neeskens, Papma and Franzen2024; Cerami et al., Reference Cerami, Boccardi, Meli, Panzavolta, Funghi, Festari, Cappa, Chatzikostopoulos, Chicherio, Clarens, de Oliveira, Di Lorenzo, Filardi, Ibanez, Girtler, Lebouvier, Logroscino, Luca, MacPherson and Matias-Guiu2025; Franzen et al., Reference Franzen, Papma, van den Berg and Nielsen2021). Most neuropsychological assessments, including emotion recognition tests, were developed in a “Global North” context shaped by sociopolitical stability and high living standards (Alladi & Hachinski, Reference Alladi and Hachinski2018; Bourdage et al., Reference Bourdage, Narme, Neeskens, Papma and Franzen2024). These tools therefore often embed cultural and educational biases, disadvantaging individuals from the “Global South” (i.e., countries with historically less access to resources and more sociopolitical instability; Ardila, Reference Ardila1996; Daugherty et al., Reference Daugherty, Puente, Fasfous, Hidalgo-Ruzzante and Pérez-Garcia2017; Fernández & Evans, Reference Franzen, van den Berg, Bossenbroek, Kranenburg, Scheffers, van Hout, van de Wiel, Goudsmit, van Bruchem-Visser, van Hemmen, Jiskoot and Papma2022; Alladi & Hachinski, Reference Alladi and Hachinski2018; Fujii, Reference Fujii2018).
As highlighted in our systematic review (Bourdage et al., Reference Bourdage, Narme, Neeskens, Papma and Franzen2024), some progress has been made toward addressing this resource gap. Overall, translations and adaptations were the most frequently used methods for developing such tests, yet these studies received the lowest quality ratings. These findings underscore the need for more rigorous and transparently documented adaptation practices to ensure validity across diverse cultural groups. This finding may also highlight the difficulty of actually achieving conceptual equivalence or validity across cultures. Jackson et al. (Reference Jackson, Watts, Henry, List, Forkel, Mucha, Greenhill, Gray and Lindquist2019) created a lexical map of 2,474 languages outlining cultural differences in the use of emotional language. Their results demonstrated that the meanings of emotion terms differ across languages, even when they are treated as equivalents in translation dictionaries. For example, in Tai-Kadai languages, “anxiety” was most closely linked to “fear,” whereas in Austroasiatic languages, it was more commonly associated with “grief” and “regret.” Conversely, “anger” was connected to “envy” in Nakh-Daghestanian languages but was more often tied to “hate,” “bad,” and “proud” in Austronesian languages. This is why, in the Netherlands, despite Surinamese migrant populations often speaking Dutch as a native language, it cannot be assumed, given the influence of other historical languages and cultures of Suriname, that Surinamese populations will understand emotional concepts translated to traditional Dutch the same as native Dutch populations. Yet finding conceptual equivalence is essential, as cultural differences in the language and understanding of emotions seem to affect their recognition. This is summarized in a review by Lindquist (Reference Lindquist2021), which synthesizes cross-cultural and developmental evidence demonstrating that emotion recognition is shaped by the availability and use of emotion-related vocabulary. This suggests that language guides how individuals perceive, categorize, and make meaning of facial emotional expressions.
Therefore, this study had two aims, subdivided into Part I and Part II. Part I included the primary aim, which was to examine and qualitatively report the process and the challenges of achieving conceptual equivalence in the adaptation of the Test d’Identification des Émotions Faciales (TIE-93), which is originally in French, to Dutch, Turkish, and Moroccan-Arabic for the populations of the Netherlands. The TIE-93 is an emotion recognition test adapted from Ekman’s paradigm (Ekman, Reference Ekman1970) and tailored for diverse groups. For full test construction considerations, see Bourdage et al. (Reference Bourdage, Franzen, Palisson, Maillet, Belin, Joly, Papma, Garcin and Narme2025a). In Part II of the study, the pilot-testing component, we sought to explore how Dutch, Surinamese, Moroccan, and Turkish participants might perform on the translated and adapted versions of the TIE-93. Performance among healthy controls across cultural groups was compared, and Turkish, Moroccan, and Surinamese healthy controls were also compared with corresponding patient groups to examine whether the TIE-93 could distinguish healthy controls from those with subjective cognitive impairment, mild cognitive impairment, and patients with dementia. Finally, to explore the potential impact of language on test performance, we compared Moroccan healthy controls from France, who completed the TIE-93 in French (Bourdage et al., Reference Bourdage, Franzen, Palisson, Maillet, Belin, Joly, Papma, Garcin and Narme2025a), with Moroccan healthy controls from the Netherlands, who completed the test in their native language, as this was the only group well represented in both countries.
Methods
Participants
Fifty-eight healthy controls and 20 patients were recruited to participate in this study. Of the healthy controls, 16 were Turkish (28% of the sample), 15 were Surinamese (26% of the sample), 14 were Moroccan (24% of the sample), and 13 were Dutch (with non-mixed heritages; 22% of the sample). Of the patient group, the majority were Turkish participants, representing 50% of the sample (n = 10). Otherwise, eight were Moroccan (8% of the sample) and 2 patients were from Suriname (10% of the sample). All controls were recruited by research assistants or students affiliated with the Erasmus MC University Medical Center in Rotterdam. Students visited local community centers and utilized snowball sampling for recruitment. Although this may have introduced bias in our study, these methods of recruitment are considered good practice when attempting to reach communities that are not reached by traditional recruitment methods (Pardhan et al., Reference Pardhan, Sehmbi, Wijewickrama, Onumajuru and Piyasena2025), such as those with low (health) literacy levels or those with limited trust in medical research. All Moroccan, Surinamese and Turkish participants recruited for this study were first-generation immigrants, with a median year of immigration of 1979 (IQR = 12.25) for healthy controls and a median of 1978 (IQR = 12) for patients. The Moroccan healthy controls from France were recruited as part of our previous validation study by students using the same methods (for more information, see Bourdage et al., Reference Bourdage, Franzen, Palisson, Maillet, Belin, Joly, Papma, Garcin and Narme2025a).
All patients were recruited from the memory clinic in the Alzheimer Center department of the Erasmus MC and lived in the Netherlands. All patients received a clinical diagnosis after a multidisciplinary team reviewed all collected data, which included a clinical interview, neuropsychological evaluation, and, if applicable, other diagnostic information such as neuroimaging data, blood-based and cerebrospinal fluid biomarkers, genetic testing, and/or assessments of functional impairment by an occupational therapist. Diagnoses were made using current research criteria for dementia subtypes, including those for AD (McKhann et al., Reference McKhann, Knopman, Chertkow, Hyman, Jack, Kawas and Phelps2011), mild cognitive impairment (Albert et al., Reference Albert, DeKosky, Dickson, Dubois, Feldman, Fox and Phelps2011), subjective cognitive decline (Jessen et al., Reference Jessen, Amariglio, Buckley, van der Flier, Han, Molinuevo and Wagner2020), Parkinson’s disease (Postuma et al., Reference Postuma, Berg, Stern, Poewe, Olanow, Oertel and Marek2015), and bvFTD (Rascovsky et al., Reference Rascovsky, Hodges, Knopman, Mendez, Kramer, Neuhaus and Miller2011). In total, there were nine patients with subjective cognitive decline, three with mild cognitive impairment, three with vascular dementia, three with frontotemporal lobar degeneration (including one patient with semantic variant primary progressive aphasia and one with Corticobasal Syndrome), one patient with AD dementia, and one patient with Parkinson’s disease dementia.
Study procedure
Controls underwent several experimental tests, including the TIE-93, if they reported no history of neurological or psychiatric challenges. Healthy controls also completed the RUDAS (Rowland Universal Dementia Assessment Scale; Storey et al., Reference Storey, Rowland, Basic, Conforti and Dickson2004) and GDS-15 (Geriatric Depression Sclae; Sheikh & Yesavage, Reference Sheikh and Yesavage1986) as global cognitive and depression screeners, respectively. Participants scoring ≤22 on the RUDAS (n = 2) or ≥5 on the GDS-15 (n = 5) were excluded. Moroccan and Turkish healthy controls also completed an adapted version of the Brief Acculturation Scale for Hispanics (BASH; Norris et al., Reference Norris, Ford and Bova1996). The BASH is a language-based proxy for acculturation (defined as the complex process by which attitudes, values, and behaviors of an individual’s culture of origin are modified or influenced as a result of contact with a different culture; Al-Jawahiri & Nielsen, Reference Al-Jawahiri and Nielsen2020); it measures someone’s use of the host language (in this case Dutch) vs their native language in different social contexts. As all Surinamese healthy controls spoke Dutch, they were not asked to complete the BASH as it likely would not serve as an appropriate indicator of acculturation in this specific group. All patients underwent a multicultural memory clinic assessment protocol, such as the Cross-Cultural Dementia Screening (Goudsmit et al., Reference Goudsmit, Uysal-Bozkir, Parlevliet, van Campen, de Rooij and Schmand2017), the Naming Assessment in Multicultural Europe (Franzen et al., Reference Franzen, van den Berg, Ayhan, Satoer, Türkoğlu, Genç Akpulat, Visch-Brink, Scheffers, Kranenburg, Jiskoot, van Hemmen and Papma2023), along with the inclusion of the appropriately translated or adapted version of the TIE-93. One patient was excluded from the study due to an incomplete TIE-93; the neuropsychologist reported that they ran out of time to administer this test during the neuropsychological assessment.
For all participants, education was measured using the Dutch Verhage scale. The Verhage scale is a classification system, with 0 indicating zero education and 7 indicating academic university education (Verhage, Reference Verhage1964). Participants were excluded from the study if they self-reported visual impairments or comprehension difficulties, or if such difficulties became apparent during the assessment. All participants completed the TIE-93 in their native language; Surinamese participants completed the TIE-93 in Dutch. Healthy controls gave their informed consent to participate in the study, which followed the ethical guidelines of the Declaration of Helsinki and was approved by the Erasmus MC University Medical Center Research Ethics Committee (METC number: 2022-0774). For patients, informed consent for use of clinical data was waived by the Erasmus MC University Medical Center Research Ethics Committee.
The TIE-93 test description and administration
TIE-93 stimuli
The stimuli used in the TIE-93 consist of photographs of actors sourced from the F.A.C.E.S. database, which includes an equal number of “Caucasian” male and female individuals across a range of ages (images used with permission from the authors; Ebner et al., Reference Ebner, Riediger and Lindenberger2010; see Figure 1). The photos of the actors, which are numbered and displayed simultaneously, facially depict: joy, disgust, anger, fear, neutrality, and sadness. Apart from the introductory example panel, the test comprises eight panels in total.
Example stimuli of the TIE-93. Consent was given by the authors of the F.A.C.E.S. database to include these images.

TIE-93 items
In each panel, the participant is asked to identify the image that corresponds to the emotion described by the test administrator using a given context – either by pointing to it or stating the number beneath the image. The six emotional contexts were selected for their potential cross-cultural relevance. In addition to the previously mentioned example, the scenarios include (translated from French): “He just bit into a rotten apple; he feels disgusted,” “He is calm; he feels nothing in particular,” “He is scared; he is very afraid,” “He is happy; he just won the lottery,” and “He is angry; he is upset.”
TIE-93 test administration
The test begins with the administrator providing a clear explanation of the procedure, using an example panel to illustrate the task. During this initial phase, the test-taker may ask questions, and the administrator is permitted to offer guidance or corrections to ensure full comprehension. Once the test-taker has demonstrated an understanding of the instructions, the actual testing phase begins. From that point on, the administrator must refrain from giving any feedback about the correctness of the responses. For healthy control participants, administration typically takes around 8 to 10 minutes. Each of the six emotions can yield a maximum of 8 correct responses, resulting in a total possible score of 48.
Part I – Translation and adaptation of the TIE-93 to Dutch, Turkish and Moroccan-Arabic
Translation procedure of the TIE-93
The Dutch, Moroccan-Arabic, and Turkish translations of the TIE-93 were made based on the International Test Commission Guidelines (International Test Commission, 2017); however, an expert panel was not included as funding was not available to hire linguistic experts native to target languages. Forward and blind back-translation methods were used (Bracken & Barona, Reference Bracken and Barona1991; Cha et al., Reference Cha, Kim and Erlen2007). For the forward translation procedure, a bilingual speaker translated the test items from the original French language into their native target language, while trying to maintain meaning equivalence by assessing the following: content, semantic, and conceptual equivalence (Flaherty et al., Reference Flaherty, Gaviria, Pathak, Mitchell, Wintrob, Richman and Birz1988; Tsai et al., Reference Tsai, Luck, Jefferies and Wilkes2018). Content equivalence refers to whether an item is relevant in each culture. Semantic equivalence refers to whether the literal meaning of an item is the same or similar after translation. Finally, conceptual equivalence refers to whether an item is understood in the same way by each culture (Flaherty et al., Reference Flaherty, Gaviria, Pathak, Mitchell, Wintrob, Richman and Birz1988; Tsai et al., Reference Tsai, Luck, Jefferies and Wilkes2018).
For the blind back-translation procedure, a second individual, native to the target language and unaware of the original TIE-93 version, translated the initial target version back into the source language. This step is considered helpful for assessing semantic equivalence and is commonly recommended in cross-cultural research (Beaton et al., Reference Beaton, Bombardier, Guillemin and Ferraz2000; Tsai et al., Reference Tsai, Luck, Jefferies and Wilkes2018). The original and the back-translated versions were then compared to detect discrepancies or inaccurate meanings. All discrepancies were resolved upon discussion between translators and the original authors of the TIE-93. Adaptations were also collaboratively agreed upon by the translators and the original authors of the TIE-93. All translators had experience working clinically with patients with dementia and were familiar with neuropsychological assessments.
TIE-93 Dutch translation
The Dutch version of the TIE-93 was forward translated from French to Dutch by co-author S.F. and back translated by external neuropsychologists. The Dutch forward and back-translation procedure involved three primary considerations for semantic and conceptual equivalence. The disgust item required some modification. Instead of “he feels disgusted” as in the original item, in Dutch the item was altered to be hij walgt ervan (“he is disgusted by it”) as, due to the grammatical structure of Dutch, someone feels disgust toward or about something, not just as a state in itself. The same had to be done for the anger item. Of note, the original French items, at times, used two words to describe the same feeling, such as for the anger item. This also posed a challenge given that there was a need to find two semantically equivalent terms. Also, the term s’énerve (meaning to be angrily agitated) is a specific French term which doesn’t quite translate directly or meaningfully into Dutch. Instead, the terms kwaad (“mad”) and boos (“angry”) were used to maintain conceptual equivalence. For the fear item, the Dutch version employed terms such as bang (“afraid”) or angstig (“anxious”), while back-translation yielded angoissé (“anxious”), suggesting a slight meaning and intensity change from the French il est vraiment effrayé (“he is very afraid”). It is worth noting how words in different languages may suggest different levels of intensity in their meaning. Finally, for the neutral item, the use of the original item il ne resent rien de spécial (“he feels nothing special”) when translated directly into Dutch, neutraal, isn’t a common way of talking about feeling neutral. Instead, the word normaal was used, meaning to feel “normal.” For a detailed description of considerations and resolutions, see Figure 2. Despite these differences, all Dutch items were retained, as they preserved overall conceptual equivalence.
Description of consideration and resolutions from the forward and back-translation procedure for the Dutch TIE-93 per meaning equivalence component.

TIE-93 MoroccanArabic translation and adaptation
The Moroccan-Arabic version of the TIE-93 was forward and back translated from the approved Dutch version by external neuropsychologists. Several discrepancies were identified and addressed. First, the phrase howa 3ad daba rbe7 lqflous (“won a lot of money”) was used in place of “lottery win” to maintain religious sensitivity, as gambling is forbidden in Islam. Second, the expression ki 7as rasou 3adi (“he feels his head is normal”) was selected as a natural idiom for emotional neutrality, preserving the conceptual meaning of “he feels normal.” Third, the fear item was translated using makhlou3 (“startled/shaken”) and khayef (“afraid”), avoiding the Dutch intensifier echt (“really”), since intensity in Moroccan-Arabic is more often conveyed through tone or delivery than vocabulary. Finally, some sadness items replaced “his mother” (zijn moeder) with either sahbto (“her friend”) or sahbo 3ziz 3lih (“his close friend”) depending on context, to align gender and relational framing without altering the underlying concept of concern for a loved one (for a detailed description of considerations and resolutions, see Figure 3). Given that many individuals in the Dutch Moroccan migrant population (also) speak Tamazight, a Tamazight version was also developed from the Dutch version. This version was reviewed and found comparable to the Moroccan-Arabic version. Of note, Arabizi is not used in this study given that Moroccan-Arabic lacks a standardized orthography and is commonly written using Latin-based scripts (Elinson, Reference Elinson2013). Reflecting this, the translators included in this study cannot read Arabic but can only speak it, highlighting a unique additional aspect to consider when translating and adapting tests for Moroccan-Arabic.
Description of consideration and resolutions from the forward and back-translation procedure for the Moroccan-Arabic TIE-93 per meaning equivalence component.

TIE-93 Turkish translation
The Turkish translation of the TIE-93 was forward translated from Dutch to Turkish by external neuropsychologists. Several translation nuances were noted during the review process. First, for the happy item, the word mutlu (“happy”) was chosen over sevinçli (“joyful”) to reflect clarity and naturalness, although the latter may have carried a more emotionally charged tone. Second, for the anger item, the expression bıktı (“fed up”) was occasionally used instead of sinirlendi (“angry”), particularly for female characters, reflecting a softer tone leaning toward frustration or exhaustion. Finally, in the fear item, korkuyor (“afraid”) was retained without intensifiers, such as gerçekten (“really”) or endişeli (“worried”), to avoid redundancy or excessive emphasis, even if this slightly reduced emotional intensity compared to the Dutch echt bang (for a detailed description of considerations and resolutions, see Figure 4).
Description of consideration and resolutions from the forward and back translation. Procedure for the Turkish TIE-93 per meaning equivalence component.

Part II – Piloting
Statistical analyses
All statistics were performed using RStudio version 2024.09.0, with a significance level of .05. Due to deviations in the data from a normal distribution, non-parametric statistical tests were used. To identify outliers among healthy controls, an education-adjusted interquartile range (IQR) method was applied to TIE-93 total scores, based on evidence that higher education is associated with better neuropsychological test performance (Dash et al., Reference Dash, Behera, Dehuri and Ghosh2023). This procedure resulted in the exclusion of two participants. TIE-93 total scores were subsequently summarized using medians and IQRs. Kruskal–Wallis tests were used to compare all healthy control groups on age, education, RUDAS, and GDS-15, while sex differences were assessed using Pearson’s chi-square. Mann–Whitney U tests were then applied to compare demographics of between healthy controls and patients, and to compare Moroccan healthy controls from France versus the Netherlands (except for the RUDAS and GDS-15 scores, as these data were not collected in France).
A permutation-based Analysis of Covariance (ANCOVA) using the Freedman–Lane procedure was used to compare groups (healthy control groups; Surinamese, Moroccan, and Turkish healthy controls and patients; and Moroccan healthy controls from France and the Netherlands; Freedman & Lane, Reference Freedman and Lane1983). This was chosen as the most appropriate analysis given that the outcome measure and our selected covariate (education) were both non-normally distributed, the covariate had limited variability, and our group sample sizes were relatively small. The Freedman–Lane approach builds its null distribution by permuting residuals, which requires no normality, handles small samples without relying on large-sample approximations, and accommodates covariates with a lack of variability by preserving the exact education–score relationship in each permutation. Education was selected as the covariate given its significant effect on TIE-93 performances, as mentioned above and as found in our previous study (Bourdage et al., Reference Bourdage, Franzen, Maillet, Belin, Papma and Narme2025b). If education had a significant effect on TIE-93 total score, then partial eta squared was calculated for effect size. If the permutation-based ANCOVA was significant, then Wilcoxon rank-sum tests were used to compare performances between groups on TIE-93 total score.
Results
The following analyses were conducted to explore how participants performed on the translated and adapted version of the TIE-93. Healthy controls did not report any difficulties with the task or issues with comprehension. Two patients reported that the TIE-93 was difficult and that there were few differences between facial expressions. Two other patients reported feeling tired by the end of the TIE-93. One additional patient also found the task difficult and reported fatigue by the end. Demographic characteristics of all participants are presented in Table 1.
Participant characteristics for all groups

Note: BASH = Brief Acculturation Scale for Hispanics. Patient group is composed of Surinamese, Moroccan, and Turkish participants. Age, education, RUDAS, GDS-15, and BASH scores are reported in median (interquartile range). Education is measured using the Dutch Verhage scale. A high score on the BASH indicates that the individual uses their native language more frequently than the host language.
Comparing demographic characteristics
No significant differences were found between healthy control groups for age (χ 2(3) = 3.48, p = .32), education (χ 2(3) = 6.53, p = .088), RUDAS (χ2(3) = 1.38, p = .71), or GDS-15 (χ 2(3) = 3.60, p = .31) scores, nor for sex distribution (χ 2(3) = 2.70, p = .44). Significant differences were found when comparing Surinamese, Moroccan and Turkish healthy controls with patients on education (W = 650, p < .01), age (W = 173, p < .001), RUDAS (W = 587, p < .01), and GDS-15 (W = 155, p < .001), where patients were less educated, older, scored more poorly on the RUDAS, and scored higher on the GDS-15 depression scale. No significant differences were found for sex (χ 2(1) = .05, p = .83). When comparing Moroccan healthy controls from the Netherlands and France, a significant difference was only found for education (W = 206, p < .01), where Moroccan healthy controls from France were significantly more educated. No significant differences were found for age (W = 151, p = .71) or sex (χ 2(1) = .49, p = .49).
Comparing healthy control groups on the TIE-93
Median and IQR values on TIE-93 total score for all healthy control groups showed ceiling effects except for Turkish healthy controls: Dutch 48 (0.5), Surinamese 48 (0.5), Moroccan 47.5 (2.75), and Turkish 45 (3.5). For an overview of TIE-93 total scores per healthy control group, see Figure 5. When comparing all healthy control groups on TIE-93 test performance, only Turkish healthy controls performed significantly more poorly than all other groups (W = 162, p < .01).
Scatter plots of healthy control performances on TIE-93 total score per cultural group.

Comparing Surinamese, Moroccan, and Turkish patients to healthy controls
The median and IQR on the TIE-93 total score for Surinamese, Moroccan, and Turkish patients were 32 (10). For an overview of TIE-93 total score performances for patients, see Figure 6. When comparing patients and healthy controls on TIE-93 total score, a significant difference was found, F(1, 62) = 76.23, p = .0002, with patients performing more poorly than controls. Education was also a significant predictor of TIE-93 total scores, F(1, 62) = 7.27, p = .0078, ηp 2 = .10, 95% CI [.02, 1.00].
TIE-93 total scores for Surinamese, Moroccan, and Turkish patients.

When comparing healthy controls to patients per emotion sub-score, healthy controls performed significantly better on all emotions: happiness (W = 576, p < .05), disgust (W = 769, p < .001), sadness (W = 842, p < .001), neutral (W = 812, p < .001), fear (W = 783, p < .001) and anger (W = 847, p < .001). For an overview of performances of healthy controls and patients per emotion sub-score, see Figure 7 a and b.
a. Culturally, linguistically, and educationally diverse healthy control TIE-93 emotion sub-score performances. b. Culturally, linguistically, and educationally diverse patient TIE-93 emotion sub-score performances.

Comparing Moroccan healthy controls from France and the Netherlands
The median and IQR value on TIE-93 total score for Moroccan healthy controls from France were 41 (10). For an overview of TIE-93 total scores, see Figure 8. When comparing Moroccan healthy controls from France and the Netherlands, analyses revealed a significant difference, F(1, 31) = 8.14, p = .008, with Moroccan healthy controls from the Netherlands performing better than those from France. Education was not a significant predictor, F(1, 31) = 0.06, p = .81.
TIE-93 total score for Moroccan healthy controls from France.

Discussion
This study aimed to translate and adapt the TIE-93 from French into Dutch, Turkish, and Moroccan-Arabic, and to report the procedures and challenges encountered during this process. The adapted versions were then piloted to examine performance in healthy control groups and to compare Moroccan, Turkish, and Surinamese healthy controls with patients. Finally, Moroccan healthy controls from the Netherlands and France were compared in a cross-national analysis to further explore the potential effect of language on test performance.
The translation and adaptation of the TIE-93 highlighted the challenges inherent in translating and adapting social cognition measures, particularly the need to balance natural phrasing with the preservation of emotional meaning. As noted by Jackson et al. (Reference Jackson, Watts, Henry, List, Forkel, Mucha, Greenhill, Gray and Lindquist2019), selecting Dutch emotional terms with equivalent literal meaning to the original French required close consultation with translators and the original test authors. Structural differences between languages further complicate this process, especially when adapting emotionally charged language and preserving equivalent intensity (Hendrikx et al., Reference Hendrikx, Van Goethem, Meunier and Hiligsmann2017). Shifts in semantic equivalence may also result in challenges to conceptual equivalence, as illustrated by the neutral item in this study, where a semantically appropriate Dutch translation yielded a conceptually different back-translation to French. Similar difficulties were found for the Turkish version of the TIE-93, including cultural considerations regarding appropriate emotion words based on the sex of the actor in the stimuli. Translation into Moroccan-Arabic presented additional complexity due to its layered linguistic history, shaped by contact with Spanish, French, and Classical Arabic, resulting in a distinct and evolving linguistic system (Bullock, Reference Bullock2014; Younes et al., Reference Younes, Souissi, Achour and Ferchichi2020). Although often referred to as a dialect, Moroccan-Arabic has its own grammatical and syntactic structures, necessitating careful consideration during adaptation, including for content potentially incongruent with religious or cultural beliefs.
These challenges underscore both the methodological difficulty of translation and adaptation work and the evolving understanding of best practices. Our previous systematic review (Bourdage et al., Reference Bourdage, Narme, Neeskens, Papma and Franzen2024) showed that translation procedures are often poorly documented, frequently summarized in a single sentence. The present study reflects a growing recognition of the complexity and rigor required for such work. For example, back-translation alone is now understood to be insufficient for ensuring conceptual equivalence (Behr, Reference Behr2017), and only recently have guidelines from the International Test Commission specifically for neuropsychological test development been published (Nguyen et al., Reference Nguyen, Rampa, Staios, Nielsen, Zapparoli, Zhou, Mbakile-Mahlanza, Colon, Hammond, Hendriks, Kgolo, Serrano, Marquine, Dutt, Evans and Judd2024). The procedure recommended by Nguyen et al. (Reference Nguyen, Rampa, Staios, Nielsen, Zapparoli, Zhou, Mbakile-Mahlanza, Colon, Hammond, Hendriks, Kgolo, Serrano, Marquine, Dutt, Evans and Judd2024) involves assembling an expert multidisciplinary team (including members of the target culture and experts in the source and target languages), prioritizing conceptual and functional equivalence during translation and adaptation, qualitatively and quantitatively piloting the new versions and conducting analyses to confirm equivalent psychometric properties. However, although these procedures are ideal and rigorous, the findings of this study demonstrate that translation and adaptation of emotional and social concepts, which are intricately linked with culture, found in all social cognition tests, remain particularly inherently difficult. In this context, the detailed description of the TIE-93 adaptation process was provided to highlight that semantic equivalence does not necessarily ensure conceptual equivalence. These findings also point to the need for dedicated funding to support translation and adaptation work, enabling the formation of an expert committee as recommended by Nguyen et al. (Reference Nguyen, Rampa, Staios, Nielsen, Zapparoli, Zhou, Mbakile-Mahlanza, Colon, Hammond, Hendriks, Kgolo, Serrano, Marquine, Dutt, Evans and Judd2024). Such resources would also allow for larger samples, facilitating analyses requiring substantial power, such as differential item functioning (Ramirez et al., Reference Ramirez, Teresi, Holmes, Gurland and Lantigua2006), and improving understanding of whether performance differences reflect linguistic, cultural, or educational factors.
Regarding the piloting component, most healthy control groups performed similarly on the TIE-93, except for Turkish healthy controls. Due to small sample sizes, further analyses of underlying factors causing this difference were not possible; however, several variables may have affected test performance. Although differences in acculturation, not fully captured by the BASH, may be a consideration, other factors to consider would be differences between test language versions, or differences in participant’s social norms around emotion expression (Elfenbein & Ambady, Reference Elfenbein and Ambady2002) or in social determinants of health (Migeot et al., Reference Migeot, Calivar, Granchetti, Ibáñez and Fittipaldi2022), not measured in this study.
Moroccan, Turkish, and Surinamese patients performed significantly more poorly than healthy controls across all emotions, consistent with previous findings demonstrating the TIE-93’s ability to distinguish healthy controls from patients with AD dementia (Bourdage et al., Reference Bourdage, Franzen, Palisson, Maillet, Belin, Joly, Papma, Garcin and Narme2025a). However, a limitation to the study is the small sample size, which did not allow further analyses to disentangle the effects of culture, education, and language and the uneven ratio of cultural representation in the patient sample, with most patients being from Turkey. Therefore, although the TIE-93 shows clinical potential, its diagnostic accuracy would require further evaluation in larger samples. Moroccan healthy controls from the Netherlands outperformed those from France, possibly reflecting reduced language barriers, but this difference may also relate to cultural, demographic, or cohort effects, or to differences between the French and Moroccan-Arabic versions of the TIE-93. These possibilities warrant further investigation.
This study has several strengths. Despite the absence of a formal expert panel, forward and blind back-translation procedures were conducted using bilingual individuals familiar with neuropsychological testing, in close consultation with the original test authors. In contrast to the limited reporting identified in prior work (Bourdage et al., Reference Bourdage, Narme, Neeskens, Papma and Franzen2024), this study provides a detailed account of translation decisions and their rationale. However, due to small sample sizes, analyses were exploratory and could not control for other potentially influencing variables such as age, acculturation, and country of origin. The extent to which demographic variables such as these affect TIE-93 test performance in the sample is unknown. Other possibly influencing demographic data were also not collected such as socioeconomic status or degree of exposure to the host culture. Snowball sampling may also have introduced bias, as healthy controls tended to have higher education levels than typically observed in migrant populations (Parlevliet et al., Reference Parlevliet, Uysal-Bozkir, Goudsmit, van Campen, Kok, Ter Riet, Schmand and de Rooij2016). Intra-group bias could not be assessed, and patient samples were too small to examine diagnostic subgroups.
Taken together, this study illustrates the difficulty of achieving true conceptual equivalence in cross-cultural test adaptation. The complex interplay between a language’s history, structure, and use means that achieving a literal translation does not ensure that the meaning is translated. These challenges raise broader questions regarding who is best positioned to conduct such adaptations and how feasible it is to address resource gaps for diverse populations. Even when adaptation is successful, assessments may still inadequately capture performance across groups sharing a country of origin but differing in acculturation. Future research should therefore replicate these findings in larger, more diverse samples, examine intra-group differences, and further investigate in- and out-group biases in emotion recognition tasks.
In sum, this study underscores the inherent challenges of cross-cultural adaptation in emotion recognition testing, highlighting that while the TIE-93 shows promise, achieving true emotional meaning equivalence remains a complex yet essential goal for equitable neuropsychological assessment.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1355617726101982.
Acknowledgments
We would like to thank the Erasmus MC University Medical Center students who have assisted in data collection: Ceyda Sekerci, Souhaila Tamali, Sandjai Ramsaran, Roos Lemmen, and Naomi Maas. We would also like to sincerely thank those who assisted in the adaptation of the TIE-93: Najoua Lazaar, Lamia Tamali, Marie-Noelle Witjes-Ane, Ervanur Keceli, and Metehan Bebek.
Funding statement
This work was supported by the IDEX Global Fellowship for author R.B. from Université Paris Cité, the Fondation Médéric Alzheimer, France, and the Prix Chaffoteaux Fondation de France/Société Française de Gériatrie et Gérontologie. S.F. and J.P. are supported by ZonMW (#73305095007, #1051003210004) and Health Holland, Topsector Life Sciences & Health (PPP-allowance, #LSHM20106). S.F. and J.P. receive royalties on two neuropsychological tests (mVAT and FDT, Hogrefe). SF served as a consultant to Biogen in 2022.
Open access funding provided by Erasmus MC.
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.



