The Brief International Cognitive Assessment in Multiple Sclerosis (BICAMS): Validation in Arabic and Lebanese Normative Values

Abstract Objective: Multiple sclerosis (MS) is often associated with cognitive deficits. Accurate evaluation of the MS patients’ cognitive performance is essential for diagnosis and treatment recommendation. The Brief International Cognitive Assessment in Multiple Sclerosis (BICAMS), widely used cognitive testing battery, examines processing speed, verbal and visuospatial learning, and memory. Our study aims to examine the psychometric properties of an Arabic version of the BICAMS and to provide normative values in a Lebanese sample. Method: The BICAMS, comprised of the Symbol Digit Modalities Test (SDMT), Brief Visuospatial Memory Test-Revised (BVMT-R), and a newly developed verbal learning/memory test, the Verbal Memory Arabic Test (VMAT), were administered on healthy subjects and MS patients. The sample consisted of 180 healthy individuals, of whom 63 were retested after 2–3 weeks. Forty-three MS patients matched with 43 healthy subjects based on age, sex, and years of education were assessed. A sample of 10 MS patients was also examined on two occasions. Test–retest reliability and criterion-related validity were examined, and regression-based norms were derived. Results: The test–retest correlations showed good evidence of reliability with coefficients ranging between 0.64 and 0.73 in the healthy sample, and between 0.43 and 0.92 in the MS sample. The BICAMS was able to discriminate between MS patients and matched healthy participants on the SDMT and BVMT-R. Normative data were comparable to other studies. Conclusions: This new Arabic version of the BICAMS shows initial good psychometric properties. While good evidence of VMAT’s reliability was shown in the healthy participants, less test–retest reliability in this tool was seen in the MS group, and partial criterion-related validity was evident. This renders further examination of the VMAT. We provide regression-based norms for a Lebanese sample and encourage the use of this battery in both research and clinical settings.

Several batteries for assessing cognitive function in MS have been developed and validated (Sumowski et al., 2018). The most commonly used ones are the Brief Repeatable Battery of Neuropsychological Tests (BRB-N) (Rao, 1990), Minimal Assessment of Cognitive Function in MS (MACFIMS) (Benedict et al., 2002), and the Brief International Cognitive Assessment for MS (BICAMS) (Langdon et al., 2012). The BICAMS has several advantages over other batteries, such as the BRB-N and MACFIMSnon-neuropsychology specialists can administer it requires a shorter time for administration, and international use was taken into consideration during development. Moreover, the battery was shown to have ecological validity; it predicted the real-life functional performance of MS patients (Goverover, Chiaravalloti, & DeLuca, 2016). The BICAMS examines the commonly affected cognitive functions in MS using three standardized tests with stable psychometric properties (Langdon et al., 2012); the Symbol Digit Modalities Test (SDMT) (Smith, 1982), California Verbal Learning Test-Second Edition (CVLT-II) (Delis, 2000), and the Brief Visuospatial Memory Test-Revised Edition (BVMT-R) (Benedict, 1997).
Although the BICAMS scores have shown evidence of their validity across 11 languages (Corfield & Langdon, 2018), such evidence is currently lacking among Arabicspeaking populations. An Arabic BICAMS with evidence of reliability is currently present (Kishk et al., 2017); however, the psychometric validation of the battery has not been completed and might not be appropriate for other Arab countries, as noted by Paul, Brown and Hughes (2019) in their recent systematic review (Paul et al., 2019). Results point to the need for additional translation, adaptation, and validation of standard MS cognitive measures for use in Arabic-speaking populations, although some limitations of BICAMS exist (El Ghoneimy, Hassan, Homos, Farghaly, & Dahshan, 2015;Hamdy et al., 2013;Kishk et al., 2017;Paul et al., 2019). In our study, we examined a new Arabic version of the battery following the BICAMS international standards for validation (aim 1) (Benedict et al., 2012). This version of the BICAMS includes a newly developed Verbal Memory Arabic Test (VMAT) that could be generalized to most Arab cultures (Zeinoun, Farran, Khoury, & Darwish, 2020). Additionally, given the importance of producing normative data relevant to local populations (Smerbeck et al., 2018), we provide normative values for the BICAMS in a Lebanese sample (aim 2).

Design and Methods
This cross-sectional observational study was approved by the American University of Beirut Institutional Review Board, and all participants signed an informed consent. Human data were obtained in compliance with the Helsinki Declaration. The study was conducted in the period 2017-2019. The methods followed were primarily based on other studies that provided evidence of BICAMS validity (Filser et al., 2018;Goretti et al., 2014;Ozakbas et al., 2017;Polychroniadou et al., 2016;Sousa et al., 2018).

Sample
For the sample size of both healthy and MS patients, we followed Benedict et al. (2012) recommendations. The authors mentioned that 150 or more healthy persons are needed for data applicable to persons of all ages and diverse ethnicity. Benedict et al. (2012) likewise mentioned that additional 35 healthy participants could be recruited to round out the normalization sample. The authors also mentioned that ideally 65 healthy individuals should be matched with MS patients. Nevertheless, in our study, data from the MS patients were limited in terms of the number of participants.
Initially, we recruited patients using flyers posted at the hospital, clinics, and various social media platforms, then we relied on the participants' word of mouth and snowball techniques that we found to be more efficient. Around 75% of the final sample was recruited through word of mouth and snowballing. We attempted to purposefully target potential participants from different areas of Lebanon and age groups. There was no incentive for participating.
The final sample included 180 healthy participants recruited from the community. From this group, a subsample of 63 individuals was retested after 1-3 weeks. We subjected all participants to the same rigorous two phases of inclusion/ exclusion screening process. During the first phase of enrollment, healthy participants, older than 16 years without a history of neurological disorders, traumatic brain injury, and psychiatric disorders, including alcohol and/or drug dependence, were included in the study. Men who consumed more than 15 drinks/week and women who consumed more than eight drinks/week were considered excessive alcohol consumers and excluded from the study (McGuire, 2011). During the 3 months prior to screening, individuals who were on antidepressants, mood stabilizers, or medications known to affect cognitive performance were excluded as well.
Excerpts from the BRFSS were used to complete the screening (Centers for Disease Control and Prevention, 2009). The BRFSS is a screening questionnaire that enquires about the participant's current health status, medical history, physical activity (weekly frequency, type, and duration of activity), smoking habits (current and previous smoking habits), age, years of education, and alcohol consumption (weekly and monthly). The following data were gathered using the BRFSS: age, years of education, marital status, educational attainment, current employment, annual household income, area of residence, primary language, diagnosis with any illness (if yes, indicate illness), current use of medications, current participation in volunteering activities, frequency of physical activity (if any), presence of any difficulties that limit one's activities in addition to factors associated with cognitive function such as smoking including hubble bubble (if yes, amount smoked per day), alcohol intake in the past 30 days (frequency and amount), leisure activities, and cognitive performance in the last 12 months (presence of any difficulties, impact on daily activities.). Information about these variables was obtained through participant's self-report.
After completing the second phase of enrollment, we used the Arabic version of the Hopkins Symptoms Checklist-25 (HSCL-25) to screen for symptoms of depression (during the past week). This phase was administered post-consenting. Participants were also excluded if they scored 3.3 or more on the depression subscale (Fares, Dirani, & Darwish, 2019;Mahfoud et al., 2013;Winokur, Winokur, Rickels, & Cox, 1984).
The Montreal Cognitive Assessment (MoCA) was used post-consenting to screen for cognitive impairment (Nasreddine et al., 2005;Rahman & El Gaafary, 2009). A cutoff score of 26 for individuals below the age of 60 years and 24 for those 60 years or older was used (Carson, Leach, & Murphy, 2018). In other words, individuals below the age of 60 years were excluded if they scored less than 26, while those older than 60 years were excluded if they scored below 24.
MS patients were recruited from the American University of Beirut Medical Center MS center. MS patients were diagnosed according to the McDonald 2010 criteria by a neurologist (Polman et al., 2011). This information was checked in the patient's medical record. Only patients with a disease duration greater than 1 year were enrolled in this study. This was established through reviewing the patient's date of symptom appearance. Following the same eligibility criteria of the healthy subjects, 43 MS patients were matched to 43 healthy participants based on age, sex, and years of education on an individual level. (Table 1, Part A). The bands used were the following: age þ/-3 years, 1:1 ratio on sex, and þ/-2 years on education. We did not exclude MS patients with cognitive deficits or depression given its high prevalence in MS patients and importance for examining the discriminative abilities of the tests.

Training for data collectors
We trained two data collectors who were either in their senior year or holders of an undergraduate degree in psychology. Training consisted of a 3-hr practical workshop that included a live demonstration of flawed and flawless administrations. Then, the data collectors were observed in three mock administrations, until no errors were detected. Authors PZ and NF supervised the training and ongoing data collection; NF also collected data and was previously trained by PZ. All tests were scored by NF.
All tests were administered in a standardized manner, in a quiet room, using the Lebanese Arabic dialect. Tests were also performed in a fixed order, beginning with screening, including the BRFSS and HSCL-25, and next, the SDMT, and the VMAT learning trials and short delay recalls, and then by the BVMT-R. Other segments of the verbal memory test were performed last (25 min post-VMAT short delay recalls). Hence, the total duration of the BICAMS administration was around 55 min.
The oral version of the SDMT (Smith, 1982) was administered to all participants. Using a test form that contains a 9 symbol-digit paired (key) with a sequence of symbols (stimuli), the participant was required to respond by voicing the digit associated with each symbol as quickly as possible. A sequence of 10 symbols is first used for practice. Then the participant was given 90 s to complete as many items as possible present in the form after the practice items (Smith, 1982). The dependent variable was the number of correct responses in 90 s.
For the BVMT-R (Benedict, 1997), participants were asked to recall a matrix of six simple abstract designs after 10 s of visual exposure. The participants reproduced the designs using paper and pencil as accurately as possible in their correct positions. In total, the test was repeated three times for each participant. Each figure received a score of 0, 1, or 2, based on accuracy and location scoring criteria (Benedict, 1997). The dependent variable was the total score across the three trials.
We recently developed and validated a VMAT (Zeinoun et al., 2020), which substituted the CVLT-II in our study. The VMAT was developed indigenously in Arabic using quantitative and qualitative methods. Following a rigorous process, words that are more or less familiar to all Arab regions were selected during the development of the VMAT to facilitate use in other Arab countries. The instrument measures verbal learning, short-term memory, long-term memory, and recognition. Similar to other standardized verbal learning/memory tests, and in line with Benedict et al. (2012) recommendations, the examinee is presented with 15 words (List A) to be recalled freely, across 5 trials, and is then presented by another 15 words (List B) which serve as an interference trial. Following the recall of List B, the participant was required to recall List A with and without semantic cues. Following a 25min delay, the test-taker was required to recall List A with and without cues and then recognize the words from List A from an array of 45 words that include List A, List B, and additional distractors. Several scores could be derived from the VMAT such as the number of words recalled per trial, the total number of words recalled on trials 1 to 5, in addition to a recognition discriminability index (i.e., the ability to endorse the 15 target items and reject all 30 distractors) (Zeinoun et al., 2020). However, the regression model and normative data included only the total number of words recalled in trials 1 to 5 as dependent variables. We chose this variable for the VMAT based on the methods of other studies 96 H. Darwish et al. (Filser et al., 2018;Ozakbas et al., 2017;Polychroniadou et al., 2016;Sousa et al., 2018). Medical charts were also reviewed to collect the MS patients' disease type, disease duration, and Expanded Disability Status Scale score.

Normative Values
Regression-based norms were calculated following the previously described procedure applied for the MACFIMS (Parmenter, Testa, Schretlen, Weinstock-Guttman, & Benedict, 2010), which has recently begun to be utilized for the BICAMS (Goretti et al., 2014). To ensure the normal distribution of the raw test scores of healthy participants, we have first retrieved the cumulative frequency distribution of the SDMT, BVMT-R, and VMAT score of trials 1 to 5. The resulting distribution was converted into a standard scaled score with a mean (M) of 10 and a standard deviation (SD) of 3 (actual scaled score). Next, regression equations for the predicted scaled scores were modeled; stepwise regression analyses were performed including age, age 2 , sex (1 = male, 2 = female), and years of education. A squared term of age was used to adjust for the nonlinear relationship between age and cognition (Goretti et al., 2014;Parmenter et al., 2010). We used stepwise regression based on Parmenter et al. (2010), MACFIMS, and Goretti et al. (2014). We performed a forced entry analysis to compare the results. The derived equations of the statistically significant models include unstandardized β-coefficients of the predictors and the constant. Both assumptions of homoscedasticity and normality of residuals were evaluated and met.
These regression models were used to generate normative values. First, the equations using specific demographic infor-mation were derived. Next, the predicted scaled score was subtracted from the actual scaled score. The difference was then divided by the residual SD of the healthy group tests. Finally, the derived value could be converted to other standardized scores, such as Z scores, to classify performance (Goretti et al., 2014).

Statistical Analysis
Descriptive statistics were calculated; mean (M), standard deviation (SD) and median for continuous data, and frequencies and percentages for categorical ones. For the test-retest reliability analysis, Pearson's correlations between test scores on both sessions were calculated (coefficient: r). When data violated the assumption of normality, the nonparametric alternative Spearman's Rho was used.
To evaluate whether the BICAMS score could differentiate between known groups membership, scores were compared between matched healthy subjects and MS patients using Mann-Whitney U-test, since the data were significantly skewed. Effect sizes were computed for variables that discriminated between the groups.
To derive normative values, a stepwise multiple regression analyses for each of the dependent variables, SDMT, BVMT-R, and VMAT (total score on trials 1 to 5), were conducted with age, age 2 , sex, and years of education entered as predictors (see the previous section on "normative values" for more details). Regression analyses using forced entry method were also run to compare resultssimilar outcomes were obtained. All analyses were performed on SPSS version 25, a twotailed test, and results with p < 0.05 were considered statistically significant.

Descriptives
Two hundred and thirty-four healthy participants from the community were screened for eligibility, 54 were excluded, and 2 from the MS group (psychiatric illnesses). Figure 1 is a flow chart of participants' recruitment. Table 2 summarizes demographic information of the full healthy sample, in addition to the MS group.

Healthy Individuals (Full Sample)
The average age of healthy individuals (n = 180) was 45.01 ± 19.36 years. The youngest participant was 16 and the oldest was 80 years old, and Table 3 reports the scores received on the different tests of BICAMS. The 63 individuals who were retested had a mean age of 33.15 ± 15.98 years, similar sex distribution (54% males), and high educational attainment (71.4% completed university). There were no significant differences between the two matched samples on age (t = 0.234, p = 0.816) and education (t = 0.751, p = 0.455). There were also equal numbers of males and females. Nevertheless, the samples differed based on symptoms of depression scores (t = -3.56, p = 0.001) and MoCA (t = 5.85, p < 0.001).

MS Patients Retested
The mean age of the 10 MS patients who were subject to testretest was 38.71 ± 12.87 years, and most were females (n = 8). Six individuals were diagnosed with RRMS, three with SPMS, and one with PPMS. Most of this subsample completed university education (n = 7), one completed high school, one obtained vocational education, and one had some high school education.

Criterion-Related Validity (Group Differences)
Forty-three MS patients were matched with 43 healthy individuals. MS patients scored lower than healthy participants on SDMT, BVMT-R, VMAT trials 1 to 5, short delay-free and cued, as well as long delay-free and cued. The SDMT and BVMT-R discriminated the most between the groups with a larger effect size for the SDMT (Table 1).
Because the difference in scores between the groups on the VMAT did not reach statistical significance, for further validation, we examined the scores of the MS patients on the verbal memory section of the MoCA (total score 5), and we found a higher than average mean score of 3.31 ± 1.07.

Normative Values
Our data suggest that age and education were the strongest demographic contributors to test performance. Specifically, age or age 2 and education predicted the SDMT and BVMT-R scores. Age 2 mainly predicted the VMAT, total
scores on trials 1 to 5. Table 4 reports the raw to scaled scores (M = 10, SD = 3) conversion using the BICAMS cumulative frequency distribution derived from our sample (Part A) and the results of the regression models, which were statistically significant (Part B). The derived equations used to compute predicted scaled scores are also listed in the table, alongside an example on how to apply them. To facilitate the adoption and usage of the proposed norms in the clinical setting, the reader can access an Excel file with built-in formulas through https://arabicbicams.wixsite.com/ website to calculate the following parameters: actual scaled score, predicted scaled score, Z score, and t score.

DISCUSSION
This study contributes to the international utility of the BICAMS by providing partial evidence of validity and fair evidence of reliability for the Arabic version of the battery. The SDMT and BVMT-R tests mostly showed good psychometric properties. Here, we followed the recommendations and standards of the BICAMS consensus committee (Benedict et al., 2012). We also provide Lebanese normative data. Although the BICAMS does not replace a more comprehensive evaluation of cognitive function in MS, it is valuable as a brief tool that can be integrated into broader assessments.
The availability of BICAMS increases accessibility for cognitive assessments in nonspecialized centers (Langdon et al., 2012). Translation and validation of the BICAMS were performed in many regions of the world and languages (Corfield & Langdon, 2018), and normative data were established for populations in several countries, such as Italy (Goretti et al., 2014) and Greece (Polychroniadou et al., 2016).
In this study, the BICAMS showed good evidence of testretest reliability, which is essential for longitudinal assessments of cognition in MS. The SDMT, in particular, whereby different test forms and retest sessions were utilized, was higher than other measures. This result is in line with other studies (Benedict, 2005;Goretti et al., 2014;Sousa et al., 2018).
The VMAT showed fair reliability results (including the total learning trials 1 to 5 score). Regarding this main outcome variable, results are comparable to other BICAMS studies, which provided evidence of validity (Polychroniadou et al., 2016;Sousa et al., 2018). The highest values were on the cued recall trial. It should be noted that while good test-retest results were present for the healthy group, weaker results were found in the MS group in terms of the short delay free and cued recall trials. The use of one form of the VMAT in our study most likely elevated practice effects, thus hindering reliability analyses. We are currently developing alternative forms of the test and recruiting a larger and more diverse sample for additional psychometric validation. The BVMT-R, on the other hand, showed slightly lower evidence of reliability in the healthy sample. This could be due to several participants reaching high ceiling scores on the first testing sessions, while others significantly increased their performance toward the second testing session. Nonetheless, the BVMT-R is known to be adequate for international use (Benedict et al., 2012) and showed good test-retest reliability measures among persons with MS.
The matched MS sample performed lower than the healthy participants on the Arabic BICAMS, but only SDMT and BVMT-R could significantly discriminate between the samples. We compared the current results on the VMAT with our prior work using the same tool in a smaller MS sample as well. In Zeinoun et al. (2020), results indicated that the VMAT could significantly discriminate between MS patients and healthy individuals on various subscores. In our current study, this significance in discrimination was absentalthough MS patients scored lower than healthy individuals. We partly attribute this discrepancy in results to the different sampling methods and performance of the patients included in the studies. In the current study, 43 patients were matched with 43 healthy individuals, as opposed to the original VMAT study, in which 16 MS patients were matched with 32 healthy individuals. The scores of the healthy participants groups used in both studies were similar but differed for MS patients. In this study, the MS patients' scores on the VMAT were higher; for example, on the long delay free recall, MS patients in the original VMAT study scored 8.62 ± 3.07, while in the current study, they scored 10.83 ± 3.2. It could suggest that the examined MS sample had no or little impairment in verbal memory, which was confirmed when examining their performance on the verbal section of the MoCA. The majority scored above average. The better performance of the MS group in the current study when compared to the original VMAT study could have contributed to losing statistical significance when examining the discrimination of the test.
In essence, we encourage the use of the battery in Arabspeaking MS patients. Nevertheless, caution should be exercised in reference to the VMAT. In particular, while the full battery might be useful during an initial assessment of the patient, we advice to administer the battery again following a substantial period of time (such as 6 months). Alternatively, we are in the process of building and examining an alternative  In this study, we also provide regression-based norms for a Lebanese sample. This approach has several advantages over conventional norming methods, such as linearly adjusting covariates. Also, it applies to smaller samples (Oosterhuis, van der Ark, & Sijtsma, 2016). Although the scores and the impact of a few demographic variables on test results were similar to other studies (Goretti et al., 2014;Vanotti, Smerbeck, Benedict, & Caceres, 2016), not all variables contributed equally. This is not surprising in light of cultural differences in testing. The current results, age, and education as main predictors of cognitive performance are similar to findings from another study in the same country where we validated and normed a visual memory test (Rey Complex Figure Test) (Darwish, Zeinoun, Farran, & Fares, 2018). Furthermore, the Portuguese BICAMS validation study and regression-based norms reached similar conclusions (Sousa et al., 2018). This highlights the importance of national validation of cognitive tests across cultures as subtle differences in test performance could be present.
Several limitations need to be taken into account when interpreting our findings. The majority of the MS sample were diagnosed with RRMS, and the sex distribution was not balanced. Further evidence on the validity of the new Arabic version of the BICAMS with a more diverse profile and a larger sample will be pursued in future studies. Along a similar vein, examining external validity can potentiate a wider use of the battery, especially in other Arab countries.
Lastly, the frequency of individuals who attained a university degree in this study was higher in younger individuals when compared to older participants. This is in line with the age distribution, literacy, and education rates in Lebanon. The literacy rate is 99.24% for those between the ages of 15 and 24 years, and 60.15% for the 65 years and older. Sixteen percent of the population is between the ages 15 and 24 years, the majority, 45.27%, are between the ages 25 and 54, 8.3 and 7% are 55-64 and 65 years or older, respectively, with around 11-12 years of school life expectancy (CIA, 2020). Also, in 2017, it was reported that 93% of the population finished primary education, 63-70% secondary education, and 45-49% tertiary education (UNESCO, 2017).
The current study shows encouraging psychometric properties of a new Arabic version of the BICAMS and provides regression-based norms for a Lebanese sample. The findings and data presented can enhance MS-related clinical practice in this region, and we encourage the use of this battery in both research and clinical settings. Equations derived from Part B to convert MS raw scores to regression-based t scores or any other type of standardized scores: SDMT scaled score predicted = 11.569 -0.001 (age 2 ) þ 0.082 (education).
An example on how to apply the normative values: Consider a 40-year-old female MS patient with 16 years of education. This patient scored 42 on the SDMT. From Part A, we know that her SDMT raw score corresponds to an actual scaled score of 7, and the predicted scaled score is 11.281 based on the formula provided above [11.569 -0.001 (402) þ 0.082 (16)]. Next, to deduce the Z score, we subtract the predicted scaled score from the patient's actual scaled score and then divide the difference by the residual SD of the healthy participants; residual SD is provided in Part B. Here, we have (7 -11.281) ÷ 2.132, which equals to -2 (i.e., t score: 30) and suggests that her SDMT performance can be classified as impaired (Z ≤ -2 SD).
To facilitate the adoption and usage of the proposed norms in the clinical setting, the corresponding author, upon request, can provide a spreadsheet table with a built-in formula that calculates and yields the above parameters.