Statement of Research Significance
Research Question(s) or Topic(s): This study addressed the need for a Hebrew version of the California Verbal Learning Test, Third Edition (CVLT-IIIHebrew). It examined the test’s adaptation for Hebrew speakers in Israel and established initial norms for this population. Main Findings: The CVLT-III was successfully translated and adapted into Hebrew. The study established initial norms for this version based on the performance of healthy Hebrew-speaking Israeli adults. Age and education level affected test performance, while sex impacted performance to a lesser degree. The CVLT-IIIHebrew showed high internal reliability. Study Contributions: This study provides the first formal Hebrew adaptation and initial norms for the CVLT-III in Israel. The study’s findings offer clinicians a valuable tool for evaluating verbal memory while emphasizing the need to expand the norms to include individuals with lower education levels and those belonging to Israeli minorities.
Introduction
The evaluation of memory and learning, commonly impacted by neuropsychiatric disorders, is integral to neuropsychological assessments (Lezak et al., Reference Lezak, Howieson, Bigler and Tranel2012; Reynolds et al., Reference Reynolds, Altmann and Allen2021; Sherman et al., Reference Sherman, Tan and Hrabok2022). Verbal memory is frequently assessed using word lists, with the California Verbal Learning Test (CVLT; Delis et al., Reference Delis, Kramer, Kaplan and Ober1987) consistently ranked among the three most popular memory tasks by clinicians (Rabin et al., Reference Rabin, Nester, Barr, Boyle, Stern., Stein., Sahakian., Golden., Lee. and Chen2023). Unlike memory tasks that screen for impairment by providing a single outcome score, the CVLT offers an in-depth analysis of memory processes and a wealth of quantitative outcome measures. Notably, the test is unique in its ability to assess learning strategies. More specifically, the CVLT’s primary wordlist includes semantically related words, allowing the examiner to clarify strategy use by comparing the examinee’s performance in the free recall and cued-recall trials (Bair et al., Reference Bair, Patrick, Noyes, Hale, Campbell, Wilson, Ransom and Spencer2023). Thereby, the CVLT differs from tasks based on a presentation of unrelated words, such as the Rey Auditory Verbal Learning Test (RAVLT; Rey, Reference Rey1964). The CVLT’s reliability and validity, especially those of its core scores, are well established (see reviews; Delis et al., Reference Delis, Kramer, Kaplan and Ober2017; Farrer & Drozdick, Reference Farrer and Drozdick2020a; Lezak et al., Reference Lezak, Howieson, Bigler and Tranel2012, pp. 478 – 481; Sherman et al., Reference Sherman, Tan and Hrabok2022, pp. 624 – 635). The CVLT consistently shows robust construct validity in both various age groups and clinical populations, including neurological and psychiatric disorders (e.g., traumatic brain injury and schizophrenia), and correlations with relevant fronto-temporal brain structures (e.g., Keith et al., Reference Keith, Haut, Wilhelmsen, Mehta, Miller, Navia, Ward, Lindberg, Coleman, McCuddy, Deib, Giolzetti and D’Haese2023). Importantly, the CVLT has been proven capable of detecting age-related declines in verbal memory processes such as acquisition, recall, and recognition discrimination, with older adults specifically showing an increase in recall errors and response bias. This sensitivity aids in detecting early signs of neurodegenerative disorders (e.g., Alzheimer’s disease) and tracking the course of these patients’ memory deficits over time. Regarding reliability, the CVLT-III’s factor has robust alternate form reliability for its core scores, adequate test–retest reliability, and excellent internal reliability (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017, pp. 36 – 44).
The impact of cross-cultural factors on examinees’ performance in cognitive testing assessment has garnered increased research attention in recent years (Fernández & Evans, Reference Fernández, Evans, Fernández and Evans2022; Franzen et al., Reference Franzen, Pomati, Papma, Nielsen, Narme, Mukadam, Lozano-Ruiz, Ibanez-Casas, Goudsmit, Fasfous, Daugherty, Canevelli, Calia, van den Berg and Bekkhus-Wetterberg2022; Merkley et al., Reference Merkley, Esopenko, Zizak, Bilder, Strutt, Tate and Irimia2023; Ramani et al., Reference Ramani, Young and Zakzanis2024). These studies, including those undertaken in Israel, stressed the limited number of cross-culturally adapted tests and the importance of using local norms (e.g., Kave et al., Reference Kave, Sapir-Yogev, Bregman and Shiner2022; Staios et al., Reference Staios, Kosmidis, Tsiaras, Nielsen, Papadopoulos, Kokkinias, Velakoulis, March and Stolwyk2023). The CVLT-III updated extensive normative data spans ages 16 to 90, with participants demographically matched to the most recent census of the U.S. population (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017). This represents progress as it minimizes the biases inherent in traditional tests, which often depend on norms established by a homogeneous group of white, English-speaking, middle-class, and highly educated individuals (Ardila, Reference Ardila2020). It is also reassuring that ethnicity explained a negligible percentage (0.3%) of the normative sample’s variance (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017, p. 35). Concerns regarding the impact of cross-cultural factors, however, have not been sufficiently alleviated to date. Normative data for the CVLT of non-English-speaking examinees and countries other than the U.S. are numbered (for recent publications, see Campos-Magdaleno et al., Reference Campos-Magdaleno, Nieto-Vieites, Frades-Payo, Montenegro-Pena, Facal, Lojo-Seoane and Delgado-Losada2024; Feyzioğlu, Reference Feyzioğlu2020; Garcia-Herranz et al., Reference Garcia-Herranz, Diaz-Mardomingo, Suarez-Falcon, Rodriguez-Fernandez, Peraita and Venero2022; Lou et al., Reference Lou, Yang, Cai, Yu, Zhang, Shi and Zhang2022; Romaszko-Wojtowicz et al., Reference Romaszko-Wojtowicz, Borkowska, Opalach, Romaszko, Łowczak and Buciński2023). Moreover, at least some of these studies suggest differences in performance (Kim & Kang, Reference Kim and Kang1999) or differ in the impact of sociodemographic variables such as sex and education level on CVLT performance compared to the normative sample (Chang et al., Reference Chang, Kramer, Lin, Chang, Wang, Huang, Lin, Chen and Wang2010; Lou et al., Reference Lou, Yang, Cai, Yu, Zhang, Shi and Zhang2022). These findings further stress the importance of using local norms and were the impetus for the current research project. More specifically, while Hebrew translations of the CVLT were created for specific research projects (Poreh et al., Reference Poreh, Avital, Dines and Levin2015; Toren et al., Reference Toren, Sadeh, Wolmer, Eldar, Koren, Weizman and Laor2000), these were ad-hoc translations of previous CVLT editions rather than the latest third edition. In addtion, these studies did not provide normative data. The current study aimed to meet this need by adapting the CVLT-III to the Israeli Hebrew-speaking population (CVLT-IIIHebrew) and establishing initial normative data among healthy Israeli adults.
Age and education level were hypothesized to significantly impact CVLT-III Hebrew performance. Consistent with previous research, age was expected to be the strongest sociodemographic predictor of CVLT performance (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017, pp. 77 – 81; Lezak et al., Reference Lezak, Howieson, Bigler and Tranel2012, pp. 478 – 481; Sherman et al., Reference Sherman, Tan and Hrabok2022, pp. 624 – 635). For example, the CVLT-III manual reports that age accounted for 25.9% of the variance in the sum of raw scores for Trials 1 – 5, with education level explaining an additional 4.5% (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017, p. 35). Age was therefore hypothesized to inversely impact CVLT-III Hebrew performance, while education level was expected to have a positive, though smaller, impact. The impact of participant sex on CVLT-IIIHebrew performance was also hypothesized, with females expected to perform better than males (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017, pp. 79 – 80; Hirnstein et al., Reference Hirnstein, Stuebs, Moe and Hausmann2023). However, this hypothesis was made more tentatively as the effect of sex on CVLT performance has not been uniformly identified in cross-cultural research (see Discussion for a comprehensive review). Finally, while non-linear age effects on CVLT performance, particularly a more rapid decline in later years, have been observed, their impact is generally weaker and primarily evident in studies that include geriatric populations (Lou et al., Reference Lou, Yang, Cai, Yu, Zhang, Shi and Zhang2022). Given that older adults were not evaluated in the current study, a quadratic effect of age was not hypothesized. To further validate the adapted test, the author assessed the CVLT-IIIHebrew’s internal reliability, evaluated the impact of sociodemographic characteristics on test performance, and compared the performance of the Israeli sample to that of participants who were tested using other non-English versions of the CVLT and case reports of participants whose performance was analyzed using the CVLT-III’s normative data (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017). Thereby, the project aimed to add a rigorously researched wordlist memory test with unique characteristics to the tools at the clinician’s disposal.
Method
Participants
Healthy adults participated in the study (N = 249). They were recruited through announcements on Ariel University’s online research platform and social networks between 10/2020 and 10/2023. Inclusion criteria were: (a) Adult age (≥ 18 years and ≤ 65 years). (b) Self-reported fluency in Hebrew. Exclusion criteria were: (a) Major neuropsychiatric disorders. (b) Neurodevelopmental disorders, including learning disabilities and Attention-Deficit/Hyperactivity Disorder (ADHD). (c) Medical conditions that may impair cognition (e.g., diabetes and sleep apnea). (d) Motor and sensory disability precluding cognitive testing (e.g., uncorrected hearing impairment). (e) Alcohol and drug abuse. Inclusion/exclusion criteria were determined based on a written self-report, with any ambiguities resolved by the research staff before the participant signed the study’s informed consent form. Participants were not compensated for study participation.
Ten candidates were excluded from the study due to pre/co-morbid neurological (n = 2), psychiatric (n = 2), and neurodevelopmental disorders (n = 6). Two additional candidates were excluded due to diabetes (n = 2). The data of two participants were excluded from analyses based on an apriori decision rule for detecting poorly motivated participants; motivation item <4 in the study’s debriefing survey (following Berger et al., Reference Berger, A., Braw, Elbaum, Wagner and Rassovsky2021; Braw et al., Reference Braw, Elbaum, Lupu and Ratmansky2024). Overall, data from 235 healthy adults was analyzed (age range: 20 – 65, education level range: 9 – 20, n females = 163, n male = 72). Table 1 presents the sociodemographic characteristics of the participants.
Sociodemographic characteristics of the normative sample (per age group and total)

Note. Data of parametric variables are presented as mean ± SD.
The study was performed in accordance with the Helsinki Declaration and approved by Ariel University’s Ethics Committee (approval no.: AU-SOC-YB-20221218). All participants signed a written informed consent form before entering the study.
Tools
California Verbal Learning Test-III (CVLT-III; Delis et al., Reference Delis, Kramer, Kaplan and Ober2017): The test stimuli comprise two word lists, each containing 16 nouns (lists A and B). List A comprises four categories (i.e., furniture, vegetables, means of transportation, and animals), with four words in each category. List B is also comprised of four categories, two of which are identical to those of list A. The administration procedure is as follows: (a) Immediate recall trials (trials 1 – 8): These trials included: (1) Learning trials (trials 1 to 5 free recall): An immediate recall of List A, which is repeated five times. (2) Interference trial (list B free recall): An immediate recall of list B. (3) Short Delay Free Recall (SDFR): A free recall of list A. (4) Short Delay Cued Recall (SDCR): A free recall of words belonging to each semantic category after being provided with the name of the category. (b) Delayed memory trials (trials 9 – 12): These trials included: (1) Long Delay Free Recall (LDFR; trial 9): A free recall of list A after a 20-minute delay. (2) Long Delay Cued Recall (LDCR; trial 10): A free recall of list A after being provided with the names of the categories. (3) Long delay Yes/No Recognition (trial 11): The participant is read a list of words, either words from list A or foils, and is requested to respond “yes” if the word belongs to list A and “no” if it does not. (4) Forced Choice Recognition (trial 12): This optional trial, aimed at assessing performance validity, was not included in the current study. The CVLT’s outcome measures are presented in the test manual (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017), as well as the tables that accompany the current study (for additional information, see Farrer & Drozdick, Reference Farrer and Drozdick2020a; Lezak et al., Reference Lezak, Howieson, Bigler and Tranel2012, pp. 478 – 481; Sherman et al., Reference Sherman, Tan and Hrabok2022, pp. 624 – 635). Two measures (Total recall [sum of correct responses across Trials 1 – 5] and Yes/No recognition Hits−False Positives) are not part of the core scores listed in the CVLT-III’s manual (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017, p. 4). These measures were added following the request of an anonymous reviewer due to their utility and use in clinical practice.
Adaptation procedure
The author directly translated the CVLT-III words from English to Hebrew based on the following criteria: (a) The frequency of each translated word was checked in Linzen’s word frequency database (Linzen, Reference Linzen2009). Low-frequency words (i.e., a frequency of ≤ six appearances per million words, following Kavé et al., Reference Kavé, Gorokhod, Yerushalmi and Salner2019) were replaced by more frequently used words belonging to the same semantic category. Concurrently, the four most prototypical words in each category were avoided (following Delis et al., Reference Delis, Kramer, Kaplan and Ober2017, p. 29). All words belonging to one list B category (“parts of a house”) are low-frequency words in Hebrew. It was, therefore, decided to replace the category with that of “nature” (e.g., mountain). This procedure was similar to that used when the CVLT was adapted to Chinese and the “tools” category was replaced (Chang et al., Reference Chang, Kramer, Lin, Chang, Wang, Huang, Lin, Chen and Wang2010). (b) When multiple Hebrew words corresponded to a single English word, the original word was replaced with a single Hebrew word belonging to the same semantic category. (c) In several cases, a plural word in English was translated to the singular form of the Hebrew word due to significant differences in the number of syllables. Following current standards (International Test Commission, 2017; Nguyen et al., Reference Nguyen, Rampa, Staios, Nielsen, Zapparoli, Zhou, Mbakile-Mahlanza, Colon, Hammond, Hendriks, Kgolo, Serrano, Marquine, Dutt, Evans and Judd2024), the adapted CVLT-III was evaluated and further reviewed by a multidisciplinary team that included two licensed rehabilitation psychologists and a speech therapist. The team members were native Hebrew speakers, familiar with the Israeli culture, and experienced in cognitive testing. The revised edition of the CVLT-IIIHebrew underwent three iterative cycles until reaching its final form. Each cycle comprised pilot testing of the CVLT-IIIHebrew by graduate students in clinical neuropsychology, followed by further revision of the CVLT. The final items in CVLT-IIIHebrew were deemed familiar to Israelis across age, sex, and socioeconomic class divides. The CVLT-III’s publisher approved the project (Pearson; Master License Agreement No. LSR-620161, April/25/2023).
Procedure
The experimental procedures were conducted in a quiet, well-lit room with the experimenter sitting on the opposite side of a table from the participant. After signing an informed consent form and filling out a demographic-medical questionnaire, the participants performed the CVLT-IIIHebrew’s immediate recall trials. After 20 minutes in which they performed non-verbal filler tasks, the participants performed the delayed trials of the CVLT- IIIHebrew (LDFR, LDCR, and Yes/No recognition) and completed a debriefing survey in which they noted their motivation to perform the experimental procedures as instructed (1 – 7 Likert scale; higher scores indicating stronger motivation). Trained graduate students in clinical neuropsychology conducted all experimental procedures.
Data analyses
Statistical analyses generally followed the procedures implemented in the Spanish Multicenter Normative Studies (NEURONORMA), a large-scale project aimed at providing normative data to clinicians in a cross-cultural context (e.g., Pena-Casanova et al., Reference Pena-Casanova, Blesa, Aguilar, Gramunt-Fombuena, Gomez-Anson, Oliva, Molinuevo, Robles, Barquero, Antunez, Martinez-Parra, Frank-Garcia, Fernandez, Alfonso and Sol2009; Pena-Casanova et al., Reference Pena-Casanova, Casals-Coll, Quintana, Sanchez-Benavides, Rognoni, Calvo, Palomo, Aranciva, Tamayo and Manero2012; Perez-Enriquez et al., Reference Perez-Enriquez, Garcia-Escobar, Florido-Santiago, Pique-Candini, Arrondo-Elizaran, Grau-Guinea, Pereira-Cuitino, Manero, Puig-Pijoan, Pena-Casanova and Sanchez-Benavides2024), which applied procedures that were previously developed as part of the Mayo Older American Normative Studies (MOANS; Ivnik et al., Reference Ivnik, Malec, Smith, Tangalos, Petersen, Kokmen and Kurland1992). The analysis comprised these steps:
-
a. Preliminary statistical procedures: Prior to conducting linear regressions, a series of preliminary statistical procedures were performed to ensure data integrity and model validity. First, descriptive statistics (means, SDs, and ranges) for all variables were calculated to summarize the sample characteristics and identify any unusual patterns in the data. Outlier detection was conducted using both SD criteria (values exceeding ±3 SD from the mean) and the more robust Sn measure, which was evaluated using the R robustbase package (Jones, Reference Jones2019). The selection of predictors was determined by theoretical relevance and empirical associations. As part of the data preparation, age and education level were mean-centered to reduce bias due to multicollinearity and improve the interpretation of coefficients, and squared terms for these mean-centered variables were also created to explore nonlinear associations (Espenes et al., Reference Espenes, Eliassen, Ohman, Hessen, Waterloo, Eckerstrom, Lorentzen, Bergland, Halvari Niska, Timon-Reina, Wallin, Fladby and Kirsebom2023). Regarding empirical associations among the remaining potential sociodemographic predictors, I examined Pearson product–moment correlations to minimize the impact of multicollinearity as a potential confounder and simplify the main regression models. A criterion of ∣r∣ > .8 for pairwise correlations was utilized as an initial indicator of potentially problematic multicollinearity among these predictors.
-
b. Evaluating the impact of sociodemographic variables on CVLT-III Hebrew performance: To justify a parsimonious prediction model for the primary analyses, the impact of sociodemographic effects on CVLT-IIIHebrew performance was explored by performing linear regressions in which five sociodemographic variables predicted each of the CVLT-IIIHebrew’s core raw scores: mean-centered age, mean-centered age2, mean-centered education level, mean-centered education level2, and sex. Predictors were retained if they significantly contributed to the overall model (p < .05) and the unique variance (semi-partial correlation2, sr 2) was at least 5%. The analyses indicated that the non-linear terms had a negligible impact. Only two of the 15 models were significant, and the quadratic age term (mean-centered age2) was not a significant predictor in any model. Although the quadratic education term (mean-centered education level2) significantly predicted total intrusions (p = .043), its contribution of unique variance fell well below the 5% criterion (sr 2 = .017). Given these findings, the exploration of non-linear effects was not pursued further. Consequently, all main analyses utilized a simplified model that included mean-centered age, mean-centered education level, and sex as predictors.
-
c. Division of normative data according to age: The sample was stratified by age, a decision informed by the impact of aging on CVLT-IIIHebrew performance (noted in the Introduction) and our observation that Israeli clinicians are more familiar with the traditional normative data presentation (delCacho-Tena et al., Reference delCacho-Tena, Christ, Arango-Lasprilla, Perrin, Rivera and Olabarrieta-Landa2024). This was done using the overlapping interval strategy (Pauker, Reference Pauker1988) which enabled sample size within each age group to reach the minimum recommended size of 50 to 70 participants per age group, thereby increasing the stability of means and SDs (Bridges & Holler, Reference Bridges and Holler2007; Piovesana & Senior, Reference Piovesana and Senior2018). Next, scaled scores (SS; Mean = 10, SD = 3, range: 2 – 18) per age group were created to approximate the normal distribution upon which linear regressions could be performed. This was done by transforming the CVLT-IIIHebrew core raw scores into percentile ranks (i.e., cumulative percentiles) and then SSs per age group.
-
d. Norm adjustments based on sociodemographic variables: The overall contribution of sociodemographic variables to the prediction of CVLT-IIIHebrew performance was evaluated using linear regressions in which sociodemographic variables (mean-centered age, mean-centered education level, and sex) predicted each of the 15 CVLT-IIIHebrew core raw scores. Next, the need to adjust core SSs based on sociodemographic variables was evaluated using linear regressions in which each core SS served as the dependent variable and sociodemographic variables (mean-centered age, mean-centered education level, and sex) were predicted per age groupFootnote 1 . Adjusted scaled scores (SSadj) were calculated using the formula: SSadj = SS − [Bage × (age – 39.5) + Beducation level × (education level – 15) + Bsex × sex (male = 0, female = 1)]Footnote 2 . All adjusted SSs were truncated to the lower whole number. The selection of sociodemographic variables to be included in the formula was based on the earlier-mentioned criteria (i.e., the variable significantly predicted the model and sr 2 was ≥ 5%).
Comparisons with earlier normative samples and internal reliability
To compare the current study’s normative data with existing published norms, the following statistical comparisons were performed: (a) I applied two different normative standards to the raw core scores of the CVLT-IIIHebrew obtained from the current sample; the newly developed Israeli norms and the original CVLT-III norms (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017). SSs and index scores, normed using each of the normative data sets, were then compared using paired-samples t-tests. In addition, the CVLT performance of three case reports, presented by Farrer and Drozdick (Reference Farrer and Drozdick2020b)Footnote 3 , was normed using both normative data sets and then compared using paired-samples t-tests. (b) The current sample’s CVLT performance was compared using one-sample t-tests to representative studies of healthy participants conducted in a cross-cultural context. As the studies differed in reported CVLT scores, the most commonly reported measure (Total recall; Sum of correct responses across Trials 1 – 5) was selected (i.e., Spanish, Turkish, Korean, and Chinese versions; references for the studies are presented in the Results section).
Considering the challenges that word list learning tasks pose for estimating internal reliability due to item interdependence, the split method was utilized (see discussion in Sherman et al., Reference Sherman, Tan and Hrabok2022, p. 627). More specifically, the immediate recall trials were split (trials 1 & 3 vs. trials 2 & 4, and trials 2 & 4 versus trials 3 & 5), and the Spearman-Brown formula was applied to the average of the correlations (lengthening factor = 2.5).
Additional analyses and general remarks
Supplementary Material 1 presents Pearson product–moment correlations between CVLT-IIIHebrew core raw scores and sociodemographic variables (age, education level, and sex). Analyses were conducted using SPSS 27.0, with p < .05 considered statistically significant in all statistical analyses.
Results
Division of normative data into age groups and calculation of unadjusted scaled scores
Six overlapping age groups were created: 18 – 30 years (n = 93), 26 – 38 years (n = 94), 34 – 46 years (n = 50), 40 – 52 years (n = 44), 48 – 60 years (n = 52), and 56 – 65 years (n = 46). Sociodemographic characteristics of the sample and CVLT-IIIHebrew raw core and process scores per age group can be found in Table 1 and Tables 2–3, respectively.
Core raw scores according to age

Note. All data are presented as Mean ± SD.
FP = False positives, LDCR = Long Delay Cued Recall, LDFR = Long Delay Free Recall, SDCR = Short Delay Cued Recall, SDFR = Short Delay Free Recall, T1/2/3/4/5 = Trial 1/2/3/4/5, Total recall (T1–5) = Sum of correct responses across Trials 1–5.
Process raw scores according to age

Note. All data are presented as Mean ± SD.
LDCR = Long Delay Cued Recall, LDFR = Long Delay Free Recall, SDCR = Short Delay Cued Recall, SDFR = Short Delay Free Recall, T1–T5 = Trials 1 to 5.
Unadjusted SS (SS; Mean = 10, SD = 3, range: 2 – 18) and percentiles for the CVLT-IIIHebrew core raw scores per age group can be found in Tables 4– 9.
CVLT-IIIhebrew core SSs; age range = 18 – 30 years, median age = 24, n = 93

Data adjustments based on sociodemographic variables
Linear regressions in which sociodemographic variables (mean-centered age, mean-centered education level, and sex) predicted each of the CVLT-IIIHebrew core raw scores revealed that mean-centered age and mean-centered education level each significantly predicted five, partially overlapping, scores. Semi-partial correlations and coefficients of determination (sr 2) reflecting the associations between sociodemographic variables and CVLT-IIIHebrew core raw scores can be found in Supplementary Material 2.
Regressions in which sociodemographic variables (mean-centered age, mean-centered education level, and sex) predicted SSs per age group indicated the need for 13 age-based adjustments, seven education level-based adjustments, and two sex-based adjustments. Table 10 presents formulas for adjusting CVLT-IIIHebrew core SSs.
Supplementary Material 3 features a calculator designed for conveniently adjusting core SSs, using an examinee’s sociodemographic data and CVLT-IIIHebrew performance. This calculator is also accessible online at https://bit.ly/45rqXFd.
Clinical example
Calculation of SS and SSadj of a 38-year-old female examinee (education level = 8 years) with a T1 correct (i.e., number of correct responses in trial 1) raw score of 7: (a) Determine SS and %tile: Locate the raw score in Table 6 and determine SS, which is in the left column, and percentile, which is in the right column (SS = 9, %tile = 29 – 40; respectively). Note that although both Tables 5 and 6 cover the examinee’s age range, Table 6 was selected as the examinee’s age (38 years) is closer to the median age of the group presented in Table 6 than Table 5 (40 vs. 32 years, respectively). (b) Determine whether SS adjustments are mandated: SSs necessitating adjustments are marked using a gray background in Tables 4–9, as is the case for T1 correct in the 34 – 46 years age range (see Table 6). (c) Calculate SS adjustments if mandated: The SS of T1 correct can be adjusted using the regression formulas that are listed in Table 10 or by using the CVLT-IIIHebrew’s norm calculator (i.e., the spreadsheet in Supplementary Material 3 or online at https://bit.ly/45rqXFd). Using the norm calculator, the examinee T1 correct SS (= 9) should be adjusted based on the examinee’s sex; this is done by entering the examinee’s sex in the relevant cell (column A, row 8) and T1 correct SS in the suitable place in the upper right table (column E, row 5). The SSadj is then automatically calculated and presented in the lower table (= 11).
CVLT-IIIhebrew core SSs; age range = 26 – 38 years, median age = 32, n = 94

CVLT-IIIhebrew core SSs; age range = 34 – 46 years, median age = 40, n = 50

CVLT-IIIhebrew core SSs; age range = 40 – 52 years, median age = 46, n = 44

CVLT-IIIhebrew core SSs; age range = 48 – 60 years, median age = 54, n = 52

CVLT-IIIHebrew core SSs; age range = 56 – 65 years, median age = 60.5, n = 46

Note. FP = False positives, LDCR = Long Delay Cued Recall, LDFR = Long Delay Free Recall, SDCR = Short Delay Cued Recall, SDFR = Short Delay Free Recall, SS = Scaled scores, T1/2/3/4/5 = Trial 1/2/3/4/5, Total recall (T1–T5) = Sum of correct responses across Trials 1–5.
1 Higher raw scores indicate poorer performance.
SSs necessitating adjustments are marked using a gray background.
Supplementary Material 3 includes a spreadsheet for the adjustment of CVLT-IIIHebrew core SSs based on the examinee’s sociodemographic data. The spreadsheet is also available online at the following link: https://bit.ly/45rqXFd.
Adjustments of CVLT-IIIHebrew core SSs based on socio-demographic variables (age, education level, and sex)

Note. Grayscale levels differ according to the variable that is used to adjust the SS (age, education level, or sex).
CVLT-IIIHebrew core SSs can be adjusted using the spreadsheet provided in Supplementary Material 3 (also available online at the following link: https://bit.ly/45rqXFd).
FP = False positives, LDCR = Long Delay Cued Recall, LDFR = Long Delay Free Recall, N.R. = Not relevant (i.e., no adjustments of SS needed). SDCR = Short Delay Cued Recall, SDFR = Short Delay Free Recall, SS = Scaled scores, T1/2/3/4/5 = Trial 1/2/3/4/5, Total recall (T1 – T5) = Sum of correct responses across Trials 1 – 5.
Comparisons with earlier normative samples and internal reliability
Comparisons with earlier normative data sets: (a) Comparisons with the CVLT-III’s normative data: The CVLT-IIIHebrew core scores indicated poorer performance when using the normative data from the current study versus the original CVLT-III norms (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017), ps < .001. See Supplementary Material 4. Correspondingly, all three case reports presented in Supplementary Material 5 exhibited lower Total recall SSs when using the current study’s norms compared to those derived using the normative sample (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017); t(10) = 5.68, p < .001, d = 1.71; t(10) = 4.30, p = .002, d = 1.30; t(10) = 8.19, p < .001, d = 2.47 (case report 1 through 3, respectively). (b) Significantly more words were recalled in trials 1 to 5 (Total recall; sum of correct responses in Trials 1 – 5) by participants in the current study compared to those in studies of healthy participants that were performed in a cross-cultural context (Campos-Magdaleno et al., Reference Campos-Magdaleno, Nieto-Vieites, Frades-Payo, Montenegro-Pena, Facal, Lojo-Seoane and Delgado-Losada2024; Garcia-Herranz et al., Reference Garcia-Herranz, Diaz-Mardomingo, Suarez-Falcon, Rodriguez-Fernandez, Peraita and Venero2022; Kim & Kang, Reference Kim and Kang1999; Lou et al., Reference Lou, Yang, Cai, Yu, Zhang, Shi and Zhang2022)Footnote 4 , ps < .001. A notable exception was Feyzioğlu (Reference Feyzioğlu2020), in which Turkish healthy adults had a significantly higher Total recall than participants in the current study (p = .046). Additional information about the studies, such as the sociodemographic characteristics of their participants, is available in Supplementary Material 6.
Internal reliability based on the split method was very high (= .90).
Additional analyses
Supplementary Material 6 presents Pearson product–moment correlations between CVLT-IIIHebrew core raw scores and sociodemographic variables (age, education level, and sex).
Discussion
The current study aimed to provide normative data for the CVLT-IIIHebrew, a translation of the CVLT-III to Hebrew, and its adaptation to the Israeli population. To achieve this aim, 235 healthy adults, aged 18 to 65 years, performed the CVLT-IIIHebrew. After ensuring data integrity, the data was stratified by age, a decision informed by the impact of aging on CVLT-IIIHebrew performance and our observation that clinicians in Israel are more familiar with the traditional normative data presentation (delCacho-Tena et al., Reference delCacho-Tena, Christ, Arango-Lasprilla, Perrin, Rivera and Olabarrieta-Landa2024). This was done using the overlapping interval strategy (Pauker, Reference Pauker1988), which maximizes the sample size for normative data generation. Next, raw scores were transformed into scaled scores (SSs; Mean = 10, SD = 3, range: 2 – 18) per age group, approximating the normal distribution upon which linear regressions exploring the impact of sociodemographic variables could be performed. These regressions evaluated the need and extent of score adjustments in several stages. First, the non-linear effects of sociodemographic factors on CVLT-IIIHebrew performance were explored with the analyses indicating that quadratic variables (i.e., mean-centered age2 and mean-centered education level2) had a negligible impact. This was likely related to the fact that the current study did not include geriatric patients, an age at which accelerated decline in verbal memory is expected (Liampas et al., Reference Liampas, Folia, Ntanasi, Yannakoulia, Sakka, Hadjigeorgiou, Scarmeas, Dardiotis and Kosmidis2023). Given these findings, non-linear effects were not further explored, and all main analyses used a simplified model that included mean-centered age, mean-centered education level, and sex as predictors. Second, the overall impact of sociodemographic factors on CVLT-IIIHebrew performance was investigated. This was done using whole-sample regressions in which the sociodemographic variables (i.e., mean-centered age, mean-centered education level, and sex) predicted the raw core CVLT-IIIHebrew scores. These analyses indicated that age and education level each significantly predicted five, partially overlapping, core scores. More specifically, age and education levels explained up to 5.1 and 4% of the shared variance, respectively. These findings align with the well-established age-related decline in declarative memory functioning (Lighthall et al., Reference Lighthall, Conner and Giovanello2019), which is linked to structural changes in medial temporal lobe (MTL) structures, particularly the hippocampus, and its connectivity with other cortical and subcortical regions (Dickerson & Eichenbaum, Reference Dickerson and Eichenbaum2010; Nyberg, Reference Nyberg2017). Correspondingly, a decline in CVLT performance with aging was consistently found in earlier studies (Lezak et al., Reference Lezak, Howieson, Bigler and Tranel2012; Sherman et al., Reference Sherman, Tan and Hrabok2022, pp. 625 – 626), including those that were performed outside of the U.S. (e.g., Argento et al., Reference Argento, Pisani, Incerti, Magistrale, Caltagirone and Nocentini2014; Campos-Magdaleno et al., Reference Campos-Magdaleno, Nieto-Vieites, Frades-Payo, Montenegro-Pena, Facal, Lojo-Seoane and Delgado-Losada2024; Chang et al., Reference Chang, Kramer, Lin, Chang, Wang, Huang, Lin, Chen and Wang2010; Garcia-Herranz et al., Reference Garcia-Herranz, Diaz-Mardomingo, Suarez-Falcon, Rodriguez-Fernandez, Peraita and Venero2022; Kim & Kang, Reference Kim and Kang1999; Lou et al., Reference Lou, Yang, Cai, Yu, Zhang, Shi and Zhang2022)Footnote 5 . The impact of education level on CVLT performance found in the current study also corresponds to earlier findings (Lezak et al., Reference Lezak, Howieson, Bigler and Tranel2012; Sherman et al., Reference Sherman, Tan and Hrabok2022, pp. 625 – 626). Education level, however, was a weaker predictor of CVLT performance than age in the current study, aligning with the CVLT-III’s decision not to stratify scores by education (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017). In contrast, examinees’ sex had only a minor impact on CVLT-IIIHebrew performance. More specifically, sex did not significantly predict performance on any CVLT-IIIHebrew measures and accounted for ≤ 1% of the unique variance. This contrasts with the CVLT-III’s normative data, in which sex explained an additional 5.1% of the variance beyond the variance that was explained by age (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017), corresponding to the small but reliable advantage of females over males in verbal-episodic memory across different ages and task types (see meta-analysis; Hirnstein et al., Reference Hirnstein, Stuebs, Moe and Hausmann2023). At the same time, the extent and consistency of this sex effect across all CVLT measures vary in cross-cultural studies. Some studies found sex effects across many measures (Argento et al., Reference Argento, Pisani, Incerti, Magistrale, Caltagirone and Nocentini2014; Garcia-Herranz et al., Reference Garcia-Herranz, Diaz-Mardomingo, Suarez-Falcon, Rodriguez-Fernandez, Peraita and Venero2022; Kim & Kang, Reference Kim and Kang1999), while others observed effects only in specific measures or described them as not robust (Campos-Magdaleno et al., Reference Campos-Magdaleno, Nieto-Vieites, Frades-Payo, Montenegro-Pena, Facal, Lojo-Seoane and Delgado-Losada2024; Chang et al., Reference Chang, Kramer, Lin, Chang, Wang, Huang, Lin, Chen and Wang2010; Lou et al., Reference Lou, Yang, Cai, Yu, Zhang, Shi and Zhang2022). Regarding the current study, the observation that sex had a limited impact on CVLT-IIIHebrew performance might stem from the relative underrepresentation of older males in our sample (see also the limitations paragraph). This is particularly relevant as memory function is known to be influenced by an interaction between age and sex (Asperholm et al., Reference Asperholm, Nagar, Dekhtyar, Herlitz and Ginsberg2019). Enlarging the normative sample for the CVLT-IIIHebrew would allow for a more thorough investigation into how sociodemographic variables contribute to test performance. The highly educated nature of the current study’s sample in this regard is a critical factor, as it may partially account for the performance differences observed between our participants and the original normative sample (Delis et al., Reference Delis, Kramer, Kaplan and Ober2017). Notably, when the current study’s norms were applied, our participants were generally rated as performing more poorly (∼ one SD) than when the original norms were used. This discrepancy was further confirmed by re-norming three relevant case reports (Farrer & Drozdick, Reference Farrer and Drozdick2020b) using both our current norms and the original CVLT-III norms (see Supplementary Material 6). Furthermore, significantly more words were recalled in trials 1 to 5 (Total recall; sum of correct responses in Trials 1 – 5) by participants in the current study compared to most studies of healthy participants that were performed in a cross-cultural context (Campos-Magdaleno et al., Reference Campos-Magdaleno, Nieto-Vieites, Frades-Payo, Montenegro-Pena, Facal, Lojo-Seoane and Delgado-Losada2024; Garcia-Herranz et al., Reference Garcia-Herranz, Diaz-Mardomingo, Suarez-Falcon, Rodriguez-Fernandez, Peraita and Venero2022; Kim & Kang, Reference Kim and Kang1999; Lou et al., Reference Lou, Yang, Cai, Yu, Zhang, Shi and Zhang2022).Footnote 6 Both lines of evidence suggest the current study’s sample might be overrepresented by participants with relatively high verbal memory functioning, perhaps due to a highly educated sample, although other sources may be at play (e.g., four of the earlier-mentioned cross-cultural studies included geriatric participants; Campos-Magdaleno et al., Reference Campos-Magdaleno, Nieto-Vieites, Frades-Payo, Montenegro-Pena, Facal, Lojo-Seoane and Delgado-Losada2024; Garcia-Herranz et al., Reference Garcia-Herranz, Diaz-Mardomingo, Suarez-Falcon, Rodriguez-Fernandez, Peraita and Venero2022; Kim & Kang, Reference Kim and Kang1999; Lou et al., Reference Lou, Yang, Cai, Yu, Zhang, Shi and Zhang2022). The influence of education level on CVLT-IIIHebrew’s performance is further discussed in the limitations paragraph, along with a call for expanding the normative data of the CVLT-IIIHebrew and thereby minimizing possible sources of bias.
A second set of regressions was conducted to determine the necessity and degree of sociodemographic adjustments. These regressions, with core SSs as the dependent variables for each age group, indicated the need to make 13 adjustments based on age, seven based on education level, and two based on sex. Most age adjustments were needed for the 26 to 38 age range, while education level adjustments primarily applied to participants aged 48 or older. In other words, notable age-related performance changes in younger adults and the comparatively low education level of older adults required additional adjustments to the CVLT-IIIHebrew scores. Sex-based adjustments, on the other hand, were limited in number and showed no clear link to the participants’ age ranges. These sociodemographic adjustments to core SSs can be readily implemented using a norm calculator (refer to the Supplementary Material 3 spreadsheet or the online calculator available at: https://bit.ly/45rqXFd), as exemplified in the clinical example that was presented in the Results section. By applying these standardized normative procedures—converting raw scores to SSs and then adjusting for sociodemographic variables as needed—clinicians can directly compare an examinee’s CVLT-derived core and process scores. This standardization also allows for straightforward comparison of the examinee’s CVLT performance with results from other neuropsychological tests, simplifying the understanding of their unique strengths and weaknesses (Slick & Sherman, Reference Slick and Sherman2022). Such comparisons are becoming more common with the increase in cognitive tests at the disposal of the Israeli clinician. More specifically, Kavé and Sapir-Yogev recently developed a story recall task (Kave & Sapir-Yogev, Reference Kave and Sapir-Yogev2020), adding to the former adaptation of tests such as the well-established Hebrew translation of the RAVLT (Vakil & Blachstein, Reference Vakil and Blachstein1997; Vakil et al., Reference Vakil, Blachstein and Sheinman1998; Vakil et al., Reference Vakil, Greenstein and Blachstein2010). Regarding the latter task, it should be noted that the CVLT-III and RAVLT are not identical despite their many commonalities. More specifically, the CVLT-III’s first list comprises words belonging to four semantic categories, and it includes cued free-recall trials that aid in detecting the use of strategies to code and retrieve the words. The semantic composition of the first list also means that the examinee’s CVLT scores no longer express verbal learning ability per se but rather the interaction between their memory and conceptual functions (Lezak et al., Reference Lezak, Howieson, Bigler and Tranel2012, p. 478). Clinicians are therefore advised to exercise discretion when comparing a patient’s performance in the two tests (e.g., a patient tested using the RAVLT and then retested later using the CVLT). Overall, the provision of initial sociodemographically adjusted CVLT-IIIHebrew norms and the availability of an intuitive score adjustment calculator will markedly increase the available tools at the disposal of the Israeli clinician and will hopefully lead to improved diagnostic clarity, more tailored intervention strategies, and precise tracking of cognitive alterations in clinical settings.
As part of the current study, I aimed to uphold key requirements for normative studies, including appropriate inclusion and exclusion criteria, use of standardized test administration and scoring, and statistical analyses that are adequate for clarifying the contribution of key sociodemographic variables (Casaletto & Heaton, Reference Casaletto and Heaton2017; Mondini et al., Reference Mondini, Cappelletti and Arcara2023; delCacho-Tena et al., Reference delCacho-Tena, Christ, Arango-Lasprilla, Perrin, Rivera and Olabarrieta-Landa2024). A key strength of this CVLT adaptation is the CVLT-III Hebrew’s excellent internal reliability (= .90), consistent with the split-half reliability (r = .94) found by Sherman et al., (Reference Sherman, Tan and Hrabok2022, p. 627). Additionally, the relationships observed between sociodemographic variables and CVLT-IIIHebrew performance were mostly as anticipated (see earlier discussion regarding the impact of sex). However, this study has several important limitations to consider. First, the sizes of two age groups (n 40 – 52 years = 44, n 56 – 65 years = 46) fell slightly below the minimum recommended sample size of 50 – 70 participants per age group (Bridges & Holler, Reference Bridges and Holler2007; Piovesana & Senior, Reference Piovesana and Senior2018). Moreover, the skewness of some CVLT-IIIHebrew raw scores (e.g., number of intrusions) likely requires a larger sample size per age group to ensure accurate results, as previous work recommends a minimum of 85 participants per cell when dealing with such skewed data (Piovesana & Senior, Reference Piovesana and Senior2018). This concern is somewhat alleviated by the transformation of raw scores to SSs, which approximates the normal distribution. However, it is still recommended to further increase the size of the normative sample in the future. Cautious interpretation is particularly warranted for the older age cohorts due to their smaller sample size and a skewed sex distribution. For example, the 48 – 60 years age group was composed of 44 females and only 8 males. Additionally, clinicians should be cautious when using the normative data from the current study when assessing examinees with limited schooling, as the study’s participants were relatively well-educated. This may reflect the fact that the percentage of people with an academic degree in Israel is among the highest in the world (e.g., 57.9% of the Israeli Jewish population has post-secondary education; Israeli Central Bureau of Statistics, 2023). However, it raises concerns when testing examinees with lower educational levels and may also explain, at least partially, why participants in the current study recalled more words in the CVLT-IIIHebrew than participants in most earlier studies that were performed in a cross-cultural context, as noted earlier. Second, Israel has several minorities that differ in religion and a myriad of sociodemographic variables (e.g., quality of schooling). For example, approximately 21% of the Israeli population comprises Arab citizens, encompassing diverse religious and cultural groups (i.e., Muslims, Christians, Druze, and Circassians; Israeli Central Bureau of Statistics, 2022). As the current study provides normative data based on the performance of Jewish Israelis, employing the norms when testing examinees from minority groups should be done cautiously. This also calls for complementing the current study with normative studies of ethnic and religious minorities in Israel. Better stratification of participants according to their resident location is also called for, considering thatunderrepresentation of participants from the periphery of Israel in the current study (see Table 1) and socio-economic disparities associated with place of residence in Israel (Israeli Central Bureau of Statistics, 2024). Evaluating the impact of socio-economic status, which was not gathered as part of the current study, is also recommended in these future studies (Farah, Reference Farah2017). Finally, the CVLT-III includes a forced-choice trial (termed Forced Choice Recognition) used as an embedded validity indicator (Axelrod et al., Reference Axelrod, Miller, LaBuda and Boone2021, pp. 132 – 137). This optional trial was not performed as part of the current study, a limitation considering the importance of performance validity determination in neuropsychological assessment (Bush et al., Reference Bush, Ruff, Troster, Barth, Koffler, Pliskin, Reynolds and Silver2005; Sweet et al., Reference Sweet, Heilbronner, Morgan, Larrabee, Rohling, Boone, Kirkwood, Schroeder and Suhr2021). Providing normative data for this trial, analyzing measures that were not gathered as part of the current study (e.g., serial position effects), as well as adapting the alternate and brief forms of the CVLT-III, is therefore warranted and will promote clinical work and empirical research. Finally, as an anonymous reviewer noted, words belonging to one list B category (“parts of house”) were replaced in the CVLT-IIIHebrew with another category (“nature”). Evaluating the impact of this change on examinees’ performance in future studies is also of value.
Summary
With the increasing integration of neuropsychological assessments in Israel (Kave et al., Reference Kave, Bloch, Shabi and Maril2020; Vakil & Hoofien, Reference Vakil and Hoofien2016) and the need for additional wordlist learning tests in Hebrew, the need arose to adapt the CVLT-III to Hebrew. The current study details the adaptation process and provides initial normative data for the CVLT-IIIHebrew. These aims were achieved by deriving age-adjusted SSs for the CVLT-IIIHebrew and then utilizing regression-based adjustments to control the impact of sociodemographic variables. These adjustments can be easily performed using a norm calculator (Supplementary Material 3), which is also available as an online tool at https://bit.ly/45rqXFd. Concurrently, caution is warranted when testing examinees with low education level and older examinees (i.e., ≥ 48 years), as elaborated earlier. Clinicians should also be aware that the normative data may be biased when evaluating examinees from minority groups in Israel. Enlarging the sample size, evaluation of Israeli minorities, and adaptation of the optional forced-choice recognition memory subtest of the CVLT-III are among the endeavors that await further research and will enhance the coverage of the CVLT-IIIHebrew norms. With the increasing acknowledgment of the unique challenges of performing neuropsychological assessments in a cross-cultural context, these are important and will progress the services neuropsychologists provide to the Hebrew-speaking population.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1355617725101616.
Acknowledgments
The study’s findings are based on data gathered as part of a graduate course in Rehabilitation psychology titled “Advancement of Neuropsychological Assessment in Israel”, supervised by Prof. Yoram Braw. I thank the following students for aiding in data collection: Tsofiya Ansbacher, Noam Baruch, Liraz Bernstein, Moriah Cohen, Sapir Eliyahu, Yaara Elran, Hatzav Hanoch, Linoy Karni, Mor Nahari, Sarah Oved, Elior Oren, Chen Rashef, Yotam Shuker, Shaked Yeshaiahu, Mai Akiva, Noy Harel, and Daniela Winter.
Funding statement
None.
Competing interests
The authors declare that they have no conflict of interest.









