Cross-cultural evaluation of learning and memory using a consonant-vowel-consonant trigram list

Abstract Objective: Word list-learning tasks are commonly used to evaluate auditory-verbal learning and memory. However, different frequencies of word usage, subtle meaning nuances, unique word phonology, and different preexisting associations among words make translation across languages difficult. We administered lists of consonant-vowel-consonant (CVC) nonword trigrams to independent American and Italian young adult samples. We evaluated whether an auditory list-learning task using CVC nonword trigrams instead of words could be applied cross-culturally to evaluate similar learning and associative memory processes. Participants and Methods: Seventy-five native English-speaking (USA) and 104 native Italian-speaking (Italy) university students were administered 15-item lists of CVC trigrams using the Rey Auditory Verbal Learning Test paradigm with five study-test trials, an interference trial, and short- and long-term delayed recall. Bayesian t tests and mixed-design ANOVAs contrasted the primary learning indexes across the two samples and biological sex. Results: Performance was comparable between nationalities on all primary memory indices except the interference trial (List B), where the Italian group recalled approximately one item more than the American sample. For both nationalities, recall increased across the five learning trials and declined significantly on the postinterference trial, demonstrating susceptibility to retroactive interference. No effects of sex, age, vocabulary, or depressive symptoms were observed. Conclusions: Using lists of unfamiliar nonword CVC trigrams, Italian and American younger adults showed a similar performance pattern across immediate and delayed recall trials. Whereas word list-learning performance is typically affected by cultural, demographic, mood, and cognitive factors, this trigram list-learning task does not show such effects, demonstrating its utility for cross-cultural memory assessment.

Over the last century, numerous memory assessment procedures have been developed to measure and differentiate normal from abnormal memory functioning.Most of these measures are highly face valid and focus on quantifying the amount of to-beremembered information that can be recalled immediately after presentation and after a specified time delay.Most of these approaches are also highly verbal.Such procedures might evaluate the number of details from prose passages, the number of paired words, or the number of items on a list that can be recalled.Even recall of geometric figures can be susceptible to verbal labeling of shapes, suggesting that some linguistic processing occurs during purportedly visuospatial memory tasks.While most of these approaches effectively differentiate normal from abnormal memory functioning, performance on these measures can be influenced by demographic characteristics, such as sex, age, education, cultural and linguistic factors, and an examinee's familiarity with the to-be-recalled test material.For example, females commonly show at least a slight verbal episodic memory performance advantage over males (Asperholm et al., 2019;Hirnstein et al., 2023).These variables can influence an individual's processing efficiency and may lead to memory scores contaminated by demographic, cultural, and linguistic differences.
Though basic cognitive mechanisms are considered similar cross-culturally (Nell, 1999), the behavioral manifestations of higher-order processes are undeniably influenced by an individual's culture (Fernández & Abe, 2018;Puente & Agranovich, 2004;Rivera Mindt et al., 2010).The influence of culture on cognitive performance typically favors individuals born and raised in the geographical region where a test was developed (Cole, 1998).Moreover, specific cognitive tasks may be more complex and require more cognitive and neural resources in one culture than another (Gutchess et al., 2011).In addition to these culture-specific effects, individual cognitive measures may assess different cognitive abilities depending on the examinee's cultural background (Fasfous et al., 2013).Most cognitive tests are developed, standardized, and normed in predominantly "Western" and industrialized regions, such as the USA and UK, which share similar languages and origins and have high social and economic development rates.Several studies have reported that cross-cultural variability in socioeconomic and health status, as well as inequalities in educational opportunities, are confounds that can strongly influence cognitive test performance (Chin et al., 2012;Ferraro, 2016;Krch et al., 2015;Rosselli & Ardila, 2003;Schwartz et al., 2004;Weuve et al., 2018).
An essential cultural factor related to neuropsychological assessment is the language the examinee speaks.Because of the substantial verbal demands imposed by most memory measures, the words comprising a word list likely have shared and unique meanings to examinees who speak a specific language that do not lend themselves to translation.Additionally, word frequencies may differ substantially across languages.Simple word-for-word translations from one language to another may also change word phonology, such as the number of syllables per word, and subtle differences in semantic meanings of words, which collectively may influence memory performance.These limitations make direct translations of word list items from one language to another flawed and minimally informative.
Conventional word list memory measures usually use highly familiar words or material frequently occurring in the examinee's language.Familiarity with test stimuli could give some examinees an advantage on the task and may produce ceiling effects.Historically, Hermann Ebbinghaus (1885Ebbinghaus ( , 1964) ) made use of consonant-vowelconsonant (CVC) trigrams, a structured way of creating nonsense syllables, as stimuli in his quest to develop material that was devoid of inter-item associations and prior experience (Thorne & Henley, 2001).Nonword CVC trigrams are pronounceable combinations of letters (consonant-vowel-consonant) with no meaning or associations with other nonsense syllables.Nonword repetition depends more on the temporary storage of phonological representations in short-term memory during initial learning due to limited access to long-term lexical models that facilitate the recall of unfamiliar items.This manipulation eliminates the ability to use preexisting meaning, knowledge, or experience to facilitate recall.
To our knowledge, only one group has used CVC material within a list-learning task in two neuropsychological contexts.Bourke et al. (2012) presented the CVC nonsense syllables utilizing the structure of the Auditory Verbal Learning Test (AVLT; Rey, 1958) as part of a larger neuropsychological battery to study neuropsychological differences among persons with major depressive and social anxiety disorders, relative to matched controls.Persons with depression showed lower recall of CVC items across Trials 1-5 and a flatter learning curve than persons with social anxiety and matched controls.Delayed recall and recognition did not differ between groups.In a second study, Vierck et al. (2015) used the same paradigm to evaluate and screen persons for mild cognitive impairment.They reported that the CVC list-learning task showed similar psychometric characteristics as traditional list-learning tasks but with a reduced tendency for a ceiling effect.As in the first study, the CVC task was also sensitive to depression.These two studies suggest that using the AVLT paradigm with CVC material can evaluate processes associated with learning and memory, with concomitant reductions in the likelihood of ceiling effects.Importantly, this group does not appear to have investigated this procedure's potential for cross-cultural applications across speakers of different languages.
In this study, we implemented a similar CVC list-learning procedure to that of Bourke et al. (2012) and Vierck et al. (2015) with independent samples of undergraduate students collected in the United States and Italy.Although the United States and Italy are considered "Western" and industrialized (Masuda et al., 2020), the predominant languages used in each country, a fundamental component of their respective cultures, differ phonologically and semantically.CVC trigrams eliminate the semantic aspect of words, and they hold the phonological structure constant across items.Beyond language, other critical cultural differences that could plausibly impact cognitive function exist between the US and Italy, such as access to education and health care (Petrelli et al., 2020), nutritional status (Zhang et al., 2015), climate, and other material, social, and cultural resources (Marks et al., 2006).Thus, we investigated whether similar learning and memory processes would be observed using CVC stimuli in native speakers of two different languages from North America and Italy.If similar learning material across languages can evaluate the same memory processes, such a finding could lead to developing a standardized cross-cultural memory task.We hypothesized that both groups of participants would show a comparable and increasing number of items recalled over the five learning trials, a similar number of items retrieved on the first learning trial and the interference trial, and an equivalent ability to retain CVC material over a 20-min delay.We also hypothesized that the American and Italian samples would perform equally on all performance indexes.

Participants
The American sample included 75 (21 males, 54 females) Wayne State University undergraduate psychology students.The Italian sample consisted of 104 undergraduate psychology students (38 males, 66 females) from the University of Bergamo.Participants were recruited through the online Sona System at Wayne State University or an online Google form at the University of Bergamo.All participants received research credit for participation.Exclusion criteria included reporting any current or previous history of cognitive or neurological deficit, major psychiatric disorders, or any concern that would make participating in the research study challenging.Informed consent was obtained from all participants after fully explaining the research protocols and before starting the experimental session.The individual studies were approved by the Institutional Review Boards of Wayne State University and the University of Bergamo, and the research was completed in accordance with the Helsinki Declaration.

Measures
During the 20-min delay, both groups completed a demographic and health history questionnaire.Salthouse's Synonym and Antonym Test (Salthouse, 1993) and the Center for Epidemiologic Screening-Depression Scale (CES-D) (Radloff, 1977) were also administered during the delay to examine general intellectual functioning and depressive symptoms, respectively, for the American sample only.No parallel standardized Italian measures were available for the Italian participants, but they were given unstandardized direct translations of the Synonym and Antonym test and CES-D to fill the 20-min time delay.The Synonym and Antonym Test presents 20 target words with five response options for each word.Examinees must choose the correct synonym for the first ten target words, followed by selecting the correct antonym of the remaining ten target words.The CES-D consists of 20 statements reflecting depressive symptoms rated regarding their frequency of occurrence during the past week in four ordinal categories (Rarely/None of the Time to Most of the Time).We evaluated the possible contribution of these two constructs to performance on the CVC learning task, given that depression has reportedly been associated with performance on the CVC task (Bourke et al., 2012;Vierck et al., 2015) and crystallized intelligence is related to long-term memory (Hundal & Horn, 1977).
The CVC free recall task's structure used the AVLT framework.The AVLT is multi-trial list-learning approach for assessing immediate and delayed episodic memory, attention, and concentration (Lezak et al., 2012;Magalhães & Hamdan, 2010).Trigrams were presented using the software program, PsychoPy (Version V3.0.0b11;Peirce et al., 2019) on a Macintosh 27-inch 2015 iMac computer for the American study and an Apple MacBook Pro (13inch, Mid 2012) for the Italian study.The learning material consisted of 15 test trigrams (List A) and 15 interference trigrams (List B).Each trigram consisted of three English letters in a consonant-vowel-consonant pattern.Candidate items were randomly generated and selected.Trigrams that were actual words, meaningful acronyms, popular abbreviations, homophones, or challenging to pronounce were excluded.The items were displayed in black, bold Arial font on a white background on the computer screen to enhance illumination.
Because there are only 21 letters in the Italian alphabet (the letters j, k, w, x, and y are not used), some of the original CVC trigrams that had originally been developed for the American group were substituted with new items for the Italian sample (10 items for List A and five for List B).Table 1 shows the items used for each list for the two groups.

Procedure
The American study involved one in-person study visit between May and December of 2019.The Italian study took place during the COVID-19 pandemic via an online Skype video call using screen sharing of the list words.For the American study, participants were seated directly opposite the examiner, with the computer monitor in a room illuminated by overhead fluorescent light.Before the start of the experiment, the experimenter briefed the participant on the study's aims with the study information sheet.Informed consent was obtained, and all participants received a copy of this document.To begin the task, the participant faced the computer screen.Participants were seated about 91.44 cm (3 ft) from the computer screen.For the Italian online study, the experimenter briefed the participant on the study's aims and presented an informed consent form to the participant.Once participants agreed to participate in the study after reading the informed consent, the study began.Participants were asked to be in a quiet environment and indicated whether the screen was visible before the experiment commenced.
The CVC trigrams used for the American and Italian samples adopted the AVLT paradigm (Rey, 1958), replacing list words with CVC trigram nonwords.A 15-item list of nonword CVC trigrams (List A) was presented in a fixed order on five successive study-test recall trials.Trigrams appeared at a rate of one every 2 s, and participants read each item aloud as it was presented.At the end of each trial, participants were given a fixed period of 60 s to recall as many items as possible, and precisely 60 s separated the presentation of each trial.After the fifth trial, List B (consisting of a new 15-item list of nonword CVC trigrams) was presented, and participants were given 60 s to recall as many items as possible.Immediately after the recall of List B, participants were asked to identify as many trigrams from List A as possible within 60 s (Trial 6).Participants then completed a demographic questionnaire, the Salthouse Synonyms and Antonyms test, and the CES-D to fill a 20-min delay period.Next, for the delayed recall trial, participants were asked to recall as many items from list A as possible within 60 s.Performance indices included the number of items recalled on each trial (Trials 1 through 6, List B, and 20-min Delayed Recall), and the sum of items recalled across Trials 1 through 5.

Data analysis
Bayesian independent group t tests, correlations, and mixed-design analyses of variance (ANOVAs) were used to analyze the data.All analyses were conducted with JASP (JASP Team, 2023), version 0.17.1.A Bayesian analytic approach quantified the population parameter estimates most likely to underlie the observed data.The Bayes factor (BF 10 ) evaluates the degree of support for the alternate hypothesis relative to the null hypothesis given the observed data.BF 10 values between 3 and 10 suggest anecdotal evidence favoring the alternate hypothesis, while values greater than 10 strongly support the alternate hypothesis.In contrast, increasing support for the null hypothesis is indicated as BF 10 values become smaller than 1/3.As Bayesian effect size indexes for ANOVAs are difficult to compute using available software, frequentist partial η 2 effect sizes are presented to convey the magnitudes of the main and interaction effect sizes.The JASP default priors were as follows: ANOVAs used the uniform distribution as the prior; contingency table prior concentration was set to 1; t tests used the Cauchy prior scale 0.707; correlations used a stretched beta prior width of 1. Bayesian independent groups t-tests report the BF 10 , the median Standardized Mean Difference (SMD) effect size, and the 95% Credible Interval (CI) of parameter estimates likely to have given rise to the data.Small, medium, and large SMD effect size benchmarks are typically considered to be 0.2, 0.5, and 0.8, respectively.Small, medium, and large partial η 2 effect size benchmarks are 0.01, 0.06, and 0.14, respectively.Because Bayesian analysis primarily focuses on the posterior distributions that are unaffected by the number of comparisons one wishes to make, and p-values are not used, corrections for multiple comparisons are unnecessary (Kruschke, 2015).

Learning over trials
A 2 (Nationality: Italy, USA) × 2 (Sex: Male, Female) × 5 (Learning Trial: Trial 1 through Trial 5) Bayesian mixed-design ANOVA examined learning over trials.The dependent variable was the number of items correctly recalled on each trial.Of 19 possible models, including each combination of main and/or interaction effects, a model containing only the Learning Trial main effect had the highest BF 10 (BF Model = 13.22,partial η 2 = .56).This finding indicates that there was no evidence for main effects of Sex or Nationality, nor were there any other interaction effects, as illustrated in the model-averaged results table for each main and interaction effect in Table 2. Partial η 2 values for all other main and interaction effects were 0.01 or less.The BF 10 for the model including the Learning Trial main effect was 5.1 × 10 þ13 , which indicates the robust evidence in the data for the model that only includes Learning Trial.The BF 10 s for the other main and interaction effects, were less than 0.28.Notably, there were no appreciable effects of Sex or Nationality on the learning curve.

Comparison of American and Italian samples on memory performance indexes
Table 3 presents the means, SDs, BF 10 s, and standardized mean group difference effect sizes from a series of independent groups Bayesian t-tests for each of the five learning trials, List B, Trial 6, Delayed Recall, and the sum of words recalled across Trials 1-5 for the American and Italian participants, separately.Figure 1 presents the means and 95% credible intervals for each index by each nationality.Except for performance on List B, no group differences on any other performance index were observed, as indicated by the minimal BF 10 values.Italian participants performed slightly better than the American participants on List B by approximately one item.Except for List B, which showed a medium effect size, all performance indexes showed conventionally small or near-zero effect sizes.

Comparison of male and female performance on memory performance indexes
Table 4 presents the descriptive statistics for the AVLT learning and memory indexes separately for men and women, collapsed over nationality.Figure 2 displays the means and 95% credible intervals for each index in graphical form.As can be gleaned from the table, the small BF 10 values suggest no group differences in memory performance between men and women.All effect sizes are near zero or in the conventionally small range.

Susceptibility to proactive and retroactive interference
Susceptibility to proactive interference was investigated using a 2 (Nationality: Italy, USA) × 2 (Trial: Trial 1, List B) mixed-design Bayesian ANOVA.The model-averaged results in Table 5 demonstrate a robust main effect of Nationality (inclusion Note: P(incl): prior probability associated with the plausibility of an effect before looking at the data; P(excl): prior probability associated with the implausibility of an effect before looking at the data; P(incl|data): posterior probability associated with the plausibility of an effect given the data; P(excl|data): posterior probability associated with the implausibility of an effect given the data; BF inclu : the change from the prior inclusion odds to the posterior inclusion odds for each effect, averaged by all models including the effect, broadly reflecting the support for the effect given the data.
BF 10 = 20.8,partial η 2 = 0.06), whereby the Italian sample performed better than the American sample, averaged over Trial 1 and List B. There was no evidence of susceptibility to proactive interference for the overall sample (Trial main effect inclusion BF 10 = 0.1, partial η 2 = 2.1 × 10− 4 ) or a different pattern for the two Nationalities (Nationality × Trial interaction effect inclusion BF 10 = 0.3, partial η 2 = 0.02).These results are presented in Figure 3, whereby the Nationality main effect seems to be driven by the previously observed difference between the Italian and American samples, primarily on List B.   Another 2 (Nationality: Italy, USA) × 2 (Trial: Trial 5, Trial 6) Bayesian ANOVA examined possible pre and postinterference differences in recall that would reflect susceptibility to retroactive interference.Table 6 displays the model-averaged effects suggesting a robust main effect of Trial (inclusion BF 10 = 1.4 × 10 þ14 , partial η 2 = 0.56), no main effect of Nationality (inclusion BF 10 = 1.0; partial η 2 = 4.8 × 10 -4 ), and anecdotal evidence suggesting an interaction effect (inclusion BF 10 = 3.9, partial η 2 = 0.04).Figure 4 illustrates the means and 95% credible intervals for the two Nationality groups on the preinterference (Trial 5) and postinterference (Trial 6) recall trials.As indicated by the overlapping 95% credible intervals for the two Nationality groups, the evidence favoring an interaction does not appear compelling.In contrast, there does appear to be robust evidence in favor of susceptibility to retroactive interference for both groups, given the substantial performance decline following the presentation of the interference list.

Discussion
The primary aim of this cross-cultural study was to determine whether a list of nonword CVC trigrams presented in the   framework of the AVLT (Rey, 1958) could be used to evaluate learning and memory for native speakers of different languages.
Our results suggested that this approach can assess auditory learning and delayed recall in native English and Italian speakers.No performance differences were observed between the Italian and American participants on any of the five learning trials, the postinterference recall trial, the sum of Trials 1-5, or the Delayed Recall trial.The Italian group performed better than the American participants only on List B by approximately one item.No evidence for susceptibility to proactive interference was observed for either group.In contrast, robust susceptibility to retroactive interference was observed for both groups.Susceptibility to retroactive interference is not commonly observed on the AVLT for healthy individuals in this age group, but it is more commonly seen after age 60 (Vakil et al., 2010).No performance differences were observed between men and women, which was surprising given the typical verbal memory advantage often observed for women (Asperholm et al., 2019;Crossley et al., 1997;Hirnstein et al., 2023).
In addition, in the American sample, none of the memory outcome measures were related to age, depressive symptoms, as indexed by the CES-D, or crystallized intelligence, as measured by the Salthouse Synonym and Antonym Test.Language is the most prominent issue when attempting to establish the cross-cultural equivalence of a standardized assessment.Standardized learning and memory assessments have historically tended to be heavily language-based and were developed primarily in "Western" cultures, thus biasing assessment results (Mushquash & Bova, 2007).For example, the most commonly used learning and memory assessments are auditory verbal list-learning tasks, such as the AVLT paradigm (Rey, 1958).Ideally, simple translations of the assessment items and instructions would yield comparable psychometric properties.However, previous research suggests that several significant considerations must be made during translation, and psychometric equivalence is not ensured (Cromer et al., 2013;Rendu et al., 2012).Simple translations, especially for heavily language-based tasks or complex verbal instructions, can be problematic because word meaning and usage vary as a function of language and culture (Ardila, 2021).As a result, the direct translation of word learning tasks seems to be insufficient due to factors such as preexisting semantic connections, frequency and familiarity, different cultural approaches to testing and conceptualizations of memory, and other cultural nuances (Ardila, 2021;Leger & Gutchess, 2021).These factors become further complicated when considering the role of language functioning in bilingual individuals (Rivera Mindt et al., 2008).Even within "Western" nations, ethnic diversity has dramatically increased in recent decades as international immigration has become more accessible.As a result, neuropsychologists must be prepared to encounter and assess individuals from diverse backgrounds (Franzen et al., 2021;Goudsmit et al., 2017).Thus, there is a pressing need for a more thorough development of standardized assessments of learning and memory that remain psychometrically useful across languages and, thus, cultures.
Our study offers a novel demonstration that performance on a verbal list-learning task of nonword CVC items was functionally equivalent across samples from the USA and Italy.Using nonword CVC items delivered in the AVLT paradigm precludes many issues inherent in language-based tasks, such as individual and cultural differences in preexisting semantic associations and familiarity with items.Thus, using unfamiliar nonwords standardized in form and length (e.g., CVC) reduces language-related factors, such as differences in phonology and semantics across languages, that may bias participant performance.Unlike most word list-learning tasks, participants read each item aloud as it was visually presented in our task.This procedure may have introduced a unique supportive learning process not seen in other measures.However, we still did not observe a ceiling effect.Despite these advantages, relatively little work has been done to modify the AVLT paradigm with nonword CVC items (Bourke et al., 2012;Vierck et al., 2015).The two previous studies using a similar approach to our current work were conducted with samples from New Zealand.With the addition of our research to the limited extant literature, there is evidence from three distinct cultures that using nonword CVC items in the AVLT paradigm is a valid measure of learning and memory, with participants performing similarly across cultures.The current results suggest that this modified paradigm is a suitable cross-cultural measure of auditory-verbal learning and memory, given that similar stimuli evoked roughly equivalent responses across two distinct cultures and languages.
No significant disparities between males and females or learning, interference, or delayed recall trials were observed.This finding was somewhat surprising, as females tend to perform better on verbal memory tasks than males (Crossley et al., 1997;Geffen et al., 1990;Gordon & Clark, 1974;Graves et al., 2017;Kimura & Seal, 2003;Kramer et al., 1988;Norman et al., 2000;Weiss et al., 2006;Woodard, 2006).Bleecker et al. (1988) found that women outperformed men on most AVLT performance indexes.Interestingly, Kimura and Seal (2003) found that females outperformed men in recalling actual Note: P(incl): prior probability associated with the plausibility of an effect before looking at the data; P(excl): prior probability associated with the implausibility of an effect before looking at the data; P(incl|data): posterior probability associated with the plausibility of an effect given the data; P(excl|data): posterior probability associated with the implausibility of an effect given the data; BF inclu : the change from the prior inclusion odds to the posterior inclusion odds for each effect, averaged by all models including the effect, broadly reflecting the support for the effect given the data.words but not nonsense words.One possible reason for the absence of sex differences in performance may have been due to the nonsemantic and associative nature of the stimulus items.Because the CVC trigrams are equally unfamiliar to men and women, neither group had an advantage, resulting in no sex disparity in recall performance.These results imply that female superiority in verbal memory may result from using items that are familiar to the examinee, have preexisting associations, or have a unique semantic salience (e.g., "bird" may be more memorable than "kestrel" due to its greater frequency of use and because "bird" represents a superordinate category).Performance on the CVC trigram memory test was also unrelated to age.This finding is unsurprising due to the restricted age range consisting only of younger adults.Future studies using this approach should consider the possible effects of age in a sample that spans a more extensive age range.More surprising, however, was the absence of a relationship between crystallized intelligence and memory performance.As noted earlier, crystallized intelligence has a demonstrated association with long-term memory (Hundal & Horn, 1977).In a more recent study, Rapport et al. (1997) showed strong relationships between Verbal IQ on the Wechsler Adult Intelligence Scale-Revised (Wechsler, 1981) and learning indexes on the Wechsler Memory Scale-Revised (Wechsler, 1987) and California Verbal Learning Test (Delis et al., 1987) in a young adult sample.The strong relationships between crystallized intelligence and conventional verbal memory scores imply a performance advantage for individuals with larger vocabularies or preexisting familiarity with the to-be-learned material.The absence of associations between crystallized intelligence and all learning indexes on the CVC trigram memory test in the American sample suggests that performance on this task may be a purer index of auditory memory, as the CVC trigrams are devoid of semantic information that could potentially confer any performance advantage.
Depression has been inconsistently related to verbal memory performance.Some studies show relatively intact memory functioning in depression (Egeland et al., 2003;Hammar et al., 2011;Hammar & Årdal, 2013), and others report impaired memory functioning (Chen et al., 2018;Lee et al., 2012;Wang et al., 2022), particularly for individuals with recurrent depression (Basso & Bornstein, 1999).We found no relationships between CES-D score and memory performance on the CVC trigram test in the American sample, suggesting that it may be relatively insensitive to depressive symptoms.However, this finding should be interpreted cautiously, as our sample was relatively young and not a clinical group, which would likely restrict the range of CES-D scores.In addition, the two previous studies using the CVC trigram memory test (Bourke et al., 2012;Vierck et al., 2015) did observe modest but significant relationships with other measures of depression in older clinical and community samples (Bourke et al., 2012: mean age approximately 38 years, range 18-65 years; Vierck et al., 2015: mean age not reported, range 49-51 years) and they included participants meeting criteria for major depressive disorder.Thus, greater depression severity would be expected in those studies compared to our study where major depressive disorder was an exclusion criterion.The other two studies also used a faster presentation time than in our study (1 s/item versus 2 s/item), and they provided external auditory presentation of list items instead of having participants read the items aloud.Future cross-cultural research with this measure should evaluate the possible effects of depression on performance in both community and clinical samples with a broad age range.
The single significant nationality difference across individual trials reflected better List B performance for the Italian participants than for the American group.Although the mean difference amounted to only one word, it was associated with a medium effect size.While this isolated difference is puzzling, possible explanations could include a possible advantage conferred by online administration of the task or slightly better performance associated with the trigram set composition of List B for the Italian sample than the trigrams used for List B in the American group.A larger working memory capacity in the Italian sample relative to the American sample might be a third possibility that could be directly tested in future research.Note: P(incl): prior probability associated with the plausibility of an effect before looking at the data; P(excl): prior probability associated with the implausibility of an effect before looking at the data; P(incl|data): posterior probability associated with the plausibility of an effect given the data; P(excl|data): posterior probability associated with the implausibility of an effect given the data; BF inclu : the change from the prior inclusion odds to the posterior inclusion odds for each effect, averaged by all models including the effect, broadly reflecting the support for the effect given the data.

Limitations and future directions
Though the results of this study provide preliminary evidence supporting the cross-cultural use of CVC trigrams to assess aspects of auditory-verbal learning and memory, several limitations must be considered.First, the administration contexts differed for the two nationality groups, as the American participants were tested in person.In contrast, the Italian participants were tested online due to the COVID-19 pandemic.However, there were no group differences on any learning index, apart from List B, and this difference was relatively modest.This finding seems to imply that the administration format did not systematically influence CVC trigram learning task performance.Second, the CVC word lists shared many to most of the same items, but they were not identical because the original list of CVC trigrams used with the American sample included trigrams containing certain letters not used in the Italian alphabet.Again, except for List B, no systematic performance differences were observed between the two groups despite having somewhat different suggesting that the specific composition of the individual trigrams may have minimal to no influence on performance.Nevertheless, future work with the cross-cultural application of the CVC trigram learning task should consider differences in alphabet composition and other phonological differences across languages when constructing a list of trigrams.In doing so, a common list of items could be developed for use across several languages, assuming they use a common set of Latin/Roman alphabetical characters.This approach may be more challenging to apply in languages that use non-Latin/Roman alphabetical characters.Ease of pronunciation of CVC trigrams across languages should also be carefully considered.Some trigrams may be challenging to enunciate for speakers of some languages or dialects.Finally, the trigram memory task was not as limited by a ceiling effect as some word list-learning measures.However, CVC trigrams assess pure phonological verbal memory due to the nonsemantic nature of the items.The extent to which phonological verbal memory is sensitive to the effects of aging, neurodegenerative conditions, or brain injury as word list learning would be worthwhile to investigate in future research.A direct comparison of this novel task with existing word list-learning measures in the same participants would be helpful to determine similarities and differences between the learning processes tapped by each measure.Because the task removes semantic information but holds item phonology constant, it will be interesting to contrast this measure's sensitivity to preclinical Alzheimer's disease with existing word list-learning tasks.Finally, using this procedure with individuals from "non-Western" and less industrialized geographical regions would be an essential next step to establishing further the clinical utility of this approach.

Conclusion
Typical word list-learning measures that require examinees to recall words after the presentation of the word list commonly use semantically associated or nonassociated words or material already familiar to the learner.Because the word list stimuli typically include everyday items or high-frequency words, they may be less difficult to recall than nonsense syllables, especially among younger adults at the peak of their cognitive abilities.Nonword CVC trigrams are nonsemantic, eliminating familiarity with and preexisting associations among items.CVC trigram lists have a particular advantage over word lists in that the same common core set of trigrams can potentially be constructed to assess phonological memory across speakers of many different languages that use the same Latin/Roman alphabetical characters.The present study's findings suggest that using nonword CVC trigrams in a list-learning paradigm can overcome the challenges posed by preexisting familiarity and associations among words used in traditional list-learning measures of memory.This approach can also be used for cross-cultural memory assessment, as we found minimal differences in performance between Italian and American young adults.Moreover, the task appeared to assess learning and memory equally well for males and females, regardless of nationality.Finally, the trigram memory test reduces the likelihood of ceiling effects, minimizes sex differences in verbal memory performance and performance advantages conferred by extensive vocabulary knowledge, and lends itself to creating many different alternate forms.
Note: DR = Delayed Recall; T1-T5 Sum = Sum of words recalled across Trials 1-5; SD = standard deviation; SE = Standard Error; 95% Credible Interval reflects the lower and upper bounds of the 95% most likely mean value for each nationality; 95% Credible Interval for Effect Size reflects the upper and lower bounds for the 95% most likely standardized mean difference effect sizes.

Figure 1 .
Figure 1.Means and 95% credible intervals for learning and memory indexes by nationality.
Note: DR = Delayed Recall; T1-T5 Sum = Sum of words recalled across Trials 1-5; SD = standard deviation; SE = Standard Error; 95% Credible Interval reflects the lower and upper bounds of the 95% most likely mean value for each nationality; 95% Credible Interval for Effect Size reflects the upper and lower bounds for the 95% most likely standardized mean difference effect sizes.

Figure 2 .
Figure 2. Means and 95% credible intervals for learning and memory indexes by sex.

Figure 3 .
Figure 3. Means and 95% credible intervals for nationality × trial Bayesian ANOVA for susceptibility to proactive interference.

Figure 4 .
Figure 4. Means and 95% credible intervals for nationality × trial Bayesian ANOVA for susceptibility to retroactive interference.

Table 1 .
CVC trigrams for List A and List B used for the American and Italian studies

Table 2 .
Analysis of effects of trial, nationality, and sex on number of words recalled on each trial for Bayesian mixed-design ANOVA

Table 3 .
Means and standard deviations for learning indexes by nationality

Table 4 .
Means and standard deviations for learning indexes by sex

Table 5 .
Analysis of effects of trial, nationality, and trial × nationality for susceptibility to proactive interference (List A Trial 1 vs. List B Trial)

Table 6 .
Analysis of effects of trial, nationality, and trial × nationality for susceptibility to retroactive interference (List A Trial 5 vs. List A postinterference Trial 6)