Three-month Practice Effect of the National Institutes of Health Toolbox Cognition Battery in Young Healthy Adults

ABSTRACT: The National Institutes of Health Toolbox-Cognition Battery (NIHTB-CB) is a tablet-based cognitive assessment intended for individuals with neurological diseases of all ages. NIHTB-CB practice effects (PEs), however, need clarification if this measure is used to track longitudinal change. We explored the test–retest PEs on NIHTB-CB performance at 3 months in young healthy adults (n = 22). We examined corrected T-scores normalized for demographic factors and calculated PEs using Cohen’s d. There were significant PEs for all NIHTB-CB composite scores and on 4/7 subtests. This work suggests the need to further assess NIHTB-CB PEs as this may affect the interpretation of study results incorporating this battery.

The National Institutes of Health Toolbox Cognition Battery (NIHTB-CB) is a brief tablet-based assessment of cognitive functioning designed for use across the lifespan. 1 The NIHTB-CB provides common data elements for clinical research in evaluating cognition and has been validated for use in many neurological disorders. 2,3 The NIHTB-CB has seven subtests and produces scores normalized for age, gender, education and race-ethnicity for Fluid cognition, Crystallized cognition, and Total cognition.
Practice effects (PEs), or improvement in test performance due to repeated exposure to testing materials, have been investigated for the NIHTB-CB in a small body of literature over short-(1-5 weeks) 4,5,6 and longer-term (15 months) 7 intervals in middle-aged and older adults. These studies did not find consistent evidence of PEs in these particular cohorts, over these time intervals. It is unclear if these findings can be extrapolated to younger adults, who may have a better cognitive reserve in the context of a neurological insult. 8 Recent work from our group has demonstrated that the NIHTB-CB detects cognitive deficits in high-functioning young stroke survivors with normal scores on the Montreal Cognitive Assessment. 9 We conducted an exploratory pilot in the healthy control group of the aforementioned study to assess possible PEs as a consideration in future research with young adults. We investigated a 3-month test-retest interval as 90-day outcomes are commonly assessed in acute stroke trials. Further, many stroke survivors will experience persisting cognitive impairment at 3 months despite an excellent functional recovery. 10,11 We expected higher retest performance for Fluid Cognition scores and Total Cognition. We expected stable retest performance for Crystallized Cognition scores.
We recruited healthy adults aged 18-55 years old. Further eligibility criteria included fluency in English, normal use of one's dominant hand, no history of neurological or psychiatric disease, no diagnosed learning disability, and no prior exposure to the NIHTB-CB. Participants were recruited beginning in November 2017 by advertisement at a local academic hospital. The intended sample size was a convenience sample of 50 participants, who were serving as controls for a study in young stroke survivors. 9 This provided over 80% power a priori with a two-tailed alpha of 0.05 to detect an estimated effect size of 0.25 based on assumptions from previous test-retest work. 6,12 Recruitment was still open in March 2020, but due to restrictions related to the COVID-19 pandemic, we completed the study early as an exploratory pilot with 22 participants.
The NIHTB-CB has seven subtests measuring five major cognitive domains: language, executive function, episodic memory, processing speed, and working memory. 1 In addition to reporting performance on individual tests, subtests are aggregated as measures of Crystallized Cognition (Picture Vocabulary and Oral Reading Recognition) and Fluid Cognition (Flanker Inhibitory Control and Attention, List Sorting Working Memory, Dimensional Change Card Sort, Pattern Comparison Processing Speed, and Picture Sequence Memory). Performance is adjusted for demographic factors, including age, education, gender, and race-ethnicity. Subtest and composite scores are reported as fully corrected T-scores (mean: 50, SD: 10). 13 A trained research assistant administered the NIHTB-CB on a 9.7" iPad Pro (Apple, CA) in a quiet, distraction-free room. The assessment was administered in English. Administration time was approximately 30 minutes. Assessments were completed under the same test conditions with a 3-month (±2 weeks) testretest interval.
The study protocol was approved by the local institutional review boards and the research ethics committee at the University of British Columbia. Written informed consent was obtained by all participants.
Statistical analysis was completed using IBM SPSS Statistics version 26.0 (Armonk, NY). Demographic data are reported as descriptive statistics. Test-retest comparisons were made using paired t-tests and Wilcoxon rank tests (two-tailed, p ≤ 0.05) for parametric and non-parametric data, respectively. Test-retest bivariate correlations were made using Pearson's correlation coefficient (r). PEs (effect sizes) were calculated using Cohen's d for repeated measures (within-subject version) with 95% confidence intervals and cutoffs of 0.2, 0.5, and 0.8 for small, moderate, and large effects, respectively. 14 All participant data are included in the analysis.
Mean NIHTB-CB scores for Fluid, Crystallized, and Total Cognition were significantly higher at the second compared to the first administration ( Table 2, Table 2).
In this small cohort of young, educated, healthy adults, we found a significant 3-month PEs across all composite cognition scores of the NIHTB-CB. Moderate-to-small PEs were also seen across several individual subtests, with most marked changes in Picture Sequence Memory and Flanker Inhibitory Control.
Our findings differ from previous work examining PEs in the NIHTB-CB as both Fluid and Crystallized composite scores improved on re-testing. Previous work investigating a short test-retest interval of 3 weeks found a small to moderate effect size for Fluid Cognition (d = 0.42) and Total Cognition (d = 0.29), but no significant PE for Crystalized Cognition. 4 Our findings are unexpected. While fluid cognition represents dynamic cognitive processes, including working memory and executive function, which are more susceptible to aging or brain injury, crystallized cognition reflects cognitive processes relying on language and comprehension and tends to be stable across the lifespan with greater resilience to brain changes. 3,4 Thus, it is possible that, despite their emphasis on testing crystallized cognition, the design of the Picture Vocabulary subtest may still be sensitive to learning effects, at least amongst our cohort and over this time interval.
In contrast to 3-week repeated administration of the NIHTB-CB, an extended test-retest period of 15 months in older adults showed no significant PEs on any of the composite cognition scores, and a small effect size for the Dimensional Change Card Sort task. 7 These differences are not unexpected as PEs are typically more prominent with a shorter test-retest interval, and with higher frequency testing. 15 Our results are in line with PEs reported in older adults over a 3-5-week retest interval where significant PEs were seen for both Fluid and Total Cognition scores, as well as the Picture Sequence Memory task. 5 However, it is challenging to compare PE trends between our younger adult sample and older adults as fluid cognition is known to decrease across adulthood in parallel with age-related neurobiological changes. 16 Improved characterization of PEs helps to minimize overestimation of cognitive recovery, or, conversely, underestimation of cognitive deterioration, over time. Our findings suggest that three-month PEs, even on Crystallized Cognition performance, may be a potential consideration in interpreting longitudinal changes in NIHTB-CB scores.
There are limitations to our study. We had a smaller-thananticipated sample size due to early termination with the COVID-19 pandemic. Importantly, the demographics of our study group, who are primarily white with post-secondary education, are not representative of the general population. However, cultural and educational backgrounds are thought to impact crystallized cognitive ability more than fluid cognition, which would not explain the moderate PEs for the Fluid Cognition score reported here. 4 Additionally, early-life education, as well as education in midto-late adulthood has been shown to improve crystallized cognition, but does not influence working memory, executive function, or other fluid cognitive abilities. 17 The homogenous nature of our sample is further demonstrated by the fact that some of the standard deviations for the NIHTB-CB T-scores were small, for Total Cognition, and with a leptokurtic distribution. This may contribute to the moderate-to-large observed PEs on composite scores, despite nonsignificant effect sizes on many individual subtests. We found similar distributions in a previous study examining 3-week PEs in in-person versus virtual test conditions in healthy controls who had similarly high levels of education. 6 We also acknowledge that, although the 3-month interval was chosen in considering usual timepoints for trials in acute stroke, different follow-up time intervals will be preferred for studies of different neurological and psychiatric conditions. Finally, we did not account for other factors such as sleep and stress, which may confound performance.
Despite the limitations of this pilot study, however, given that the NIHTB-CB is increasingly used as an outcome measure in studies of neurological and psychiatric disease, we feel it is important to draw attention to potential considerations of PEs if the NIHTB-CB is to be used as a repeated measure. Future confirmatory work with larger and more diverse cohorts is warranted.
Our findings suggest that the NIHTB-CB may have lower testretest reliability in short-term repeated administrations in young, educated, healthy adults. Although preliminary, this work suggests that PEs may need to be considered in studies repeating the NIHTB-CB at 3-month intervals, and that even Fluid Cognition subtests may be subject to PEs in some circumstances. This work may be informative for clinical trials or observational studies using the NIHTB-CB to assess longitudinal cognitive outcomes.