Assessment of Prorated Scoring of an Abbreviated Protocol for the National Institutes of Health Toolbox Cognition Battery

Abstract Objective: To evaluate an abbreviated NIH Toolbox Cognition Battery (NIHTB-CB) protocol that can be administered remotely without any in-person assessments, and explore the agreement between prorated scores from the abbreviated protocol and standard scores from the full protocol. Methods: Participant-level age-corrected NIHTB-CB data were extracted from six studies in individuals with a history of stroke, mild traumatic brain injury (mTBI), treatment-resistant psychosis, and healthy controls, with testing administered under standard conditions. Prorated fluid and total cognition scores were estimated using regression equations that excluded the three fluid cognition NIHTB-CB instruments which cannot be administered remotely. Paired t tests and intraclass correlations (ICCs) were used to compare the standard and prorated scores. Results: Data were available for 245 participants. For fluid cognition, overall prorated scores were higher than standard scores (mean difference = +4.5, SD = 14.3; p < 0.001; ICC = 0.86). For total cognition, overall prorated scores were higher than standard scores (mean difference = +2.7, SD = 8.3; p < 0.001; ICC = 0.88). These differences were significant in the stroke and mTBI groups, but not in the healthy control or psychosis groups. Conclusions: Prorated scores from an abbreviated NIHTB-CB protocol are not a valid replacement for the scores from the standard protocol. Alternative approaches to administering the full protocol, or corrections to scoring of the abbreviated protocol, require further study and validation.


INTRODUCTION
Cognition is an important outcome in research trials and clinical practice (McInnes et al., 2017;Sheffield et al., 2018;Tang et al., 2018). To provide a common metric of cognition in the context of clinical research, the NIH Toolbox Cognition Battery (NIHTB-CB) was introduced. It is a brief, tabletbased cognitive assessment that has been validated for use in healthy populations and those with neurological and psychiatric disease Carlozzi, Tulsky et al., 2017;Weintraub et al., 2014).
The NIHTB-CB is comprised of seven instruments: two assessing crystallized cognition (Picture Vocabulary and Oral Reading Recognition) and five assessing fluid cognition (Flanker Inhibitory Control and Attention, List Sorting Working Memory, Dimensional Change Card Sort, Pattern Comparison Processing Speed, and Picture Sequence Memory) (Weintraub et al., 2013). Of these, both instruments that assess crystallized cognition and two that assess fluid cognition (List Sorting Working Memory and Picture Sequence Memory) can be modified for administration without any physical contact between examinee and tablet. The other three fluid cognition instruments are scored based on accuracy and reaction time, and thus, require in-person inputs into the tablet.
In the context of the COVID-19 pandemic where strict physical distancing guidelines have been implemented, there is a strong need for remote cognitive assessments (Gostin & Wiley, 2020). Our group has previously developed and validated a protocol for administering the NIHTB-CB using telemedicine to assess participants at remote sites (Rebchuk et al., 2019). However, this protocol still requires in-person conditions for some instruments. Recent guidelines published by the NIHTB-CB developers describe an abbreviated protocol, incorporating only the four instruments that can be administered entirely remotely (HealthMeasures Help Desk, 2020a).
We sought to explore whether a prorated score based on this abbreviated battery could provide a valid substitute for the standard score from the full battery. We assessed the agreement between prorated fluid and total cognition scores from the abbreviated protocol versus standard scores from the full protocol. The equations we applied to estimate prorated scores were derived from published regression equations for NIHTB-CB standard scores (Casaletto et al., 2015;HealthMeasures Help Desk, 2020b). As much ongoing research has been modified to facilitate physical distancing, this work helps to inform the future interpretation of data collected with the abbreviated NIHTB-CB protocol.

Data
We extracted participant-level NIHTB-CB data gathered under standard conditions by trained examiners as part of six previous or ongoing studies in individuals with neurological disease [history of stroke or mild traumatic brain injury (mTBI)] or psychosis (inpatients with treatment-resistant psychosis) and healthy controls (no history of neurological disease, learning disability, or active psychosis). See Supplementary material for details of respective studies.
For all data sets, the NIHTB-CB was administered on an iPad (Apple, California, USA), and Form A of the cognition battery was used. Participant demographic data were captured with written questionnaires. All participants were older than 18 years and provided written informed consent. The experimental protocols for the respective studies were approved previously by the University of British Columbia's Clinical Research Ethics Board, and conformed to the Declaration of Helsinki.

Statistical Analysis
We chose to report standard scores corrected for age (mean = 100, standard deviations = 15) and not other demographic variables because education levels may not be equivalent across regions where our data were collected (Vancouver, Canada) and where the NIHTB-CB was normed (United States) (Chevalier et al., 2016). As well, several of our participants identified with race(s) that the NIHTB-CB race/ethnicity options failed to capture.
Prorated fluid (Equation 1) and total (Equation 2) cognition scores were derived from appropriate regression equations provided with the NIHTB-CB (Casaletto et al., 2015;HealthMeasures Help Desk, 2020b). The prorated fluid cognition score included instruments (List Sorting Working Memory and Picture Sequence Memory) that can be administered remotely without the examinee having direct access to the tablet. Data were separated into healthy controls and disease-specific groups (stroke, mTBI, and psychosis). Demographic data between groups were compared using one-way analysis of variance for parametric data and chi-square test for categorical data. Paired t tests were used to compare the standard and prorated fluid cognition score within each group; data met assumptions of normality (Meyers et al., 2013). Prediction error was determined for the difference between standard and prorated scores for each participant, as well as mean prediction error for each group. Intraclass correlation (ICC) values between standard and prorated fluid cognition grouplevel scores were generated using two-way mixed effects, absolute agreement, and multiple measurements model (Koo & Li, 2016). Data met assumptions of normality and equality of variance for ICC analyses. All analyses were repeated for the prorated total cognition scores. We operationalized a clinically meaningful discrepancy as 0.5 standard deviations (or 7.5 standard score points), and calculated the frequency of participants with prorated-standard discrepancies exceeding this magnitude (Silverberg & Millis, 2009). A prediction error of zero reflects equal standard and prorated scores. Chi-square tests were used to compare observed frequencies of participants with clinically significant prediction errors (i.e., exceeding ±0.5 SD difference between total and prorated score) between groups. Data met the assumptions of chi-square testing.
Given the exploratory nature of the study, we did not correct for multiple comparisons. Significance was set a priori at 0.05. Statistical analyses were performed using IBM SPSS Statistics (Version 19.0; IBM Corp., Armonk, NY).

Group-level Comparisons
Overall, fluid cognition prorated scores were higher than standard fluid cognition scores (mean difference þ4.5, SD = 14.3; p < 0.001). These differences were significant in the stroke and mTBI groups, but not in the healthy or psychosis groups. This resulted in overall prorated scores for total cognition also being higher than standard total cognition scores (mean difference þ2.7, SD = 8.3; p < 0.001). Again, these differences were only significant in the stroke and mTBI groups (see Table 1). Overall agreement between prorated and total scores as per the ICC was moderate-to-good for fluid cognition only, and good-toexcellent for total cognition.

DISCUSSION
The aim of this exploratory study was to assess the validity of a prorated score, based on a proposed abbreviated NIHTB-CB protocol, against the standard score for the usual protocol (HealthMeasures Help Desk, 2020a). Particularly during COVID-19-related physical distancing measures, the potential advantage of an abbreviated protocol is its ability for remote administration without personnel alongside the examinee. Beyond the COVID-19 pandemic, advantages of a fully remote protocol could include greater participation by those with mobility restrictions or in isolated communities, and fewer losses to follow-up (Berge et al., 2016). Overall, we found that prorated scoring for the abbreviated protocol overestimated fluid and total cognition standard scores. However, differences were noted between testing groups, with no group-level differences seen between prorated and standard scores in healthy individuals or in those with treatment-resistant psychosis.
It is uncertain as to whether these significant differences in group-level performance represent true differences related to domain-specific deficits from lesional injuries in the stroke or mTBI participant groups, random error, or insufficient statistical power to detect between-group differences in the healthy control group or, in particular, the psychosis group, which has the fewest participants (McInnes et al., 2017;Nys et al., 2007;O'Brien et al., 2003). The instruments included within our prorated scores include measures of working memory and episodic memory, and fail to capture processing speed, attention, and executive function (Mungas et al., 2014). It may be that anatomic lesions or functional deficits (e.g., frontal lobe injury, motor deficits, and fatigue) in the stroke and mTBI cohorts result in worse performance in executive function and timed tasks, in particular, and hence lead to the overestimation of prorated scores with exclusion of instruments assessing these specific domains. The data were collected as part of six separate studies, and unmeasured confounders specific to study conditions may also play a role.
Although exclusion of processing speed, attention, and executive function tests from prorated scores failed to Table 1. Standard and prorated age-corrected standard scores (mean, SD) for fluid cognition and total cognition in healthy participants and those with stroke, psychosis, and mTBI. ICCs (95% CIs) between standard and prorated scores are given significantly affect the assessment of healthy controls and psychosis cohorts at the group level, we cannot confidently conclude that prorated scores are equivalent to standard scores in these groups. Amongst healthy controls, 60.0% of prorated scores were overestimated or underestimated by a clinically significant margin, and amongst psychosis patients, the rate was 59.0%. Given the significant variability in patient-level performance, these two methods should not be considered equivalent when considering individuallevel data. Our study has limitations. Our findings are limited to healthy individuals and those with stroke, mTBI, or treatment-resistant psychosis. Future studies should explore whether there may be groups in which an abbreviated protocol may be appropriate. Additionally, we only reported age-corrected scores, which do not control for sex, education, and ethnicity of participants; these factors may influence NIHTB-CB performance (Casaletto et al., 2015).
At this point in time, we are simply comparing in-person testing with prorated versus standard scoring in advance of considering entirely remote adaptations of the NIHTB-CB protocol. We have not prospectively validated an abbreviated remote protocol as we are limited by current physical distancing recommendations related to the COVID-19 pandemic.
In conclusion, an abbreviated NIHTB-CB protocol is a pragmatic solution in the context of physical distancing requirements, but does not constitute a valid replacement for the standard protocol. Our preliminary findings suggest that prorated scores excluding the Flanker Inhibitory Control and Attention, Dimensional Change Card Sort, and Pattern Comparison Processing Speed instruments may tend to overestimate Fluid Composite scores. Thus, a fully remote version of the NIHTB-CB should include adapted versions of the timed instruments. We provide empirical evidence in support of newly updated guidelines by the NIHTB developers, which now state that prorated scores may not be comparable to standard scores (Salesforce, 2020). Still, remote administration of the current abbreviated protocol warrants further validation of the nontimed instruments. These individual instruments, administered remotely, may still benefit continuity of research measuring crystallized cognition and working and episodic memory.