Estimating premorbid intelligence in people living with dementia: a systematic review

ABSTRACT Objectives: In diagnosing dementia, estimating premorbid functioning is critical for accurate detection of the presence and severity of cognitive decline. However, which assessments of premorbid intelligence are most suitable for use in clinical practice is not well established. Here, we systematically evaluate the validity of instruments for measuring premorbid intelligence in people living with dementia. Design and setting: In this systematic review, electronic databases (EMBASE, PsycINFO, MEDLINE, CINAHL, and AMED) were searched to identify studies reporting on objective measures of premorbid intelligence in dementia. Participants from included studies were recruited from local communities and clinical settings. Participants: A total of 1082 patients with dementia and 2587 healthy controls were included in the review. Measurements: The literature search resulted in 13 eligible studies describing 19 different instruments. The majority of instruments (n = 14) consisted of language-based measures, with versions of the National Adult Reading Test (NART) being most commonly investigated. Results: Preliminary evidence suggested comparable performance of patients with mild dementia and healthy controls on word reading tasks in English, Portuguese, Swedish, and Japanese. In moderate dementia, however, the performance was significantly impaired on most verbal tasks. There was a lack of reliability and validity testing of available instruments, with only one of the included studies reporting psychometric properties within the patient group. Conclusions: The results demonstrate that there is a wide range of tools available for estimating premorbid intelligence in dementia, with cautious support for the potential of word reading tasks across different languages in individuals with mild dementia. However, the review highlights the urgent need for extensive assessments of the psychometric properties of these tasks in dementia. We propose that further longitudinal research and assessments of nonverbal measures are necessary to validate these instruments and enhance diagnostic procedures for people living with dementia worldwide.


Introduction
Dementia is a highly prevalent neurodegenerative disorder and a leading global cause of disability and mortality (Vos et al., 2017;Nichols et al., 2019). It has been estimated that 46.8 million people were living with dementia worldwide in 2015, with numbers expected to rise to 131.5 million in 2050 (Alzheimer's Disease International, 2015). Recent global research prioritization initiatives and policies aiming to reduce the burden of dementia have highlighted the importance of early and accurate diagnosis (Alzheimer's Disease International, 2012;Shah et al., 2016;World Health Organization, 2017). Among others, a timely diagnosis can result in earlier interventions such as prescription of acetylcholine inhibitors to maintain function, enhanced advance care planning for patients and their families, and identification of relevant agencies and support networks (Dubois et al., 2016;Robinson et al., 2015). To establish a diagnosis of dementia, it must be ascertained that a decline in cognition compared to previous levels of functioning has occurred. Determining whether a change in cognitive ability has taken place can be challenging in clinical practice, however, as previously obtained measures of cognition are rarely available at clinical presentation.
To remedy this problem, various assessments have been developed to estimate premorbid intelligence. Three of the most commonly used approaches are demographic regression equations (Barona et al., 1984;Wilson et al., 1978), irregular word reading tasks (Nelson and O'Connell, 1978;Wechsler, 2001;Wechsler, 2011), and lexical decision-making tasks (Baddeley et al., 1993;Yuspeh and Vanderploeg, 2000). The first method computes an estimated intelligence quotient (IQ) based on variables such as education, geographic residence, and occupation. A major advantage of regression equations is that demographic details are independent of current levels of functioning, and therefore are inherently unaffected by cognitive decline due to dementia. In practice, however, reliability may be hampered by difficulties in obtaining accurate information from patients and/or limited access to demographic records. Furthermore, it has been suggested that regression equations tend to provide inaccurate estimates for people with an IQ outside the average range (Goldstein et al., 1986;Veiel and Koopman, 2001;Griffin et al., 2002), and can only predict approximately 50% of the variance in measured intelligence (O'Carroll, 1995).
Irregular word reading and lexical decisionmaking tasks, on the other hand, rely on current performance rather than self-reported information. Scores on these tasks are strongly correlated with general intelligence assessments in healthy adults (Crawford et al., 1989a;Yuspeh and Vanderploeg, 2000). Their use as a measure of premorbid intelligence is based on the assumption that the ability to pronounce irregularly spelled words or differentiate real words from pseudo-words is relatively resistant to cognitive decline. However, although some early studies supported the view that performance on these tasks is stable in dementia Nelson and McKenna, 1975;Sharpe and O'Carroll, 1991;Crawford et al., 1988a), others reported significantly lower scores in patients compared with healthy controls (Stebbins et al., 1990;Patterson et al., 1994;O'Carroll et al., 1995;Schmand et al., 1998). In addition, while good test-retest and inter-rater reliability have been demonstrated for several instruments in healthy adults (Crawford et al., 1989a;O'Carroll, 1987;Dykiert and Deary, 2013), it is unclear whether these psychometric properties extend to clinical populations. Furthermore, the use of these measures across cultures and languages has not been systematically evaluated.
Considering their widespread use in clinical settings, a better understanding of the validity of measures for estimating premorbid intelligence in dementia is vital. Accurate estimates are key to correct interpretations of scores on cognitive screening tests, and consequently accurate diagnosis of dementia. The present systematic review focuses on the following questions: (1) what assessments are currently available for the measurement of premorbid intelligence in dementia?; (2) do estimated premorbid intelligence scores on these instruments remain stable in dementia? That is, are task scores similar for healthy adults and patients in crosssectional comparisons, and/or are patient scores constant over time?; and (3) what are the psychometric properties of the identified tools in people living with dementia? The main objective of this review is to clarify which measures of premorbid intelligence may be most suitable for assessing patients with dementia in global clinical practice.

Search strategy
The predefined protocol for this systematic review was registered with PROSPERO (CRD42019133499) and was based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Moher et al., 2015). Databases (EMBASE, PsycINFO, MEDLINE, CINAHL, and AMED) were searched from 1999 until May 2019 using the NICE Healthcare Databases Advanced Search. Papers were identified through Boolean operators using keywords for dementia ("dementia" OR "Alzheimer") and premorbid intelligence ("premorbid" AND ["intelligence" OR "intellect"]) with the thesaurus "explode" function. The search was restricted to papers published in the English language. References in the selected journal papers and previous reviews were screened manually to supplement the main search methods.

Paper selection
Titles and abstracts and, where appropriate, full text of identified citations were independently screened by two authors (MJO and SL). Any disagreements were resolved by consensus and a third author (TJW) was consulted when needed. The following criteria had to be met for inclusion in the review: 1) Published as a journal paper or letter. 2) Participants had a diagnosis of any type and severity of dementia, except for dementia secondary to acquired brain injury or non-neurological disease.
3) Diagnosis of a dementia was determined using standardised criteria (e.g. Diagnostic and Statistical Manual of Mental Disorders, International Classification of Diseases). 4) Performance on an objective assessment of premorbid intelligence was a primary or secondary outcome, and its relation to diagnosis or severity of dementia was investigated with statistical analyses.

Data extraction
The following details were extracted independently by two authors (MJO and SL) from each study using a structured form: study characteristics (study design, sample size, recruitment site, and diagnostic criteria used), participant demographics (mean age, years of education, type and severity of dementia, percentage of female participants), assessment scale of premorbid intelligence (name, type, language, and scores of patient and control groups), psychometric properties in the patient group (test-retest reliability and interrater reliability), and results (statistically significant findings at p < 0.05, unless otherwise determined by the authors The rationale for this exclusion is that MCI is a highly heterogeneous syndrome, which in some cases progresses to dementia but can also remain stable or even reverse over time (Gauthier et al., 2006). Where possible, Cohen's d was estimated for the core comparisons by computing the difference between reported group means divided by their pooled standard deviation. The pooled standard deviation was calculated as per Cohen (1988): n 1 þ n 2 À 2 s with SD 1 and SD 2 denoting the standard deviations for each group and n 1 and n 2 referring to their respective sample sizes. Effect sizes were interpreted as small (d = 0.2), medium (d = 0.5), or large (d = 0.8).
Quality assessment and data synthesis Study quality and risk of bias were evaluated according to the AXIS tool (Downes et al., 2016), a checklist comprised of 20 items designed for quality assessment of observational studies. Quality was appraised according to the number of items for which a "yes" response was recorded and rated as "high quality" (15)(16)(17)(18)(19)(20), "moderate quality" (8-14), or "low quality" (0-7). A list of instruments for the assessment of premorbid intelligence in dementia was generated from the selected papers. The key outcomes of the identified studies and the psychometric properties of the instruments, where available, are presented in a narrative synthesis.

Study selection and characteristics
Titles and abstracts of all identified papers after removal of duplicates (n = 304) were screened, with 13 studies meeting the stipulated eligibility criteria after full-text review. A flow diagram of the identification and attrition of studies is provided in Figure 1. Study design and demographic details of participants were recorded for all studies (see Table 1). The studies were conducted in eight different countries (Australia, Brazil, Germany, Japan, Portugal, Sweden, the UK, and the USA).

Identified instruments
Nineteen objective measures of premorbid intelligence were identified, including revisions, parallel versions, and variants in different languages. The most commonly investigated tools were word reading tasks (47.4%), followed by lexical decision tasks (21.1%), visuospatial reasoning tasks (15.8%), demographic equations (10.5%), and a word description task (5.3%). The majority of assessments were conducted in English (47.4%) and Portuguese (31.6%), with the remaining tasks carried out in German (10.5%), Swedish (5.3%), and Japanese (5.3%). The key outcomes from cross-sectional and longitudinal studies for each of the identified instruments are summarized in Tables 2 and 3, respectively. WORD R E A D I N G A total of nine studies investigated the performance of dementia patients on word reading tasks (see Tables 2 Premorbid intelligence assessment in dementia 1147 and 3). English assessments included the original and revised versions of the National Adult Reading Task (NART and NART-R), the Wechsler Test of Adult Reading (WTAR), version III and the revised version of the Wide Range Achievement Test (WRAT-III and WRAT-R), and the Cambridge Contextual Reading Test (CCRT). In the NART and the WTAR, participants are asked to read a list of 50 words which have irregular grapheme-phoneme correspondences (Nelson, 1982;Nelson and Willison, 1991;Wechsler, 2001). The WRAT differs from these two measures through its inclusion of words following regular spelling rules (Jastak and Wilkinson, 1984). Finally, the CCRT is comprised of the same words as the NART, but provides greater syntactic and semantic context by presenting each word within a sentence (Beardsall and Huppert, 1997). The aim of this adaptation is to facilitate recognition of the word and thereby improve task performance.
In the mild stage of dementia, no significant differences were observed between healthy adults and patients on any of these tasks in two crosssectional studies (McCarthy et al., 2005;McFarlane et al., 2006). Direct task comparisons indicated that the performance of patients with mild dementia was better on the CCRT than the NART, suggesting that embedding words within a sentence may improve scores in dementia patients (McFarlane et al., 2006). When disease severity was moderate, one study reported similar scores for healthy adults and patients on the WRAT-R and WRAT-III  differences in word reading scores was found for the NART by McFarlane et al. (2006), while the remaining studies observed a medium effect size on the NART, NART-R, WTAR, and CCRT.
In a longitudinal study, WRAT-III reading scores were significantly higher for control participants than patients with Alzheimer's disease in baseline assessments, but raw scores did not decline significantly in either patients or controls over a 1year period (Ashendorf et al., 2009). In studies with a longer follow-up time of 3 years, however, steeper declines in performance were observed for patients than controls on the NART (Cockburn et al., 2000), and lower MMSE scores were systematically associated with a greater decline on the WTAR (Weinborn et al., 2018). In the latter, the authors noted that a large proportion of the recruited patients were lost to follow-up due to death (18.5%) or withdrawal (41.4%). Moreover, only 75 of the remaining 132 patients were able to complete the WTAR on follow-up, with participants who could not carry out the assessment being more cognitively impaired at baseline (Weinborn et al., 2018). The degree of decline on the WTAR may therefore be larger than estimated in the assessed patient group.
In addition to these English tasks, the systematic search identified adaptations of the NART into Portuguese (TeLPI; Alves et al., 2013), Swedish (NART-SWE; Rolstad et al., 2008), and Japanese (JART; Matsuoka et al., 2006). Due to language differences, the NART-SWE items consisted of loan words rather than irregular Swedish words (Rolstad et al., 2008). The JART was based on Kanji characters, an ideographic script which is used to represent lexical morphemes. Many Japanese words are compounds comprised of multiple Kanji, and the pronunciation of an individual character can vary across different words. The authors propose that the JART provides a suitable adaptation of English irregular word reading tasks as it similarly requires word-specific translations from orthography to phonology (Matsuoka et al., 2006). For all three tasks (the TeLPI, NART-SWE, and JART), comparable task performance was observed in patient and healthy control groups (Rolstad et al., 2008;Matsuoka et al., 2006;Alves et al., 2013). It should be noted that all studies focused on patients with mild dementia. Overall, these findings suggest that word reading tasks may have potential as an assessment of premorbid functioning across different languages in early dementia.

L E X I C A L D E C I S I O N -M A K I N G
The four studies assessing lexical decision-making tasks all focused on the Spot-the-Word (STW) task or adaptations of this instrument. In the original STW task, participants have to select which of a Table 1. Premorbid intelligence assessment in dementia 1149  (Wechsler, 2011). The assessment comprises 60 pairs of real words and pseudo-words of varying word length and frequency. An equivalent version of this lexical decision-making task (LDT) was developed in Portuguese by Serrao and colleagues (2015). Two German adaptations used the same general principle as the STW, but required participants to identify the real word among four pseudo-words rather than presenting word pairs (Mehrfachwahl-Wortschatz-Test A and B, MWT-A and MWT-B;Binkau et al., 2014;Hessler et al., 2013). In the English STW task, test scores were comparable for controls and patients with both mild and moderate dementia (McFarlane et al., 2006). In the Portuguese LDT, mild dementia was associated with numerically lower scores than healthy adults, but this difference was not significant in general linear models (Serrao et al., 2015). For the German adaptations of the task, on the other hand, scores were significantly lower for patients than controls on both the MWT-A (Binkau et al., 2014) and MWT-B (Hessler et al., 2013). Impaired performance was observed across the entire spectrum of disease severity, from mild to severe dementia. In addition, effect sizes for these differences were estimated to be medium to large.

D E M O G R A P H I C R E G R E S S I O N E Q U A T I O N S
The Barona Index ( Barona et al., 1984) and a demographic regression equation based on Crawford and colleagues' (1989b) work were assessed in two crosssectional studies (McCarthy et al., 2005;McFarlane et al., 2006). The Barona Index is based on age, sex, race, education, occupation, and geographical residence, whereas the latter equation includes the variables age, total years of education, and social class. In both studies, estimated premorbid IQ scores were similar for the patient and control groups. As would be expected given the task's reliance on stable demographic characteristics as opposed to current performance, results were similar across patients with mild and moderate cognitive impairments.

O T H E R A S S E S S M E N T S
Three less frequently used assessments of premorbid intelligence were identified in the present review. First, a Portuguese version of the Vocabulary subtask of the Wechsler Adult Intelligence Scale III (WAIS-III) was investigated in two cross-sectional studies (Serrao et al., 2015;De Oliveira et al., 2014). In this task, participants are asked to provide definitions of a list of words. The two studies demonstrated that Vocabulary task performance was similar in patients with mild dementia and controls with either normal (Serrao et al., 2015) or low levels of education (De Oliveira et al., 2014).
Finally, the remaining two assessments focused on performance in the visuospatial domain rather Table 2.   (Wechsler, 1997). In addition, one of these studies assessed people's performance on a Block Design task from the WAIS-III (De Oliveira et al., 2014). Both Matrix Reasoning and Block Design draw on visuospatial problem-solving skills rather than knowledge acquired through past learning. It was found that participants with mild dementia scored significantly lower on all of these tasks compared with healthy individuals (Serrao et al., 2015;De Oliveira et al., 2014). The effect sizes of these group differences were large on all visuospatial tasks. There was thus no evidence to support the use of perceptual problem-solving tasks to estimate premorbid IQ in dementia.

Psychometric properties
Of the 13 studies evaluated here, only 1 reported reliability measures within the dementia patient group. Ashendorf and colleagues (2009) found that test-retest reliability for the irregular word reading WRAT-III task was .90 in the subgroup with Alzheimer's disease, indicating high stability of test scores across multiple measurements. No further statistics for test-retest or inter-rater reliability were provided for any of the other measures.

Quality assessment and risk of bias
All included studies were deemed to be of moderate (n = 11) to high quality (n = 2) as assessed with the AXIS tool (see Table 4). The criteria least frequently met were justification for the sample size (n = 13), representative participant selection (n = 13), and addressing and categorizing nonresponders (n = 12). These findings suggest that there was a risk of selection and nonresponse bias in the majority of the studies reported here.

Discussion
The main aims of the present review were to identify and evaluate instruments for estimating premorbid intelligence in people living with dementia. Our findings suggest that while a wide range of tools has been assessed for this purpose, evidence for their validity in patients with dementia is rather mixed. Furthermore, the lack of reliability testing across studies highlights the need for further information regarding the psychometric properties of the identified instruments. We will discuss the core findings and their implications for the assessment and diagnosis of dementia in clinical practice, and propose several directions for future research.

Stability of verbal task performance in early dementia
Of the 19 tools for estimating premorbid intelligence evaluated here, the vast majority consisted of verbal assessments. While a number of studies indicated that performance on word reading, lexical decisionmaking, or vocabulary tasks was unaffected by a diagnosis and/or severity of dementia, others reported significant differences between scores of healthy adults and patients groups or declining scores over time. Word reading tasks were most frequently investigated, with a total of nine different instruments being identified. However, it should be noted that the majority of the findings on word reading performance stem from only two studies, which both examined several different tasks (McCarthy et al., 2005;McFarlane et al., 2006). McCarthy et al. (2005) found that performance on the NART-R was reduced in moderate but not mild dementia, whereas there was no evidence for impairment on the WRAT-III and WRAT-R in either mild or moderate dementia. McFarlane and colleagues (2006) indicated that the performance of patients  McFarlane et al. (2006), participants with moderate dementia on average reported significantly fewer years of education than those with mild dementia and healthy controls. Furthermore, the total years of education were numerically higher for patients in the study by McCarthy and colleagues (2005). However, McFarlane et al. (2006) highlight that inclusion of education in the statistical analyses did not alter the pattern of results. The second possible reason for these conflicting findings is that, while some verbal abilities are presumed to be relatively resistant to dementia, it is improbable that they are entirely impervious to the condition. At the more severe end of the spectrum, we might therefore observe greater difficulties in completing verbal tasks. In line with this hypothesis, high performance on word reading tasks tended to be maintained in mild dementia, whereas patients with moderate cognitive impairments scored lower than controls on the NART, NART-R, WTAR, and CCRT (McCarthy et al., 2005;McFarlane et al., 2006;McGurn et al., 2004). In the two studies investigating Vocabulary tasks, no differences between patients and healthy adults were found (De Oliveira et al., 2014;Serrao et al., 2015). Importantly, only individuals with mild dementia were included in these studies, leaving open the question of whether the performance would be similarly preserved in patients with more severe cognitive impairments. Further testing of verbal tasks across the full range of disease severity is therefore essential to determine whether such assessments are suitable beyond the early stages of dementia.
Taken together, these findings are thus suggestive of an influence of disease severity and specific task demands on verbal premorbid intelligence scores in people living with dementia. In future research, it would be worth testing this hypothesis explicitly by (1) including participants with a wide range of scores on the MMSE or similar screening measures and (2) directly contrasting tasks which are based on a similar approach but may vary in difficulty, such as the NART and the WRAT. Crucially, in studies reporting significant differences between patients and controls on verbal tasks, the effect size tended to be medium to large. This suggests that inappropriate use of these tasks could lead to substantial underestimation of prior cognitive function, which would hamper the interpretation of neuropsychological assessments and consequently accurate diagnosis of dementia. Clarifying the cause of differences in performance across verbal tasks and establishing more firmly whether these measures are valid only in early dementia is therefore a critical next step in optimizing assessments of premorbid intelligence in patient groups.

Impact of language differences
Cultural and linguistic differences between the populations and tasks should also be taken into account when interpreting findings across studies. In this review, translations of English verbal tasks into German, Portuguese, Swedish, and Japanese were identified. For some of the word reading tasks, translating instruments which were originally developed in English was a nontrivial issue. Specifically, an instrument developed in Sweden had to rely on loan words due to an absence of irregular words (which form the basis of the NART, WTAR, and WRAT) in the Swedish language (Rolstad et al., 2008), and a Japanese adaptation used a different writing system (Kanji) (Matsuoka et al., 2006). As a consequence, it is possible that various translations of word reading tasks relied on different cognitive processes compared to the original English versions. For example, Kanji characters are perceptually highly complex and tend to have fewer phonemic factors than English written words. As a consequence, reading Kanji by guessing is difficult when the reader is not familiar with the word, and may rely on semantic processing to a greater extent than the reading of English irregular words (Matsuoka et al., 2006). Nevertheless, the absence of group differences between patients and healthy adults in any of these adaptations is encouraging, and suggests that such word reading tasks may hold promise as a measure of premorbid intelligence across a range of languages.

Nonverbal measures
Overall, there is thus preliminary evidence that language-based assessments may be suitable for estimating premorbid intelligence in dementia, although further research is needed to clarify the effects of disease severity and specific task differences. In addition, it should be noted that such measures are likely to be of limited use for people presenting with language variants of dementia (e.g. semantic dementia or primary progressive aphasia), as well as learning difficulties such as dyslexia. It is therefore worth considering the use of alternative, nonlinguistic assessments. However, research on such measures to date appears to be very limited. While two studies investigated patients' performance on visuospatial reasoning tasks (De Oliveira et al., 2014;Serrao et al., 2015), the authors highlight that these tasks were specifically included to demonstrate the deterioration of fluid intelligence in dementia compared with "crystallised" abilities such as lexical tasks. It was therefore unsurprising that impaired performance on these tasks was observed in people with dementia. In the future, it may be of interest to explore whether there are visual abilities that tend to be preserved in dementia and are good predictors of intelligence which can be exploited to devise suitable assessments for people with language difficulties. Alternatively, it would be possible to utilize demographic equations, which are entirely independent of people's current performance. Preliminary findings suggest that such equations tend to result in similar estimates of premorbid IQ in healthy adults and people with dementia (McCarthy et al., 2005;McFarlane et al., 2006). However, as only two studies were identified in the present review which used this method, we cannot make any strong claims regarding their global utility in the diagnosis of dementia. Furthermore, concerns raised in previous studies regarding limitations of this approach in accurately estimating high and low ranges of IQ (Goldstein et al., 1986;Griffin et al., 2002;Veiel and Koopman, 2001) remain to be addressed.

Implications and recommendations for clinical practice
As studies have rarely performed direct comparisons of the different measures presented here, an outstanding question is which of the various tasks is most suitable for application in people living with dementia. One study which contrasted performance on the NART, WTAR, CCRT, and STW suggested that the lexical decision-making STW task was the only measure on which no significant differences between groups were observed (McFarlane et al., 2006). The word reading tasks were all found to result in lower scores in people with mild Alzheimer's disease compared to healthy controls, although embedding the words within sentences (as in the CCRT) was associated with better performance in the mild patient group than presenting a list of words (as in the NART). However, given that no other studies identified in this review have investigated the English STW, replication is needed to confirm the superiority of the lexical decisionmaking task over irregular word reading measures in estimating premorbid intelligence. Further comparisons of word reading tasks were conducted by McCarthy and colleagues (2005), who investigated differences in scores on the NART-R, WRAT-R, and WRAT-III as well as estimates derived from the Barona demographic equation. While the authors suggest that all four measures showed similar stability relative to Full-Scale IQ scores obtained from the Wechsler Adult Intelligence Scale -Revised (WAIS-R), no statistical analyses were conducted to directly compare the performance of the individual instruments. Additional studies contrasting the efficacy of different measures are therefore needed to identify the most suitable measure for assessments of dementia.
For clinical practice, the utility of different tools not only depends on their accuracy in estimating cognitive decline, but also the resources it requires in terms of financial costs, time, and expertise. From a pragmatic point of view, versions of the languagebased NART have been investigated most extensively and are currently being used by many health professionals, with the NART-R having recently been re-standardized for the WAIS-IV (Bright et al., 2018). The NART-R may thus represent an up-to-date measure which can easily be implemented in clinical settings. However, as some crosssectional and longitudinal studies have disputed the stability of NART scores in dementia (Cockburn et al., 2000;McFarlane et al., 2006), task performance should be interpreted with caution. In particular, the NART may not be appropriate when assessing patients with moderately impaired cognition, as a number of studies indicated that NART performance may be affected when dementia has progressed beyond the mild stage. As the specific advantages and limitations of the reviewed word reading tasks remain to be established, we propose that other sources of information should ideally be taken into account when assessing premorbid functioning. One potentially promising approach may be to combine several tools in order to increase confidence in estimations of premorbid intelligence. For instance, demographic equations which are independent of current abilities and take little time to complete could likely complement word reading assessments. This method may offer a sensible provisional solution for clinical settings while further evidence for the validity of currently available measures is being acquired.

Methodological considerations
There are several methodological limitations which should be considered in relation to both the present review and the included papers. First, we were unable to carry out a meta-analysis to formally compare findings from the different studies due to the heterogeneity of tasks and populations assessed. In addition, most of the studies evaluated here applied a cross-sectional design, which can only offer limited insight into the presence and rate of decline on specific tasks associated with dementia. Moreover, the longitudinal studies included tended to focus on people who had already been diagnosed with dementia at the time of the first assessment. While such research can provide information regarding performance changes with the progression of dementia, it does not capture performance prior to disease onset. Additional longitudinal studies, particularly those following participants before onset of dementia, would be useful for improving our understanding of the validity of different instruments for measuring premorbid intelligence.
A strength of this review is the inclusion of both English and translated versions of premorbid intelligence assessments, which were investigated in eight different countries. However, nearly all of Premorbid intelligence assessment in dementia 1155 the included studies were conducted in countries with strong economies and educational systems. It was previously demonstrated that education is highly predictive of word reading, lexical decision, and vocabulary scores (Crawford et al., 1988b;Kosmidis et al., 2006;Starr et al., 1992;Walker et al., 2009), and it has been suggested that reading scores can be used as a proxy for quality of education (Manly et al., 2002). The validity of using verbal tasks to assess premorbid intelligence in dementia, however, has rarely been investigated in countries with fewer socioeconomic or educational resources. Only one study in the present review, which was based in Brazil, focused on participants with low levels of education (De Oliveira et al., 2014). Here, it was found that there were no significant differences between healthy adults and patients with Alzheimer's disease on a Vocabulary task. An untested possibility, however, is that these verbal tasks may underestimate cognitive abilities which are less strongly associated with educational background. Furthermore, it is unclear whether other, more frequently employed task types (e.g. irregular word reading) are useful for estimating premorbid intelligence in individuals with limited access to high-quality education. There is thus a clear need for more extensive testing of premorbid intelligence measures in low resource countries.
In addition, illiteracy presents a particularly important issue for the use of word reading assessments, as this inherently prevents accurate task performance even if cognition is unimpaired. According to recent estimates, approximately 750 million adults worldwide lack basic reading and writing skills (UNESCO Institute for Statistics, 2017). Exclusively relying on reading tasks as a measure of premorbid intelligence could therefore negatively affect many people across the world. In the broader neuropsychometry literature, some measures focusing on nonverbal skills have specifically been developed to measure intelligence in adults with low literacy (e.g. Ryan et al., 2008). However, such instruments are scarce and have not yet been validated in patients with dementia. As an alternative, it has been suggested that informant-based questionnaires may be useful for estimating premorbid intelligence in individuals with low educational levels (Apolinario et al., 2013). A drawback of this approach is that the estimated abilities are dependent on the accuracy of the information provided by the informant. As well as re-evaluating existing instruments across countries, it would therefore be valuable to develop novel objective measures which are less dependent on education and literacy.
Assessments of study quality and risk of bias indicated that the majority of studies did not satisfy criteria for sample size justification, participant selection, and nonresponse. It is therefore possible that results were influenced by lack of power and/or a selection bias. This particularly complicates interpreting findings of instruments which were only assessed in one study. Finally, in addition to assessing the validity of premorbid intelligence tasks in dementia, this review set out to collate information regarding the psychometric properties of available instruments. However, only one of the studies reviewed here, which investigated performance on the WRAT-III, reported test-retest reliability within the dementia patient group (Ashendorf et al., 2009). Reliability testing for other frequently employed tasks is therefore urgently needed.

Conclusion
Early detection and treatment of dementia are highly dependent on accurate information regarding premorbid functioning. The studies reviewed here demonstrate that there is a large number of tasks available for estimating premorbid intelligence, which are predominantly language-based. These verbal tasks appear to hold some promise for the assessment of people with mild dementia, but maybe unsuitable for individuals presenting with more severe cognitive impairments. Conclusions are limited by the fact that few tools have been investigated across multiple studies and direct comparisons of different instruments are rare. In addition, while there is some evidence supporting the use of verbal assessments across different languages, more extensive testing is needed to determine whether such measures are suitable for use in countries with lower socioeconomic and educational resources. We propose that, in clinical practice, it may be sensible to combine tools based on different mechanisms (e.g. word reading and demographic equations) in order to improve estimates of intelligence. In addition, longitudinal studies contrasting different measures would be valuable to confirm the validity of premorbid intelligence measures, and could thereby contribute to enhancing diagnostic procedures for people living with dementia worldwide.

Conflict of interest
None.

Description of authors' roles
M. Overman designed the study, collected and analyzed the data, and wrote the paper. S. Leeworthy assisted with data collection and analysis. T. Welsh was responsible for the supervision of the study and revising drafts of the paper.