Combining Cognitive Markers to Identify Individuals at Increased Dementia Risk: Influence of Modifying Factors and Time to Diagnosis

Abstract Objective: We investigated the extent to which combining cognitive markers increases the predictive value for future dementia, when compared to individual markers. Furthermore, we examined whether predictivity of markers differed depending on a range of modifying factors and time to diagnosis. Method: Neuropsychological assessment was performed for 2357 participants (60+ years) without dementia from the population-based Swedish National Study on Aging and Care in Kungsholmen. In the main sample analyses, the outcome was dementia at 6 years. In the time-to-diagnosis analyses, a subsample of 407 participants underwent cognitive testing 12, 6, and 3 years before diagnosis, with dementia diagnosis at the 12-year follow-up. Results: Category fluency was the strongest individual predictor of dementia 6 years before diagnosis [area under the curve (AUC) = .903]. The final model included tests of verbal fluency, episodic memory, and perceptual speed (AUC = .913); these three domains were found to be the most predictive across a range of different subgroups. Twelve years before diagnosis, pattern comparison (perceptual speed) was the strongest individual predictor (AUC = .686). However, models 12 years before diagnosis did not show significantly increased predictivity above that of the covariates. Conclusions: This study shows that combining markers from different cognitive domains leads to increased accuracy in predicting future dementia 6 years later. Markers from the verbal fluency, episodic memory, and perceptual speed domains consistently showed high predictivity across subgroups stratified by age, sex, education, apolipoprotein E ϵ4 status, and dementia type. Predictivity increased closer to diagnosis and showed highest accuracy up to 6 years before a dementia diagnosis. (JINS, 2020, 00, 1–13)


INTRODUCTION
Cognitive deficits are a defining feature of preclinical dementia, observable years or even decades before a clinical diagnosis (Boraxbekk et al., 2015;Elias et al., 2000;Rajan, Wilson, Weuve, Barnes, & Evans, 2015). While research has often focused on episodic memory, a broad range of cognitive domains show deficits during the preclinical phase (Bäckman, Jones, Berger, Laukka, & Small, 2005).
Although the predictivity of cognitive measures may be influenced by various factors, one factor that may modify the patterns of cognitive deficits in the preclinical phase is age. Previous research has shown that during early old age, those who develop dementia tend to show a pattern of cognitive deficits more closely associated with AD (episodic memory deficits), whereas during later old age a broader pattern of deficits, spanning multiple cognitive domains, is seen (Bondi et al., 2003;Stricker et al., 2011). As the older old are more often affected by mixed pathologies (Esiri et al., 2001), these differences may be driven by differences in dementia subtype. If the patterns of cognitive deficits differ between young-old and old-old persons, tailoring the cognitive tests used within dementia prediction models to suit the age group being targeted may be beneficial.
Evidence supports a link between low education and increased dementia risk (Beydoun et al., 2014;Sharp & Gatz, 2011). Furthermore, educational attainment is associated with level of cognitive performance, but not with rate of age-related cognitive decline (Berggren, Nilsson, & Lövdén, 2018). A recent meta-analysis by Opdebeeck, Martyr, and Clare (2016) showed a small-to-moderate effect of education on cognitive performance in later life, with this effect varying little over cognitive domains. How, or if, the level of education would affect the predictive value of tests from different cognitive domains is therefore worth investigating.
Which cognitive tests work best in prediction models may also differ between the sexes. For example, women are more at risk of AD-type dementia (Podcasy & Epperson, 2016;Seshadri et al., 1997) and, as mentioned above, different dementias have different cognitive profiles (Andriuta et al., 2018;Graham et al., 2004). There is also evidence suggesting that risk profiles for dementia are different for men and women (Artero et al., 2008). Within cognitive profiles, it is well documented that women perform better on verbal tasks and men on visuospatial and motor tasks (Li & Singh, 2014), with the female advantage on verbal memory tasks seemingly retained during early stages of dementia (Ferretti et al., 2018;Sundermann et al., 2016). These differences between sexes may potentially result in different cognitive profiles in the preclinical dementia phase for male and females.
Carrying the ϵ4 allele of the apolipoprotein E (APOE) gene is a strong risk factor for AD (Corder et al., 1993;Raber, Huang, & Ashford, 2004;Roses & Allen, 1996). It has been linked to impairment in global cognition, episodic memory, and executive functioning (Wisdom, Callahan, & Hawkins, 2011) as well as steeper rates of decline in episodic memory and perceptual speed (Knopman, Mosley, Catellier, & Coker, 2009;Salmon et al., 2013). However, detangling the effects of being an ϵ4 carrier from that of being in a prodromal phase of AD has been difficult. The influence of APOE on specific cognitive domains and on dementia risk indicates that the most predictive domains may differ depending on APOE status, with episodic memory most likely to be a stronger predictor for those with an ϵ4 allele compared to those without.
Longitudinal studies mapping rates of decline suggest that some cognitive domains start to decline earlier and some later in relation to the dementia diagnosis, and they may follow different trajectories (Cloutier, Chertkow, Kergoat, Gauthier, & Belleville, 2015). Thus, different cognitive domains may be more or less useful in predicting future dementia depending on time to diagnosis. The effect of time to diagnosis on dementia prediction models has not been comprehensively explored. Few studies have observed cognitive deficits more than 10 years before diagnosis, and results are mixed regarding which cognitive domain is most affected early in the preclinical phase (Amieva et al., 2008;Elias et al., 2000;Rajan et al., 2015). Amieva et al. (2008) observed significant differences in a test of category fluency between groups of preclinical dementia and cognitively normal participants 12 years before diagnosis. In terms of predictive ability, Elias et al. (2000) found that tests of episodic memory and abstract reasoning were significantly predictive of dementia at least 10 years before diagnosis. Similarly, Rajan et al. (2015) found significant predictive ability of tests of episodic memory and executive function between 10 and 18 years before diagnosis.
In this study, we have access to dementia diagnoses up to 12 years after baseline assessment. We aim to test which cognitive markers are significantly more predictive of future dementia over a model of demographic factors, providing a more stringent test of the models' clinical usefulness. Moreover, we will test the predictive ability of different markers and combinations of markers at different distances to diagnosis within the same individuals, enabling a proper test of the role of time to diagnosis in dementia prediction.

Participants
Data were collected from participants involved in a longitudinal population-based study, the Swedish National 786 N.M. Payton et al.

Study on Aging and Care in Kungsholmen (SNAC-K).
Baseline assessment was conducted on 3363 individuals, belonging to specific age cohorts. Older age groups (≥78 years) were re-examined every 3 years and younger age groups (60-72 years) every 6 years. The assessment at each wave consisted of a nurse interview, a medical examination, and neuropsychological testing. Of the original sample, 2848 underwent cognitive testing. Due to exclusion and dropout, follow-up data were available for 2357 participants. Of those, 1733 remained dementia-free, 246 developed dementia (of whom 36 were diagnosed from death certificates and medical records), and 378 died during the 6-year follow-up ( Figure 1). The main sample analyses focused on predicting dementia up to 6 years later, as results from the 12-year analysis provided very few significant predictors and could not be used for model building. The 6-year follow-up also contained a larger number of participants and therefore allowed for subsample analyses.

Time-to-diagnosis sample
A select sample of the SNAC-K participants had cognitive data available at baseline, 6-year, and 9-year follow-ups, with dementia diagnosis performed at 12 years. Only those with data at all time points were included. Everyone who developed dementia or died at or before 9 years were excluded and only those who developed dementia or died between the 9-and 12-year follow-ups were included in the dementia and dead outcome categories, respectively. This subsample included 407 participants, of whom 284 remained dementia-free, 48 were diagnosed with dementia at 12 years (of whom 4 were diagnosed from death certificates and medical records), and 75 died during the same period ( Figure 2).

Ethical considerations
All stages of SNAC-K have been approved by the Karolinska Institutet's ethical committee or the regional ethical review board, and written informed consent was collected from all participants. In cases where participants had severe cognitive impairment, a proxy was asked for consent.

Dementia diagnosis
Dementia diagnoses were made according to the Diagnostic and Statistical Manual of Mental Disorders, 4 th edition (DSM-IV, 1994). The procedure consisted of three steps. A preliminary diagnosis was made by the examining physician, followed by a secondary diagnosis based on computerized data from the medical examination. In cases of disagreement, a final decision was made by a senior physician. A differential diagnosis of AD was made according to the NINCDS-ARDRA criteria (McKhann et al., 1984). The clinical cognitive assessment used for diagnosis included the Mini Mental State Examination (MMSE; Folstein, Folstein, & McHugh, 1975), the Clock test (Manos & Wu, 1994), and items regarding memory, executive functioning, problem-solving, orientation, and interpretation of proverbs. The cognitive test battery investigated in this study was not used for diagnostic purposes. For those who died before receiving a dementia diagnosis in SNAC-K, death certificates and medical records were reviewed to identify additional dementia cases.

Cognitive Assessment
Episodic memory was assessed by presenting a word list of 16 unrelated nouns with a new word appearing every 5 s (Laukka et al., 2013). This was immediately followed by a 2-min free-recall task and number of words correctly Sample at baseline (n = 3363) Exclusions at 6 years Dropouts before 6-year follow-up (n = 336), and uncertain dementia diagnosis at 6 years (n = 1)

Young-old sample (< 78 years) (n = 1434)
No dementia (n = 1278), dementia (n = 38), and dead (n = 118) Predicting dementia with cognitive markers 787 remembered was recorded. Word recognition was assessed with an untimed list of 32 nouns, including the original words and an equal number of distractors, where recognition reflected number of hits minus number of false alarms. Two semantic memory tasks were administered. The general knowledge task consisted of 10 moderately difficult questions, and participants were asked to pick the correct answer from two alternatives (Dahl, Allwood, & Hagberg, 2009). The vocabulary task involved matching a target word to the correct synonym among five alternatives (Dureman, 1960;Nilsson et al., 1997). In both tasks, semantic memory was measured as number of correct answers.
Verbal fluency was assessed with letter and category fluency. These tasks involved generating as many words as possible within 60 s, either starting with the letters "F" and "A" (letter fluency) or belonging to the categories "animals" and "professions" (category fluency). The fluency measures were derived by averaging the total number of words produced within each task.
Three tasks were used to assess perceptual speed. Digit cancellation (Zazzo, 1974) comprised 11 rows of random digits and participants were required to mark the target number (4) whenever they encountered it during 30 s. Pattern comparison (Salthouse & Babcock, 1991) consisted of pairs of basic line constructs; 30 s were given to mark the pairs as "same" or "different". The average number of correct answers was calculated from two trials. Trail Making Test (TMT) part A (Lezak, 2004) involved connecting 13 encircled digits in numeric order as fast and accurately as possible. Time to complete the task constituted the TMT-A score, although time was only taken for those who completed the task correctly, with maximum one careless connection.
Executive function was measured using TMT-B (Lezak, 2004). In this task, circles with numbers and letters were connected based on numeric and alphabetical order, alternating between the two categories (1-A, 2-B, etc.). Similar to TMT-A, time was only taken for those who completed the task correctly, or had maximum one careless connection.

Statistical Analyses
All statistical analyses were conducted in IBM SPSS 23. Baseline differences between incident dementia and nodementia groups were determined using χ 2 tests for dichotomous variables and ANOVAs for continuous variables.
Multinomial logistic regressions were employed to investigate how well various markers, or combination of markers, predicted future dementia, with three outcomes possible: no dementia (reference group), incident dementia, and death. The third outcome was included to take into account mortality as a competing event. However, as the outcome of interest was dementia, only data from the reference and incident dementia groups are reported in this study. Age, sex, and education were included as covariates in all models, except during subgroup analyses where the focus was sex, age, or education, and all variables were entered simultaneously. To determine which marker or combination of markers best predicted future dementia, the receiver operating characteristics (ROCs) were calculated using the estimated probabilities from the multinomial logistic regressions. The area under the curve (AUC) values thus represent the predictivity of each model, consisting of the covariates and one or several cognitive predictors. The predictive value of individual variables was determined first. Significant individual measures with the highest AUC value within their cognitive domain were entered into subsequent models. This reduced the number of variables and addressed issues of collinearity. The threshold for statistical significance was set to p < .05.
Models were created by starting with the best cognitive predictor (based on AUC value) and adding a second variable, systematically testing all available combinations. The two-variable model with the highest AUC was then used as the base for testing a possible three-variable model using the same method. When no predictor added further unique variance, this was considered the final model. The statistical significance of AUC changes between models was assessed using DeLong's tests (DeLong, DeLong, & Clarke-Pearson, 1988). The DeLong's tests allow for increases in predictivity to be evaluated in a statistical manner, where a nonsignificant result would indicate that the addition of further tests is not improving predictivity in a significant way. The Bayesian information criterion (BIC) was used as a measure of model fit, an increase in BIC denotes a worsening of model fit and potential overfitting of the data leading to artificial increases in predictivity.
All continuous variables were standardized and all scores where a higher value was related to a decreased risk were reversed so that odds ratios (ORs) represent increased risk per SD unit change in the predictor.

Subsample analysis
In the subsample analyses, prediction for dementia was again set to up to 6 years before diagnosis, with further categorization made for age, sex, educational level, APOE ϵ4 status, and AD-type dementia. The "old-old" group was ≥78 years and the "young-old" was <78 years old at baseline. High education was defined as those who had attended high school ("gymnasium") or above, whereas low education included those with maximum 9 years of education. APOE ϵ4 status was a binary subgrouping of carrying at least one ϵ4 allele or no ϵ4 allele. The number of subjects across subgroups is provided in Table 1.

RESULTS
Descriptive statistics across the main sample groups are shown in Table 1. For descriptive statistics of the time-todiagnosis sample, see Table 2. Persons who developed dementia were older, had fewer years of education, and had lower MMSE scores at baseline compared to the no-dementia group in all samples.

Main sample
Results from multinomial logistic regressions and ROC analyses showed that the category fluency (verbal fluency) task was the strongest individual predictor of future dementia up to 6 years later (Table 3). This was followed by word recall (episodic memory), pattern comparison (perceptual speed),  TMT-B (executive function), and vocabulary (semantic memory), respectively, as the best predictors in their specific domains. These tests were then entered into combined models. Adding further variables from other cognitive domains generally increased predictivity (Table 4). The model starting with the strongest individual predictor (category fluency, AUC = .903) was most improved by adding word recall or pattern comparison (AUC = .907), although the increase in predictivity compared to the one-variable model was not significant (DeLong's test, p = .57). The final model included three predictors, that is, category fluency, word recall, and pattern comparison, and represented a significant increase in predictivity from the one-and two-variable models (AUC = .913, p = .002). The final model achieved a sensitivity of 48.6%, specificity of 98.4%, and an accuracy value of 92.9% in predicting future dementia. All models performed significantly better than a model including only the covariates (p < .001).
Repeating the analyses in different subgroups, similar patterns emerged. The strongest predictor among the old-old group was category fluency, with a final model of category fluency, word recall, and TMT-B. While for the young-old group, category fluency and number cancellation were equally predictive, the final models included both of these variables and word recall (supplementary Table S1). The same strongest predictor (category fluency) and pattern of domains (verbal fluency, episodic memory, and perceptual speed) were apparent in a female only sample, while for   Table S2). Category fluency was the strongest individual predictor for both high-and loweducated subgroups, with tests of verbal fluency, episodic memory, and perceptual speed present in the final models (supplementary Table S3). For those carrying at least one ϵ4 allele, word recall was the strongest individual predictor, although the most predictive model once again included tests of episodic memory, verbal fluency, and perceptual speed. For APOE ϵ4 noncarriers, episodic memory was less important as an individual predictor, but revealed the same pattern of cognitive tests for the final model as all previous analyses (supplementary Table S4). For those who would develop AD-type dementia, category fluency and word recall performed equally well. A final model included tests of category fluency, episodic memory, and perceptual speed (supplementary Table S5). Sensitivity analyses were performed removing individuals diagnosed with depression, using the International Statistical Classification of Diseases and Related Health Problems -Tenth Revision (ICD-10; n=95), or using anticholinergic drugs (n = 114) at baseline. This did not change the patterns of results (data not shown).

Time-to-diagnosis sample
In the sample with 12 years of follow-up, word recall, vocabulary, general knowledge, category fluency, pattern comparison, and TMT-B were all significant predictors of future dementia 12 years later, with pattern comparison being the most predictive test (AUC = .686). As in the main sample analyses, category fluency was the most predictive individual test (AUC = .733) 6 years before diagnosis. Three years before diagnosis, category fluency was again the strongest predictor (AUC = .781) (supplementary Table S6).
Twelve years before diagnosis, no additional predictors could be added to the model including pattern comparison (AUC = .686). Models 6 years before diagnosis built upon category fluency and TMT-B (AUC = .748) to arrive at a final two-variable model, while 3 years before diagnosis yielded category fluency and TMT-A as a two-variable model (AUC = .794), with the addition of word recall for a final three-variable model (AUC = .814; see Table 5). Twelve years before diagnosis, none of the individual variables were significantly more predictive than a model containing covariates only. However, 6 years before diagnosis, category fluency alone (p = .01) and the two-variable model with category fluency and TMT-B (p < .05) performed better than the covariate model. Three years before diagnosis, all models were significantly more predictive than a model of covariates (p < .001). Comparing the final models, there was no significant difference in predictivity from 12 to 6 years before diagnosis. However, from 12 to 3 years (p = .001), and from 6 to 3 years (p = .021), there was a significant increase in predictivity for the final models.
Further analyses removing those with depression (ICD-10 diagnosis, n = 13) and those using anticholinergic drugs (n = 23) at baseline did not change the patterns of results. Predicting dementia with cognitive markers 791

DISCUSSION
The present study demonstrates that the addition of tests from multiple cognitive domains increases predictivity of future dementia within 6 years. The cognitive domains that found most useful in predicting dementia were verbal fluency, episodic memory, and perceptual speed. Models containing any combination of these domains consistently performed well in predicting future dementia both in general and over a range of modifying factors, such as age, educational level, sex, ϵ4 status, and dementia type. Furthermore, predictivity of the cognitive markers increased closer to diagnosis.

Individual Predictors
While episodic memory is often purported as being an especially good predictor of dementia, this study adds to the literature supporting relative homogeneity among cognitive predictors (Chen et al., 2000). None of the tests included in the modeling differed significantly from one another in this regard. This homogeneity in predictivity reflects deficits over a wide range of cognitive domains, which can be observed in preclinical dementia and AD (Bäckman et al., 2005;Economou, Papageorgiou, Karageorgiou, & Vassilopoulos, 2007). This may stem from wide-ranging brain changes in both the hippocampus and beyond, during the preclinical phase (Twamley et al., 2006). That said, these changes may affect different cognitive domains differently, with episodic memory being more linked to the hippocampus (Burgess, Maguire, & O'Keefe, 2002) and therefore affected to a greater degree by AD pathology, whereas perceptual speed has been more linked to white matter damage and vascular pathology (Penke et al., 2010;Prins & Scheltens, 2015). A possible reason why tests from a range of domains were predictive of future dementia is that the majority of the persons with dementia were likely to be mixed cases (Esiri et al., 2001). Addressing this issue by redoing the analyses in different subsamples rendered some support to this hypothesis. Analyzing the subsamples where the etiology suggested AD or where the persons carried at least one ϵ4 allele indicated greater importance of word recall as a predictor of dementia. This is consistent with episodic memory being an early marker of AD (Backman, Small, & Fratiglioni, 2001;Elias et al., 2000) and with APOE being a risk factor especially for AD-type dementia (Roses & Allen, 1996). Carrying an ϵ4 allele has been associated with poorer episodic memory performance  and faster episodic memory decline (Knopman et al., 2009;Salmon et al., 2013). The presence of the same three cognitive tests across the final models suggests robustness of the domains of verbal fluency, episodic memory, and perceptual speed, in predicting future dementia.
Despite the relative homogeneity among the cognitive predictors, category fluency was consistently a good predictor. This may be due to the fact that this task has a broad neural base, covering frontal, parietal, and temporal regions (Baldo, Schwartz, Wilkins, & Dronkers, 2006;Gourovitch et al., 2000), and is thus likely to be affected by several aspects of dementia pathology, making it a good predictor of both AD and VaD. As with previous studies (Clark et al., 2009), this study found better predictivity of category fluency over letter fluency. It should be noted that category fluency was identified as the strongest predictor both in the main analyses and in the time-to-diagnosis analyses (6 years before diagnosis). Due to the difference in follow-up length, the time-to diagnosis sample represents a different but overlapping group of individuals, which may be considered a replication in a semi-independent sample as different individuals are included in the incident dementia group.
Although not statistically significant, the patterns of cognitive impairment across domains suggest a prediction gap between the more predictive domains of verbal fluency, episodic memory, and perceptual speed, and the less predictive domains of semantic memory and executive function. Similar patterns have been observed previously (Belleville et al., 2017). Semantic memory has been shown to be relatively well preserved in the early dementia stages, possibly as impairment runs along a fluid/crystallized spectrum, with fluid abilities being impaired earlier in disease progression and crystallized abilities later (McDonough et al., 2016;Thorvaldsson et al., 2011). As semantic memory showed poor predictivity of future dementia, the results from this study support this idea.
Executive functioning also performed relatively poorly as a predictor in this study, despite belonging to the fluid cognitive domain. The benefit of executive function as a predictive domain has been suggested to be lower than other cognitive domains, such as episodic memory and verbal fluency (Belleville et al., 2017). However, executive function spans a wide range of cognitive abilities and some aspects, for example, updating of representations in working memory, not addressed by the TMT-B test, may be more predictive than others. Due to missing data, TMT-B included fewer participants and, as the AUC is sensitive to sample size, the results are not directly comparable. A subanalysis, including only those who completed all tests, showed that predictivity for executive function was comparable to the other more predictive domains once sample size-related AUC issues were removed (supplementary Table S7). This, alongside its presence in the 6-year final model of the 12-year analysis, suggests that executive function may still be a useful predictor of future dementia.
The similarities in predictivity among the cognitive domains could be extended to subsamples of different age, sex, and educational level. As mixed pathology is most common in the oldest old (Esiri et al., 2001), cognitive domains with a broader neural basis, such as category fluency, might be expected to be affected to a greater degree in the old-old, whereas in the young-old the presence of mixed pathology would be less likely to have an influence. Moreover, female gender and higher educational level have been associated with better cognitive performance in non-demented aging.
In regard to education, cognitive performance may reflect educational level as well as disease-specific pathology (Kawano et al., 2010). However, our results suggest that patterns of predictors did not change due to high or low educational level, and that the same tests were useful as predictors of future dementia across different levels of education.
It has been shown that women are at greater risk of AD-type dementia (Podcasy & Epperson, 2016;Seshadri et al., 1997). However, in the current study, the pattern of cognitive predictors did not vary greatly by type of dementia. This difference may be larger in a sample consisting or more pure dementia types. There was some influence of sex on which cognitive test was found to be the most predictive. This may reflect long-standing differences in cognitive ability between men and women on tasks, such as verbal or spatial abilities (Li & Singh, 2014). However, the final models for both sexes included tasks of verbal fluency, episodic memory, and perceptual speed, suggesting that these domains are good predictors of future dementia regardless of sex.
Due to the sensitivity of the AUC measure to sample size, the results from the subsamples cannot be directly compared. However, the overall pattern suggests that the same cognitive markers that showed the highest predictivity in the full sample worked equally well regardless of these sample characteristics, with domains of verbal fluency, episodic memory, and perceptual speed showing consistently good predictivity.

Combined models
Previous research has shown that models of combined neuropsychological tests can improve predictivity (Belleville, Fouquet et al., 2014;Palmer et al., 2003) and may perform equally as well as multimodal models including also biological markers (Gomar et al., 2011;Payton et al., 2018). However, the frequent lack of statistical testing when comparing models in previous studies makes it hard to determine the added value of including additional cognitive markers.
In the present study, we were able to show that combining cognitive tests between multiple domains significantly increased predictivity of future dementia, above models of covariates and individual predictors. One reason why combining across domains may be useful is because cognitive tasks are differentially sensitive to different types of dementia pathology. In line with the results of this study, a task of episodic memory specifically combined with tasks of either verbal fluency (Artero, Tierney, Touchon, & Ritchie, 2003;Small, Herlitz, Fratiglioni, Almkvist, & Backman, 1997) or perceptual speed (Chapman et al., 2011;Jungwirth et al., 2009) has been shown to increase predictivity to the largest degree. As noted, executive function was not a high-performing predictor in this study; however, there is a strong research basis to suggest that it is a domain which also performs well together with tasks of episodic memory (Albert et al., 2001;Chen et al., 2000).
While the current data allow for some conclusions regarding which domains and tests may be the best predictors in the early detection of dementia, note that not all cognitive tests within a certain domain are equally useful in predicting dementia. It cannot be discounted that variation among tests can be partially responsible for variations among results between studies. For example, within the episodic memory domain, tasks of word recall are typically better predictors than word recognition (Belleville et al., 2017). Tests which are categorized as the same cognitive domain may also be assessing different aspects of that domain, for example, verbal fluency. Category fluency and letter fluency have been shown to have a different neural basis drawing on semantic and phonologic aspects, respectively (Gourovitch et al., 2000). The exact tests chosen will therefore have an effect on predictivity of the models.
Despite good prediction values, it should be noted that sensitivity of cognitive tests for predicting future dementia was lower than ideal in this study. In line with previous studies, specificity and accuracy of cognitive tests tend to be higher than specificity (Belleville et al., 2017). It has been shown that the addition of biological markers to cognitive tests can increase sensitivity, as is the case with cerebral spinal fluid (CSF) (Mazzeo et al., 2016) or MRI (Peters, Villeneuve, & Belleville, 2014) markers. Nevertheless, cognitive testing constitutes a quick and inexpensive way of first identifying at-risk individuals on a large scale before further, more expensive or invasive testing, such as CSF or MRI markers are used.

Time-to-diagnosis
In the time-to-diagnosis sample, most cognitive tests and combined models were significantly more predictive closer to a diagnosis of dementia. This is consistent with earlier findings which show an increase in predictive value or an increase in magnitude of the effect as the dementia diagnosis approaches (Boraxbekk et al., 2015;Elias et al., 2000;Gomar, Conejero-Goldberg, Davies, & Goldberg, 2014;Rajan et al., 2015). Conceivably, cognitive deficits become more pronounced closer to diagnosis due to a continuing worsening of the underlying disease pathology as the disease progresses. While the results are in keeping with previous studies, these studies have typically used different groups of dementia cases in their time to dementia classification. In contrast, this study followed the same individuals over an extended period with a narrow time window for testing, allowing for a direct comparison within individuals at specific times before diagnosis.
Twelve years before diagnosis, perceptual speed was the most predictive individual test of dementia. While few studies have been conducted covering such an extended time period, there are studies showing the possibility to predict dementia far in advance (Boraxbekk et al., 2015;Elias et al., 2000). Tests of verbal fluency, specifically category fluency, (Amieva et al., 2008), episodic memory (Elias et al., 2000;Rajan et al., 2015), and executive function (Rajan et al., 2015) have all been found to predict dementia over a decade Predicting dementia with cognitive markers before diagnosis. In line with these findings, tests representing these domains were found to predict future dementia 12 years before diagnosis (supplementary Table S6), although the strongest predictor observed in this study was pattern comparison (representing perceptual speed). However, applying more stringent testing using the DeLong's test, which most previous studies have not applied, none of the individual tests performed significantly better than a model of covariates in predicting future dementia, suggesting that 12 years before diagnosis may be too far to accurately predict dementia in the population. That significantly predictive models can be made 6 years, but not 12 years, before diagnosis suggests that neuropsychological tests can be used to accurately predict dementia during the final 6 years before dementia but further away from diagnosis predictions become more uncertain.
Although it was clear that the cognitive markers were better predictors of dementia closer to diagnosis, the relative importance among the predictors did not appear to substantially change over time. It has been shown that the rate of cognitive decline in the preclinical phase differ across cognitive domains (Cloutier et al., 2015;Grober et al., 2008;Thorvaldsson et al., 2011), suggesting that different domains may be more or less useful for predicting future dementia depending on time until diagnosis. However, using the scores from single points in time (as opposed to investigating rate of decline), this study rather points to the same domains being the most predictive of future dementia. While the best cognitive domain predictors altered depending on time to diagnosis, one or more tests of verbal fluency, episodic memory, and perceptual speed were once again present in each final model, suggesting a robustness of these domains for predicting future dementia regardless of time until diagnosis.

Strengths and limitations
A major strength of the current study is the population-based sample, making the results generalizable outside a clinical setting. Also, the use of tests to diagnose dementia which were not included as predictors reduces circularity in diagnosis. Alongside this, there was a strict test of time to diagnosis within the same individuals. A potential limitation is that the results from the main and subsamples were not directly comparable, and the time-to-diagnosis sample was smaller and more selective compared to the main sample.
The focus on preclinical cognitive deficits regardless of mild cognitive impairment (MCI) status renders the study more inclusive. However, it also means some of those classified as dementia-free at follow-up may have had MCI, resulting in lower prediction accuracy than may have been achieved if comparing cognitively normal and dementia groups.

Implications
Our results show that combining cognitive tests across multiple domains significantly increases the ability to predict future dementia, and that the same cognitive tests and combinations of test are predictive of future dementia across a number of subgroups, such as age, educational level, sex, APOE status, or dementia type. Alongside the fact that patterns of the most predictive tests remained stable both further and closer to diagnosis of dementia, this suggests that tests of verbal fluency, episodic memory, and perceptual speed can be widely used as screening tools to detect individuals with increased dementia risk in the general population.