Phonological fluency norms for Spanish middle-aged and older adults provided by the SCAND initiative (P, M, & R)

Abstract Objective: Verbal fluency tests are quick and easy to administer neuropsychological measures and are regularly used in neuropsychological assessment. Additionally, phonological fluency is a widely used paradigm that is sensitive to cognitive impairment. This paper offers normative data of phonological verbal fluency (letters P, M, R) for Spanish middle- and older-aged adults, considering sociodemographic factors, and different measures such as the total number of words, errors (perseveration and intrusions), and 15 sec-segmented scores. Method: A total of 1165 cognitively unimpaired participants aged between 50 and 89 years old, participated in the study. Data for P were obtained for all participants. Letters M and R were also administered to a subsample of participants (852) aged 60 to 89 years. In addition, errors and words produced every 15 seconds were collected in the subsample. To verify the effect of sociodemographic variables, linear regression was used. Adjustments were calculated for variables that explained at least 5% of the variance (R 2 ≥ .05). Results: Means and standard deviations by age, scaled scores, and percentiles for all tests across different measures are shown. No determination coefficients equal to or greater than .05 were found for sex or age. The need to establish adjustments for the educational level was only found in some of the measures. Conclusions: The current norms provide clinically useful data to evaluate Spanish-speaking natives from Spain aged from 50 to 89 years. Specific patterns of cognitive impairment can be analyzed using these normative data and may be important in neuropsychological assessment.


Introduction
Verbal fluency (VF) involves expressive language abilities, storage of language knowledge, and executive functions. VF tests are typically brief assessment instruments that permit the evaluation of these cognitive processes with simple administration and scoring procedures (Lezak et al., 2012;Lezak et al., 2012;Pekkala, 2012;Strauss et al., 2006). Due to their high sensitivity to neurological damage, they are widely used in clinical evaluation and research in various areas such as neuropsychology, speech therapy, linguistics, and medicine (see for example: Catani et al., 2013;Faroqi-Shah & Milman, 2018;Herbert et al., 2014).
Both in clinical practice and in research, local normative data are essential for the correct interpretation of the data obtained by a person in one or several neuropsychological test/s, no matter whether the objective is diagnosis or elaboration of a judgment about their cognitive state. In the clinical setting, a patient is compared with a group of people who have the same sociodemographic characteristics that he/she has. If the main purpose is this and it is necessary to choose some reference standards, the first ones that should be rejected are those that have been developed with the smallest sample size (Mitrushina et al., 2005). Those that were developed perhaps more than a decade ago, or that may be biased by cultural differences, or by other factors (low educational level, etc.) should be rejected too.
Normative studies provide what has been called "clinical comparison data" (Mitrushina et al., 2005), which represents the range of performance on a test of different groups characterized by medical, psychiatric, and/or neurological criteria, who present homogeneous demographic traits.
Previous initiatives have provided norms for phonological fluency (PF) tests in different populations of Spanish-speaking natives from Spain: Peña-Casanova et al., (2009) for community-dwelling cognitively normal adults (n = 346), ranging from 50 to 94 years of age, used the letters P, M, and R; Casals-Coll et al. (2013) for a younger population, adults between 18 and 49 years old (n = 179), used the letters P, M, and R; and Lubrini et al. (2022) that assessed participants from 17 to 100 years old (n = 257), utilized the letters F, A, and S. If we consider what has been mentioned before about the choice of reference normative data, it is evident that the studies carried out with a small sample (Casal-Coll et al., and Lubrini et al.) would be the first to discard. Considering the date of publication, the work by Peña-Casanova et al., is 14 years old, so it could be considered outdated. Furthermore, it is important to note that in Peña-Casanova et al., study, they used an overlapping strategy to distribute their sample (n = 346) across 10 groups aged from 55 to more than 80 years, thus artificially increasing the number of cases in each age range. It is evident that updated norms with larger samples are necessary for both Spanish young and older adult populations.
According to the statistical data of the National Institute of Statistics (INE) of Spain, elderly people now represent approximately 19.5% of the total population. The mean age of the population stands at 43.81 years when in 1970 it was 32.7. According to the INE projection, in 2035 there could be more than 12.8 million older people, 26.5% of the total population (Pérez et al., 2022). The need for valid and accurate normative data is especially important for older people, as this group is at special risk of cognitive impairment or dementia. As a result, it is necessary to develop global programs and increase resources focused on promoting prevention and early diagnosis. In this context, normative studies conducted by the SCAND initiative (www. scandcognition.org), such as the one presented here for PF with a sample of middle-aged and older adults, make sense.
Due to its importance, it is necessary to mention other precedents in the international sphere. For example, the Neuropsychological Norms for the US-Mexico Border Region in Spanish (NP-NUMBRS) project (Rivera Mindt et al., 2021), and specifically, for its relation to the present study, the normative data published on VF . It should also be noted the Mayo Clinic initiative that started in the 90s (Ivnik et al., 1996) and has provided norms for the most important neuropsychological test in different populations (see for example: Lucas et al., 2005).
Numerous studies have pointed out the influence of sociodemographic variables as possible moderators of VF, with age, gender, and education being the most studied (Henry & Phillips, 2006;López-Higes et al., 2022;Mathuranath et al., 2003). Some authors have reported that men perform worse than women in PF tasks (Loonstra et al., 2001), whereas many studies have not shown such differences (Costa et al., 2014;Khalil, 2010;Kozora & Cullum, 1995;Peña-Casanova et al., 2009;Tombaugh et al., 1999). Some studies have found the different effects of age when comparing semantic fluency (SF) and PF (Santos Nogueira et al., 2016;Tombaugh et al., 1999), whereas others did not (Khalil, 2010;Loonstra et al., 2001). However, the impact of the educational level on the performance of VF tasks has been widely recognized in neuropsychological research (Lubrini et al., 2022;Peña-Casanova et al., 2009). It has been found that higher education levels are associated with the production of more words (Oberg & Ramírez, 2006). The significant effect of educational level on PF could be related to the fact that these tasks are more demanding than semantic tasks and more sensitive to executive dysfunction (Shores et al., 2006). Formal education may increase vocabulary and consequently greater verbal lexical retrieval capacity (Henry & Phillips, 2006). In fact, education is the major factor that contributes to the performance in PF.
The performance on a PF test is usually evaluated by the total number of correct words given within the time limit (Lezak et al., 2012;Pekkala, 2012;Strauss et al., 2006). However, this score provides little information about the cognitive processes underlying fluency performance, thus some authors have proposed additional measures, such as the error types in PF, which typically include perseverations (repetition of the same correct word) and intrusions (words with another initial letter) (Thiele et al., 2016). Although errors are relatively rare in normative data, investigating the number and types of errors is useful in research and clinical practice as it is not only the decline in response counts that indicates pathology, but also alterations in performance patterns.
Research suggests that different mechanisms probably underlie perseverations and intrusions. Perseverations have been linked to a frontal lobe dysfunction characterized by intellectual rigidity and inability to shift mental sets (Miller & Cohen, 2001) which is common in neurologic disorders. By recording the number of perseveration errors, one might have more information about the status of the central executive component of working memory.
Intrusions on a PF task could be executive errors (e.g., forgetting the rules, losing the set, and using rules from a different fluency trial; McDowd et al., 2011) or phonetic/spelling related (a word with a similar sound but incorrect letter; Rofes et al., 2019). These errors have been studied in different pathologies such as brain injury, Parkinson's disease, or Alzheimer's disease (Smith et al., 2020), suggesting that intrusions are typical of senile dementia of the Alzheimer's type and may help distinguish it from other causes of dementia. Regarding the clinical usefulness of the error patterns analysis, it is important to remember that it is better to interpret them within the context of the whole neuropsychological assessment.
In some studies, time has also been considered. In these cases, the number of words generated during four segments of 15 seconds each has been evaluated. In general, participants produce most of the words in the early stages of the task (first 30 seconds) using a semi-automatic rapid retrieval process. Cognitive demand is not uniform throughout the PF task since, as time progresses, lexical retrieval becomes more difficult, and therefore fewer words are produced in the final moments of the task (Venegas & Mansur, 2011). Furthermore, it has been found that the educational level of the participants has a positive effect on the first-time segments in both PF and SF, while age was not significant (Venegas & Mansur, 2011).
Several studies in languages other than English have aimed to adapt their letter sets so they might pose a similar difficulty to "F, A, and S" in English. For instance, the "P, M, and R" set has been proposed as more appropriate to be used with Spanish speakers for several reasons (Fortuny et al., 1998). First, words beginning with "F" are rare in Spanish. Although words beginning with "A" are common, the starting "HA" (as in "hábito", habit) is also very frequent, which may be disadvantageous for people with low levels of literacy since the letter H is silent in Spanish. Finally, the "S" sound may be confusing in some regions of Spain and Latin America ("C" in sequences "CE/"CI", -as in "celebración", celebration-, and "Z" in sequences "ZA/ZO/ZU", -as in "zapato", shoe-, are pronounced like "S"), which again poses a disadvantage for people with low literacy.
The availability of normative data on PF, adjusted for age, education, or sex, can help in the early detection of cognitive impairment and the measurement of clinically significant changes.
The main purpose of this paper is to offer updated normative data of PF (P, M, R, and P þ M þ R) for Spanish-speaking middleaged and older adults natives from Spain (over 50 years old), considering sociodemographic factors (age, education, and sex), which have been elaborated with a large sample. Instead of providing only normative data for a single outcome measure (the total of words evoked) as other previous studies did, a novel contribution of the present work is the inclusion of both errors and word production during 15-second segments as additional measures for analyses.

Participants
Participants were selected using the following inclusion criteria: (1) community-dwelling individuals; (2) over 50 years of age; (3) Mini-Mental State Exam (MMSE; Lobo et al., 1999) greater or equal to 24 points; (4) Geriatric Depression Scale of 15 items version (GDS-15;Yesavage et al., 1983) below or equal to 9 points; (5) normal cognitive development, not meeting diagnostic criteria for Mild Cognitive Impairment (MCI) (Petersen, 2004) in at least two previous consecutive assessments; (6) being able to manage an independent life without any severe mental disorder (cognitive or psychiatric) impeding daily functioning; (7) normal or corrected hearing and vision; (8) basic reading comprehension and writing abilities in Spanish; and (9) signed written informed consent.
A total of 1165 Spanish speakers without cognitive impairment aged between 50 and 89 years were recruited. In the study, the sampling was non-probabilistic incidental, most of the participants lived in urban areas (93%) and did not receive any type of remuneration for their participation. All participants were asked to produce words that began with P. Letters M and R were also administered to a subsample of participants (852) aged 60 to 89 years.
Before unifying data from the full 'P' sample with the 'M and R' subsample, we verified that there were no statistically significant differences in performance that could be due to the group  Table 1 shows the distribution by sex and educational level for the total sample and the subsample. Regarding Spanish people aged 50 or more, they completed their studies under the educational law enacted in 1970 or under the law that precedes it, established in 1953, that was reformed in 1967. In respect of university studies, there was a law in 1943 substituted later in 1970 for another that distinguished different levels (diplomado, licenciado, doctor, trad. English: diplomate, graduate, doctorate). To cope with the heterogeneity of levels or grades (EGB, elemental bachelor, superior bachelor, BUP, etc.) we used the following equivalents, related to international categories. We have considered educational level as an ordinal variable (with values ranging from 0 to 3). That is, 'Without formal education' (0): less than 6 years of schooling; 'Primary studies' (1): between 6 and 11 years of schooling; 'Secondary studies' (2): between 12 and 15 years of schooling; 'Higher studies' (3): more than 15 years of formal education. The level "Without formal education" corresponded to people who are literate but could only go to school for 2 or 3 years (they only have basic reading and writing skills and simple mathematical calculation). The mean age of the total sample was 72.08 years (SD = 6.46) and of the subsample was 74.01 years (SD = 3.83). The median of educational level was 2 for the total sample and the subsample. Sex was coded as 0 for females and 1 for males. The percentage of women was 65% in the total sample and 63% in the subsample.
The Spanish Consortium for Ageing Normative Data (SCAND) initiative takes the data from three Spanish cohort studies, Aging Brain Projects of the Complutense University of Madrid, the Vallecas Project, and the Compostela Aging Study. The SCAND initiative was developed with the aim of sharing data on the Spanish middle-aged and old adult population provided that the above-mentioned studies share a set of neuropsychological tests in their evaluation protocols. All participants in the total sample were selected given that they had been involved in different studies about the aging process and were recruited between 2008 and 2019.
All studies complied with the ethical standards of the Declaration of Helsinki and were approved by the local Ethics Committees of the participant institutions.

Instruments
In the present study, three letters were considered (P, M, and R), all of which were part of the comprehensive neuropsychological assessment protocol used by each of the SCAND research groups, which included screening tests, scales, and other tests belonging to different cognitive domains (memory, executive functions, and language). Initially, the PF task with the letter P was included in the protocol. Although this is common among a significant number of clinicians, we also found some cases in which the letters M or R were used in isolation. For this reason, we later began to use the three letters (P, M, and R) with participants. Participants were asked to generate as many words as possible beginning with these initial letters in 60 seconds. For each letter, the number of correct words was registered, excluding intrusions and perseverations. Errors (perseverations and intrusions) and words produced every 15 seconds were recorded only for the subsample.

Procedure
All participants completed a structured interview to collect sociodemographic data, screening tests, and an extensive neuropsychological assessment, including memory, executive functions, and language tests, administered by neuropsychologists well-trained in the use of neuropsychological assessment tools. All participants were informed about the main research aspects, and they signed a written informed consent before performing any study procedure.
According to standard instructions (e.g., Lezak et al., 2012), participants were asked to generate in 60 seconds as many words as possible that began with each letter (presented always in this order and consecutively: P, M, and R), excluding proper names and repetitions of the same word with different endings. Participants started naming words beginning with each letter and then two neuropsychologists recorded correct words, perseverations, and intrusions in the order that they were generated. Two raters were used to ensure the reliability of the scoring procedure. The interrater reliability was near .99 in all cases.

Data analyses
The statistical procedure was as follows: First, the cumulative frequency distribution of the raw scores was generated. Percentile ranges were assigned to the raw scores depending on their place in distribution. Then the percentile ranges were converted to scaled scores (ss) (range 2 to 19) using the formula ss = 10 þ 3*Z where Z is the normalized standard score corresponding to the percentile. This transformation of raw scores produced a normal distribution (M = 10 and SD = 3), which allows the application of linear regressions to test the effect of sociodemographic variables and to calculate the scaled-adjusted scores (ssadjusted). Secondly, the effects of age, education, and sex were verified. For each letter, three univariate regressions were calculated on ss with age, education, and sex as predictors. Corrections were only applied for those sociodemographic variables that yielded a significant regression coefficient (p < .05) and that also explained more than 5% of the variance (Lee, 2014). Finally, adjustments were made according to age, education, and sex on the SS, using the following formula: In the formula, and due to the level of measurement of each of the variables, the mean is subtracted from age (continuous ratio level), the median from education (ordinal level), and the mode from sex (nominal level), so that the adjusted scores provide a betterstandardized reference. All analyses were performed with SPSS version 25. Table 1 showed the distribution of women and men by educational level and age group.

Results
Correlations between the individual letter scores and each one with the P þ M þ R score were high and statistically significant (r P-M = .703; r P-R = .718; r M-R = .747; r P-PMR = .896; r M-PMR = .902; r R-PMR = .910; p ≤ .001 in all cases).
Norms for the total number of correct words produced by participants Table 2 shows the descriptive statistics of the total number of correct words by sociodemographic variables. Age was categorized only to show its distribution in the tables but was included as continuous in all statistical analyses.
The adjustments by education in P, M, R, and P þ M þ R indicated that for people without formal education þ2 points must be added to their ss score for P, M and R, and þ3 points when P þ M þ R score is considered. For individuals with primary education, þ1 must be added for P, M, R or P þ M þ R, and when the educational level is higher, the ssadjusted score for P, M, R, or P þ M þ R must be -2 points. No adjustments are needed for the secondary education level. To explain how to use tables to select the correct scaled score given a raw score and how to use the correction on unadjusted scaled scores if needed, let us consider an example. If a patient without formal education produced 16 words with the letter P, we first locate the raw score in Table 3, then we see the percentile range at the left (69-75) and the corresponding unadjusted scaled score (ss = 12). To adjust the unadjusted scaled score according to her/his educational level "without formal education" it is necessary to correct his/her unadjusted scaled score by adding two points (ssadjusted = 14). For a patient with the same raw score but with a higher educational level, their unadjusted scaled score should be corrected by subtracting two points (ssadjusted = 10). Scaled scores and percentile ranges corresponding to perseveration and intrusion errors across letters are shown in Table 5. None of the sociodemographic variables could explain at least 5% of the variance, so no adjustments are required (R 2 < .05 in all cases).

Norms for the total number of words produced every 15 seconds
Descriptive statistics corresponding to the total number of words produced every 15 seconds are shown in Table 6. Table 7 shows unadjusted scaled scores (ss) and percentile ranges corresponding to this measure across letters.
The only variable that explained at least 5% of the variance was education (From 0 to 15 sec.: r = .299, p = .001, R 2 = .089 for P; r = .379, p = .001, R 2 = .143 for M; r = .347, p = .001, R 2 = .121 for R; and r = .417, p < .001, R 2 = .174 for P þ M þ R. From 16 to 30 sec.: r = .338, p = .001, R 2 = .114 for P; r = .348, p = .001, R 2 = .121 for M; r = .373, p = .001, R 2 = .139 for R; and r = .456, p < .001, R 2 = .207 for P þ M þ R. From 31 to 45 sec.: r = .345, p = .001, R 2 = .119 for P; r = .340, p = .001, R 2 = .116 for M; r = .323, p = .001, R 2 = .104 for R; and r = .420, p < .001, R 2 = .177 for P þ M þ R. From 46 to 60 sec.: r = .281, p = .001, R 2 = .079 for P; r = .306, p = .001, R 2 = .094 for M; r = .321, p = .001, R 2 = .104 for R; and r = .424, p < .001, R 2 = .180 for P þ M þ R). Adjustments were only needed for education so that when considering the letter P, ssadjusted is ss þ 1 for those without formal education and ss−1 for those with higher education, in all 15-second segments. When the letter is M, in the first segment (0 to 15 sec.) ssadjusted is ss þ 2 for patients without formal education, ss þ 1 for those with primary education, and SS−2 for those with higher education, and for the rest of the 15-second segments ssadjusted is ss þ 1 for those without formal education and ss-1 for those with higher education. Regarding the letter R, in the second segment (16 to 30 sec.) ssadjusted is ss þ2 for patients without formal education, ss þ 1 for those with primary education, and ss−2 for those with higher education, and for the rest of 15-second segments ssadjusted is ss þ 1 for those without formal education and ss−1 for those with higher education. When the sum P þ M þ R is considered, ssadjusted is ss þ 2 for patients without formal education, ss þ 1 for those with primary education, and ss−1 for those with higher education, regardless of segment.

Discussion
The present study provides normative data of phonological VF (letters P, M, R) for Spanish middle-and older-aged adults, considering sociodemographic factors, and different measures such as the total numbers of words, errors (perseveration and intrusions), and 15 second-segmented scores.
Regarding the whole sample it is important to note that the percentage of women with a lower educational level was higher than that of men, a fact that reflects the reality of the Spanish population of people over 50 years of age (Pérez et al., 2020). Therefore, the sociodemographic distribution of the participants can be considered representative of the Spanish population, with the sole exception that our group of men has a higher level of education than usual.
All correlations between individual letters and between those and the total P þ M þ R score are high and similar, which would indicate that a clinician could use a single letter if she/he needs brevity in her/his evaluation protocol instead of P þ M þ R total score.
Results regarding the total number of words produced by participants point out that education is the most important variable since it explains between 19 and 23% of the total variance   Note: The adjustments by education in P, M, R, and P þ M þ R indicated that for people without formal education þ2 points must be added to their ss score for P, M and R, and þ3 points when P þ M þ R score is considered. For individuals with primary education, þ1 must be added for P, M, R or P þ M þ R, and when the educational level is higher, the ssadjusted score for P, M, R, or P þ M þ R must be −2 points. No adjustments are needed for secondary education level.
in all letters, and 26.6% in the sum P þ M þ R. The effect of age was also statistically significant, although it was unable to explain 3% of the variance. Sex does not have any significant effect, in accordance with previous studies (Mathuranath et al., 2003).
Results showing the effect of the education level on the total number of words produced are in accordance with all the studies reviewed. Tombaugh et al. (1999) showed a direct influence of education on PF, accounting for 18.6% of the variance. Mathuranath et al. (2003) concluded with similar results, indicating a significant influence of education on PF, in which participants with higher educational levels present better performance than those with fewer years of schooling. Aziz et al. (2017) showed that educational level significantly influences both phonemic and semantic fluency tasks, with higher educational levels being associated with better performance. This result is a constant across studies and it appears even with qualitative characteristics of task performance such as clustering and switching (Pereira et al., 2018) Another set of studies have shown the significant effects of educational level and age. Dursun et al. (2002) studied the effects of age and total years of education on vocabulary performance in healthy volunteers. They found that education and age were overall predictors of total scores, but no correlation was found with sex. Peña-Casanova et al. (2009) reported effects of age and education in different letters, but sex was again not significant. A recent study conducted by Marquine et al. (2021) has found a small effect of age and a medium effect of educational level on PF scores, thus showing a similar pattern to that of the present study.
Word production is largely based on verbal skills (vocabulary), memory retrieval and recall, and executive control processes (selfinitiation and monitoring to inhibit repetitions and intrusions) (Shao et al., 2014). VF tasks are multidimensional as they rely on other executive functioning skills as well, such as processing speed, cognitive flexibility, working memory, and sustained attention (Diamond, 2013). PF requires search and retrieval strategies dependent on accessing the mental lexicon. A higher educational level entails a larger lexicon and greater verbal lexical retrieval capacity, as well as the use of more efficient information retrieval strategies (Federmeier et al., 2010). Thus, many factors can influence educational attainment. Formal education may increase vocabulary knowledge, a strong predictor of PF performance with age (Henry & Phillips, 2006), and provides contents and procedures frequently included in cognitive testing (Ardila et al.,   Note that higher scores indicate more perseverations and intrusions and thus worse performance. Pc = Percentile ranges; Rs =Raw scores 2000). Cognitively stimulating experiences in early life can enhance brain development and impact cognitive ability later in life (Noble et al., 2015). In line with these arguments, some studies have reported that reading level (a proxy related to cognitive reserve) and PF were moderately correlated (Johnson-Selfridge & Zalewski, 2001). The systematic review conducted by Panico et al. (2022) described two studies that found correlations between cognitive reserve (CR) and PF. Moraes et al. (2013) investigated the correlation between CR (expressed by years of formal education and frequency of reading and writing) and scores on VF tasks (phonemic and semantic fluency), among other tests/tasks. Education showed the best predictive value on PF. Roldan-Tapia et al. (2012) reported that a composite index of CR (including education, occupation, and vocabulary knowledge) significantly correlated with scores on PF. In another interesting study, Kraan et al. (2013) concluded that PF in adults was associated with verbal intellectual function and processing speed. While traditional normative data studies have examined the total number of words generated as a measure of VF performance, there is evidence suggesting that task performance analysis (errors or temporal analysis) provides valuable additional information (Abwender et al., 2001;Pakhomov et al., 2018). As far as we know, the present study is the first attempt to provide normative data on older Spanish adults concerning the number of perseveration and intrusion errors. Our results showed a small statistically significant effect of educational level that failed to explain at least 5% of the variance of the errors. Effects of age or sex were not significant. Previous studies considering the effects of education on older and middle-aged adults' errors were not found. As a tentative hypothesis, the fact that education explains a percentage under 5% criterion might be related to the small range observed in the number of errors. Ranges in raw scores are certainly small, given that the sample is composed of healthy older adults.
Concerning our measures of word generation performance on 15-second time intervals, we also did not find previous studies including Spanish normative data in similar measures. Our results indicate that the highest word production occurs in the first 30 seconds of the tests, and a descending curve of word production was observed over the 60-second test period. These results are consistent with those found in the adult population; it has been suggested that production of words is maximal during the initial stages of the task (semi-automatic retrieval) as individuals access their long-term memory, which consists of the greatest frequency, easy-to-retrieve words. When this store is exhausted, the individual attempts to retrieve words from a larger pool of words, making the search process more time-consuming and more difficult (Crowe, 1998;Jacobs et al., 2021;Raboutet et al., 2010). That is, as time on task increased the production decreased, as did the word frequency of the items produced. In a study conducted by Demetriou and Holtzer (2017), healthy older adults were fast and efficient at initiating search processes and retrieving words from memory as evidenced by the larger number of words they produced in the first 20 seconds of the task. However, there was a discrepancy between the first-time intervals and the subsequent two intervals' performance which Demetrious and Holtzer explained assuming that people had to monitor and inhibit responses that had already been given from a large set of evoked words.
Educational level effects appeared in all the 15-sec.-intervals across letters (and in the sum) in the present study. Sex and age had a significant effect on some of the 15-second interval measures but were very weak as they failed to explain 2% of the variance. Similarly, Venegas and Mansur (2011) found that participants'    educational level has a positive effect on the three first quartiles in PF, while age was not significant. As suggested by these authors, the first quartile is dedicated to semi-automatic retrieval, while the other quartiles are implicated in planning, adjusting, and monitoring the performance, to guarantee the generation of items and avoidance of repetitions and intrusions. The effect of educational level on word production in the last 15-second interval is also relevant given that at this point the task requires the greater effortful retrieval processes and people with higher educational attainment should show an advantage given their larger lexicon, greater verbal lexical retrieval capacity, and more efficient information retrieval strategies (Demetriou & Holtzer, 2017). Congruent with this line of reasoning is the study of Sauzéon et al. (2011) which revealed a knowledge compensation mechanism in older adults' letter fluency productions that only occurred during the second period (31-60 sec.) and was related to vocabulary level. We would like to point out as a limitation of this research that we did not use epidemiological recruitment methods, and potentially medical and/or psychological conditions that may interfere with cognition and self-reported mood. Neither of these variables was assessed in this study. We did not recruit illiterate participants because they are very unusual in Spain. Although it provides a greater representation of the Spanish population to the sample of participants, the unequal sex distribution of the sample should be added to the list of the study's limitations. Also, the reader must consider that these normative data are not generalizable to Spanish speakers outside of Spain, provided that other sample demographic characteristics must be similar to those of the normative sample; additionally, the administration and scoring procedures of the test used must be matched too (Mitrushina et al., 2005).

Conclusions and future directions
The present study provides normative data for healthy older people on the PF task for the letters P, M, and R, and considers errors and production by time segments. The influence of education is in line with other previous studies. These data may also be of considerable use for comparisons with other normative studies in Spain and other countries.
There are very few normative studies for VF tasks for people over 60-75 years of age, and even fewer for Spanish speakers, in addition, to our knowledge, this is the first study to present normative data regarding the number of errors (perseverations and intrusions) and the number of words produced in 15-sec. intervals. Normative data for PF in specific populations are a useful resource for clinical and research studies and may aid in the early detection of cognitive impairment, diagnosis, establishing prognosis, planning treatment, and monitoring clinically significant changes.
It is of interest to further investigate whether a different approach to quantifying performance on PF, such as error testing or time segment generation, is related to prevalent MCI or predictive of its incidence.