Performance difference in verbal fluency in bilingual and monolingual speakers

Research has shown that bilinguals can perform similarly, better or poorly on verbal fluency task compared to monolinguals. Verbal fluency data for semantic (animals, fruits and vegetables, and clothing) and letter fluency (F, A, S) were collected from 25 Bengali–English bilinguals and 25 English monolinguals in English. The groups were matched for receptive vocabulary, age, education and non-verbal intelligence. We used a wide range of measures to characterize fluency performance: number of correct, fluency difference score, time-course analysis (1 RT, Sub-RT, initiation, slope), clustering, and switching. Participants completed three executive control measures tapping into inhibitory control, mental-set shifting and working memory. Differences between the groups were significant when executive control demands were higher such as number of correct responses in letter fluency, fluency difference score, Sub-RT, slope and cluster size for letter fluency, such that bilinguals outperform the monolinguals. Stroop performance correlated positively with the slope only for the bilinguals.


Introduction
The literature is abuzz with arguments for and against the linguistic and executive control differences between monolingual and bilingual speakers (Gollan, Montoya, Fennema-Notestine & Morris, 2005;Ivanova & Costa, 2008;Luo, Luk & Bialystok, 2010;Paap, Myuz, Anders, Bockelman, Mikulinsky & Sawi, 2017;Prior & MacWhinney, 2010). Previous studies have shown bilingual disadvantages in various linguistic tasks such as picture naming , verbal fluency (Rosselli, Ardila, Araujo, Weekes, Caracciolo, Padilla & Ostrosky-Solí, 2000), word identification through noise (Rogers, Lister, Febo, Besing & Abrams, 2006). According to the weaker link hypothesis, the reason for bilingual disadvantage in the linguistic domain is the lesser usage of each language of a bilingual speaker resulting in weaker links between the two languages (Michael & Gollan, 2005). A sensorimotor account (Hernandez & Li, 2007) attributes the bilingual disadvantage to the delay in age of acquisition of the second language. Further, bilinguals face greater lexical competition compared to monolinguals as both languages are active during language processing (Costa & Caramazza, 1999) and the poorer performance in the linguistic domain can be attributed to this increased lexical competition (Inhibitory control model, Green, 1998).
In contrast to the disadvantages of being bilingual in the verbal domain, effect of bilingualism on executive control mechanism is hotly debated. Researchers have shown advantages (Bialystok, Craik, Klein & Viswanathan, 2004;Prior & MacWhinney, 2010) as well as no differences across various executive control tasks (Kousaie & Phillips, 2012;Paap & Greenberg, 2013;Paap et al., 2017). For example, studies have reported bilingual advantage on inhibitory control tasks (e.g., Simon task in Bialystok et al., 2004;Flanker task in Emmorey, Luk, Pyers & Bialystok, 2008), no difference or similar performance between bilinguals and monolinguals has also been noted (e.g., Stroop in Kousaie & Phillips, 2012; Flanker task and Simon task in Paap & Greenberg, 2013;. Similarly, on mental set shifting measure using colour-shape task switching paradigm have reported divergent findings ranging from advantage for bilinguals (Prior & Gollan, 2011;Prior & MacWhinney, 2010) to no differences between the two groups (Paap & Greenberg, 2013;Paap et al., 2017). As could be seen from the literature, it is still unresolved whether bilinguals would show specific advantages on certain domains of executive control as the difference between the groups depends on cultural differences, small sample size, inappropriate statistical analysis, and the tasks used (Paap, Johnson & Sawi, 2014).
A prevalent approach in the literature has been to use separate measures of language production and executive control mechanisms; linguistic tasks tapping into language production, and non-linguistic tasks tapping into executive control processes. A better approach to inform the debate ondisadvantage or advantageamongst language and executive control processes for bilingual and monolingual speakers would be to use a task (e.g., verbal fluency task) that simultaneously draw upon both these processes. With the exception of a handful of studies, the role of executive control during language production amongst bilinguals and monolinguals has not been explored (e.g., Bialystok, Craik & Luk, 2008;Friesen, Luo, Luk & Bialystok, 2015).
Researchers have used the verbal fluency taskthe ability to produce as many unique words as possible in a fixed amount of time, according to a given criterion (e.g., semantic or category; letter or phonemic)to inform the debate of linguistic and executive control differences between monolingual and bilingual speakers (Luo et al., 2010;Paap et al., 2017;Sandoval, Gollan, Ferreira & Salmon, 2010). Performance in the semantic fluency condition resembles to our day to day language activities: for example, in a semantic fluency task, participants are asked to generate items belonging to the category of clothing, and participants try to remember the items from their wardrobes. Therefore, participants can revisit the existing links in their mental lexicon related to a concept while generating novel words in the semantic fluency condition (Friesen et al., 2015). However, letter fluency condition becomes more challenging as it requires producing words starting with a letter or phoneme, which is not commonly practiced in our everyday life. Successful performance in the letter fluency condition requires coming up with strategies and suppression of the activation of related semantic concepts (e.g., Friesen et al., 2015;Luo et al., 2010). Thus, the respective contributions of linguistic and executive components are differential for semantic and letter fluency conditions: higher demands are placed on executive control mechanisms in letter fluency, while a greater emphasis is placed on linguistic abilities in semantic fluency (Delis, Kaplan & Kramer, 2001;Luo et al., 2010;Paap et al., 2017;Sandoval et al., 2010;Shao, Janse, Visser & Meyer, 2014).
Verbal fluency research comparing bilingual and monolingual performance have shown mixed results Luo et al., 2010;Paap et al., 2017;Sandoval et al., 2010). In semantic fluency, monolinguals generate a larger number of correct responses than bilinguals (Gollan, Montoya & Werner, 2002;Rosselli et al., 2000;Sandoval et al., 2010). However, this bilingual disadvantage disappears when the groups are matched on receptive vocabulary Luo et al., 2010). For letter fluency, findings have been wide ranging from fewer to equivalent to greater number of correct responses by bilinguals Kormi-Nouri, Moradi, Moradi, Akbari-Zardkhaneh & Zahedian, 2012;Rosselli et al., 2000;Luo et al., 2010;Paap et al., 2017;Sandoval et al., 2010). Luo et al. (2010) found that vocabulary matched bilinguals outperform monolinguals on letter fluency, proposing that it is suggestive of better executive control in bilinguals. However, Paap et al. (2017) were unable to replicate these results. They strongly argued that "relatively better performance by a group on letter fluency compared to category fluency cannot be taken as evidence that the group has superior executive functions. Rather such a claim must be backed up by an independent and direct test of EF ability" (Paap et al., 2017, p.108). Importantly, studies exploring the relationship of independent measures of executive control and verbal fluency performance (at least in monolinguals) did not find a stronger relationship between executive control measures and the performance in letter fluency compared to semantic fluency task (Shao et al., 2014). With a limited number of empirical studies and difficulties with replication, it remains an open question whether bilinguals and monolinguals: (1) perform differently in semantic and letter fluency tasks; (2) whether their performance differences would be mediated by specific aspects of executive control abilities.
Moving beyond the number of correct responses, we used a wide range of variables to characterize verbal fluency performance, such as time-course, clustering, and switching analyses for both semantic and letter fluency (Luo et al., 2010;Troyer, Moscovitch & Winocur, 1997). Table 1 provides description of the variables and the components of verbal fluency they are assumed to index. To our knowledge, this is the first study that systemically compares healthy bilinguals and monolinguals on this full range of measures. In addition, we included independent measures of executive processes (i.e., inhibition, shifting and memory) to compare performance differences between bilinguals and monolinguals and their relationship to verbal fluency performance. This allows us to establish if bilinguals will evidence exaggerated differences on the verbal fluency parameters that depend more on the executive component of the task and if bilinguals' better performance in letter fluency found in some studies can be attributed to differences in executive control.
As the verbal fluency task places a premium on rapid search and retrieval, temporal measures of performance, such as timecourse analysis (i.e., production time of each word as a function of its position in the sequence), provide insights into the linguistic and executive control strategies (e.g., Crowe, 1998;Luo et al., 2010;Sandoval et al., 2010). In time-course analysis, the number of words generated over the 60 second time interval is grouped into 5-second time bins, with declining response rate presented by plotting the number of words produced as a function of time. Four parameters are generated from this graph: First-Response Time (1 st -RT), Subsequent-Response Time (Sub-RT); initiation parameter; and slope (see Table 1 for the definition of these measures). Luo et al. (2010) compared semantic and letter fluency performance for a group of young monolinguals and two groups of young bilinguals (high-vocabulary bilinguals who were matched with monolinguals; low-vocabulary bilinguals). In letter fluency, the high-vocabulary bilinguals produced a profile of larger number of correct responses, a longer Sub-RT, and a flatter slope than the monolinguals. Similar results have been obtained by Friesen et al. (2015), who found no difference between bilinguals and monolinguals on the semantic fluency condition, but a greater number of correct responses on the letter fluency by the bilinguals.
In contrast, studies have shown that bilinguals produced longer Sub-RT along with fewer number of correct responses compared to monolinguals in letter fluency (Sandoval et al., 2010). These authors argued that the bilingual disadvantage results from cross-linguistic interference which slows down their word retrieval process, as denoted by longer Sub-RT. It has been argued that as vocabulary-matched bilinguals produced a greater number of correct responses compared to monolinguals, it is unlikely that the retrieval-slowing hypothesis can explain the bilingual advantage (Friesen et al., 2015;Luo et al., 2010). Instead, they suggest that bilinguals' better performance in the letter fluency in conjunction with the longer Sub-RT is a result of bilinguals' superior executive control abilities, which is proposed to be a by-product of constant cross-linguistic interference faced by bilinguals (Abutalebi & Green, 2008;Friesen et al., 2015;Luo et al., 2010).
The Fluency Difference Score (FDS) has been suggested to further capture the role of executive control in fluency task (Friesen et al., 2015). The FDS is calculated as the difference in the number of correct responses between the semantic and letter fluency conditions as a proportion of correct responses in the semantic fluency condition. Therefore, individuals who can maintain better performance in the difficult letter fluency condition would show a smaller FDS score; this is indicative of better executive control abilities (Friesen et al., 2015).
The production of words during verbal fluency performance is not evenly distributed over time but tends to be produced in "spurts" or temporal clusters, with a short time interval between words in a cluster and a longer pause between clusters (Gruenewald & Lockhead, 1980;Troyer et al., 1997). On semantic fluency tasks, the words that comprise these temporal clusters tend to be semantically related (e.g., first name farm animals, then switch to pets, then to birds); on letter fluency tasks, the words tend to be phonologically related (e.g., words that start with same first two letters, then switch to words that rhyme, then to words that have the same ending). This response pattern has led to the suggestion that performance involves two processes: a search for subcategories which corresponds to a pause between clusters followed by an output mechanism to produce as many words as possible from the subcategories (Gruenewald & Lockhead, 1980;Tröster, Fields, Testa, Paul, Blanco, Hames, Salmon & Beatty, 1998). The metrics of switching and clustering have been suggested to quantify the above two processes (Troyer et al., 1997). Specifically, clustering involves accessing and using the word store and cluster size is a measure of the ability to access words within the subcategory. Switching involves search processes and is a measure of the ability to shift efficiently from one subcategory to another; reduced switching has been attributed to executive function difficulty to shift between subcategories (Troyer, Moscovitch, Winocur, Alexander & Stuss, 1998). Both clustering and switching abilities contribute to the total number of correct responses; however, in category fluency, clustering accounts for more of the variance for number of correct responses, whilst in letter fluency, switching accounts for more of the variance for number of correct responses (Troyer et al., 1997). Thus, clustering and switching analyses provide another well-established mean to further inform the linguistic and executive debate for bilinguals vs. monolinguals.
To the best of our knowledge, no research has reported the relationship of independent executive control measures to bilingual vs. monolingual performance difference on verbal fluency. Only one study with healthy monolingual adults investigated the 60-seconds verbal fluency performance with measures of executive control (Shao et al., 2014). Shao et al. had assessed older Dutch speakers on both semantic and letter fluency conditions and related their performance with the measures of executive control (i.e., updating of working memory, operation span; inhibitory control, stop-signal task). Results revealed that only working memory ability predicted the number of correct responses in both fluency conditions. Shao et al. noted that "there was no evidence that executive control had a stronger effect on performance in the letter than in the category fluency task" (Shao et al., 2014, p. 8). The authors cautioned that the inhibitory control task (i.e., stop-signal task) used in their study may not have represented the inhibitory control required for the verbal fluency task. The stop-signal task measures how fast an individual can stop a planned response, whereas, in verbal fluency, participants need to suppress the activation of competitor lexical items (selective inhibition) to produce the target word.
For the present study, we adopted the framework developed by Miyake and his colleagues (Miyake, Friedman, Emerson, Witzki, Howerter & Wager, 2000;Miyake & Friedman, 2012) to measure the three executive control components. This framework proposes that the three executive control components share a common executive functioning factor, which is the ability to actively maintain task-related goals while controlling the lower level processing using the task-related information (Miyake & Friedman, 2012). Specifically, this is what we measured: inhibitory control (ability to inhibit the automatic, dominant, or prepotent responses when required), mental set-shifting (ability to shift between different tasks, rules, or mental representations), and working memory (constant updating and manipulation of relevant incoming information while replacing old irrelevant information). We 1st RT Preparation time to initiate the first response. √ Sub-RT Estimate for mean retrieval latency and represents the time point at which half of the total responses have been generated.

Initiation parameter
Measures the initial linguistic resources or vocabulary available to perform the task.

Slope
Reflects how resources are monitored and used over time during the retrieval process; largely determined by executive processes.

Qualitative analysis 4
Cluster size Strategic process that helps generating words within a subcategory and utilizes the speaker's ability to access words within subcategories.

Number of switches
Strategic process to shift efficiently to a new subcategory when a subcategory is exhausted.
used the Stroop task to measure selective inhibition (Scott & Wilshire, 2010), the colour-shape switch task to measure the mental-set shifting ability (Prior & MacWhinney, 2010), and the backward digit span test to measure working memory (Wechsler, 1997).
Research in bilingualism has identified various factors, such as language combination of bilinguals and their language proficiency, which can confound the results. Studies including bilinguals with a range of different language combinations lead the individual variability and can result in a wider range of performance that could be attributed to typological, structural, and cultural differences amongst the languages (Eng, Vonk, Salzberger & Yoo, 2018;Marian, 2008). Inclusion of bilinguals with the same language combination allows controlling for within-group performance variation due to differences in the second language they speak. Language proficiency of bilinguals has also been shown to be also a significant contributor for verbal fluency performance Gollan et al., 2002;Luo et al., 2010). When bilinguals are matched with monolinguals in terms of language proficiency, they either outperform (Luo et al., 2010) or perform at par with the monolinguals Paap et al., 2017). In contrast, low proficient bilinguals perform poorly (Gollan et al., 2002) compared to the monolinguals. Therefore, it is crucial to match the bilinguals to the monolinguals in terms of language proficiency. In the present research, we have included a homogenous group of bilinguals in terms of language combination and proficiency, which we hope would decrease the within-group variability and findings could be attributed to the processes that are tested.

The current study
We compared the difference in verbal fluency performance in two groups of young healthy participants: 25 Bengali-English bilinguals and 25 English monolinguals. The groups were matched on receptive vocabulary, years of education, and non-verbal intelligence. We collected semantic (animals, fruits, vegetables) and letter (F, A, S) fluency data for 60 seconds in English. We provided detailed characterization of our bilingual participants on relevant variables for bilingualism: language history and acquisition patterns, usage patterns, proficiency, dominance, and switching habits. Our bilingual participants formed a relatively homogenous group of balanced bilinguals in terms of language of instruction during education, self-rated language proficiency, and language dominance. All bilingual participants were born in the Bengali speaking region in India and acquired Bengali as their first language. However, they currently lived in the UK and they used English more frequently than Bengali in their everyday life.
We quantified the verbal fluency performance in terms of quantitative (number of correct responses; FDS); time-course (1st-RT; Sub-RT; initiation parameter; slope); and qualitative (cluster size; number of switches). Executive control processes were measured using the Stroop (measured selective inhibition), the colour-shape switch task (measured shifting between mental sets), and the backward digital span (measured working memory) tasks.
We formulated our hypothesis from the theoretical accounts (weaker link hypothesis, inhibitory control model) described earlier. Bilingual participants in the present study were matched in vocabulary with the monolingual group. Further, our bilingual participants used English in their day-to-day life more often than Bengali. We predicted that controlling for these factors (vocabulary and usage), bilinguals would be able to perform at par with the monolinguals if bilinguals can resolve their increased cross-linguistic competition. Moreover, they might be able to perform better in linguistic conditions that require higher executive control processing (e.g., letter fluency condition). The research aims, and predictions, were as follows: 1. To determine differences in verbal fluency performance (quantitative, time course, and qualitative analysis) between bilingual and monolingual participants.
As the groups were matched on vocabulary, we predicted bilinguals would perform similarly to monolinguals on the semantic fluency condition, but potentially produce a larger number of words than monolinguals in the letter fluency condition. In similar vein, we did not expect differences in cluster size. If bilinguals were to show superior executive control, we would expect bilinguals to demonstrate smaller FDS, more number of switches and longer Sub-RT, and flatter slope in letter fluency compared to monolinguals. 2. To determine measures of executive control (inhibitory control, mental set shifting, and working memory) that mediate verbal fluency performance difference between the groups. We expected that if bilinguals were to show an advantage in the letter fluency condition, then executive control measures would have a stronger correlation with performance measures that relate to the executive control abilities (i.e., FDS, slope, number of switches).

Participants
Twenty-five Bengali-English bilingual healthy adults (M = 32.84, SD = 4.78) and 25 English monolingual healthy adults (M = 30.4, SD = 8.2) participated in this study. Participants reported themselves to be right-handed, with normal or corrected vision, no history of hearing impairment, and no history of any neurological illness. All participants were residing in the Berkshire county of the United Kingdom. Demographic details (age, gender, and years of education) and scores on nonverbal IQ from the Raven's standard progressive matrices plus version (SPM Plus, Raven, 2008) are presented in Table 2. Participants were also assessed on two standardised tests of receptive vocabulary: the Oxford Placement Test (Oxford University Press and Cambridge ESOL, 2001) and the British Picture Vocabulary Scale III (BPVS-III; Dunn, 2009). The groups did not differ on age, gender distribution (bilinguals: 11 females and 14 males; monolinguals: 12 males, 13 females; p = .78), years of education, non-verbal IQ and receptive vocabulary (see Table 2). Bilingual participants were recruited from the local Bengali community (e.g., Bengali Cultural Society of Reading). Bilinguals were immigrants who have lived in the UK, ranging from 1 year to 15 years (M = 7.48, SD = 3.58). They spoke Bengali and English fluently, had minimal or no knowledge of any other language. Monolingual participants were recruited from the university student population, who received course credit for participation and local community. Monolingual participants used only English in their day-to-day life and were functionally fluent only in English. Participants provided written consent and their participation was voluntary. The University of Reading Research Ethics Committee approved all the experimental procedures.

Measures of bilingualism
Bilinguals were assessed using various measures to characterize their bilingualism. We adapted and modified the questionnaire developed by Muñoz, Marquardt & Copeland (1999). This questionnaire assessed language acquisition history, instruction of language during education, self-rated language proficiency (in speaking, comprehension, reading and writing), and the current language usage pattern. Language dominance was measured using the language dominance questionnaire (Dunn & Tree, 2009) and language switching habits were assessed using a language switching questionnaire (Rodriguez-Fornells, Krämer, Lorenzo-Seva, Festman & Münte, 2012). All the questionnaires are provided as Supplementary Material (Appendix S1, Supplementary Materials).
There was no significant difference amongst bilinguals' Bengali and English on the language of instruction during education, subjective language proficiency ratings (speaking, comprehension, reading, and writing abilities) and language dominance. This indicated a balanced bilingualism on these domains. However, during childhood, bilinguals had significantly greater Bengali exposure during acquisition (M = 14.3, SD = 2.6) than English (M = 2.5, SD = 2.3). Current usage of language was predominantly English; they were more prone to switch from Bengali to English than the reverse during day-to-day communication.

Verbal fluency measures
Trials and procedures Participants completed two verbal fluency conditions − semantic and letterin English. They were asked to produce as many words as possible in 60 seconds. In the semantic condition, participants produced words in three categories − animal, fruits and vegetables, and clothing items. In the letter condition, participants were asked to produce words that start with letters F, A, and S. The restrictions for the letter conditions were to produce unique words that are not proper names or not numbers (e.g., Singapore, seven), and to not produce variants of the same words (e.g., shop, shopper, shopping). The order of the fluency conditions was randomized across participants; however, the trials were blocked by condition. Each participant was tested individually in a quiet room. After providing the instruction, the participant started a trial only when the tester said "start". This ensured that there was a definitive starting point for each trial. Responses were recorded with a digital voice recorder and later analysed for the following variables.

Data coding and analysis
All responses (including repetition and errors) were transcribed verbatim. Each correct response was time-stamped using PRAAT (Boersma & Weenink, 2015). The time-stamping enabled us to index the onset of a response from the onset of the trial (i.e., "start"), which allowed us to calculate the variables in time-course analysis. We measured the following variables for each trial: 1. Number of correct responses (CR): the number of responses produced in one-minute excluding errors. In semantic condition, errors were repetition of same words, words that were not from the target category (e.g., cat as a response for clothing category), and cross-linguistic intrusions. In letter condition, errors were repetition of same words, words that began with a different letter (e.g., pig as a response for letter F), proper names (e.g., France as a response to letter F), same word but with inflectional or derivational suffixes (e.g., fast, faster, fastest were counted as single CR), and cross-linguistic intrusions. 2. Fluency Difference Score (FDS): the differences in the number of correct responses between semantic and letter fluency conditions as a proportion of correct responses in the semantic fluency condition.  (Luo et al., 2010). Based on the time tag, CRs were grouped into 5 sec bins over each 60 sec trial, resulting in 12 bins. The group means of CR in each of the twelve bins were calculated for each semantic and letter fluency trial. The means of CRs for each trial were plotted using a line graph (x variable, bins; y-variable, mean CR). This graph was then fitted with a logarithmic function. An example of a logarithmic function is y = 4.39-1.41 In(t), where y is the estimated value of the function at different points of time(t). Two central measures derived from this plot were: initiation parameter and slope.
FIRST-RT (1 ST -RT) is the time interval from the beginning of the trial to the onset of first response. The first response usually takes longer than the subsequent responses and this delay in first response has been linked to the task preparation (Rohrer, Wixted, Salmon & Butters, 1995). SUBSEQUENT-RT (SUB-RT) is the average value of the time intervals from the onset of first response to the onset of each subsequent response. Thus, Sub-RT provides a good estimate for mean retrieval latency and represents the time point at which half of the total responses have been generated (Sandoval et al., 2010). A longer mean Sub-RT indicates that performance extends later into the time course, but interpretation of this variable depends on the total number of correct responses (Luo et al., 2010). If one group produces more correct responses than another group and has longer mean Sub-RT, then the interpretation is that the group has superior control (and equivalent or better vocabulary) and could continue generating responses longer. If one group produces fewer or equivalent correct responses but has longer mean Sub-RT, then the interpretation is that the control is more effortful as it took longer to generate the same or a fewer number of items. In contrast, a shorter mean Sub-RT would indicate a faster declining rate of retrieval because a large proportion of the responses were produced early during the trial.
INITIATION PARAMETER is the starting point of the logarithmic function that is the value of y when t = 1 or In(t) = 0 (e.g., initiation parameter for the above mentioned logarithmic function isy = 4.39 -1.41 In(1) = 4.39 -0 = 4.39). The initiation parameter indicates the initial linguistic resources or breadth of lexical items available for the initial burst when the trial begins and is largely determined by vocabulary knowledge.
SLOPE of the plot is determined by the shape of the curve and refers to the rate of the retrieval output as a function of the change in time over 60 seconds. The slope for the above example would be 1.41. It reflects how the linguistic resources are monitored and used over time and is largely determined by executive control. Flatter slope indicates that participants were able to maintain their performance across the response period despite greater lexical interference (e.g., avoiding repetition, searching for words from the already exhausted vocabulary source) towards the end of the trial, reflecting better executive control.
4. Clustering and switching analyses: We closely followed the methods used by Troyer et al.'s (1997). Repetitions were included for the clustering and the switching analyses. Semantic fluency clustering was defined as successively produced words that shared a semantic subcategory. Letter fluency clustering was defined as successively generated words which fulfil any one of the following criteria (Troyer et al., 1997): words that begin with the same first two letters (stop and stone); words that differ only by a vowel sound regardless of the actual spelling (son and sun); words that rhyme (stool and school); or words that are homonyms ( foot: anatomical part of body, and foot: unit of measure). Two variables were generated after clustering the responses: cluster size and number of switches.
CLUSTER SIZE was calculated beginning with the second word in each cluster. A single word was given a cluster size of zero (e.g., crocodile), two words cluster was given a cluster size of one (e.g., bear, fox belong to North American animal cluster and cluster size of one), three words cluster was given a cluster size of two (e.g., rhinoceros, hippopotamus, deer belong to African animal cluster and cluster size of two) and so on. Mean cluster size for a trial was calculated by adding the size of each cluster and dividing the total score by the number of clusters.
NUMBER OF SWITCHES was the number of transitions between clusters. For example, dog, cat; snake, lizard; horse, cow, goat contain two switchesbefore snake and before horse. Leopard, cheetah; kangaroo, koala bear; robin, sparrow, crow; chimpanzee, orang-utan, baboon has three switchesbefore kangaroo, robin and chimpanzee. Similarly, in letter fluencyfragile, fraught, fray; fan, fat; fly, flower, flute contain two switchesbefore fan and before fly.

Executive control measures
Stroop task (Inhibitory control) The computerized Stroop Task used in this study was adapted from Scott and Wilshire (2010). It consisted of six colours and their names: red, green, blue, yellow, orange, and purple. The task was divided into two conditions, neutral and incongruent. In the neutral condition, participants named the colour of differently coloured rectangles. A series of 50 coloured rectangles, each in one of the six colours were presented in a random order, such that two successive trials never had the same colour. In the incongruent condition, participants named the font colour of the colour words. A series of 50 colour words were shown one at a time on the screen in a random order, each of which was presented in a colour other than the word's name (e.g., red in green colour).
The procedure was the same for both conditions. Participants were instructed to name the colour or read the word as quickly and as accurately as possible. Each condition began with six practice trials. Both conditions were completed during a single session with the neutral condition first followed by the incongruent condition. The onset of each stimulus was accompanied by a beep, which allowed latency measurement. All responses were recorded with a digital voice recorder.

Analysis
Accuracy and response times were obtained. The reaction time (RT) analysis was performed after excluding self-corrected and incorrect responses. Using PRAAT, RT for each trial was measured from the onset of the beep to the onset of the naming. Outliersthat is, RTs that were 2.5 standard deviations above or below a participant's mean RT or <250 mswere removed prior to calculation of the dependent measures. We calculated the Stroop Effect, as the difference between incongruent and neutral conditions Scott & Wilshire, 2010). Calculation of Stroop Effect can yield similar results even when the interference effects are not similar. For example, for participant 1, RT of 800 ms in the incongruent condition minus a RT of 400 ms in the neutral condition will give a stroop effect of 400 msec. For participant 2, RT of 1200 ms in the incongruent condition minus a reaction time of 800 ms in the neutral condition will also give a Stroop effect of 400 ms. However, the difference score does not take into account overall slowness between the participants. This is a crucial factor in assessing Stroop interference (Green, Grogan, Crinion, Ali, Sutton & Price, 2010). To account for overall speed differences in responses, we calculated Percentage Stroop Ratio (%). The Percentage Stroop ratio (%) was calculated by dividing the Stroop Difference (mean incongruentmean neutral) by the mean of neutral and incongruent trials, and then multiplied by 100. In the above example, partici- Mental-set shifting (Colour-shape switch task) We adapted Prior and MacWhinney's (2010) colour-shape switch task. Participants had to switch between colour judgement and shape judgement trials. Target stimuli consisted of filled red triangle, red circle, green triangle, and green circle. Participants had to judge the colour or shape of the stimuli based on a cue. There were two types of cues: colour cue (colour gradient) and shape cue (row of small black shapes). If the cue was a colour cue, participants had to judge the colour of the stimulus (red or green) and if the cue was a shape cue, participants had to judge the shape of the stimulus (circle or triangle). The target stimulus appeared at the centre of the screen, followed by the cue that remained on the screen above the target stimulus. The task was presented via E-Prime (Psychology Software Tools, Pittsburgh, PA). Each trial started with a fixation cross for 500 ms, after which the cue appeared on the screen for 250 ms, 2.8°above the fixation cross, followed by a blank screen for about 300 ms. The targets were red or green circles (2.8°* 2.8°) and red or green triangles (2.3°*2.3°). The cue and target remained on the screen until there was a response or for a maximum duration of 2000 ms. This was followed by a blank screen for about 1000 ms before the onset of the next trial. Participants were required to press the key on a computer corresponding to red/green colour or triangle/circle shape. One half of the trials comprised switch trials, the other half non-switch trials. In the switch trial, a colour stimulus preceded the shape stimulus (colour to shape switch) or a shape preceded the colour stimulus (shape to colour switch). In the non-switch trial, a colour stimulus always preceded another colour stimulus (colour to colour) and a shape stimulus always preceded another shape stimulus (shape to shape). There were 20 practice trials followed by 3 blocks of 48 experimental trials each. There were total 72 switch trials and 72 non-switch trials. Reaction time and accuracy were measured for switch trials and non-switch trials separately. We derived three dependent variablesswitch cost for reaction time (SC RT ), Percentage switch cost ratio (%), and switch cost for accuracy (SC ACC ). Smaller switch cost meant participants had a smaller difference (i.e., equivalent performance) between the easier (non-switch trial) and the difficult condition (switch trial). This would suggest efficient shifting ability (Prior & MacWhinney, 2010).

Working memory (backward digit span)
The Wechsler Memory Scale (WMS 3, Wechsler, 1997) was used to measure the backward recall of digit sequences. This is thought to reflect working memory performance (Wilde, Strauss & Tulsky, 2004). Participants were verbally presented an increasingly longer series of digits from 2 to 9, and they were then asked to repeat the sequence of the digits in reverse order. The rate of presentation was one digit per second. The test ended when the participants failed on two consecutive trials at any one span size or when the maximum trial size was reached. The backward digit score was the total number of lists reported correctly in the backward digit span test.
As could be seen in Table 4, the two groups differed significantly on Percentage Stroop ratio (%), Percentage switch cost ratio (%), and switch cost accuracy. Although, bilinguals were overall slower in the Stroop task but there was no difference on the Stroop difference measure. However, when we accounted for overall speed difference, bilinguals demonstrated smaller Percentage Stroop ratio (%) which is indicative of better inhibitory control. Bilinguals also showed a smaller Percentage switch cost ratio (%) and a smaller switch cost accuracy suggestive of superior shifting ability.

Statistical analysis
All verbal fluency measures were normally distributed. To arrive at the mean scores for each measure, the three trials were averaged in each condition; for semantic fluency, animals, fruits and vegetables, and clothing were averaged; for letter fluency F, A, and S trials were averaged. A two-way ANOVA repeated measure was used on the following measures: number of CR, 1 st -RT, Sub-RT, cluster size, and number of switches. In the design, Group (Bilingual, Monolingual) was treated as a between-subject factor, and Condition (Semantic, Letter) was treated as within-subject factor. Tukey's post-hoc tests were applied for significant interaction effects at p ≤ 0.05. Independent sample t-tests were performed for FDS, initiation parameter and slope for semantic and letter fluency conditions with Group as the between-subject factor. To examine the relationship between the executive control measures and verbal fluency measures, correlations were performed separately for each group.

Results
The mean and standard deviation values for the verbal fluency variables for Group (Bilinguals and Monolingual) and Condition (Semantic and Letter) averaged across participants are presented in Table 5 (standard deviation reflects betweensubject variation). The results of the statistical tests are provided in Table 5 as well. Findings from the correlation analyses between the executive control measures and verbal fluency variables for each group are presented in Table 6. Findings for Group differences are presented first, followed by the findings on the relationship of executive control measures and verbal fluency variables. The authors are happy to share anonymized item level timestamped verbal fluency data with interested readers.

Group differences in verbal fluency performance
Differences between the bilinguals and monolinguals were observed either as a main effect of Group or as an interaction of Group X Condition for CR, FDS, Sub-RT, slope for letter fluency, and cluster size. There were no group differences in 1 st -RT, initiation parameters for either semantic or letter fluency, slope for semantic fluency, and number of switches.

Discussion
This research set out to determine group differences in verbal fluency performance between a group of relatively homogeneous Bengali-English bilinguals with English speaking monolinguals, as well as identify the executive control measures that contribute to the performance difference between them. We used a wide range of measures − CR, FDS, 1st-RT, Sub-RT, initiation, slope, clustering and switchingto characterize the linguistic and executive control components of the participants' verbal fluency performance. These measures are thought to differentially contribute to the linguistic and executive components of verbal fluency task. In addition, we measured executive control in the domains of inhibition, switching, and working memory, and linked the verbal fluency performance to the executive measures.
To summarize the main findings, compared to monolinguals, bilinguals showed differences in both the linguistic (letter fluency: number of CR, cluster size) and executive control (FDS, Sub-RT, slope and number of switches in letter fluency) domains of the verbal fluency task as identified and indicated on Table 7. Although overall there was no significant difference between the two groups on CR, there was an interaction with the type of fluency task. Bilinguals and monolinguals performed similarly on semantic fluency; whilst bilinguals outperformed the monolinguals on letter fluency. The finding that there were no differences regarding CR between the vocabulary matched two groups is consistent with the findings observed in the literature Luo et al., 2010;Paap et al., 2017;Portocarrero, Burright & Donovick, 2007;Rosselli et al., 2000).
Our findings show that bilinguals perform better than monolinguals in the letter fluency task, which is thought to be more demanding on executive control. This is shown in the following key findings: 1) bilinguals demonstrated significantly smaller FDS than monolinguals, which have claimed to reflect superior executive control; 2) bilinguals demonstrated significantly longer Sub-RT with higher mean number of correct responses in the letter fluency and a flatter slope on letter fluency, which could be attributed to superior executive control. These findings suggest that our bilinguals demonstrate superior executive control abilities which are helping them to perform better (in terms of lower FDS, flatter slope) for a difficult fluency condition (i.e., letter fluency). As discussed in the introduction, longer Sub-RT can be either due to smaller vocabulary or superior executive control abilities of bilinguals compared to monolinguals (Luo et al., 2010). Luo et al. (2010) have postulated that the superior executive control would result in a slower decline in retrieval speed or longer Sub-RT for bilinguals in combination with a higher and or
equal number of CR and flatter slope than monolinguals. Since our groups were matched on vocabulary and we do not find any significant difference between the two groups on the initiation parameter (which is a measure of initial linguistic resources), it would be reasonable to conclude that the bilinguals' performance would be indicative of superior executive control (Friesen et al., 2015;Luo et al., 2010). Overall, equivalent performance on the vocabulary test, longer Sub-RT, and better performance on the letter fluency condition (higher CR, smaller FDS, flatter slope, and larger cluster size) for bilinguals compared to monolinguals suggest a bilingual advantage in the verbal fluency task when there is a higher demand for the controlled executive processing skills.
On the qualitative measures, we expected vocabulary-matched bilinguals to produce equal cluster size, which utilizes more of the linguistic components and a larger number of switches, which requires efficient executive control mechanism. However, we found that bilinguals produced a larger cluster size in the letter fluency condition. This could be due to a strategy to bolster their performance in letter fluency. Greater number of CR with larger cluster size in letter fluency in bilinguals could be a strategy that allowed them to sustain production in a more demanding condition. The lack of a difference in switching is surprising as a switching measure is supposed to tap into the executive control components of the verbal fluency task. We expected bilinguals to  switch more compared to monolinguals. However, no difference between the groups on switching indicates bilinguals may not use switching as a strategy to facilitate their performance in the verbal fluency task.
On the executive control measures, we found bilinguals outperformed monolinguals on the inhibitory control measure (smaller Percentage Stroop ratio), and mental set-shifting measure (smaller Percentage switch cost ratio and smaller switch cost accuracy). However, both groups performed similarly on the working memory measure (backward digit span). An advantage in inhibitory control for bilingual participants is in line with the literature (Bialystok et al., 2004;Bialystok et al., 2008;Emmorey et al., 2008). For the mental-set shifting task, we measured Percentage switch cost ratio (%) to account for the overall speed difference between the two group. We found bilinguals to show advantage in the mental set-shifting measure which is in line with the findings from the previous task switching measures in the literature (Prior & MacWhinney, 2010). However, we did not find any difference between the two groups on the most used dependent variable (switch cost in RT) in the task switching literature. No differences in the switch cost (RT) variable using the colour-shape switch task supports the findings by Paap and his colleagues (Paap & Greenberg, 2013;Paap et al., 2017). Similarly, having no differences between the two groups on working memory measures is in line with the literature Luo et al., 2010). Current findings showed that the difference between the two groups on executive control measures might depend on the type of task and the type of dependent variables derived from the task.
Previous studies have suggested the role of executive control measures, especially working memory and inhibitory control in verbal fluency (Luo et al., 2010, Shao et al., 2014. There exists only one study that has directly correlated the executive control measures (updating of working memory and inhibitory control) with verbal fluency measures in healthy monolingual adults (Shao et al., 2014). This is the first study that attempted to establish relationship amongst various executive control measures with measures of verbal fluency comparing bilingual and monolingual healthy adult populations. Results of our correlation analyses showed that verbal fluency slope correlated with inhibitory control (Percentage Stroop ratio) only for the bilingual group (Blumenfeld & Marian, 2011;Prior & Gollan, 2011; see Table 7 and Figure 3). These results support the notion that an executive control advantage helps bilinguals to outperform monolinguals in verbal fluency tasks, especially where executive control demands are higher.
Similar to Shao et al.'s (2014) study, we did not find any correlation between working memory and verbal fluency measures; neither did we find any significant correlation between the mental-set shifting measure and verbal fluency measures. As this was the first study to attempt to establish relationship amongst various executive control and verbal fluency measures, future studies should consider investigating different kinds of tasks within specific domains of executive control to reflect the presumed processes underpinning a verbal fluency task. These lines of research will provide greater insights into the relationship between linguistic and executive control processes during word production.
In conclusion, previous studies comparing healthy monolinguals and bilinguals on verbal fluency tasks have shown mixed results ranging from bilingual advantage Luo et al., 2010) to disadvantage (Gollan et al., 2002;Paap et al., 2017) to no differences (Paap et al., 2017). However, all these studies have relied ONLY on the number of correct responses as a dependent variable (except Luo et al., 2010). For example, Paap et al. (2017) did not find any difference between bilinguals and monolinguals on the difficult letter fluency condition. The results were inconsistent with the notion that bilinguals' enhanced executive control abilities help them to outperform monolinguals on the more demanding letter fluency condition. Paap et al. also refuted the claim that, compared to semantic fluency, letter fluency requires greater executive control functioning and suggested trying to support this claim by independent and direct tests of executive control abilities. Similarly, Whiteside, Kealey, Semla, Luu, Rice, Basso and Roper (2016) in an exploratory factor analysis study have argued that the contributions of linguistic processes are greater in verbal fluency compared to executive control processes. They found that the number of correct responses in the verbal fluency loaded onto the language factor and not the executive control factor.