A study on the executive functioning skills of Greek–English bilingual children – a nearest neighbour approach

Abstract Findings of bilingual participants outperforming their monolingual counterparts in executive functioning tasks have been repeatedly reported in the literature (Bialystok, 2017). However, uncontrolled factors or imperfectly matched samples might affect the reliability of these findings. This study aims to take into account a range of relevant variables in combination with innovative analyses to investigate the performance of one unstudied language group, Greek–English bilingual children in the north of England, compared to monolingual control groups. Our battery of executive function tasks taps into inhibition, updating and shifting. We use k-means nearest neighbour methods to match the groups and factor analysis to determine language proficiency. We find that bilinguals’ accuracy is on a par with their monolingual peers – however, they are faster in inhibition and working memory tasks. Our study provides strong evidence for the presence of a bilingual advantage in these domains, while making important methodological contributions to the field.


Introduction
Many recent studies have focused on childhood bilingualism and executive control, showing that bilingual children outperform their monolingual peers on executive functioning tasks (see Adesope, Lavin, Thompson & Ungerleider, 2010;Bialystok, 2017). This is considered as a 'bilingual advantage' in executive functions (Bialystok, 2001;Bialystok, Craik, Klein & Viswanathan, 2004;Bialystok, Craik & Ryan, 2006;Emmorey, Luk, Pyers & Bialystok, 2008) and has been observed in cognitive control tasks such as selective attention (Bialystok, 2001), cognitive flexibility (Poulin-Dubois, Blaye, Coutya & Bialystok, 2011) and working memory (WM) (Morales, Calvo & Bialystok, 2013). However, other studies have tended to show weaker or no effects of bilingualism (e.g., Valian, 2015). The executive functioning system is a domain-general cognitive system, vital for the flexibility and regulation of cognition and goal-directed behaviour (Best & Miller, 2010). It is referred to as the most crucial cognitive achievement in early childhood (Bialystok & Craik, 2010). Children gradually master the ability to control attention, inhibit distraction, monitor sets of stimuli, and shift between tasks, while their working memory develops. More specifically, SHIFTING involves shifting back and forth between multiple tasks, operations, or mental sets (Monsell, 1996, as cited in Miyake et al., 2000. UPDATING includes monitoring and coding task-relevant information and replacing any no longer relevant information held in WM with the new, more relevant information (Morris & Jones, 1990). Lastly, INHIBITION is the ability to knowingly inhibit dominant, automatic, or prepotent information (Miyake et al., 2000).
The advantage in executive functions associated with bilinguals is operationalised as superior performance by bilinguals in tasks that are thought to require executive processing, which is the ability to monitor goal-setting cues, to switch attention to goal-relevant sources of information, and to inhibit those that are irrelevant or competing (Bialystok, 2006;Bialystok, Craik & Luk, 2008;Costa, Hernández & Sebastián-Gallés, 2008). These advantages are thought to be linked to the management of multiple languages and to the continuous monitoring of the appropriate language for each communicative situation (Bialystok, 2009). More specifically, bilinguals need to select the right language for each circumstance, attend to cues in order to select the right language, select the suitable lexical set and at the same time suppress the interference of the other language/s. This process is thought to generate executive functioning advantages (Bialystok, 2017).
There have been several meta-analytic reviews regarding the cognitive outcomes of bilingualism (e.g., Adesope et al., 2010;Hilchey & Klein, 2011;Hilchey, Saint-Aubin & Klein, 2015;Lehtonen, Soveri, Laine, Järvenpää, de Bruin & Antfolk, 2018) reporting mixed results in adults. More specifically, Adesope et al. (2010) analysed data from 63 studies and found positive effects of bilingualism, including increased attention, working memory, metalinguistic awareness, and abstract and symbolic representation skills. However, there was high variability in terms of effect sizes. For inhibition, Hilchey and Klein (2011) found a global bilingual performance advantage, though insufficient evidence was provided for a bilingual effect in inhibition. Hilchey et al. (2015) in their re-analysis of the Hilchey and Klein (2011) study included more recent studies, and this time did not observe a global bilingual performance advantage.
Similar mixed findings are reported in studies examining the executive functioning skills of children. Overviews by Bialystok and colleagues (Bialystok, 2015;Bialystok, Craik & Luk, 2012) suggest that the bilingual advantage can be mostly observed in children and the elderly, possibly due to the fact that these two populations are not at the peak of their executive functioning skills as young adults are. Bialystok and colleagues agree with the idea that this advantage could be more general rather than linked to a specific executive domain such as inhibition (Bialystok, 2015;. However, large-scale studies are not in line with this suggestion in other official bilingual settings such as the Basque country and Wales, where limited or no evidence of a bilingual advantage has been found (Antón, Duñabeitia, Estévez, Hernández, Castillo, Fuentes, Davidson & Carreiras, 2014;Duñabeitia, Hernández, Antón, Macizo, Estévez, Fuentes & Carreiras, 2014;Gathercole, Thomas, Kennedy, Prys, Young, Viñas-Guasch, Roberts, Hughes & Jones, 2014).
Inappropriate controlling strategies may also play a role in whether a bilingual advantage is detected (Papastergiou, Pappas & Sanoudaki, 2021). Paap and Greenberg (2013) have highlighted the need to control for an extensive number of variables within this context. Based on the above, in the current study, we aim to answer if Greek-English bilingual children outperform two control groups of monolingual Greek and monolingual English children in executive functioning tasks tapping into inhibition, updating and shifting, by using a near-neighbour approach to control for a range of relevant variables.

Bilingual effect in children
Many studies have repeatedly reported a bilingual effect in executive functions. For example, Bialystok (1999) reports that bilingual children showed better attentional control involving shifting between different task criteria. This study investigated 30 English-Chinese bilingual and English monolingual children aged 3-5 years and 30 English-Chinese bilingual and English monolingual children aged 5-6 years using the Dimensional Change Card Sort (DCCS) task (Zelazo, Frye & Rapus, 1996). Results revealed that bilingual children gave more target responses compared to their monolingual counterparts, indicating higher levels of executive control, and suggesting that bilingualism aids the development of attentional control in task rule shifting. Similar findings were presented by Bialystok and Martin (2004). In another study, 24 bilingual and 24 monolingual 6-year-olds were comparable in identifying a simple shape hidden within drawings of complex objects in the Children's Embedded Figures Task, but the bilingual children were more able to change their interpretation of the two figures (e.g., the duck-rabbit) to acknowledge the other image in an ambiguous figures task (Bialystok & Shapero, 2005). Both tasks required perceptual analysis, but only the ambiguous figures task required inhibiting the original meaning of the stimulus.
In line with the above, Carlson and Meltzoff (2008) aimed to investigate whether there was an advantage in executive functioning, previously observed in other languages, in 6-year-old Spanish-English bilingual children attending second-language immersion and traditional kindergartens. The bilingual children showed an advantage in executive-function tasks that require inhibition of attention to conflicting response options but not in tasks requiring inhibition of a habitual response to a familiar stimulus. Extending this pattern to infants, Kovács and Mehler (2009) investigated 40 preverbal 7-month olds; 20 infants raised in bilingual homes (14 infants exposed to Italian-Slovenian, 2 to Italian-Spanish, 2 to Italian-English, 1 to Italian-Arabic, 1 to Italian-Danish) and 20 in monolingual Italian homes. The infants brought up in bilingual homes were better able to switch responses after a change in the requirements of the task compared to their monolingual counterparts.
Additionally, Yang, Yang, and Lust (2011), in order to separate language effects and cultural effects, compared 15 Korean-American bilinguals, 13 Korean American (English-speaking) monolinguals, Korean monolinguals, and non-Korean-American (English-speaking) monolinguals, five years of age. Overall, the bilingual group was faster and more accurate compared to the monolinguals on all conditions of the Attentional Network Task (ANT), suggesting a bilingual advantage.
Finally, Poarch and van Hell (2012) found benefits of trilingualism on the Simon task and a bi-and trilingual advantage for the ANT. They investigated four groups of children 5-8 years of age using the Simon task: i) German-speaking monolingual children, ii) German speakers who were learning English as a second language (L2) in school (second language learners), iii) German-English bilingual children and trilinguals for whom either German or English was a native language along with a different language, and who were learning German or English or both at school. Findings for the Simon task provided evidence of a trilingual advantage compared to monolinguals and a strong trend towards a benefit for bilinguals compared to monolinguals. Bilinguals and trilinguals did not differ, nor did any other pairs. The L2 learners, the bilingual children and the trilingual children only took part in the ANT, six to eight months after the Simon task (Poarch & van Hell, 2012). Results showed no significant difference between bilingual and trilingual children; however, they both outperformed the L2 learners with regards to incongruent trials. There was no significant difference in response times across all children, irrespective of language status.
Large scale studies have tended to show weaker or no effects compared to smaller sample studies (Valian, 2015). For example, two recent large-scale studies, presented below, did not report any effects of bilingualism. More specifically, Antón et al. (2014) compared 360 bilingual Spanish and Basque children to Spanish monolingual children on the ANT. The researchers divided the children into three groups; i) children in 2 nd and 3 rd grade, ii) children in 4 th and 5 th grade, and iii) children in 6 th and 7 th grade. The first language of the bilingual children was Spanish and based on parental report the children were more fluent in Spanish compared to Basque. In addition, the bilingual children attended bilingual schools where Spanish and Basque were equally used as the languages of instruction. Their monolingual peers attended monolingual Spanish schools and they did not differ in age, reading and arithmetic skills, non-verbal IQ, and socioeconomic status (SES) compared to the bilinguals. No differences were found between the monolingual and bilingual groups. In their discussion, the authors noted that the absence of a bilingual advantage might be a result of uncontrolled factors, such as conditions associated with design and procedure.
In line with the above findings, Duñabeitia et al. (2014) used a non-verbal and a verbal Stroop task in the Spanish language to compare 504 monolingual Spanish and bilingual Spanish-Basque children. The children were enrolled in the 3 rd to 8 th grade. The findings suggested that the participants did show a cost of incongruence; however, the two groups of participants had similar performances. Additionally, the distribution of reaction times, overall reaction times and error rates were parallel for both bilinguals and their control group. Finally, in the regression analyses there was no effect of language status, teachers' judgments of children's reading, arithmetic, or attention skills, or IQ scores. In their discussion, the authors stated that they covered factors such as age, scores from teachers regarding reading, mathematics, and attention, general IQ test, and SES. Therefore, their groups differed only in linguistic profile; more specifically one group of children was immersed in bilingual (academic) context and the second consisted of purely monolingual children. No evidence of a bilingual advantage was observed (see also Paap & Greenberg, 2013).
Similar to the above study, Goldman, Negen, and Sarnecka (2014) recruited 32 English monolingual children and compared them to 40 bilingual children who were exposed to two languages other than English at home and to 20 bilingual children who were exposed to one extra language in addition to English. The children took part in a numerical discrimination task, tapping inhibitory control. The findings revealed no differences between the groups. In line with the above results, Kapa and Colombo (2013) found no group differences using the Flanker task with early and late Spanish-English bilingual children as well as their English monolingual control group aged 6-15 years.
Additionally, mixed results were presented by Poulin-Dubois et al. (2011). In this study, a partial bilingual advantage was observed in the shape Stroop task, a conflict task, one of the five tasks (two delay and three conflict tasks) used to measure executive functions in 33 bilingual and 30 monolingual two-year-olds. This suggested that a bilingual advantage in executive functions is first expressed in conflict inhibition. A bilingual effect was not found in the other two conflict tasks, possibly due to increased demands of those tasks or to them requiring both inhibitory control and working memory. An advantage in inhibitory control was found in simultaneous 7-month-old bilinguals when readily supressing the previously learned response and updating their predictions according to the changing requirements of the task, compared to monolinguals (Kovács & Mehler, 2009). Advantages in other executive functions were observed in slightly older children, 3-4 ½ years of age (Bialystok, Barac, Blaye & Poulin-Dubois, 2010) suggesting that it might be possible that more language experience is necessary to observe a bilingual advantage in switch-tasks due to the fact that the experience of infants has been primarily in receptive language rather than expressive language.

Possible reasons underlying contradictory findings
As shown in the previous sections, while there is a large body of research showing bilingual advantages (see Valian, 2015 for an overview), the field has not reached a consensus due to inconsistent findings. Several factors have been found to be relevant to this bilingual effect in executive functions. Some studies show bilingual advantages in particular tasks, conditions of those, or in measures such as accuracy or reaction times, but not both (Valian, 2015). Results seem dependent on types of stimuli (e.g., verbal-nonverbal; Moreno-Stokoe & Damian, 2020). Also, the participants might get different amounts of physical exercise or might have had some other beneficial experience (e.g., musical training; Valian, 2015), or differ in terms of SES. Another, very important factor is the actual definition of bilingualism and how this is determined in each study. Bilinguals might differ in many aspects related to age of acquisition, language use, proficiency in each language, medium of education, bilingual experiences, culture (e.g., Adesope et al., 2010;Antoniou, Grohmann, Kambanaros & Katsos, 2016;Carlson & Meltzoff, 2008;Paap, Johnson & Sawi, 2016). Finally, De Bruin, Treccani, and Della Sala (2015) found a publication bias to report a bilingual effect.

SES
Bilinguals might differ from monolinguals or other bilingual participants in socioeconomic factors, such as education, immigrant status and profession (Paap, Johnson & Sawi, 2015). The observed correlation between SES and executive functions may be due to the link of SES with the provision of emotional and academic resources in childhood (Linver, Brooks-Gunn & Kohen, 2002). Morton and Harper (2007) argued that previous studies did not appropriately match participants on SES, with the consequence that higher-SES children were being compared with monolingual children from low socioeconomic backgrounds. Some studies matching language groups on SES report a bilingual effect. For example, Engel de Abreu, Cruz-Santos, Tourinho, Martin, and Bialystok (2012) compared 40 Portuguese-Luxembourgish bilinguals and 40 Portuguese monolinguals from low-income immigrant families using flanker interference tasks. In line with Bialystok (1991, 2001, 2009), Engel de Abreu et al. (2012 found that regardless of the low-income background, this continuous use of executive functioning skills to resolve language conflict strengthened these processes in bilinguals. The results suggest that the higher the control demand of the task, the more likely it is that a bilingual effect will emerge. Similarly, Calvo and Bialystok (2014) divided children from eight public schools into four groups which were: i) working-class monolinguals (n = 20), ii) working-class bilinguals (n = 44), iii) middle-class monolinguals (n = 46), iv) middle-class bilinguals (n = 65) based on questionnaire data on SES and on language status. The children spoke English at school and another language at home. The tasks included an intelligence test, language tests, a working memory task and a flanker task (Calvo & Bialystok, 2014). Middle-class children outperformed working-class children on all measures, and monolingual children outperformed bilingual children on language tests. Bilingual children scored higher than monolingual children on the executive functioning tasks.
Other studies closely matching bilingual and monolingual participants on SES found no bilingual advantage Morton & Harper, 2007;Noble, Norman & Farah, 2005;Paap et al., 2015). Namazi and Thordardottir (2010) suggested that the way in which bilingualism is defined might vary across studies making
them difficult to compare. Other factors that might yield different findings might be the language background of the participants, including language exposure and language use, language of schooling, and proficiency in both languages (e.g., Bialystok & Barac, 2012;Crespo, Gross & Kaushanskaya, 2019;Iluz-Cohen & Armon-Lotem, 2013;Kubota, Chevalier & Sorace, 2020;Kuzyk, Friend, Severdija, Zesiger & Poulin-Dubois, 2020). Language exposure and language use can be linked to the frequency of input and output a child might receive and produce (number of hours in a day, percentage of use of language and in which context). It has been shown that reduction in exposure to the L2 contributed to smaller improvement in monitoring and updating abilitieshowever, it did not affect the inhibition domain (Kubota et al., 2020).
In terms of language of schooling, Purić, Vuksanović, and Chondrogianni, (2017) compared Serbian children in Year 2 attending a high exposure L2 immersion program (about 5 hours of daily exposure for one year), a low exposure immersion program (about 1.5 hours of daily exposure for one year), and a monolingual control group. The high exposure group outperformed the other two groups in working memory tasks, but there were no group differences for the inhibition and shifting domain. Similarly, initial findings of a recent pilot study based in Wales suggest that children receiving minimal exposure to Welsh for a year are faster than their English monolingual counterparts in a backwards digit recall task tapping on working memory (Papastergiou, Sanoudaki & Collins, 2019). Based on Purić et al. (2017), working memory (updating) may be specifically linked to these early stages of intensive L2 learning.
Biliteracy and attending a bilingual educational setting have also been found to affect performance in cognitive tasks, such as updating and verbal working memory tasks (e.g., Andreou, Dosi, Papadopoulou & Tsimpli, 2020;Dosi & Papadopoulou, 2019;Dosi, Papadopoulou & Tsimpli, 2016). For example, Andreou et al. (2020) find that good levels of biliteracy, established by a bilingual educational setting that equally supports both languages, positively affect linguistic and cognitive skills.
Language proficiency has also been linked to executive functions. Iluz-Cohen and Armon-Lotem (2013) investigated the effect of language proficiency on executive functioning skills. They found that there is a positive relationship between language proficiency and inhibition and shifting abilities, with significantly lower performance among low language proficiency bilinguals. However, Kubota et al. (2020) found that proficiency did not affect the development of executive functioning skills in childhood. Gathercole et al. (2014) propose that it might not be a coincidence that fluent bilinguals within bilingual communities such as Welsh-English bilinguals (Gathercole et al., 2014) and Basque-Spanish bilinguals Duñabeitia et al., 2014) showed either no or mixed bilingual effects. These bilinguals are brought up with both languages as part of everyday life in their respective bilingual communities in Wales and the Basque country. It has been suggested by Lam and Dijkstra (2010) that these populations have strong between-language links and a great automaticity of the linguistic knowledge in both languages. As a result, the daily switch between both languages might not require the same cognitive effort and control, consequently not leading to bilingual effects in executive functions. However, other studies including participants speaking minority languages within bilingual communities (e.g., Sardinian and Italian; Garraffa, Beveridge & Sorace, 2015) do show advantages, but in most cases only one test was used to tap one executive function, thus not allowing to extrapolate general theoretical implications.

Publication bias
Finally, a study by De Bruin et al. (2015) examined abstracts from conferences between 1999 and 2012. The authors observed that studies which reported a full bilingual advantage in executive control were most likely to be published, followed by those either supporting or challenging this bilingual advantage. In contrast, those that found no bilingual advantage were the least likely to be published. This did not have any relation to differences in sample size, tests used, or statistical power, thus suggesting the existence of a publication bias. This is in line with Paap et al. (2015), who raised the concern that the literature based on executive control in bilinguals may be influenced by this bias to report a bilingual advantage. As a result, many studies that have not found evidence suggesting a bilingual advantage might have not reached publication and their hypotheses and methodologies have not enhanced our knowledge on executive functioning.

Current study
It is evident from the previous section that matching bilinguals with a monolingual control group/s has proven challenging, especially due to the variability within bilingual groups. Despite numerous studies investigating the cognitive effects of bilingualism, it is still not clearly understood which factors influence executive functioning and in what way. In the current study, we aim to control for relevant variables using innovative analyses in order to investigate the performance in executive functioning tasks of one unstudied language group of Greek-English bilingual children in the north of England. Our battery of executive function tasks taps into inhibition, updating and shifting, as operationalised by Miyake et al. (2000).
Bearing in mind previous studies on bilingualism and executive functions, we compare our Greek-English bilingual group to two monolingual control groups from both language backgrounds; namely, a control group of monolingual Greek-speaking children and a control group of monolingual English-speaking children. To the best of our knowledge, one more study has controlled for both languages of the bilingual groups of children (Torregrossa, Andreou, Bongartz & Tsimpli, 2021). Similarly, in our study we control for both languages, Greek and English, using factor analysis (Antoniou et al., 2016) to take as many variables as possible into consideration, such as language proficiency, language use and standardised vocabulary and grammar tasks. The group of bilingual children taking part in the current study attend a Greek complementary language school, a group not studied before in the U.K. for their executive functioning skills linked to language. The majority of these children are predominately exposed to Greek in the household and English at school (also see .
In combination with this, we use innovative analyses to control for as many variables as possible, a challenging issue in the study of bilinguals, and more specifically bilingual children. As a result, we aim to inform the debate and models of executive functions in relationship to bilingualism. More specifically, we aim to answer the following research question: Do Greek-English bilingual children outperform two control groups of monolingual Greek children and monolingual English children in executive functioning tasks tapping into inhibition, updating and shifting, when closely matched on recently identified relevant variables?

Participants
Nineteen Greek-English bilingual children, 15 Greek monolingual children and 25 English monolingual children, aged 63-108 months took part in this study. Details of the groups are presented in Table 1. The bilingual children were competent in both Greek and English languages to varying degrees. The Greek-English bilingual children lived in England and were recruited if at least one of their parents used Greek with them. The mean age of acquisition was 7 months (SD = 1 year and 3 months) for Greek and 2 years and 1 month (SD = 1 years and 9 months) for English. Four children had one English speaking and one Greek speaking parent and 15 children had only Greek speaking parents. We have excluded any trilingual participants. A further three children took part but were subsequently excluded because they did not meet the language criteria (they were exposed to a third language). Also, children's scores were included in the analysis if their nonverbal intelligence score was within normal range (over 80; Kaufman & Kaufman, 2004). In this case, all children had standardised scores over 80 (M = 100.77, SD = 14.44). Children included had limited or no musical training. Based on parental and teacher reports the children did not have any hearing, behavioural, emotional, or mental impairment.
Bilingual Greek-English children were recruited from a Greek supplementary school in the north-west of England. The school offers a Greek-speaking supplementary program for 2.5 to 3.5 hours a week to enhance the reading, listening, speaking and writing skills in the Greek language and to offer knowledge around Greek culture. This programme is supplementary to the mainstream English education that these children attended. Eight of the bilingual children were born in Greece and had lived in England for at least two years at the time of the study, while the remaining bilingual children were born in England. The English monolingual control group was recruited from an infant school in the north-west of England and all the children were born in England. The Greek monolingual control group participated in Greece and all children in this group were born in Greece. Note. M = Mean, SD = Standard Deviation, Age = participants' age in months, PWFT = Greek expressive vocabulary score, Adapted PPVT = Greek receptive vocabulary score, CELF-4 = English expressive vocabulary score, BPVS3 = English receptive vocabulary score, K-BIT-2 = non-verbal intelligence standardised score, DVIQ = Greek receptive grammar score, Trog-2 = English receptive grammar score, Language Use = Percentage of language use with 0% being only English and 100% being only Greek (For English monolingual group 100% being language other than English), SES = the average percentage of mother and father education.
Ethical approval was granted by the University's Research Ethics Committee. Information sheets were sent to the head teachers and to parents before the study began in order to obtain informed consent. Teachers, parents, and children were provided enough time to ask any questions about the nature of the study. Parents and children were informed that they could withdraw at any time and were subsequently debriefed after the study.

Materials
Parental questionnaire The children's language experience was investigated through the Language and Social Background Questionnaire for Children (LSBQ; Luk & Bialystok, 2013). The LSBQ was forward and backward translated in Greek and was completed by the parents in their most convenient language (Greek or English) 1 . It consisted of information about the child's age, sex, country of birth, and age of acquisition of each language. Children's SES was measured as the mean of the highest attained educational level of both parents rated on an 8-point scale, which was then converted into percentages (questions 12 and 13). Parental education is the most commonly used index of SES, is highly predictive of other SES indicators (e.g., income, occupation), and is a better predictor of cognitive performance than other SES indicators (see Calvo & Bialystok, 2014).
In Section B, the child's speaking and understanding in Greek, English, or another language was rated by the parent on a 5-point scale ranging from Poor to Excellent (questions 14 and 15). A Greek proficiency parental score was derived from both scores for speaking and understanding in Greek and was included in the analysis. Similarly, both scores for speaking and understanding in English was used as the English proficiency parental score included in the analysis. Section B also included four questions about exposure to Greek and English educational settings, four questions about language acquisition and age of onset, and one question about experience with any musical instrument. Section C of the questionnaire included questions about general language use throughout the child's lifetime with parents, siblings, grandparents, neighbours, friends, and caregivers in various situations was measured on a 7-point scale ranging from 1 (only English) to 7 (only Greek/or other language).

Non-verbal intelligence
Non-verbal intelligence was assessed using the Kaufman Brief Intelligence Test, Second Edition (K-BIT-2; Kaufman & Kaufman, 2004). The test consists of 46 items including a series of abstract images, such as designs and symbols, and visual stimuli, such as pictures of people and objects. Participants are required to understand the relationships between the presented stimuli and complete visual analogies by either pointing to the answer or saying which letter it corresponds to. All items include an option of at least five answers thus reducing chance guessing. The Matrices non-verbal subtest is individually administered, and standardised scores were calculated for the purposes of the screening, while raw scores were used in the analyses.

Language measures
To assess the proficiency of the bilingual children in their languages, receptive and expressive vocabulary measures in each language were administered along with receptive grammar assessments. Raw scores converted to percentages were used in the analysis.

English language measures
The British Picture Vocabulary Scale, Third Edition (BPVS3; Dunn & Dunn, 2009) was used to assess the receptive vocabulary of the bilingual and monolingual children in the English language. It is an individually administered, standardised test of Standard English receptive vocabulary for children ranging from 3 years to 16 years and 11 months. In this task, children are asked to select, out of four coloured items in a 2 by 2 matrix, the picture that best corresponds to an English word read out by the researcher. The assessment consists of 14 sets of 12 words of increasing difficulty (e.g., ball, island, fictional). The administration is discontinued when a minimum of eight errors is produced in a single set.
The Clinical Evaluation of Language Fundamentals, Fourth UK Edition -CELF-4UK (Semel, Wiig & Secord, 2006) is an individually administered standardised language measure which is used for the comprehensive assessment of a student's language skills by combining core subtests with supplementary subtests. The expressive vocabulary subtest was used here to assess the participants' expressive vocabulary in the English Language. This measure is designed for children and adolescents ranging from 5 to 16 years of age. Expressive vocabulary was screened through the Expressive Vocabulary subtest for children. Children were asked to look at a picture and name what they see or what is happening in each picture (e.g., a picture of a girl drawing, the child should give the targeted response 'colouring' or 'drawing' to score 2 points or the response 'doing homework' to score 1 point). The administration is discontinued after seven consecutive zero scores.
The Test for Reception of Grammar -Version 2 (TROG-2; Bishop, 2003) was used to assess receptive grammar. It is an individually administered standardised test for children and adults and it comprises 80 items of increasing difficulty with four picture choices. Children are asked to select the item that corresponds to the target sentence read out by the researcher. For each grammatical element there is a block of four target sentences. A block is considered to be failed unless all four items of each block are established by the child. The sentences include simple vocabulary of nouns, verbs and adjectives. If a child fails five consecutive blocks the administration is terminated.
Greek language measures A standard Modern Greek version of the Peabody Picture Vocabulary Task (PPVT; Dunn & Dunn, 1981) was adapted and used based on the Greek adaptation by Simos, Sideridis, Protopapas and Mouzaki (2011). The children clicked on the image, out of four possible choices, that best corresponded to the target word they heard, such as nouns, verbs, or adjectives. There were 173 items of increasing difficulty. If eight incorrect responses were provided to ten consecutive items, then the task was stopped. The answers were scored as correct (1) or incorrect (0).
The Picture Word Finding Test (PWFT; Vogindroukas, Protopapas & Sideridis, 2009a) is an individually administered standardised measure used to assess standard Modern Greek expressive vocabulary. It is a tool norm-referenced for Greek adapted from the English Word Finding Vocabulary Test -4 th Edition (Renfrew, 1995). The children are presented with 50 black and white images consisting of nouns in developmental order. The words included originate from objects, categories of 1 Questionnaire in Greek can be accessed here: https://drive.google.com/file/d/ 1fxvoVhE6JwJApSJqTtn5aXd2HQr2weO5/view?usp=sharing objects, television programs and fairy-tales very familiar to children. A score sheet is used to record the responses provided during testing which are later scored as correct (1) or incorrect (0). The children are asked to name the objects they saw and when they are ready, they move to the following one. The assessment is discontinued after five consecutive wrong replies.
The Developmental Verbal Intelligence Quotient (DVIQ; Stavrakaki & Tsimpli, 2000) was used to assess standard Modern Greek receptive grammar. It consists of five subtests used to measure children's language abilities in expressive vocabulary, understanding metalinguistic concepts, comprehension and production of morphosyntax, and sentence repetition. For this study, only the subtest measuring comprehension of morphosyntax (e.g., two/three elements, negative, passive voice, comparative) was used for both Greek monolingual and Greek-English bilingual children and it was administered individually. Each child was given a booklet with 31 pages, each including 3 images. The researcher read out a sentence and each child was asked to point to the picture that best represented the situation in the sentence. For example, this might have been "το ψηλότερο δέντρο" (the tallest tree) and the correct answer was the picture of the tallest tree out of three trees. An answer sheet was used to record the child's answers (as A, B, or C) during testing which were later scored as correct (1) or incorrect (0).

Cognitive measures
All tasks were administered on a 15.6-inch laptop screen using the experimental software E-Prime 2.0 (Schneider, Eschman & Zuccolotto, 2002). Accuracy and reaction times (RTs) were calculated automatically through E-Prime.

Attention task
The ATTENTIONAL NETWORK TASK (ANT; Fan, McCandliss, Sommer, Raz & Posner, 2002) was used to evaluate three different attentional networks: i) alerting; ii) orienting, and iii) executive control (Posner & Petersen, 1990). Similar to the flanker task, participants were asked to indicate the direction (left or right) that the target stimulus (centre fish here) pointed to. The child's distance between his/her head and the centre of the screen was approximately 50 cm. The child's task was to press either the right or left key button on the mouse (with the right or left index finger) corresponding to the direction in which the middle fish was swimming. The child was presented with a training block of 16 trials and 128 trials distributed in four experimental blocks. There were breaks in between the four experimental blocks. The task's length was approximately 20 minutes. Auditory feedback was offered to the child during both the training and experimental blocks.
Working memory tasks The first working memory task was a COUNTING RECALL TASK, which was an adaptation of the Automated Working Memory Assessment (Alloway, 2007). The children were presented with a varying number, between four and seven, of red circles and blue triangles on the laptop screen. The children were asked to count and memorise the number of red circles in each block of trials. During the recall phase the children typed the number of red circles in each trial of that block. The number of trials increased in each block, reaching seven numbers. If the child failed to correctly recall three trials in a block the task stopped. The second working memory task was a BACKWARD DIGIT SPAN TASK (BDST) and it was adapted from Huizinga, Dolan, and van der Molen (2006). The children began with two training trials in order to understand the task and were instructed to type the reverse order of the numbers presented. For example, if a child heard the number 7 and 4 they should type 4 and 7. The sequence begins with four trials of two numbers gradually reaching eight numbers. Similar to the above task, if the child failed to correctly recall three trials in a block the task stopped.

Inhibition task
The NONVERBAL STROOP TASK was adapted from Lukács, Ladányi, Fazekas, and Kemény (2016) and the stimuli consisted of arrows pointing upwards, downwards, left and right. Three experimental blocks of 60 trials each were presented to the children. The aim was to select the direction that the arrows indicated regardless of their position on the screen. The children used the arrow buttons on the laptop's keyboard. The task began with the control block, where arrows were presented in the middle of the screen. In the second block, which was the congruent block, the direction of the arrows matched their position on the screen (e.g., an arrow indicating upwards was presented at the top of the screen). Finally, the third experimental block was the incongruent block.
Here the direction of the arrows was the opposite compared to their position on the screen (e.g., an arrow indicating upwards was presented at the bottom of the screen).
For accuracy measures, the number of correct answers for the incongruent items was subtracted from the number of correct answers for the congruent items. The difference in RT for congruent and incongruent trials represents the inhibition cost.

Shifting task
All children were also administered one shifting task, the COLOUR-SHAPE TASK (Purić et al., 2017). This task included three blocks each, where children were presented with two shapes (triangle, circle) coloured either red or blue. The same buttons, one for the left hand and one for the right, corresponded to one of the choices (circle-triangle, red-blue). In the first two experimental blocks, the children's task was to either recognise the shape of the stimulus and ignore their colour or the reverse. The shape stimuli were presented in the top half and the colour stimuli in the bottom half of the screen. In the third block children were required to alternate between identifying colour and shape depending on the object's location on the screen. Cues directing the participant to the relevant dimension are presented simultaneously with the stimuli on all trials, in all blocks. The first two blocks contained 32 trials each, while the third block contained 64. The number of shifting and non-shifting sequences within the third block was balanced. The difference in RT for the first two (non-shifting) and the third (shifting) block represents the shifting cost.

Procedure
The children were tested individually in a quiet school classroom setting, during one session in Greek for the Greek monolingual children and one session in English for the English monolingual children that lasted 40 minutes on average. The bilingual children were tested in two separate sessions; the English language session was conducted within one month of the Greek language session. The researcher informed the children that they would play some games. Parents were administered the questionnaire (LSBQ) and returned it to the classroom teacher, the school head teacher, or directly to the researcher.

Greek session
The bilingual participants began with the Greek language session. Each child completed the tasks in the following order: i) Greek adapted PPVT, ii) ANT, iii) Picture Word Finding Test, iv) Colour shape task, v) Nonverbal Stroop task, and vi) DVIQ. A pilot study was conducted with four children before the actual data collection. As a result of the pilot study, the choice of the above fixed order of tasks was such so the children did not feel tired or uninterested.

English session
The second session for the bilingual participants was the English session. Each child completed the tasks in the following order: i) K-BIT-2, ii) BDST, iii) BPVS, iv) counting recall task, v) CELF-4, and vi) TROG-2. After the end of each session the researcher thanked the child for their participation. All children participated enthusiastically.

Outlier analysis
Response accuracy and RTs were recorded for all the executive function tests. All RTs shorter than 200 ms and all RTs for incorrect trials were excluded from the analysis; thus, only analyzing RTs from correct responses (e.g., Purić et al., 2017). Furthermore, in order to prevent extreme RTs from influencing participants' mean scores, we established ±3 standard deviation values both between and within participants. Every value that surpassed ±3 standard deviations away from the mean RT was substituted by the established lower and upper bound RTs (see also, Miyake et al., 2000). The inhibition cost for the nonverbal Stroop task was calculated as the difference between congruent and incongruent mean RTs. Local shifting costs (LSC) were calculated in the third block as the difference between the average RT for the shift trials and the average RT for the non-shift trials. General shifting costs (GSC) were calculated as the difference between the average RT for the third block and average RT for the first and second block together.

Factor analysis
In order to reduce the number of control variables included in the analysis, Greek and English language measures together with the proficiency scores from the parental questionnaires were submitted to a factor analysis. The analysis was conducted between the two groups of Greek-English bilinguals and Greek monolinguals and between the two groups of Greek-English bilinguals and English monolinguals. For the Greek-English bilinguals and Greek monolinguals the following four independent measures were entered into the analysis: PWFT, DVIQ, adapted PPVT, Greek proficiency parental score. For the Greek-English bilinguals and English monolinguals the following four independent measures were entered into the analysis: BPVS3, TROG-2, CELF-4, English proficiency parental score.
A Maximum Likelihood factor method was applied to the four variables for each of the two cases. Based on the analysis it was observed that participants' scores in the PWFT, DVIQ, adapted PPVT, Greek proficiency score (based on the parental report) and the BPVS3, TROG-2, CELF-4, English proficiency score (based on the parental report) clustered on one component, which represented the proficiency in each language. The analysis showed that the Greek proficiency factor explained 71.27% of the variance and the English proficiency factor 55.31% of the variance. Tables 2 and 3 summarise the Maximum Likelihood results. Table 4 indicates the correlations between the control background variables.

Matching method
For the analysis of the data we applied k:1 nearest neighbour matching (Rubin, 1973). The idea behind matching methods is to compare the outcomes (Y) of subjects that are as similar as possible to a number of covariates (X), with the sole exception of the treatment status. In our case, we would like to compare the executive function accuracy and response time of a monolingual with those of a bilingual child as long as they have similar values in other background scores; namely, the Age in months, Sex, K-BIT-2, SES, English proficiency factor, Greek proficiency factor. Only then can we be sure that any difference in the outcome variable is a consequence of the action rather than of the correlation between a test and the outcome.
For a single covariate, like the PWFT, identifying a pair of comparable children is simple. Adding a second covariate that is binary (e.g., Sex) or categorical (e.g., SES) would require more effort on our behalf and a larger dataset. However, if we want to consider more covariates, particularly if they are continuous (e.g., K-BIT-2), then finding matches becomes a daunting task. To circumvent this problem, a similarity measure or similarity index may be constructed, which quantifies how close two observations (i.e., scores from two children) are. Two wellestablished methods are the k-means nearest neighbour matching and the propensity score matching.
The k-means nearest neighbour matching calculates the "distance" between pairs of observations with regard to a set of  covariates (X's) and then "matching" each subject to comparable observations that are closest to it. For example, suppose that a bilingual participant has a PWFT score of 65.7 and we also have information on two monolingual childrenmonolingual A and Bwhere A has a PWFT score of 55.3 and B of 64.1. Naturally, monolingual B represents a closer match to the bilingual, and B would therefore be selected by the k-means nearest neighbour matching. In this case, the distance is simply d = |65.7 − 64.1| = 1.6, which is also known as the Eucleidian distance. If more than one variable is used to match, then the distance statistic that is used is the Mahalanobis, which takes into account the correlation between the covariates and the fact that they may be measured on different scales. The k-means nearest neighbour matching does not use a formal model for either the outcome or the treatment status and this makes it very flexible. However, when matching on more than one continuous covariate, the k-means nearest neighbour estimator must be augmented with a bias-correction term (Abadie & Imbens, 2006, 2011 The k-means nearest neighbour matching relies on some distance function. For example, initially assume a single covariatethe PWFT score. In the general form we can denote this variable as x. Then the distance between two individuals i, j where the i individual is bilingual and the j individual is not can be given as We can generalise this formula for when we have p number of covariates using matrix algebra. Assume that x = {x 1 , x 2 , …, x p } and that each individual, i, has the following set of covariates x i = {x 1,i , x 2,i , . . . , x p,i }. The distance between individuals i, j is now given as: where S is the variance-covariance matrix of the covariates.
Coming back to observation i, we can define the following set of nearest-neighbor index where i is the observation (i.e., the participant) who is bilingual and for whom we want to find a matching monolingual. j denotes the matching monolingual (only one in this case) and l denotes another monolingual candidate. t denotes the treatment effect and takes the value of 1 for bilinguals, zero otherwise. ||x i − x j || and ||x i − x l || denote the distance between i, j and i, l respectively and in the formula above we require that the distance between i, j is smaller than i, l (since we select the matching j participant as our match). The notation t j = 1 − t i and t l = 1 − t i implies that our i participant who is bilingual (hence t i = 1) needs to be matched with some monolingual participant for whom t j = 1 − 1 = 0 or t l = 1 − 1 = 0 The above can be generalised for m matching participants The structure of S depends on our initial assumption and can be one of Euclidean, Mahalanobis or inverse variance. Formally where 1 n is an n × 1 vector of ones, I p is the identity matrix of order p, same as the number of covariates used. w i is the frequency weight for the i observation, x = n i w i x i / n i w i which denotes a weighted mean and W is an n × n diagonal matrix containing the frequency weights. For the prediction of the potential outcomes we use the following: y 1,i is the potential outcome of the i individual that has received the treatment or in our case is bilingual (t = 1). Conversely, y 0,i is the potential outcome of the i individual that has not received the treatment or in our case is monolingual (t = 0). As we have discussed, the problem posed by the potential-outcome model is that only y 1,i or y 0,i is observed, never both. The k-means nearest neighbours can predict the potential outcome for the i observation as follows: The first is the case where the outcome of the individual (y i ) is observed whether he is bilingual (t = 1) or monolingual (t = 0). The second case is the counterfactual outcome which does not exist and is estimated as the outcome of the closest match (or matches).
Once the above are estimated we can define the following quantities of interest; namely, the Average Treatment Effect (ATE) and the Average Treatment Effect on the Treated (ATET). These are defined as and obviously as y 1,i and y 0,i are realisations of the y 1 and y 0 random variables respectively, y 1 is the average of all y 1,i and the equivalent holds for y 0

Analyses of executive functions
Tables 5 and 6 report descriptive statistics for the accuracy and RT measures from each executive function task for each group. In the case of accuracy in the two working memory tasks (BDST and Counting Recall tasks) a higher score indicates better performance, whereas for the RT a lower score indicates better performance. Similarly, for the accuracy in attention, switching and inhibition tasks (ANT, Arrow Stroop & Colour-Shape tasks) a higher score indicates better performance, whereas a lower RT score indicates better performance. We performed comparisons between the three groups of children. Table 7 and Table 8 show the results of the monolingual and bilingual groups on the attention and working memory tasks.

Comparison 1
The first comparison was between the bilingual group and the Greek monolingual group. Participants were matched via nearest neighbour matching as described above. The matching variables were Age in months, Sex, K-BIT-2, SES, Greek proficiency factor. There were no differences between the bilingual group and the Greek monolingual group based on RTs on the Arrow Stroop.
No group difference was found for the inhibition accuracy scores. Similarly, no significant group differences were found for the remaining tasks, where the groups performed comparably (see Table 7 for p-values).

Comparison 2
The second comparison was between the bilingual group and the English monolingual group. Nearest neighbour matching was again applied to match participants for the same matching variables; namely, Age in months, Sex, K-BIT-2, SES, English proficiency factor. The differences between the groups based on RTs emerged on the inhibition task; namely, Arrow Stroop, where the bilingual group was faster compared to the monolingual group. In addition, there was a significant Stroop effect ( b = 139.728, p = .033). However, no group difference was found for the inhibition accuracy scores. A significant group difference was also found for the BDST where the bilinguals were faster compared to their monolingual counterparts ( b = -1021.77, p < .001). In the remaining tasks, the groups performed comparably (see Table 8 for p-values).

Discussion
The present study investigated differences in the executive functioning skills of Greek-English bilingual children compared to two groups of Greek and English monolingual children. We investigated the executive functioning scores using a battery of tests assessing inhibition, shifting, and updating, and matching closely for language proficiency, SES, language use, vocabulary and grammar scores, and non-verbal intelligence. Our aim was to see if the Greek-English bilingual children would outperform their monolingual counterparts in line with multiple previous findings (see Bialystok, 2017), once a large number of potentially confounding variables was controlled for using innovative analyses, and therefore to contribute methodologically to the debate on whether a bilingual advantage exists and/or how reliable it is.
To achieve this, the bilingual children were compared to two closely matched monolingual control groups, one consisting of Greek monolinguals and the other of English monolinguals. We Note. GSC = Global shifting cost; LSC = Local shifting cost; Inhibition cost = The difference RT for congruent and incongruent trials; BDST = Backward digit span task; Count Recall = Counting recall task; cong = congruent trial; incong = incongruent trials.  used a factor analysis on four indicators of language proficiency to reveal one factor that we interpreted as proficiency in English and Greek and closely matched the participants using the k-means nearest neighbour matching. This close matching gives us greater confidence in the results taking into consideration a large number of relevant variables. The results showed that Greek-English bilinguals were faster than the English monolinguals in two executive function measures in terms of RTnamely, i) the bilingual children were faster in the incongruent inhibition trials and demonstrated a lower inhibition switch cost, in the inhibition task (Stroop); ii) the bilinguals were faster than the English monolinguals, in the backward WM digit span (BDST). In all the other executive function measures the bilingual children were comparable to the English monolingual children. The bilingual children showed no difference in their performance compared to the Greek monolingual control group. These findings support the hypothesis that bilingualism influences the development of executive functions and extend previous research (Blom, Boerma, Bosma, Cornips & Everaert, 2017;Bosma, Hoekstra, Versloot & Blom, 2017;Costa, Hernández, Costa-Faidella & Sebastián-Gallés, 2009;Garraffa et al., 2015;Lauchlan, Parisi & Fadda, 2013). After controlling and closely matching this group of bilinguals to two monolingual control groups on a large number of relevant variables, a bilingual effect was observed in inhibition and working memory. The comparison between the bilingual group and the English monolingual group elicited a bilingual effect only in one working memory task and in the inhibition task. Our study is in line with previous research that has showed mixed findings in executive function tasks (Paap & Greenberg, 2013;Ross & Melinger, 2017).
In contrast to the previous comparison where a bilingual effect was found, the bilingual group was comparable in all the measures to the Greek monolingual group. The fact that there was no significant difference in any task between the bilingual and the Greek monolingual group may be linked to the fact that due to the Greek educational system, we could not avoid recruiting children in Greece that were exposed to English at least one hour a week starting in Year 1 and reaching three hours a week in Year 3 (Greek Ministry of Education and Religious Affairs, 2016). This is in combination with after school language classes, where children attend English classes at least two hours a week. It is possible that these few hours of English a week have affected the executive functioning scores. Other studies investigating dual language development and executive functions of bilingual children attending L2 education programs have found advantages in working memory after as little as one year of immersion educationfor example, in a group of Serbian-speaking second-grade children (Purić et al., 2017). Nicolay and Poncelet (2015) found positive effects after 3 years of immersion education in alerting, auditory selective attention, divided attention, and mental flexibility, in line with Carlson and Meltzoff (2008) who reported a bilingual advantage on a battery of executive function tasks after 6 months of immersion.
In contrast, the bilinguals had faster reaction times in the inhibition and BDST working memory task compared to the monolingual English group. Based on Purić et al. (2017), working memory may be specifically linked to these early stages of intensive L2 learning. This finding is in line with previous research showing a bilingual advantage in working memory (Antoniou et al., 2016;Bialystok, 2011;Blom, Küntay, Messer, Verhagen & Leseman, 2014;Purić et al., 2017). The bilingual effect in the inhibition domain is also in line with previous research on bilingualism (Bialystok, 2017). Based on Paap's (2018) Controlled Dose hypothesis, this bilingual advantage might be present due to the fact that the bilinguals are still in the process of learning how to control their languages and are constantly monitoring and inhibiting. The English control group did not have exposure to an L2 whereas the Greek control group didhowever, it might be the case of other variables playing a role such as differences in the aspects of the curriculum across the two school systems, cultural effects between these two control groups (Yang et al., 2011) and other school activities and hobbies, such as playing video games and sports that have been found to affect executive functions (Paap, Anders-Jefferson, Mason, Alvarado & Zimiga, 2018;Valian, 2015;Vestberg, Gustafson, Maurex, Ingvar & Petrovic, 2012). Future research could take all these additional variables into account.
However, the other tasks, one tapping into working memory (Counting recall task) and one tapping into inhibition (ANT; only the CONFLICT INDEX was analysed here) revealed no significant differences on either accuracy or response times. This might be an issue linked to reliability and validity of commonly used executive function tasks. The view that these tasks are far from optimal is supported by many researchers in the field (e.g., Paap & Greenberg, 2013;Paap & Sawi, 2014;Soveri, Lehtonen, Karlsson, Lukasik, Antfolk & Laine, 2018). This dissociation between tasks might also be linked to the lack of theory on the bilingual advantage and the lack of clarity in the architecture of executive functions despite the division by Miyake et al. (2000) into three interrelated components (shifting, inhibition, and WM). Even though the above tasks supposedly tap the same domain, that does not mean that they are correlated with each other (Jylkkä, Lehtonen, Lindholm, Kuusakoski & Laine, 2018;. Though some researchers have reported that forwards and backwards recall tasks load onto the same factor during factor analysis (e.g., Colom, Abad, Rebollo & Shih, 2005;Engle, Kane & Tuholski, 1999), others state that a reversal of order requires the involvement of executive-attentional resources (e.g., Elliott, Smith & McCulloch, 1997). On the other hand, Costa et al. (2008) and Pelham and Abrams (2014) found a significant bilingual conflict effect using the ANT when testing young adults. This might be linked to the engagement of the monitoring processes during an executive function task that may depend on several properties of the design, such as different type of stimuli. If, for example, a task involves one type of trials, monitoring processes may not be recruited as much (Costa et al., 2009). As Costa et al. (2009) hypothesise in their study, a bilingual advantage could be linked to a more efficient monitoring processing system, that checks which strategy should be applied in a specific trial. They found that in low-monitoring conditions no bilingual advantage was detected in contrast to high monitoring condition where a bilingual conflict effect was observed. Perhaps, the child-friendly version of the ANT used in the current study was not challenging enough. Similarly, in Antón et al. (2014) and Carlson and Meltzoff (2008) no difference was found in the children's version of the ANT task between the bilingual and monolingual children. The fact that we only found the significant difference in RT in the Stroop and the BDST tasks might be linked to the fact that a bilingual advantage in monitoring and updating may speed up performance, leading to not only overall faster RTs but also to a smaller conflict effect (Costa et al., 2009).
On the shifting task we did not find any bilingual effect. As Huizinga et al. (2006) stated, various executive function components may develop asynchronously. This is in line with previous research not finding effects of bilingualism in any executive function tasks (Paap & Greenberg, 2013).

Limitations and future directions
Due to practical matters, we used non-standardised tasks to assess Greek receptive vocabulary and grammar skills in Greek monolingual and bilingual children as well as English tests that are not standardised for bilingual children. Future development of tests is needed in Greek and English, which should also include bi-mutilingual children (Babatsouli, 2019;Marinis, Armon-Lotem & Pontikas, 2017). Also, standardised Greek tests assessing language skills such as the Action Picture Test for Greek (Vogindroukas, Protopapas & Stavrakaki, 2009b) and more recently the Logometro (Mouzaki, Ralli, Antoniou, Diamanti & Papaioannou, 2017) can also be used to assess Greek grammar skills.
Future studies can shed light on the possibility that limited exposure to a second language could enhance executive functions. Pursuing this might clarify the reasons why no differences were identified between the Greek-speaking cohort and the Greek-English bilingual cohort as well as mixed findings in other studies. This finding has important educational implications especially for Greece, where there will be a pilot project of teaching English for two hours a week, as a compulsory topic, in state nurseries from September 2020 (Greek Ministry of Education and Religious Affairs, 2020). Additionally, the European Commission is working together with national governments aiming for all citizens to begin learning foreign languages at an early age (European Commission, 2019). Finally, in Wales similar findings were obtained in a pilot study where children receiving minimal exposure to Welsh for a year were faster in a working memory task than their English monolingual counterparts (Papastergiou et al., 2019). Future longitudinal studies can further investigate these groups with minimal exposure to a second language and how this interacts with executive functions.
The relatively small sample size of this study is one of its limitations. Nevertheless, our findings extend previous research and demonstrate that after controlling and closely matching this group of bilinguals to two monolingual control groups on related factors, a bilingual effect is observed in inhibition and working memory.
Based on these results and as a further step, the bilingual advantage debate on executive function could be approached holistically, using frontier methodologies that allow to jointly consider information from multiple domains of executive function (e.g., .

Conclusion
The aim of this study was to examine the differences in the executive functioning skills of Greek-English bilingual children compared to two control groups of Greek and English monolingual children. The contribution of this study to the field is empirical and methodological; namely, we considered recently identified relevant variables in combination with innovative analyses and one unstudied language group of Greek-English bilingual children from the north of England. More specifically, we used k-means nearest neighbour methods to match bilingual to monolingual children on a wide array of variables, including age, SES, Greek and English proficiency. We used a factor analysis on four indicators of language proficiency to reveal one factor which we interpret as proficiency in English and Greek, closely matching on language background information that we obtained from both objective and contextual factors. This close matching gives us greater confidence in the results that revealed a bilingual advantage in two domains, inhibition and working memory, compared to the English monolingual group, while the Greek monolingual group was comparable to the Greek-English bilingual group. The latter finding might be explained by Greek children's exposure to small amounts of English in Greece due to the nature of the Greek educational system or it could be clarified in the way executive function is divided and analysed. Our findings extend previous research on the effect of L2 exposure on executive functions.

Matching estimators
Matching estimators are used in evaluating the impact of a treatment effect upon an outcome of interest. Let W i indicate whether the individual i (i = 1, …, N ) is exposed to treatment, with treatment denoted as W i = 1, thus W i = 0 represents the control group. For simplicity we assume that W i ∈ {0, 1} (i.e., only a treated and a control group are present), but extension to multiple treatments is possible. The number of treated individuals is denoted as N 1 = N i=1 W i , and thus the control group includes N 0 = N − N 1 individuals.
The outcome of interest may be represented as Y i , and we can denote as Y i (0) and Y i (1) the outcomes without and with treatment, respectively. The treatment effect upon the outcome of interest for individual i is given as Y i (1) − Y i (0). Thus, in a fictional setting of parallel universes we would evaluate the average treatment effect as t . However, for a given individual i, only one of the two quantities is observed: Hence, for each individual i that has participated in a treatment, we need a counterfactual equivalent of the same participant that would not have participated in the treatment, and vice versa. One way to achieve this is via randomization of the treatment, but this is not always possible and/or arguably unattainable in practice. Another way is to estimate the average expected outcome of a counterfactual participant on each occasion, which leads to regression estimators. Or phrased differently, we can identify a counterfactual participant and estimate the outcome, which leads to matching estimators. Under both occasions a set of k (k = 1, …, K) observed characteristics (i.e., covariates), we can denote these as X i,k , are used to identify the i counterfactual individual. 3

Matching using regression estimators
In the case of regression, we assume a single covariate for simplicity andm w (X) is a consistent estimator of μ w (X), thus we have: Therefore, we usem 0 (X i ) andm 1 (X i ) to estimate the counterfactual outcomes. Thus, the treatment effect may be estimated as:

Matching using k-means nearest neighbour estimators
In the case of k-means nearest neighbour matching estimators, we have: where M (m = 1, …, M ) denotes the number of matches to individual i. If m = 1 then M −1 j[J m (i) Y j ; Y j , that is only the closest match is used. Thus, the treatment effect using matching estimators may be estimated as: Contrary to regression, matching estimators utilize the observed characteristics (X i ) to identify candidate matches for each individual i that has participated in the treatment (W i = 1). The k-means nearest neighbour matching relies on a distance function to measure the distance (i.e., the similarity) between two individuals i, l where the i individual is part of the treatment group (i.e., bilingual), and the l individual is not. The distance between these observations may be given as: We can generalise this formula for when we have k number of covariates using matrix algebra. Assume that X = {X 1 , X 2 , …, X k } and that each individual i has the following set of observed characteristics X i = {X i,1 , X i,2 , . . . , X i,k }. The 3 A set of assumptions is required here, most notably that of "unconfoundness" between the outcome variable (Y ) and the covariates (X ), which states that conditional on the covariates, the treatment W is as good as randomised. distance between individuals i, l is now given as: where S is the variance-covariance matrix of the observed characteristics. The structure of the variance-covariance matrix can be one of Euclidean, Mahalanobis, or invese variance, formally: where 1 N is a N × 1 vector of ones, I k is the identity matrix of order k, same as the number of observed characteristics used, w i is the frequency weight for the i w i which denotes a weighted mean, and W is an N × N diagonal matrix containgn the frequency weights. Coming back to individual i, we can define the following set of nearestneighbour index: where i is the treated individual (i.e., bilingual) and for whom we want to find a matching control (i.e., monolingual); l, q denote two candidate matching monolinguals; W denotes the treatment effects and takes the value of 1 for bilinguals, zero otherwise; ||X i − X l || , ||X i − X q || S denotes the distance between i, l and i, q respectively and in the formula above we require that the distance between i, l is smaller than i, q (since we select the matching l individual as our match). The notation W l = 1 − W i and W q = 1 − W i implies that our i individual, who is bilingual (hence W i = 1), needs to be matched with some monolingual individual for whom W l = 0 or W q = 0.
The above can be generalised for m matching individuals, as: J m (i) = {l 1 , l 2 , . . . , l m |W lm = 1 − W i , ||X i − X lm || , ||X i − X qm || S , W q = 1 − W i ∀l m = q m } Hence, for the prediction of outcomes using the k-means nearest neighbour and assuming m matches we have: where the first is the case where the outcome is observed whether the individual is bilingual (W i = 1) or monolingual (W i = 0). The second case is the counterfactual outcome which does not exist and is estimated as the weighted average outcome of the m-closest matches.
Matching using bias-adjusted k-means nearest neighbour estimators The two estimators thus far are not asymptotically equivalent; theŶ i (W) is not a consistent estimator of μ w (X) due to the specific choice of matches for the former, see Abadie and Imbens (2006) for more details. Bias-adjusted k-means nearest neighbour matching estimators for continuous distributed characteristics use a regression correction term to ensure consistency of the matching estimator. Assuming the regression used is equal to the true regression function (i.e., no misspecification) this bias-adjustment adds only noise to the matching estimator, without however affecting its unbiasedness. Nevertheless, under the presence of misspecification in the regression, which may arise due to omitted variables and/or imprecise measurement, the bias-adjustment ensures that the quantity M −1 j[J m (i) Y j (1) converges to μ w (X); thus ensuring consistency of the estimator. The bias-adjusted k-means nearest matching estimator is given as:Y Thus, the treatment effect using matching estimators may be estimated as: In our case we compare the executive function accuracy and response time of a monolingual with those of a bilingual child, matching on observed characteristics related to: i) Age in months; ii) Sex; ii) K-BIT-2; iv) SES; v) English proficiency factor; vi) Greek proficiency factor.