Online processing of which-questions in bilingual children: Evidence from eye-tracking

An emergent debate surrounds the nature of language processing in bilingual children as an extension of broader questions about their morphosyntactic development in comparison to monolinguals, with the picture so far being nuanced. This paper adds to this debate by investigating the processing of morphosyntactically complex which – questions (e.g., Which bear is chasing the camel?) using the visual world paradigm and is the first study to examine the online processing of such questions in bilingual children. For both groups, object which-questions were more difficult than subject which-questions, due to an initial misinterpretation that needed to be reanalysed. Both groups were aided by number mismatch between the two nouns in the sentence, especially in object which-questions. Our findings are in line with previous studies that have shown a slower processing speed in bilingual children relative to monolinguals but qualitatively similar patterns.


Introduction
Research on language acquisition in bilingual children has shown that they may perform less well than monolingual children in elicitation tasks tapping morphosyntax.However, most studies showing a gap between bilingual and monolingual children's morphosyntactic abilities have employed production tasks (e.g., Paradis, 2005;Paradis, Rice, Crago & Marquis, 2008;Unsworth, 2007).Over the last decade, studies examining comprehension instead of or in addition to production have enriched the literature and have so far brought varied findings.An emerging consensus is that bilingual children have knowledge of syntax and morphology, and that differences in production are attributable to production costs (Chondrogianni & Marinis, 2012, 2016).Studies employing grammatical violation paradigms to test comprehension have revealed that bilingual children show similar processing patterns as monolingual children and are sensitive to grammatical violations (e.g., Chondrogianni & Marinis, 2012, 2016).At the same time, they have shown that bilingual children process language at a slower rate compared to monolingual children, suggesting less efficient processing.
The existing literature on bilingual comprehension has however not provided direct insight into how bilingual children interpret sentences in real-time.A hallmark of incremental sentence interpretation is that initially assigned interpretations may ultimately turn out to be incorrect.Such misinterpretations require revision and re-interpretation for successful comprehension (Trueswell, Sekerina, Hill & Logrip, 1999), which may be facilitated by various factors.Many existing studies examining online processing in bilingual children have adopted violation paradigms (Chondrogianni & Marinis, 2012, 2016).Whilst such tasks provide insight into sensitivity to different morpho-syntactic violations, they do not tap into how a sentence is interpreted during incremental processing.To address this gap, we utilised the visual world eye-tracking paradigm, and temporary ambiguity in wh-questions, to examine incremental interpretation during comprehension.Furthermore, as it has been shown that morphosyntactic information can have a facilitatory effect during processing for adults and monolingual children by aiding disambiguation (e.g., Contemori, Carlson & Marinis, 2018;Schouwenaars, Hendriks & Ruigendijk, 2018), we also test whether this applies to bilingual children.In particular, we manipulated number agreement between the two NPs of a wh-question to assess if and how number agreement is utilised to facilitate processing of wh-questions in bilingual children.
There are several reasons why sentence processing in bilingual children may differ to that of their monolingual counterparts.Firstly, the input they receive in any language will inevitably be different to that of monolinguals in terms of quantity and quality of exposure, age of onset, among others.One possible consequence of this might be that bilingual children process language more slowly during comprehension than monolingual children.Indeed, the available evidence from ungrammaticality detection indicates that bilingual children are slower than monolinguals although they face a similar slowdown to monolinguals with phrase-level ungrammaticality (Chondrogianni & Marinis, 2016).Secondly, while slower processing does not necessarily implicate qualitatively different processing patterns between monolingual and bilingual children, it is a possibility.For example, bilingual children might not compute a syntactic representation of what they hear as quickly as monolinguals, or they may be unable to integrate the available linguistic/nonlinguistic information quickly enough to guide processing while computing this representation.Slower processing in this sense might have knock-on effects that lead to different processing profiles between monolingual and bilingual children.The available evidence from off-line comprehension studies in how bilingual children process which-questions and utilise linguistic information to facilitate processing has indicated qualitative differences between bilingual and monolingual children at end-stage comprehension, at least for certain linguistic features such as case (Roesch & Chondrogianni, 2016).However, real time processing of which-questions in bilingual children and its timecourse remains unexplored.

Sentence processing and the use of morphosyntactic information in bilingual children
Research in morphosyntactic processing in bilingual children is scarce.A number of studies have examined grammatical violations in bilingual children using self-paced listening and word-monitoring tasks, and have investigated the processing of tense, definite and indefinite articles, clitic pronouns, and gender agreement (Chondrogianni Eye-tracking processing of which-questions in bilingual children & Marinis, 2012Marinis, , 2016;;Chondrogianni et al., 2015;Vasić et al., 2012).Most studies indicate nativelike sensitivity to grammatical violations as evidenced by a slowdown at the ungrammatical segments in a sentence for both bilingual and monolingual children.One exception to these findings comes from Vasić et al. (2012) who report a lack of sensitivity to grammatical gender violations in Dutch.This is interpreted as indicating difficulties with lexical knowledge (grammatical gender) rather than difficulties in processing grammatical cues.
One study to test comprehension of German wh-questions as well as the use of morphosyntactic information as a facilitatory cue is Roesch and Chondrogianni (2016), who used a picture selection task with simultaneous and early sequential bilinguals.This study examined the use of case as a disambiguating cue where the position thereof was manipulated (initial, final and both).The impact of case differed across groups; monolinguals made consistent use of the cue and accuracy was higher when case disambiguated whether an NP was a subject or object; the simultaneous bilinguals did so only in the initial position while the sequential bilingual did not do so at all.This points to a reduced use of at least certain disambiguating cues, which may be further contingent on early exposure in bilingual children.
The few studies currently available which use eye-tracking to investigate real-time use of morphosyntax in bilinguals have examined the use of grammatical gender marking on determiners to predict the upcoming noun.Lew-Williams (2017) found that school aged English-Spanish bilingual children were able to utilise number but not grammatical gender to predict upcoming nouns.This study, however, tested children in an immersion context where the L2 was not the majority language.Prediction based on grammatical gender was also investigated by Lemmerth and Hopp (2019) in simultaneous and sequential German-Russian bilinguals of the same age.While simultaneous bilinguals showed consistent effects of gender, as did monolingual controls, sequential bilinguals only showed similar effects for nouns with the same gender in Russian and German.
These results suggest reduced use of some types of morphosyntactic information to guide ambiguity resolution and prediction in some groups of bilingual children.It should be noted however that case and gender are fundamentally different morphosyntactic cues to number, which is examined in this study.Gender, for example, is idiosyncratic to particular lexical items and requires acquisition on a largely item-byitem basis, while case functions as a signal to word order and thematic roles.Number on the other hand is not lexically idiosyncratic, and is instead tightly related to conceptual information.As such, comparisons between our study on number and existing research on case and gender are only indirect.While bilingual's children use of case and gender have been examined in offline and online tasks, to date we are unaware of any existing study that has examined bilingual children's use of number agreement during real-time sentence processing.
In sum, research in real time morphosyntactic processing in bilingual children has focused on morphology and has not so far expanded to filler-gap dependences which has been limited to off-line comprehension.Thus, cognitive mechanisms such as incrementality and revision/re-interpretation in bilingual children during sentence processing remain essentially unexplored.Available evidence indicates similar but slower patterns of processing, but it is unclear whether bilingual children can utilise morphosyntactic information in real time in the same way as monolingual children to facilitate comprehension.
The (psycho)linguistics of wh-questions Wh-questions and, more generally, filler-gap dependences have been widely investigated in first language acquisition and first language processing in adults (e.g., Deevy & Leonard, 2004;Frazier, 1987;Gibson, 1998;Goodluck, 2005;Grodner & Gibson, 2005;Rizzi, 2004).In such dependencies, a link needs to be made between dislocated overt and null elements (fillers and gaps respectively in generativist terminology), which can stretch over a number of words and syntactic constituents.Subject wh-questions are constructions where the dislocated element is the subject of the verb whereas object wh-questions are those where the object of the verb is dislocated, as in (1a) and (2a) respectively.
(1a) Which donkey is carrying the zebra?(subject which-question) (1b) Which donkey is the zebra carrying?(object which-question) For English and many other languages, object-extracted wh-questions exhibit greater difficulty than subject-extracted questions, manifested in lower comprehension accuracy and/or slower response times; this has been termed as the "subject-object" asymmetry (Frazier, 1987;Stowe, 1989).One reason that object wh-questions cause difficulty, that is most pertinent to this study, is that the initial noun phrase ('which donkey') is initially preferentially assumed to be a subject.Due to the absence of overt case morphology, the examples (1a) and (1b) are locally ambiguous until after the auxiliary verb.It is only after the first word after the auxiliaryin which-questions, the second NPthat the ambiguity is resolved.However, the parser does not wait until the second NP and instead will begin to construct a syntactic representation and an interpretation of the sentence immediately and incrementally.Initially, the parser prefers interpreting ambiguous sentences as in (1a) and (1b) as a subject-question.The increased difficulty for sentences such as (1b) arises from the mismatch between the preferred syntactic structure and interpretation and the ultimately correct syntactic representation.For object-questions, this interpretation will need to be revised after the second NP.
This discrepancy can be explained under numerous theoretical accounts, the testing of which is not the primary aim of this paper.For example, the active filler hypothesis (Clifton & Frazier, 1989) predicts that gaps are filled at the first available possibility, in an effort to keep the syntactic structure as simple as possible.Thus, both (1a) and (1b) are predicted to initially be interpreted as subject wh-questions, as the subject gap position becomes available first during incremental processing.Reanalysis will be required in (1b) when the sentence is disambiguated to the correct object gap interpretation.Under alternative probabilistic or experience-based accounts (Hale, 2001;Levy, 2008;Roland, Dick & Elman., 2007), speakers of English are predicted to adopt the heuristic of initially interpreting the first NP as the subject of the wh-question due to their exposure.Such accounts assume that (1a) and (1b) are initially interpreted as subject questions because these are more likely to be encountered.The difficulty hence arises from the mismatch between this expectation and ultimately the correct syntactic representation.Assuming monolingual and bilingual speakers are able to compute the appropriate structure, the active filler hypothesis would predict an initial subject-bias for both monolingual and bilingual children.Under probabilistic/experience-based accounts, a subject first preference might also emerge in both monolingual and bilingual children as subject/agent first structures are more frequent in English than object first ones.
The tendency towards a subject-bias is well attested for filler-gap dependences in English even in typical adult L1 speakers using a variety of methodologies.This can be Eye-tracking processing of which-questions in bilingual children evidenced from both behavioural measures, such as slower reaction times for object questions (Stowe, 1989; see also, Grodner & Gibson, 2005 for relative clauses), but also electrophysiological measures (Phillips, Kazanina & Abada, 2005) as well as eye-tracking measures (Staub, 2010).Similar results have been obtained for other languages, such as German (Schlesewsky, Fanselow, Kliegl & Krems, 2000) and Dutch, (Frazier & Flores d'Arcais, 1989).
Difficulty in processing object extracted wh-questions could also be attributed to locality effects predicted under a specific theoretical account within the generativist frameworknamely, Relativized Minimality (Rizzi, 1990(Rizzi, , 2004;; for empirical evidence in children, see Friedmann et al., 2009).Under this account, the syntactic dependency established through A'-movement (in this case, between the fronted wh-phrase and its original position) would be interrupted by the presence of an additional constituent, an intervenerin this case, the subject of the sentence.Crucially, under Relativized Minimality, the A' dependency is interrupted because the intervener and the initial constituent share morphosyntactic features (such as number), and as a consequence, can act as a potential candidate as the filler in the syntactic relation, thus, yielding competition effects.This entails a significant prediction: that the intervening element will have this effect in sentences where the subject and object share the same number, as in (2a) and (2b), but not where they are different, as in (2c) and (2d), where the mismatch in number will function as a cue and aid comprehension.
(2a) Which donkey is the zebra carrying?(2b) Which donkeys are the zebras carrying?(2c) Which donkey are the zebras carrying?(2d) Which donkeys is the zebra carrying?
A secondary aim of this study was to test this prediction in conjunction with the question as to whether bilingual children can utilise cues (in this case, morphosyntactic in nature) to facilitate processing.

Wh-questions in children
There is ample research on the acquisition of wh-questions in children, but the majority has focused on production or offline end stage comprehension tasks.There is a substantially smaller body of evidence for online language processing.Wh-questions have been shown to be a challenging linguistic structure to be acquired in children.Whereas object wh-questions appear early on in child language and around the same time as subject wh-questions in production (Stromswold, 1995), difficulties in the comprehension of object which-questions have been shown to persist until early school years (Deevy & Leonard, 2004;Goodluck, 2005).Similar difficulties with object wh-questions have also been observed across numerous typologically varying languages, such as French (Jakubowicz & Gutierez, 2007), Italian (De Vincenzi, Arduino, Ciccarelli & Job, 1999), Greek (Stavrakaki, 2006), and Hebrew (Friedmann, Belletti & Rizzi, 2009).Omaki, Davidson White, Goro, Lidz and Phillips (2014) examined the comprehension of embedded wh-questions in L1 English and L1 Japanese children as well as adult controls.In both languages, these are ambiguous, as it is not clear which gap position the fronted wh-phrase fills.Using an off-line comprehension task, participants were given stories and were asked a comprehension question, such as (3).
(3a) "Where did Lizzie say that she was gonna catch butterflies?"(3b) "Where did Lizzie tell someone that she was gonna catch butterflies?".(3c) "Where did Lizzie say to someone that she was gonna catch butterflies?".
The main clause interpretation for (3)where the wh-phrase was attached to the verb "say/tell" rather than "catch"was more frequent in the responses of both adults and children for English, suggesting a similar bias in adults and children to placing the filler at the earliest possible gap.For Japanese there was a preference for an embedded clause interpretation of sentences such as (4).
(4) Doko-de Yukiko-chan-wa [kouen-de choucho-o tsukameru to] itteta-no?where-at Yukiko-DIM-TOP pro park-at butterfly-ACC catch COMP was telling-Q "Where was Yukiko telling someone that she would catch a butterfly at the park?"However, due to differences in word order between Japanese and English (Japanese is a head final language whereas English is head initial), this mirror image reflects the same mechanism; the embedded clause in Japanese is centre-embedded and is the first clause of the two to have a position available for a gap.Therefore, all groups preferred to insert the filler in the earlier gap position available for globally ambiguous questions.When an additional prepositional phrase was added to specify the location of the embedded clause event, hence making the question locally ambiguous (English example: "Where was Yukiko telling someone that she would catch a butterfly at the park?"), the adults gave more often a main clause interpretation whereas the children did not.This suggests that the children had difficulty re-analysing the sentence.Omaki et al. shows incremental processing and difficulties recovering from garden-path effects, but these conclusions are extrapolated on the basis of off-line findings1 .
In the first relevant eye-tracking study, Atkinson, Wagers, Lidz, Phillips and Omaki (2018) investigated locally ambiguous filler-gap dependences such as "Can you tell me what Emily was eating the cake with?" with children and found a bias towards filling the gap as early as possible for adults and 6-year-old children but not for 5-year-olds.The results are consistent with the existing literature which suggests that children process language incrementally.
To our knowledge, the first study to investigate processing of which-questions in children and how they utilise cues to aid processing using online measures is Contemori et al. (2018).The study used the visual world eye-tracking paradigm with L1 English children aged 5-7 years.Participants heard subject and object wh-questions like (1a) and (1b) while looking at a picture and needed to answer a comprehension question by clicking on the picture corresponding to answer.Analysis of accuracy and gaze data showed a persistent disadvantage for object questions for the children.The gaze data showed similar processing mechanisms for children as with adults.For object whichquestions looks to the picture corresponding to the sentence heard initially decreased below chance and then increased.This change indicates that object-questions were initially interpreted as subject questions and only after the disambiguating second NP did the hearer reanalyse the questions and build a different syntactic representation.
Accuracy for object questions remained below that of subject questions indicating persistent difficulty with reanalysis consistent with Omaki et al. (2014).
Contemori et al. also tested the use of morphosyntactic cues to facilitate processing by manipulating the number of the two noun phrases so that it would either be the same (match) as in (5a) or different (mismatch) as in (5b).
It was expected that (5b) would be easier to process than (5a), as the number mismatch between the auxiliary and the first NP in (5b) provides an early cue for disambiguation before the second NP2 .This is also predicted under Relativized Minimality as the feature mismatch means that the second NP/intervener cannot function as a potential filler to the gap.Contemori et al. found that the mismatch in number resulted in higher comprehension accuracy for object questions.In terms of gaze data, a faster increase in looks to the picture corresponding to the sentence when the two NPs had a different number than when they were the same was taken to reflect the fact that number mismatch functioned as a facilitatory cue during real time processing.
Finally, Schouwenaars et al. (2018) investigated the role of case and number agreement cues in subject and object which-questions in monolingual German children using the visual world paradigm.They found a similar pattern of looks as in Contemori et al. with object-questions initially misinterpreted and morphosyntactic cues aiding reanalysis.However, the children were slower to revise an initial misinterpretation for object questions where disambiguation was aided by number agreement only (the two NPs differed in number) relative to when there was disambiguation from both number and case.

The current study
The current study builds on the visual-world eye-racking study from Contemori et al. (2018).Adopting the same research paradigm, we examine the processing of wh-questions in bilingual and monolingual children.To our knowledge, this is the first eyetracking study on wh-questions in bilingual children.The aims were to examine whichquestion processing in bilingual children relative to monolingual children and potential differences between subject-and object-questions.We further explore the timecourse of processing to investigate whether there is incrementality in syntactic processing in bilingual children as has been established for monolinguals.Moreover, we examine the impact of number mismatch of two NPs as a facilitatory cue in line with predictions made under Relativized Minimality for bilingual children and for older monolingual Englishspeaking children.
An additional contribution of this paper is both methodological and theoretical.A limitation of Contemori et al. was the use of an incomplete paradigm as only whichquestions where the number of the first noun phrase was exclusively singular were used (i.e., SG-SG for the match condition, SG-PL for the mismatch condition).However, wh-questions with a plural first noun phrase were not included (i.e., PL-SG and PL-PL for match and mismatch respectively).The plural number has been described as the marked number option relative to the singular in linguistics.Marked features have been associated with additional complexity in linguistics or difficulty in acquisition (Harley & Ritter, 2002;Haspelmath, 2006).They have also been associated with increased difficulty in processing as attested by an increased occurrence of attraction errorsalbeit in a different type of syntactic dependency, subject-verb agreement (Wagers, Lau & Phillips, 2009).Therefore, it is unclear whether the effect of number mismatch for questions with a plural first NP will be the same as for questions with a singular first NP.To address this, this study also manipulated the number of the first noun phrase to be either singular or plural (henceforth "First NP") across all previous conditions.We tested older children (8-11 years) relative to Contemori et al. in order to expand the results to older monolingual children and to ensure that the experimental paradigm was useable with bilingual older children.
Our research questions are: 1. Do bilingual children differ to monolingual children in their ultimate interpretation of which-questions i.e., is there evidence that both groups misinterpret object wh-questions.2. How does number (mis)match influence offline comprehension?Is the effect of number (mis)match modulated by the number of the first NP? 3. Do bilingual children initially misinterpret object wh-questions as subject questions and does the timecourse of recovery differ between monolingual and bilingual children? 4. How does number (mis)match influence real time comprehension of wh-questions in bilingual children?Is the effect of number mismatch modulated by the number of the first NP?
Research questions 1 and 2 can be addressed based on the end-result accuracy data and reaction times.Research questions 3 and 4 are examined based on the gaze data.Given previous research suggests bilingual children process sentences more slowly than their monolingual peers (Chondrogianni & Marinis, 2012, 2016;Chondrogianni et al., 2015;Vasić et al., 2012), it is plausible that there will be differences between the two groups in this study in terms of processing.These may emerge for the reaction times and/or the gaze data.Previous research has, however, relied on reaction time data from self-paced listening studies.In this respect the timecourse of processing and bilinguals' real time interpretation of sentences remains largely unknown.The evidence for slower processing found in previous studies may reflect either an overall slower but qualitatively similar processing mechanism or a slower processing mechanism alongside qualitative differences.Slower but qualitatively similar processing will be evidenced by a similar but delayed trajectory of looks towards the target image for the bilingual children relative to the monolinguals (i.e., same curve shape with a time-delayed overlay).If slower processing results in qualitative differences, bilingual children may not be able to compute a syntactic representation quickly enough in real time and thus not misinterpret it.If bilinguals initially misinterpret object questions as subject questions, looks to target for object questions will drop below chance and increase thereafter as in Contemori et al.This is not expected to be the case for subject questions where looks will increase from the beginning and will plateau earlier.

Eye-tracking processing of which-questions in bilingual children
Previous work on bilinguals' use of morphological cues has indicated a more nuanced and potentially reduced use of at least some morphosyntactic cues (case and gender), in both offline comprehension (Roesch & Chondrogianni, 2016) but also real-time processing (Lemmerth & Hopp, 2019;Lew-Williams, 2017).However, it is unclear whether bilingual children will be able to utilise number mismatch in the same way as monolingual children.Contemori et al. (2018) showed an effect of number mismatch for disambiguating object questions for both comprehension accuracy and real time processing where number mismatch resulted in higher accuracy and a faster increase in looks to target in comparison to object questions where the number matched.If bilingual children have difficulty integrating morphosyntactic information quickly enough during processing, they may be insensitive to number mismatch unlike monolingual children.However, if bilingual children make use of the number mismatch in accordance to Relativized Minimality, there will be an interaction with structure and number match; in other words, the effect of number (mis)match should be present only in the object questions.The aforementioned models of language processing do not make explicit predictions about the effect of number, but it is expected that the unmarked forms will be easier to process for both groups.

Method
Participants A total of 68 children from Grades 3-6 participated in this study: 37 monolingual children aged 7;10-11;6 (M=9;7, SD=1;1, 16 girls and 21 boys) and 31 bilingual/multilingual children 7;4-11;5 (M= 9;6, SD=1;2, 17 girls and 14 boys) who were recruited from the same schools in the UK.None of the children had a history of language impairment or learning difficulty.All bilingual children had a minimum exposure to English of two years.All children undertook a series of baseline assessments including CELF-4 (Concepts & Following Directions, Word Classes, Formulated Sentences, Recalling Sentences), TROG-2, Renfrew Test of Word Finding, CNRep, Raven's Coloured Progressive Matrices.All children scored within age-appropriate norms.As a group, the bilingual children underperformed the monolingual children on several measures of language but not on others.The results from the between group comparisons CELF-4 composite scores are summarised in Table 1 (for an overview of results from baseline The children's language history was carefully documented through the use of the PABIQ questionnaire and brief semi-structured interviews.Background information about language development and use as well as parental education was collected.In terms of their linguistic background, the multilingual children came from a variety of backgrounds; these are summarised in Tables 2 and 3.The majority of bilingual children were classed as English dominant based on the PABIQ questionnaire and one third was rated as balanced in terms of language proficiency.Almost all children used English more often in community and educational settings but language use at home was evenly divided between English dominant, L1 dominant and balanced.Sample size did not permit splitting the bilingual children into subgroups.However, measures of exposure to English, language proficiency and bilingual dominance were used individually as covariates in separate models to the bilingual children's data to control for individual variation in performance.These measures were not significant and did not improve model fit 3 .All monolingual children were born in the UK except two who were born in Australia and grew up with only English spoken in the home and in their environment.All but 2 bilingual children spoke L1s which overtly marked plurality in nouns based on the World Atlas of Language Structures (Dryer, 2013).
Ethical approval was granted from the School of Psychology and Clinical Language Sciences Research Ethics Committee.Children were recruited either through mainstream schools in the area of Reading and Southampton (UK) or privately through email or word of mouth.Separate information sheets and consent forms were completed by children and parents.

Design
The study used a visual world eye-tracking task.Participants heard a which-question and looked at two pictures.Both pictures contained two animate entities (animals) with one doing something to the other (e.g., carrying).The two pictures differed in that the thematic structure had been reversed so the agent in the one picture was the patient in the other and vice versa.In each trial, one picture depicted the event with the argument structure corresponding to the one in the question the participants heard (henceforth "target"); the other depicted the reverse argument structure ("competitor").After hearing the verbal stimulus and looking at the pictures, participants clicked on the picture that answered the question.The first within-subjects variable was the type of the which-question (subject vs. object).The second within-subjects variable was the number of the two noun phrases so that the two could be either the same or different (match vs. mismatch) following Contemori et al. (2018).The third within-subjects variable was the number of the first noun phrase (singular first vs.plural first) which was not included in Contemori et al.This gave rise to 8 (2x2x2) conditions, as exemplified in Table 4.The between-subjects variable was language group (monolinguals vs. bilinguals).

Materials
For each trial, one which-question and two pictures were used.80 which-questions were created by forming which-questions with ten lexical sets across all conditions.Each set of Use outside home 26 1 3 Total language exposure 22 1 7 Relevant information in the parental questionnaire was not provided by the parents of two participants with regards to AoO and LoE, hence the discrepancy in the numbers.
For the calculation of LoE, chronological age was used for children who were exposed to English at birth.For children who were exposed to English after birth Age of Onset was subtracted from chronological age.
For the dominance scores, mean and SD are not calculated as the measure is ordinal.Instead counts are reported.Children with scores <-2 are considered English dominant; score >2 as L1 dominant and scores between -2 and þ2 are considered balanced for the particular measure.These are calculated based on parental ratings.2018), but additional pictures for the novel conditions in this study were created by copying, to ensure maximal visual similarity.The size and visual features of target and competitor were similar to the greatest degree possible.An example of the visual stimuli is shown in Figure 1, where the picture on the right is the target for Subject SG-SG and competitor for Object SG-SG; the reverse is true for the picture on the left.For the trials with the mismatch condition, the singular and plural entity was the same in both pictures.The position of target and competitor was counterbalanced across conditions.As a result, except for the structure of the sentence heard, there were no cues to adjudicate between target and competitor.
Trials were pseudo-randomised so that each set of nouns occurred at varying intervals from 2 to 19 intervening trials (mean = 9.18, SD = 3.78).Furthermore, no trials were permitted to follow trials of the same condition although adjacent trials with the same level of a single variable (e.g., a subject question followed by a subject question) were permitted and occurred in around half the trials.A single list was used for this study; Which bear are the camels chasing?Eye-tracking processing of which-questions in bilingual children therefore, a random intercept of trials was initially allowed to control for effects of a single trial occurring in a fixed place.This however was removed as it contributed little to the variance and did not improve the model fit.

Procedure
The experiment took place in a quiet room with the participants wearing headphones.A Tobii X120 (Tobii Technology AB, Sweden) eye-tracker measured the participants' eyegaze, tracking eye position with a resolution of 120Hz.The eye-movement data reported are an average of both eyes.Stimulus presentation and eye-gaze data collection was conducted using E-prime (Schneider, Eschman & Zuccolotto, 2002).Testing started with a 5-point calibration procedure.The experimenter (first author) judged the quality of the calibration by examining the calibration plot for the five points.Quality of calibration was judged as adequate when the eye-tracker captured the participant's looks at all 5 points and there was limited drag in line with the guidance provided in the Tobii X120 manual.Participants sat on a chair at about 60 cm from the screen, although this was adjusted somewhat to facilitate calibration.
During the task, a fixation cross in the centre of the screen appeared before the onset of each trial which participants needed to fixate upon for 1000ms for the trial to begin.This also functioned as a calibration check, as the fixation would only register if adequately calibrated.Participants heard a question over a set of headphones and saw two pictures on each side of the screen.Following the question, a cursor appeared on the screen.The two pictures were kept constant, and the participants needed to click on a picture on the screen to select the target while looking at the pictures.There was no time limit for participants to select a picture, but the mouse click was not allowed until after the audio file had finished.The order of the stimuli was pseudorandomised to avoid the same condition in adjacent trials which were split into two blocks of 40 trials.Total testing time for the children was about 10-15 minutes per block excluding the time needed to calibrate.

Analyses
Accuracy, reaction time, and gaze data were collected and analysed.As the duration of the trials with subject questions was longer than that of ones with object questions (mean = 2,727ms, SD = 126 vs. mean = 2,533ms, SD = 122), the participants' reaction times were defined as the difference between the time the participant needed to click on the selected picture and the duration of the trial.There were no negative times.
For the gaze data, two areas of interest (AOI) were defined a priori in E-prime capturing the left and right half of the screen, corresponding to each picture presented in each trial.Eye-movement data were time locked to the onset of the auxiliary verb as in Contemori et al.This timepoint allows one to capture effects of misinterpretation and is the earliest point at which number mismatch can disambiguate.It is expected that looks will initially be approximately equal for both pictures as the participants explore the visual stimuli and will subsequently increase for the picture consistent with a subject-biased interpretation.We did not time lock the second NP as this would miss any effect of misinterpretation occurring at the point of the structural ambiguity and would be less likely to reflect incremental and subconscious processing in the latter time bins.
A window of 200ms was allowed for the time it takes to program a saccadic eyemovement (Matin, Shao & Boff, 1993), such that eye-movements were analysed for a period of 2 seconds (200-2200ms post auxiliary).Incorrect trials were removed from the analyses consistent with Contemori et al. and standard practice with this type of data 4 .This resulted in the loss of around 5% of trials with subject questions and 15% of trials containing object questions due to lower comprehension accuracy in the latter.The time period examined was divided into ten equal bins of 200ms.For each bin, the proportion of looks to target relative to competitor was calculated.These proportions were quasi-logit transformed to compute the empirical logit which better handles cases where the probability is high of low (Barr, 2008).
The analysis was conducted using logistic mixed effects models for accuracy, linear mixed effects models for reaction time and a growth curve spline function for the gaze datain line with Contemori et al.with crossed random effects for subjects and lexical items (Baayen, Davidson & Bates, 2008) implemented in the lme package in R (Bates, Maechler, Bolker & Walker, 2015, version 3.5-0).To control for the age range and the variability in the children's language skills, age and the Core Language Score (CLS) from the CELF-4 were entered as a continuous variable into the model.Interactions with other variables were not included for purposes of model convergence.The reasoning for selecting the CLS is two-fold: firstly, the CLS is a composite score which best reflects a child's linguistic competence as it is comprised from the scores of several diverse tasks.Secondly, adding scores from numerous baseline assessments as predictors or covariates requires larger datasets for a model to converge and may not be meaningful, as the scores on individual tasks may be correlated as they reflect the same aspect of a child's linguistic competence.As bilingual proficiency may be influenced by length of exposure (henceforth LoE) and language dominance, we fitted a second model to the data for bilingual children only using length of exposure and two dominance scores calculated based on responses in the parental questionnaire alongside the fixed effects.Dominance was defined as the difference in the proficiency scores in the two languages and exposure was defined as the difference in composite scores for exposure to each of the two languages across numerous settings (e.g., school, home, friends).For these two measures, a negative score indicated dominance in English and greater exposure to English, respectively.Age and language proficiency or dominance scores were entered only as main effects into the models, as including interaction terms of background measures with the independent variables generally deteriorated the model fit.
For the gaze data, a growth curve model was fitted to looks to the correct picture to capture change as a non-linear function of time (Mirman, Dixon & Magnuson, 2008;Mirman, 2014).Time was coded as a restrictedor naturalcubic spline with 4 equidistant knots creating three different components5 (Harrell, 2001).This type of transformation captures the non-linear change in time as the independent variable is transformed to include a linear, a quadratic and a cubic component.The use of a spline function adds further flexibility to the non-linear modelling of change over time by allowing the function and its parameters to differ across the components.In this type of modelling, significant main effects and interactions on the intercept term signify overall differences irrespective of time as in more conventional models.Significant main effects and interaction on the spline's components, i.e., those which involve time, signify that the shape of the growth curve varies between the different levels of the independent variable, e.g., faster or slower growth rate.We conservatively focus on those effects that were significant on a minimum of two of the three components of the spline, as these would reflect the most consistent patterns in the participants' behaviour.
Sum coding (-1, 1) was used for between subject variables (monolingual vs. bilingual) and fixed main effects of 'structure' (subject vs. object which-question), 'number matching' (match between the two NPs vs. mismatch) as well as 'first NP' (singular vs. plural) for all three metrics.Time, as defined as 200ms bin number, was scaled in order to conduct the growth curve analysis.Trials where there were no looks to either target or object were not included, as the computed empirical logit value would be infinity (0 divided by zero) and were thus treated as missing data.Weights were added to each observation based on the reciprocal of the variance (i.e., 1/weights).
For all three previously listed metrics, the maximal model permitted by design that converged was used with correlation parameters removed (Barr, Levy, Scheepers & Tily, 2013).This included all dependent variables and by-subject and by-item random intercepts and slopes for all fixed effects.For the eye-tracking data, a single model was fitted for all data, instead of multiple models for each time bin, with bin (i.e., time) as an additional fixed effect.This resulted in each trial having multiple interdependent data points per trial.Therefore, a third random intercept was allowed, that of trial ID (the unique pairing of subject number and lexical item which defined a trial).When a model failed to converge, the random effects that accounted for the least variance were iteratively removed until the model converged.The raw data and code for each analysis can be found at https://osf.io/4w693/.

Results
To examine RQ1 and RQ2, we analysed the comprehension accuracy and reaction time data.An overview of the results for the accuracy data and the reaction times can be found in Table 5, followed by the results for the models in Table 6.

Accuracy data
There was a significant main effect of syntactic structure, with lower accuracy for object than subject questions.Neither the main effect of group or number match, nor any interactions with group, were significant.There was a marginally significant main effect of the number of the first NP, with higher accuracy for questions with a singular first NP and a significant interaction of structure by number of first NP and an interaction of structure by number match.Overall accuracy also significantly improved with age and language proficiency.Given that subject and object questions have shown differential effects in previous studies (e.g., Contemori et al., 2018) and the significant structure by number (mis)match interaction separate analyses were carried out for subject and object questions.A main effect of number match was found for object questions where questions with a mismatch in number between the NPs resulted in higher accuracy than when there was a match.This was not found for the subject questions (Table 8 in the Appendix).For the bilingual children, accuracy did not improve with length of exposure, quantity of exposure or increased dominance in English and did not improve the model fit.

Reaction times
There was a significant effect of structure, with slower reaction times for object questions and an effect of number (mis)match with faster reaction times when there was a mismatch between the number of the two NPs.Neither the effect of group or number of first NP were significant.Reaction times became faster with age but there was no significant effect of language proficiency unlike with the response accuracy.The only significant interaction was the group by number by first NP number interaction.As separate models for monolingual and bilingual children yielded no further effects, this interaction is not discussed further (see Table 9 in Appendices for the output).Length of exposure, language  dominance and quantity of exposure to English were not significant predictors of reaction times in the bilingual children.

Gaze data
To examine RQ3 and RQ4, regarding the processing of which-questions in real time, we analysed the gaze data.We first present a visual overview of the data as well as an overview of the model output (Table 7, the full model can be found in Appendix C).Subsequently, we outline the significant effects found.We first report the significant main effects and interactions on the intercept, i.e., those which do not involve time.This reflects overall aggregate differences irrespective of time and do not speak to the trajectory of looks to target.We then report main effects and interactions on the spline.The latter show differences in the shape of the curve and are interpreted as differences in the shape of the curve, i.e., change over time.For effects on the splines, we report up to two-way interactions for reasons of conciseness and as these are the most readily interpretable, although the full list of fixed effects can be found in the appendices (Table 10).

Effects on intercept term
There was a significant effect of structure with fewer looks to the target picture for object questions than for subject questions overall.There was no significant difference in total looks towards the target between the bilingual and monolingual children, nor was there a significant effect of number (mis)match and/or first NP number.However, age and language proficiency were not significant predictors of the total amount of looks towards the target.For the bilingual children, length of exposure to English, quantity of English exposure and language dominance were not significant predictors of looks to the correct picture.

Interactions on intercept term
There were significant interactions between structure and number, and structure and number and First NP which did not interact with group.To further explore the significant interactions, models were fitted to subject and object questions separately (Table 11).
Focusing on the effects of match, an effect of number match was found for object questions, but this did not reach significance for the subject questions.This suggests an effect of number mismatch in facilitating processing of object questions but not subject questions.

Effects on spline/growth curves
There was a significant effect of time on looks to the target picture on all three components of the spline suggesting that the amount of looks to the target changed continuously.There was a significant main effect of group on all three components of the spline.This suggests differences in trajectory of looks between monolingual and bilingual children with the bilingual children showing less pronounced increases in looks to target.There was also a significant effect of structure on the first and the third component of the spline.However, the impact of number mismatch was weak with only a trend for number 1100 George Pontikas et al.
mismatch in the third component of the spline.There was no effect of first NP number on any components.

Interactions on spline/growth curves
The only significant two-way interactions were an interaction of group by number (mis) match on the first component of the spline and an interaction of structure by number (mismatch) again on the first component of the spline.Visual inspection of the data (Figure 2) suggests an increase in looks to target for subjects following chance performance prior to the auxiliary verb and an initial decrease for objects followed by a slower increase.This reflects an initial misinterpretation of the latter as subject questions and a reanalysis of the structure upon disambiguation.In terms of between group differences in the trajectory of looks, visual inspection of the data suggests a generally slower increase in looks to target for the bilingual children, i.e., a less steep slope or flatter slope.This results in greater differences in looks between the two groups after about 2,000ms post onset of the which-question 6 .For the object questions in particular, looks in the bilingual children do not appear to drop as substantially below chance as they do for the monolinguals suggesting that the garden-path effects and the immediate re-interpretation of the ambiguous question may be taking place over a more protracted period.

Discussion
The present study is among the first to examine sentence processing in bilingual children using the visual world paradigm.Subject and object which-questions were utilised to examine incrementality and timecourse of processing alongside the utilisation of morpho syntactic cues to aid disambiguation, in this case number mismatch between the two NPs.
The results show that bilingual children did not significantly underperform the monolinguals in any of the metrics and that both groups showed increased difficulty with object questions.Unlike previous studies that relied on offline comprehension questions, our use of the visual-world paradigm allows us to claim that in both groups this was due to the initial misinterpretation of the ambiguous first NP as a subject NP, thus providing clear evidence of incremental processing in bilingual children.The bilingual children differed from the monolinguals in that they had a more gradual increase in looks to the target picture in the gaze data.
RQ1: Do bilingual children differ to monolingual children in their ultimate interpretation of which-questions?
In both groups, object which-questions were more difficult than subject whichquestions.This is reflected in both lower comprehension accuracy and slower reaction times when the question was comprehended correctly.This is in line with a vast body of literature suggesting a subject-object asymmetry in filler-gap dependences (e.g., Contemori et al., 2018;Deevy & Leonard, 2004;Friedmann, Belletti & Rizzi, 2009;Goodluck, 2005;Grodner & Gibson, 2005;Stavrakaki, 2006;Stowe, 1989).As illustrated in the gaze data, this is due to an initial misinterpretation of object questions as subject questions and the subsequent need to reanalyse them after the first parse is untenable.However, the bilingual children were neither significantly slower nor significantly less accurate than their monolingual peers in answering the comprehension questions.This suggests that bilingual children show similar performance to monolingual children overall; both groups show a subject-bias with ambiguous NPs and both groups were usually able to recover from induced garden-path effects and accurately reanalyse the sentence upon disambiguation.The fact that comprehension accuracy, i.e., the response after hearing the complete question, is lower for object relative to subject questions suggests that reanalysis is challenging for the parser and may not always be successful.This is in line with previous studies with children (Trueswell et al., 1999) and has been attributed to lingering misinterpretations observed even in adults (Slattery et al., 2013).Bilingual children did not have significantly greater difficulty with object questions in comparison to the monolinguals as the interaction with group and structure was not significant for accuracy and reaction times.The above suggest that the processing mechanisms for sentences are similar in both bilingual and monolingual children.
The current study is in line with previous studies on sentence processing in bilingual children that have also shown no differences in comprehension accuracy (e.g., Chondrogianni & Marinis, 2012;Chondrogianni et al., 2015;Vasić et al., 2012).Contrary to these studies, we did not find slower reaction times for the bilinguals.However, as previous studies used self-paced listening, the reaction times obtained reflect segmentby-segment real-time processing of a sentence whereas in this study reaction times reflected the time-taken to respond to the question after participants had heard the entire sentence.
Accuracy is higher for the monolingual children in this study relative to the children in .This discrepancy could be due to the fact that the children in this study are older than in Contemori et al.Therefore, as the parser matures, the child's ability to successfully revise misinterpretations becomes more robust leading to higher comprehension accuracy once the participant has heard the entire which-question.
RQ2: How does number (mis)match influence offline comprehension?
Number mismatch was found to have a facilitatory effect for object questions but not for subject questions.This was consistent for both bilingual and monolingual children as evidenced by the absence of group by number (mis)match interactions.These results are similar to those reported by Contemori et al. (2018).This is in line with Relativised Minimality.The reason for the benefit of number mismatch exclusively for object questions is that it is redundant for subject questions; it is only for object questions that the initial interpretation will be erroneous due to the subject bias.Moreover, as with Contemori et al., accuracy for subject questions in this study showed ceiling effects, thus making any additional benefit from morphosyntactic features redundant.Roesch and Chondrogianni (2016) showed that bilingual children could utilise case to facilitate object which-question processing in German.However, they found that this was limited to simultaneous and early sequential bilinguals and not found in late sequential bilinguals.We did not analyse our data by type of exposure due to the small sample sizes that this would entail.However, we contend that the difference between our study and the study by Roesch and Chondrogianni (2016) may relate to the morphosyntactic features tested.
Reaction times showed an overall benefit of number mismatch irrespective of question type.This is unexpected and would not be predicted by Relativised Minimality, nor would it be related to a subject-biased initial interpretation of ambiguous NPs.It cannot be compared to the results in other studies (Contemori et al., 2018;Roesch & Chondrogianni, 2016) as these do not report results from reaction times.
RQ3: Do bilingual children initially misinterpret object wh-questions as subject questions and does the timecourse of recovery differ between monolingual and bilingual children?
The use of eye-tracking is the novel component of this line of research into sentence processing in bilingual children as it allows us to better understand the cognitive mechanisms involved in processing.The results from this study show that both bilingual and monolingual children initially misinterpret object questions as subject questions.For both groups, looks to the target are initially at around chance suggesting the parser has not yet committed to a specific interpretation.Shortly after the auxiliary verb, there is a decline in looks to the target for object questions, reflecting an increase in looks to the competitor which corresponds to a subject-reading of the ambiguous NP.This was also clearly found in Contemori et al. (2018) for younger children and adults and also with Schouwenaars et al. (2018) for L1 speakers of German in a comparable age range.
The findings are consistent with both structure-based accounts (e.g., Active Filler hypothesis, Frazier & Clifton, 1989) and probabilistic accounts (e.g., Levy, 2008) of the subject-object asymmetry found in filler-gap dependences.Although our design did not attempt to tease apart these different accounts, our results indicate that erroneous initial parses and garden-path effects in bilingual children are the result of their sentence processing being incremental similarly to monolingual children.
Where bilingual and monolingual children differ is in the timecourse of real-time processing of which-questions as evidenced by the effects of group on all components of the spline.For object questions, the looks to the target remain at a low point for longer than in monolingual children (400-600ms vs. 200-400ms, see Figure 2).As the reorientation of looks to the target is taken to signify a re-interpretation of the ambiguous Eye-tracking processing of which-questions in bilingual children sentence and a recovery, we interpret this discrepancy in the timings of the increase in looks as a form of slower processing.The second difference between the two groups is in the subsequent increase in looks thereafter.In the bilingual children, the increase in looks to the target is less steep than in the monolingual children after they begin to reorient their looks.Visual inspection of the mean proportion of looks and the standard error suggests this results in significantly fewer looks to target for the bilinguals relative to the monolinguals after 2,000ms after the onset of the question.Differences in steepness of the growth curves are taken to reflect differences in speed of processing consistent with previous work in growth curve analysis (Mirman 2014;Mirman et al., 2008).In this sense, the results from this study are conceptually similar with other studies using self-paced listening which show equal accuracy but slower speed of processing in bilingual children (e.g., Chondrogianni & Marinis, 2012;Chondrogianni et al., 2015;Vasić et al., 2012).
Rather than reflecting qualitatively different patterns of processing, a tentative explanation for this difference in speed is that it results from two linguistic systems remaining active during bilingual sentence comprehension.Slower processing has been shown for bilinguals for both lexical processing (e.g., Blumenfeld & Marian, 2007;de Bruin, Della Sala & Bak, 2016) but also for sentence processing during production (e.g., Bernolet, Hartsuiker & Pickering 2007;Desmet & Declercq, 2006;Loebell & Bock, 2003;Hartsuiker, Pickering & Veltkamp, 2004).Alternatively, one could attribute these differences in processing speed to differences in the input, due to the bilingual speakers presumably having less input in English than the monolingual children.However, note these differences were observed even after language proficiency was controlled for in the analyses.Moreover, measures related to the input bilinguals received in English (LoE and dominance in use/proficiency) did not consistently predict the bilinguals' performance.Further research is needed to tease apart these two potential accounts of the observed differences in processing speed.Note importantly however, that the absence of a group by structure interaction indicates no additional processing burden for object questions for the bilingual children relative to their monolingual peers.
RQ4: How does number (mis)match influence real time comprehension of whquestions in bilingual children?
Evidence for the effect of number match facilitating processing in real time was moderate.There were overall more looks to the target when there was a mismatch in number between the two NPs in the question than when the number matched for the object questions but no such effect for the subject questions.This indicates that number mismatch had a faciliatory effect on the processing of which-questions as was also found for the off-line measures from this study and also in Contemori et al. (2018) and Roesch and Chondrogianni (2016).This finding is again in line with Relativised Minimality.While number mismatch appears to aid monolinguals more than bilinguals (Figure 2), there was no significant group by number match (by structure) interaction.
Contemori et al. found more looks to target for number match for objects but not subjects similarly to the present studyand also, a faster increase in looks to target in the children for object questions with number mismatch than for those where the number matched.This was not found for subject questions, nor was it found for the adults.In this sense, the children from the present study, who are older, behaved similarly to the adult controls in the study by Contemori et al.This could be associated with developmental changes in the parser's capacity; the children in this current studyaged 8-11 yearshad a more adult-like parser than those in Contemori et al,aged 5-7 years.Visual inspection of the modelled data in 1104 George Pontikas et al. Contemori et al. shows that the children's looks to the target have a noticeably less steep increase relative to both the adult control data and the gaze data from this study.In fact, for the number match condition looks to target do not rise significantly above chance at any point for the object questions.This is not the case for the adult data in Contemori et al. and also the participants in the current study, where looks to target increase beyond chance for the object questions and show a similar sine-like pattern of decrease and increase.
In German, Schouwenaars et al. (2018) found that children utilised case to disambiguate wh-questions in real-time and could also utilise number agreement in the presence of disambiguating case information.However, number agreement alone did not facilitate disambiguation (note this condition is the one most comparable to the number mismatch manipulation in this study).Schouwenaars et al. argue the advantage for case is due to the fact that it is marked on the first NP, thus acting as an early disambiguating cue (whereas for object questions, number agreement is marked on both the verb and the second NP).Roesch and Chondrogianni (2016) also find an advantage for early cues in disambiguation.Case is not overtly marked on nouns in English and therefore number agreement is the only disambiguating cue available.Therefore, number agreement may have greater functional value as a cue in sentence processing in English than for languages with more complex overt inflectional morphology.This could thereby explain the differences in findings between Schouwenaars et al. and the present study.
The findings from this study differ from other eye-tracking studies on use of morphosyntactic cues during sentence processing in bilingual children (Lemmerth & Hopp, 2019;Lew-Williams, 2017).These have given a more nuanced picture with cue utilisation being more contingent on the exposure to the second/additional language or the properties of the language per se.We believe there are two explanations for this.Firstly, the children in this study are mostly simultaneous or early sequential bilinguals who show more monolingual like patterns as is the case with the simultaneous bilinguals in Lemmerth & Hopp.Greater divergence from monolingual patterns is observed in the sequential bilinguals in Lemmerth & Hopp and in Lew-Williams where the children are L2 learners.The second explanation lies in the nature of what is tested.Lemmerth and Hopp as well as Lew-Williams tested predictive processing and how this can be facilitated through gender marking.It is the case both that the latter is highly lexical, and that its acquisition is therefore expected to need a large quantity of input.There is no reason to expect this is the case for the incrementality in sentence processing and the need for revision in the case of experiencing garden-path effects.
The effects of first NP number One limitation in Contemori et al. is that all trials mismatch with number mismatch had a singular first NP and a plural second one.We manipulated the number of the first NP to be either singular or plural by extending the paradigm from Contemori et al.Our findings suggest that this limitation did not impact the findings from Contemori et al. in terms of the effect of number mismatch as a facilitatory cue during processing.We did observe an effect for off-line comprehension accuracy, where accuracy was lower for trials with a plural first NP.This may be related to markedness; the plural NP is the marked form and may thus be harder.Alternatively, this difficulty may be attributed to the fact that singular which NPs are simply more felicitous as they require the selection of a single entity from a choice of two rather than a set of two entities from a choice of two sets.We note that this effect is observed only in the case of the response accuracy, and not eye-movement data or Eye-tracking processing of which-questions in bilingual children reaction times.Although this may suggest that this effect occurred during the question answering phrase, rather than during online processing of the critical sentences, we are cautious in drawing strong conclusions here given the effect was observed in only one measure.

Heterogeneity in bilingualism
One potential limitation of the current study is the heterogeneity of the population in terms of age, proficiency in English and linguistic background.To address this variability, we fitted the models with age and language proficiency (Core Language Score from CELF-4) as covariates for both bilingual and monolingual children.Age and language proficiency were significant predictors of performance in the anticipated direction.However, the main effects observed were significant even after age and language proficiency were controlled for.To address variability specifically in the bilingual children, we fitted the models with the bilingual data with length of exposure, English language proficiency dominance and English language exposure as covariates.These were not found to be significant predictors of performance and again did not improve the model fit.This is in contrast to previous studies which have shown that a younger age of onset leads to more nativelike acquisition (e.g., Roesch & Chondrogianni, 2016 for wh-questions; Lemmerth & Hopp, 2019 for predictive processing; Chondrogianni & Marinis, 2011 for vocabulary) as does greater exposure to a language (e.g., Peña, Bedore, Shivabasappa & Niu, 2020;Chondrogianni & Marinis, 2011).An explanation for our findings may be that bilingual children have received an adequate quantity of input to enable them to acquire and successfully process object which-questions.This would be consistent with the non-linear effects found in Peña et al. (2020).Moreover, under the tentative hypothesis that processing in an additional language is indeed slower due to the bilingual child having two linguistic systems instead of reduced proficiency and/or exposure, then factors of bilingual experience may be less significant predictors of the child's processing performance.
A further challenge is that unlike in other studies on sentence processing in bilingual children (e.g., Russian-German in Lemmerth & Hopp, English-Spanish in Lew-Williams, French-German in Roesch & Chondrogianni), we recruited children with varying language backgrounds.Given the varied nature of our bilingual sample, it was not possible to investigate cross-linguistic effects in our study.However, according to the World Atlas of Language Structure (Dryer, 2013), the bilingual participants' L1s are all similar to English in terms of the linguistic features used in this study, i.e., there is subject-verb agreement, number marking on nominals and wh-question fronting.While we thus do not draw any strong conclusions about how cross-linguistic influence may have affected our results, examining how it may influence processing and comprehension of whquestions in English would be a useful avenue of further research.

Conclusion
We examined the online processing of which-questions in bilingual children.The results show that bilingual children did not underperform monolingual children in terms of overall accuracy or overall reaction times.Moreover, they looked at the correct picture about as much as the monolingual children over a 2 second period after hearing the auxiliary verb.The difference between the two groups was in the timecourse of processing, 1106 George Pontikas et al.
with slower processing in the bilingual group.However, these differences were found on a fine-grained timescale with the end result not being different to monolingual children.Qualitatively, the bilingual children did not differ significantly from the monolinguals.They had greater difficulty with object relative to subject questions in the same way as monolingual children as evidenced by the absence of significant interactions between group and structure.Moreover, the same factors or facilitative features (i.e., number matching between NPs) which have been shown in previous studies to impact language processing in monolingual children had the same impact in bilingual children suggesting similar processing mechanisms.
a transitive verb and two animals.The transitive verbs were action verbs in the active and were semantically reversible.This way all sentences in the experimental trials were semantically reversible.The verbs used were the same as inContemori et al. (2018); they were high frequency verbs with an age of acquisition of five years and under according to the MRC Psycholinguistic Database.Contemori et al. compared the frequencies of the nouns used and found no differences, see Appendix A for a full list of sentences.Stimuli were digitally recorded by a male L1 speaker of British English.The sentences were recorded as a single sentence rather than cross-spliced to preserve natural intonation.Trial sentences were reviewed and those with poor audio quality, clicks and abrupt changes to rhythm and intonation were re-recorded before the task was administered to participants.There were no fillers because the inclusion of fillers would have increased the length of the experiment and would have risked loss of attention.The visual stimuli were derived fromContemori et al. (

Figure 1 .
Figure 1.Sample visual stimuli for Subject & Object questions with SG-SG NP pairing.

Figure 2 .
Figure 2. Looks to target as a proportion over time by group and structure (slashed vertical line indicates point of disambiguation).
Subject questions -Number match -Plural First NP Object questions -Number match -Plural First NP 9. Which rats are kissing the rabbits?9. Which rats are the rabbits kissing?10.Which spiders are splashing the squirrels?10.Which spiders are the squirrels splashing?Subject questions -Number mismatch -Plural First NP Object questions -Number mismatch -Plural First NP

Table 1 .
Group comparisons for baseline language measures administered to children CLS: Core Language Score; RLI: Receptive Language Score; ELI: Expressive Language Score; LMI: Language Memory Index (composite scores from CELF-4); ANOVAs were used for normally distributed data -Mann Whitney tests when they were not.1090GeorgePontikaset al.measures, see Appendix B).For comparisons, we used both raw and standard scores where available.

Table 2 .
Demographics specific to bilingual children

Table 3 .
Parental report -Bilingual language profile: Age of Onset (AoO), Length of Exposure (LoE) to English for bilingual children and dominance (mean and range in years; SD in months)

Table 4 .
Sample experimental stimuli by condition

Table 5 .
Accuracy as a percentage and reaction times by condition for each group (95% bootstrapped CIs in square brackets)

Table 6 .
Fixed effects for the accuracy and reaction time data

Table 7 .
Fixed effects for gaze data