Not all wh-dependencies are created equal: processing of multiple wh-questions in Romanian children and adults

Abstract The aim of this study was to examine the acquisition and processing of multiple who- and which-questions in Romanian that display ordering constraints and involve exhaustivity. Toward that aim, typically developing Romanian children (mean age 8.3) and adults participated in a self-paced listening experiment that simultaneously investigated online processing and offline comprehension of multiple wh-questions. The study manipulated the type of wh-phrase (who/which) and the order in which these elements appear (subject–object [SO]/object–subject [OS]). The response to the comprehension question could address the issue of exhaustivity because we measured whether participants used an exhaustive or a non-exhaustive response. Our findings reveal that both children and adults slow down when processing who- as compared to which-phrases, but only adults show an online sensitivity to ordering constraints in who-questions. Accuracy is higher with multiple who- than which-questions. The latter pose more difficulties for comprehension, particularly in the OS order. We relate this to intervention effects similar to those proposed for single which-questions. The lack of intervention effects in terms of reaction times indicates that these effects occur at a later stage, after participants have heard the whole sentence and when they interpret its meaning.

Crosslinguistic studies on the comprehension of wh-questions, mainly using offline comprehension measures like picture identification and reporting accuracy scores, have found that children already at the age of 4 comprehend subject and object who-questions on a par (for English: Avrutin, 2000;Goodluck, 2005Goodluck, , 2010Hirsch & Hartman, 2006;for French: Bentea & Durrleman, 2013;for Hebrew: Friedmann et al., 2009;for Romanian: Bentea, 2016;a.o). However, there are also studies that report differences in performance between subject and object questions, as children seem to be more accurate with subject than with object questions (for English : Tyack & Ingram, 1977;for Italian: De Vincenzi et al., 1999). Online self-paced-reading studies with adults also show increased processing cost for object extraction manifested in longer reading times for object questions than for subject questions (De Vincenzi, 1991;Meng & Bader, 2000;Schlesewsky et al., 2000; see also Stowe, 1986 for contrasting results).
Moreover, the type of wh-element also affects comprehension as children find object which-questions the hardest to interpret correctly. Avrutin (2000) found that English-speaking children (mean age 4.3) give only 48% correct responses for object which-questions (2b) compared to an accuracy rate of 80-86% for the other three types of wh-questions, namely subject and object who-questions like (1a-b) and subject which-questions like (2a). Friedmann et al. (2009) report similar results for Hebrewspeaking children (mean age 4.3) who comprehend object which-questions only 58% of the time and subject which-questions 78% of the time. Children also performed equally well in the comprehension of subject and object who-questions (around 80% correct responses). In a recent study, Contemori et al. (2018) used the visual world paradigm to examine online and offline comprehension of subject and object which-questions in English-speaking children aged 5-7. Their offline results are in line with those reported in Avrutin (2000) for English and show a subject-object (SO) asymmetry in children's comprehension of which-questions: they were at 63% accuracy for object questions and 95% accuracy for subject questions in a picture matching task. The eye-gaze data in Contemori et al.'s study also indicate that children have more difficulties processing object than subject which-questions, and that these difficulties stem from a strong expectation that the first noun encountered will be the subject. In other words, children start with an initial subject preference in object which-questions and find it harder to revise and reorient their looks when this initial interpretation turns out to be incorrect.
Children's comprehension difficulties with object which-questions have been accounted for in terms of similarity in lexical N restriction between the moved element and the intervening subject (Friedmann et al., 2009;Belletti et al., 2012;Friedmann et al., 2017). This similarity gives rise to intervention effects along the lines of those captured by the principle of Relativized Minimality (RM) in adult grammar (Rizzi, 1990(Rizzi, , 2004(Rizzi, , 2013(Rizzi, , 2018Starke, 2001). RM states that two elements, X and Y, cannot be connected by movement if Z hierarchically intervenes between them, and Z bears the same morphosyntactic features as X. For example, in order to correctly interpret a sentence like (3), the wh-element when (corresponding to the target X) must be related to its trace (Y), but this relation cannot hold because another wh-element who (Z) intervenes in the path between when and its trace. The violation is triggered by the identity relation between the featural specification of the intervener who and that of the moved element when (in this case, the featural specification of the two elements is Q 1 ): X Z Y 3. *When do you wonder who arrived ___ ?
By extending the application of RM to child grammar, Friedmann et al. (2009) postulate that children encounter difficulties with movement structures in which one element containing a lexical restriction (meaning sequences such as "the N" or "which N") intervenes in the movement of another N element. According to the featural intervention account, the correct interpretation and production of object which-N questions, for example, is hindered by the presence of the subject (the dog in [2b] above) because this constituent intervenes between the object which cat and the verb bite and acts as a competitor in resolving the grammatical dependency between the object and the verb. What makes the subject a potential competitor is the inclusion of the N feature (or lexical restriction) present on the intervening subject in the set of features that also characterizes the moved object. On the other hand, children have no comprehension difficulties with object who-questions (2a), as these do not give rise to intervention effects: although the subject the dog intervenes between the moved object who and its trace, there is a disjunction of features between the moved element X and the intervening element Z (the object who is specified with a Q feature, while the intervening subject the dog bears a N feature).
In the context of these findings for the comprehension of single wh-questions, the present study extends the investigation of children's acquisition of wh-dependencies to structures that have received far less attention in language acquisition, namely multiple wh-questions (i.e., questions containing more than one interrogative word). The goals are (a) to examine how children and adults process multiple wh-questions in a language with multiple wh-movement (Romanian) and (b) to uncover the source of difficulty in the comprehension of who and which multiple wh-questions. Specifically, if children have difficulties with the comprehension of multiple wh-questions, we seek to investigate whether this can be accounted for in terms of featural intervention, that is, in terms of similarity between the features of the moved elements and the features of the intervening constituents.
In the remainder of the introduction, we will present the results of previous studies on the processing and acquisition of multiple wh-questions, as well as the properties of multiple wh-questions in Romanian.

Multiple wh-questions: acquisition and processing
Multiple wh-questions have special syntactic and semantic properties; they involve different dimensions of crosslinguistic variation and therefore lead to additional learning difficulties as compared to single interrogatives illustrated in (1)(2) above. The examples in (4) to (6) show that some languages may allow only one wh-phrase to be fronted (English), others may require fronting of all wh-words (Romanian and Slavic languages), whereas some languages do not allow multiple wh-questions at all (Italian, Irish) (see Schulz & Roeper, 2011 for a classification of languages according to availability of multiple wh-movement): 4. Who bought what?
(English) 5. Cine ce a cumpărat? (Romanian) who what has bought "Who bought what?" 6. *Chi ha comprato che cosa? (Italian) who has bought what thing "Who has bought what?" Another property of multiple wh-questions is that they may or may not obey Superiority (Chomsky, 1973), a condition that imposes a strict ordering on wh-words and states that the superior, or higher, element must move overtly. 2 So examples like (7a) for English, where the object has moved over the structurally superior subject, violate this constraint: 7. a *What did who buy? (English) b. Which book did which child buy? Such Superiority effects are cancelled when the wh-expressions are complex phrases of the type which N, as illustrated in the example (7b) above (Karttunen, 1977;Pesetsky, 1987;Comorovski, 1989). Which N phrases have been termed as D(iscourse)-linked (Pesetsky, 1987(Pesetsky, , 2000 3 ) because these elements, contrary to bare wh-words, prompt an answer chosen from "a set of individuals previously introduced into the discourse" or "from a set that is presumed to be salient to both speaker and hearer" (Pesetsky, 2000, p. 16). Pesetsky (2000) relates the difference in acceptability between (7a) and (7b) to the different movement options available to the two types of wh-phrases. Specifically, in (7b), unlike in (7a), the higher wh-phrase (which child) undergoes wh-feature movement, and it is only the lower wh-phrase (which book) that undergoes overt phrasal movement, one argument being that which or D-linked phrases constitute an exception to the multiple specifier requirement (Pesetsky, 2000: 41) whereby the complementizer in multiple questions requires more than one wh-specifier. 4 Studies with adults on the processing of multiple wh-questions containing Superiority violations (Hofmeister et al., 2013) argue that questions like (8a) violating Superiority constraints, as the object wh-element is moved by crossing over the higher wh-subject, pose significant processing difficulties that are absent in instances like (8d), which contain the same object-subject order. Given that which-phrases carry additional syntactic and semantic features as compared to bare wh-words, processing of a which-phrase involves increased activation and resistance to interference in memory and, as a result, leads to easier retrieval than processing bare wh-words. Indeed, Hofmeister et al. (2013) show that which-elements elicit more efficient processing than who-constituents in English multiple wh-questions. Sentences with two which-phrases receive higher acceptability judgments than minimally different ones with bare wh-words. 5 Acceptability judgments for the which-who and who-which cases in (8b,c) are intermediate between the what-who and the which-which ones. Hofmeister et al. (2013) also observe faster reading times at the verb (and its spillover regions) for which than for who in a self-paced reading task, 6 as well as higher question-answer accuracy for which-which than for who-who questions. Therefore, when acquiring multiple wh-questions, children must also be able to distinguish between contexts in which Superiority effects need to be obeyed, as in the case of bare wh-words, and those which represent apparent violations of Superiority, like with which-constituents. Apart from acquiring the specific syntactic properties of multiple wh-questions, children also need to learn that this type of question involves pairing relations between the two wh-words (Dayal, 2002;Grohmann, 2003). A question like Who bought what? in English requires a pair-list (PL) reading in which the exhaustive sets of possible answers to both who and what are pairwise linked. 7 The correct answer to (4) above could be "John bought a book and Mary bought a DVD." Two steps are thus necessary to derive a paired list answer: exhausting the question domain and pairing the two wh-elements. Roeper and de Villiers (2000) note that a paired answer also entails special syntactic relations in syntax, as the subject wh-element must c-command the object or adjunct wh-words (like where, when, how). When no such c-command relationship holds, as in the case of conjoined questions like "Who ate and what?", paired answers are not required (Krifka, 2001). An investigation into children's answers to multiple wh-questions can therefore inform on the type of structure they assign to these multiple whdependencies and reveal (a) whether children have difficulties with exhaustivity, in other words, if they give exhaustive PL answers or they answer only with one pair, or (b) whether children have difficulties with pairing, in other words, if they link one wh-word to another or if they answer only one of the wh-words.
Few studies have looked at the acquisition of multiple wh-questions. In two crosslinguistic studies on the production of multiple interrogatives, Grebenyova (2006Grebenyova ( , 2011 found that 4-to 6-year-old English-speaking children and children speaking Malayalam had acquired the properties of multiple wh-questions, whereas Russian-speaking children manifested some difficulties with the language-specific syntax of these structures, because they also produced questions with one fronted wh-word and one in-situ, an option that is not allowed in adult Russian. In addition, when exploring the frequency of multiple interrogatives in the input, Grebenyova (2006Grebenyova ( , 2011 concluded that children's exposure to these structures is minimal. Roeper and de Villiers (1991) tested the acquisition of PL readings using a question-with-picture task and reported that 4-to 6-year-old English children gave PL answers 78% of the time, while younger children did so in only 32% of the cases. Roeper et al. (2007) also looked at PL readings in multiple wh-questions in English and German and found that German-speaking children acquire such readings at the age of 5 as compared to the age of 6 for English-speaking children. That German children acquire PL reading earlier than English children has been linked to the presence of exhaustivity markers in German, but not in English. The presence of the exhaustivity marker alles "all" (Wer kommt alles? "Who is coming all?") makes children more likely to give exhaustive answers but also seems to enhance the knowledge that wh-words without alles are likely to be exhaustive in German (see Schultz, 2010;Schulz & Roeper, 2011). Schulz and Roeper's (2011) results on the acquisition of PL readings by Germanspeaking children with developmental language disorder (DLD, formerly termed as S[pecific] L[anguage] I[mpairement]) showed that these children struggle with providing appropriate PL answers for multiple wh-questions as compared to typically developing children. However, Roeper and Schulz (2011) show that most of the errors in DLD children consisted of providing subject lists (41% of the incorrect answers) and object lists (20%) and that only 16% of the errors consisted of single-pair answers. This suggests that DLD children have more difficulties with pairing, which seems to emerge "independently and later than exhaustivity" (Schulz & Roeper 2011, p. 404). Gavarró et al (2010) investigated the acquisition of multiple wh-questions in 3-to 5-year-old Bulgarian-speaking and Polish-speaking children, through a repetition and a comprehension task. They found that the youngest children correctly repeated multiple interrogatives over 75% of the time and that the number of correct repetitions increased significantly with age. The repetition task also included ill-formed wh-questions with wh-in-situ, Superiority violations, and intervening constituents between the wh-words. The authors report that, in cases of ill-formed questions, children avoided Superiority violations and intervening structures by omitting constituents, and they also raised some in-situ wh-constituents, both in Bulgarian and in Polish. The results of the comprehension task, which included questions with two or three wh-words, show that Bulgarian children give exhaustive answers in 90% of cases even at the age of 3, while at the age of 5, many Polish children still give nonexhaustive answers. The difference reported in Gavarró et al. (2010) between Polish and Bulgarian mirrors the one found for English and German (Roeper et al., 2007): Bulgarian, but not Polish, has plural markers associated with the wh-phrase. This appears to facilitate the early acquisition of exhaustivity in Bulgarian children, whereas Polish children lag behind and provide only 40-60% exhaustive answers. Bentea (2010) investigated the comprehension of multiple wh-questions by 4-to 6-year-old English, French, and Romanian children and found that English and French children behaved alike in that they gave more PL readings to multiple wh-questions as compared to their Romanian peers. Romanian children, on the other hand, manifested a strong preference to answer only the second wh-element in the structure, although Romanian obligatorily fronts both wh-words. Măniţă (2017) also looked at the comprehension of multiple wh-questions with two and three interrogative words in 4-to 6-year-old Romanian children. Her findings show that, even at the age of 6, Romanian children give a low percentage of exhaustive answers to multiple interrogatives; however, they show mastery of exhaustivity in single wh-questions already at the age of 5. This developmental pattern is in line with crosslinguistic findings showing that children recognize exhaustivity in multiple wh-questions later than in single wh-questions or wh-alles-questions like in German (Schulz, 2010;Schulz & Roeper, 2011). Similar to Bentea's (2010) study, the children tested in Măniţă (2017) also displayed a preference to answer only one whphrase (in this case though, contrary to Bentea, 2010, these answers were mainly to the higher/highest wh-element). Thus, 4-to 6-year-old Romanian children seem to interpret multiple wh-questions as single wh-questions, which can be taken as evidence that they have difficulties interpreting the wh-words as pairs.
In summary, multiple wh-questions present a number of important syntactic and semantic properties that children need to acquire, while faced with impoverished input. In this study, we address a topic that has been understudied in child Romanian and examine how Romanian-speaking children and adults process multiple wh-questions, and to what extent the type of wh-phrase (who vs. which) can contribute to processing difficulty and overall accuracy in multiple wh-questions.

Properties of Romanian multiple wh-questions
Romanian, like Bulgarian and other Slavic languages, allows for all wh-words to be overtly moved to a clause-initial position. This requirement holds for both bare (9) and D-linked or lexically restricted which-N constituents (10). 9. a. Cine pe cine sărută? who.Nom Acc.who kisses "Who is kissing whom?" b. *Pe cine cine sărută? Acc.who who.Nom kisses *"Whom is who kissing?" 10. a. Care bunică pe care fată j o j sărută? which.Nom grandmother Acc.which girl j her j kisses "Which grandmother is kissing which girl?" b. Pe care bunică j care fată o j sărută? Acc.which grandmother j which.Nom girl her j kisses "Which grandmother is which girl kissing?" 11. a. Care bunică pe cine sărută? which.Nom grandmother Acc.who kisses "Which grandmother is kissing whom?" b. Pe care bunică j cine o j sărută? Acc.which grandmother j who.Nom her j kisses "Which grandmother is who kissing?" Wh-objects in Romanian are marked for Acc by the preposition "pe" and whichobjects are also obligatorily doubled by a clitic pronoun agreeing in gender and number (o "her" in examples [10a,b] and [11b]) (see Dobrovie-Sorin, 1994 for a detailed discussion of the syntax of wh-questions in Romanian). This is an instantiation of the "clitic doubling" phenomenon present in languages like Romanian and Spanish, whereby an accusative or dative clitic pronoun appears together with a co-referential full lexical noun phrase.
While there is a strict ordering among bare wh-elements, since fronting a who-object over a who-subject is ungrammatical (9b), D-linked or lexically restricted which-phrases must always appear clause-initially (10-11), preceding bare phrases, and are known not to show Superiority effects (Comorovski, 1996). This is illustrated by the grammaticality of the b examples in (10) and (11), showing that which-objects can be fronted across both which-and who-subjects. From an interpretive standpoint, which-expressions are associated with the notion of "givenness" and have been analyzed as topics (see Comorovski, 1996 andAlboiu, 2000). Laezlinger and Soare (2005) and Soare (2009) show that which-phrases in Romanian get attracted to a higher position than the position occupied by bare elements at the left periphery of the clause. Given the topic flavor of which-constituents, the authors suggest that these phrases target a position that also bears a Topic feature and which is above the landing site of bare wh-phrases. 8 Various analyses have been proposed for the order preservation constraint in languages with multiple wh-movement (Richards, 1997;Boškovic, 2002;Krapova & Cinque, 2008;a.o.). The analysis put forth in Krapova and Cinque (2008) traces back the ordering contrast between (9a) and (9b) to a version of RM and featural intervention in which only a whole chain, not just the link of a chain, counts as an intervener and where a chain consists of a moved element and its trace (see also Soare, 2009 andRizzi, 2017 for an application of this analysis to Romanian multiple whquestions). The derived structures given in (12a) show that only one occurrence of the object pe cine, but not the whole chain, intervenes between cine and its trace (represented in angled brackets), and only one occurrence of cine, but not the whole chain, intervenes between pe cine and its trace. While crossing (or intersecting) chains as in (12a) are allowed in cases of featural identity, as both moved elements bear a Q feature, nested chains are excluded. In the ungrammatical example (12b), both occurrences of cine intervene between pe cine and its trace. Under this interpretation, the structure in (12b) is correctly ruled out by RM.
If we consider the corresponding sentences, but with lexically restricted whichphrases, (13a) is non-problematic because the structure also yields crossing chains. As for (13b), we follow the proposal in Rizzi (2011Rizzi ( , 2017 and Villata, Rizzi, and Franck (2016) according to which lexically restricted wh-elements have two available attractors and can target different positions, as compared to bare wh-words which can only be attracted by a Q feature. Applying this analysis to (13b) results in a configuration where the first which-element is attracted by the complex feature conglomerate [Q, N], while the second targets a [Q] position. These structures thus instantiate an inclusion configuration like in the case of single which questions, but this inclusion configuration in multiple wh-questions is created by the [Q] feature. In such cases, even if the whole chain of care fată intervenes between the object pe care bunică and its trace, the feature of the intervening chain is included in that of the target, so that no violation of RM is triggered. According to this analysis, the same inclusion relation holds in examples like (13c) as well, where the first wh-phrase pe care bunică is also attracted by the feature conglomerate [Q, N], while the second element, a bare wh-word, can only target a [Q] position.
The two important points related to the structure of multiple wh-questions in Romanian are (a) that the SO order can be inverted when the object wh-phrase is lexically restricted and (b) that which-elements are fronted to a position distinct from and higher than that of bare wh-elements, such that the order preservation constraint, whatever its implementation mechanism, is not operative. Since these questions involve the displacement of two wh-phrases whose featural specification creates either an identity or an inclusion relation, investigating the comprehension of who and which-multiple questions in Romanian children can shed light on the role of intervention effects in the acquisition of these structures.

The present study
The present study therefore aimed to investigate the acquisition and processing of multiple who and which-questions in Romanian that display ordering constraints and involve exhaustivity. Specifically, we sought to examine (a) how children and adults process multiple who-questions as compared to which-questions, (b) whether they display an online sensitivity to ordering constraints in multiple wh-questions, (c) to what extent the type of wh-phrase (who vs. which) affects overall accuracy in multiple wh-questions, and (d) whether participants provide exhaustive PL answers to multiple wh-questions.
While the other studies that have investigated the comprehension of multiple wh-questions in child Romanian always included an animacy mismatch between wh-words and also contained questions with wh-arguments and wh-adjuncts, in this study we only focus on multiple wh-questions with two wh-arguments and without a mismatch in animacy. In addition, in our study we also manipulate the order of the wh-words, with the wh-subject either preceding or following the wh-object. We adopt a self-paced listening task which offers a segment-by-segment measure of sentence processing, the rationale being that longer listening times at a specific segment in the sentence reflect increased processing difficulties (for children, see Felser et al., 2003;Chondrogianni et al., 2014;Contemori & Marinis, 2014). In addition, because participants must answer the test question verbally, the design not only probes active processing of multiple wh-dependencies based on reaction times (RTs) but also allows us to test whether the online measures correlate with the final interpretation children and adults assign to the sentence.

Method Participants
Thirty-four Romanian monolingual children between the age of 6 and 9 took part in the study (17 male, 6.11-9.08, M = 8.3 years old, SD = 11 months). All children were typically developing and had no diagnosed language, hearing, or speech disorders (as reported by the parents). Of the 34 participants, 2 were excluded from further analyses as they did not complete the task. The data of the remaining 32 participants were included in the analyses. Out of these 32 children, there were 14 children aged 6 and 7 (M = 7.3 years, SD = 3 months), eight 8-year-olds (M = 8.7 years, SD = 4 months), and ten 9-year-olds (M = 9.4 years, SD = 3 months). A control group of 20 adults also participated in the study (4 male, M = 24; 10 years old, SD = 52.4 months). All participants were recruited and tested in Romania. Prior to taking part in the experiment, the children received written parental consent. The adult participants also gave their informed consent. The study was approved by the Ethics Research Committee of the School of Psychology and Clinical Language Sciences, at the University of Reading.

Stimuli
The experiment simultaneously tested online processing (self-paced listening) and offline comprehension (response accuracy) of who-and which-multiple questions in Romanian children and adults. The participants listened segment-by-segment to embedded questions with two extracted wh-phrases (14). As there are no differences in word order or interpretation between embedded and unembedded multiple whquestions in Romanian, we opted for the use of embedded questions because of the way we set up the task, namely as a game with Paddington the Bear who wants to find out what is happening in various pictures involving princesses and superheroes. More details on the task are provided in the Procedure section.
The test sentences varied with respect to the order of the wh-elements (subjectobject [SO] vs. object-subject [OS]), as well as the type of wh-words used (only who, only which, or which followed by who 9 ). Each sentence was preceded by a lead-in introducing the characters in the pictures.
14. Examples of test sentences used (the / indicates the segment boundaries; each sentence was divided into 8 segments, Paddington being the first segment in each sentence. The lead-in was presented in one block.)

Lead-in:
This is a picture of Jasmine, Elsa, Anna and three grandmothers/ Test sentence: Paddington/ wants to know/

Subject-Object with two who-constituents (SO-Who)
a. cine /pe cine /sărută /duios /pe obraz /înainte de culcare who /Acc.who /kisses /lovingly /on the cheek /before bedtime "who is kissing whom lovingly on the cheek before bedtime." Object-Subject with two who-constituents (OS-Who) b. *pe cine /cine /sărută /duios /pe obraz /înainte de culcare Acc.who /who /kisses /lovingly /on the cheek /before bedtime "whom who is kissing lovingly on the cheek before bedtime." Subject-Object with two which-constituents (SO-Which) c. care bunică /pe care prinţesă j /o j /sărută /pe obraz /înainte de culcare which grandmother /Acc.which princess j /her j /kisses /on the cheek /before bedtime "which grandmother is kissing which princess on the cheek before bedtime." Object-Subject with two which-constituents (OS-Which) d. pe care bunică j /care prinţesă /o j /sărută /pe obraz /înainte de culcare Acc.which grandmother j /which princess /her j /kisses /on the cheek /before bedtime "which grandmother which princess is kissing on the cheek before bedtime." Object-Subject with a which-object and a who-subject (OS-WhichWho) e. pe care bunică j /cine /o j /sărută /pe obraz /înainte de culcare Acc.which grandmother j /who /her j /kisses /on the cheek /before bedtime "which grandmother who is kissing on the cheek before bedtime." All test sentences contained an equal number of eight segments, the first two segments always being Paddington/wants to know. Although the eight segments were of different length, there was a maximum number of three words in each segment. The nouns used matched in gender, number, and animacy, while the verbs used were chase, cover, follow, hug, kick, kiss, lift, pat, pinch, pull, punch, push, splash, and tickle. The task included a total of 60 items: 10 for every condition in example (14), as well as 10 fillers. 10 The fillers were grammatical subject questions, half of them consisting of embedded single who-questions and half of single which-questions, illustrated in example (15). 15. Examples of filler sentences used (the / indicates the segment boundaries) a. This is a picture of a woman, a boy, and a man/ Paddington/ wants to know/ cine/ ţine/ tortul/ în mână/ seara/ la petrecere who/ holds/ cake.the/ in hand/ in the evening/ at the party "who is holding the cake at the party in the evening." b. This is a picture of two bees and a bird. Paddington/ wants to know/ care albină/ zboară/ rapid/ dimineaţa/ în grădină/ peste trandafiri which bee/ flies/ rapidly/ in the morning/ in the garden/ over the roses "which bee is flying rapidly over the roses in the garden in the morning." Filler questions contained both transitive (eat, cut, hold, read) and intransitive (sleep, fly) verbs and were divided as well into eight segments. In using single wh-questions as fillers, we wanted to ensure that participants, especially children, can answer questions correctly and do not have difficulties with this particular type of structure or with the task itself. Moreover, the format of the fillers was similar to that of the test items, in terms of both image display and audio stimuli. Given that both the test sentences and the fillers were introduced by the same two segments Paddington/ wants to know, this reduced the type of possible constructions which could follow this introduction.
The test and filler sentences were digitally recorded by a native speaker of Romanian in a soundproof booth. The sentences were segmented using the Audacity software. At the end of each sentence, only one image with three pairs of characters as in Figure 1a-c appeared on the screen, and the participants had to answer the question they had just heard by verbally identifying the correct actions and characters.
After hearing a question with two who-constituents, as in (14a) or (14b), participants would see either a picture triad like the one illustrated in Figure 1c containing two actions of the same type and a third different action or a picture triad like in Figure 1a and 1b, in which all the pairs of characters perform the same action, but with reversed Agent-Patient roles. For example, if the participants heard a SO-Who question and then saw the picture triad in Figure 1c, they were expected to identify only the pairs that perform the correct action (e.g., Jasmine is kissing the grandmother and Elsa is kissing the girl.), while ignoring the irrelevant one (e.g., cat chasing Anna). However, we wanted to ensure that participants do not simply rely on verb knowledge when answering these questions and that they can correctly interpret multiple wh-questions as requiring exhaustive PL answers. Therefore, multiple who-questions like in (14a) and (14b) were also associated with picture triads as in Figure 1a or 1b. Here the correct answer consisted of exhaustively identifying all three pairs of characters as they all perform the correct action (i.e., someone kissing someone else). Questions containing which-constituents (14c-e) were associated with picture triads as in Figure 1a or 1b. In this case, two of the actions in the image corresponded to the correct interpretation of the sentence, while one of the actions corresponded to the reversed Agent-Patient interpretation, in line with stimuli that have been used for testing the comprehension of single which-questions. For example, Figure 1a appeared after the SO-Which question in (14c). The correct answer in this case would be "The red grandmother is kissing Anna and the yellow grandmother is kissing Jasmine" (top right picture and bottom picture), while ignoring the reversed role action in which Elsa is kissing the grandmother in green (top left picture).

Procedure
Participants were tested individually in a quiet room. The self-paced listening task was programmed and ran on a laptop using the PsychoPy software (version 1.90.3, Peirce et al., 2019;Peirce & MacAskill, 2018). Participants had to press the space key to advance from one segment to the next. PsychoPy recorded the time between each key press, which provided the listening times for each segment. The sentences were administered through headphones. All the participants heard all the items, and the order of item presentation was fully randomized automatically with PsychoPy, such that no participant saw the items in the same order. As a result, potential familiarization effects were reduced among items. Although one test sentence appeared both with a SO and an OS order and was associated with an image containing the same pairs of characters, the position of the three pairs was changed between images, as was the direction of the action being performed (left to right or right to left), again to reduce strategic and familiarization effects.  The task was set up as a game about princesses and superheroes, played with Paddington the Bear, in which Paddington wants to find out what is happening in different pictures. The pictures were always about three princesses (Elsa, Anna, Jasmine) and three superheroes (Batman, Superman, Spiderman) interacting with various animals and people. By using pictures that always involved the same characters presented in the introduction, we wanted to reduce the number of new characters that participants had to identify when answering Paddington's questions. Each of these six characters was introduced to the participants when they heard the instructions for the task (see Appendix A) to ensure the participants were familiar with their names. The detailed instructions were then followed by a familiarization phase that included a block of five practice items, which were used to familiarize the participants with the task and with pressing the space bar after each segment. The practice items were constructions similar to the ones used in the experiment, but included a mismatch in animacy as well. The computer screen remained black while participants were listening to the test sentences and it was only when pressing the space bar after the final segment in each sentence that one picture like the ones illustrated in Figure 1 above appeared on the screen. At that moment, the participants had to verbally answer the question and identify the pairs of characters performing the correct action. All the answers were recorded and then transcribed for analysis. The experiment was administered in one session for both children and adults and lasted about 20-30 min.

Predictions
On the basis of the properties of multiple wh-questions and of previous studies on the comprehension of single who and which-questions (Friedmann et al., 2009;De Vincezi et al. 1999;Bentea, 2016;a.o.), we predict that questions with a SO order should be comprehended better than questions in which the wh-object precedes the wh-subject and that the type of wh-element should also affect comprehension, such that multiple who-questions should yield more accurate responses than multiple which-questions. Moreover, if children have difficulties computing inclusion configurations in general, even when this inclusion relation is triggered by an overlap in the Q, but not N feature, then we should see that they struggle more as compared to adults with the comprehension of structures that instantiate such an inclusion configuration (namely, questions with an OS order like in [13b] and [13c]). Previous studies on the acquisition of multiple wh-questions in Romanian (Bentea, 2010;Măniţă, 2017) show that children aged 4-6 give a low percentage of exhaustive responses and that they mainly answer only one of the wh-words. As the children in our experiment are older (6-9), we expect them to give more exhaustive PL answers, in line with crosslinguistic findings that children master exhaustivity in multiple wh-questions around the age of 6.
If asymmetries emerge in online processing as well, then we expect longer RTs at the verb (and possibly its spillover regions) for which-questions than for who-questions. However, if which-phrases pose less processing difficulties as compared to who-constituents, based on Hofmeister et al. (2013), then we predict that whichitems will be processed faster than who-items, and that we will also observe faster RTs at the verb region in which-questions, as these elements are easier to retrieve from memory due to increased activation and resistance to interference in memory. If participants show an online sensitivity to ordering constraints, then we expect a slowdown in the order-violating condition upon detecting the ungrammaticality. The pattern is predicted to be qualitatively similar in children and adults, but children might show longer RTs than adults.

Analysis
We analyzed the proportion of accurate responses, as well as the RTs of trials that received correct answers. Accuracy data were analyzed with mixed effects logistic regression (Jaeger, 2008), and RTs were analyzed with mixed effects linear models (Baayen et al., 2008). All analyses were conducted using the lme4 package (Bates et al., 2015) in R (R Core Team, 2019), and figures were produced using the package ggplot2 (Wickham, 2016).

Accuracy
We used generalised linear mixed effects regression (GLMER) modeling to analyze the accuracy data in R. The statistical analysis was performed in two stages. The first analysis focused on the comparison between the SO and OS orders in the who and which conditions for both children and adults. The model included WhType (Who vs. Which), WhOrder (SO vs. OS), and Group (Children vs. Adults), as well as their interaction as fixed predictors. The second analysis considered the OS conditions separately to evaluate whether the comprehension of multiple questions with an OS word order was modulated by the effect of WhType (Who vs. WhichWho vs. Which) and whether this effect differed across the two Groups (Adults vs. Children). All the fixed factors in the two analyses were coded using sliding difference or repeated contrasts which test consecutive factor levels against each other (Schad et al., 2019). The contrasts were coded as follows -WhType: Who (-1) versus Which (1); WhOrder: SO (-1) versus OS (1); Group: Adults (-1) versus Children (1). In the model with OS structures only, we specified the following contrast matrix for WhType (c2 vs. 1, with Who coded as −1, WhichWho as 1, and Which as 0; c3 vs. 2, with Who coded as 0, WhichWho as −1 and Which as 1). For the random effect structure of the models, we followed current guidelines in psycholinguistics and chose parsimonious mixed models (Bates et al., 2018), because these models are more suited for analyzing the typical samples included in psycholinguistic research (Matuschek et al., 2017). The first model included participant and items as random intercepts and by-participant random slopes for WhType; the second model only included participant and items as random effects. The goodness-of-fit of alternative models for the random effects structure was assessed by comparing the Akaike information criterion scores (Akaike, 1974). A decrease of at least 2 in the Akaike information criterion scores means that the inclusion of a factor significantly improves the goodness-of-fit of the model. We also specified the bobyqa optimizer in the glmer function in order to sustain model convergence.
The RT data were analyzed with a linear mixed effects regression (LMER) model in R. The RTs for each segment were analyzed separately and the analysis was performed again in two steps, as for the accuracy data. In a first instance, we compared the RTs for conditions (a) to (d) in the example (14) above, so multiple who and multiple which questions with a SO and OS order. Group (Children vs. Adults), WhType (Who vs. Which), and WhOrder (SO vs. OS), as well as their interaction, were specified as fixed predictors. Second, we examined the effect of WhType (Who vs. WhichWho vs. Which) and Group (Children vs. Adults) and their interaction on the RTs for the OS conditions only. Like in the case of the analysis for accuracy, we coded the fixed predictors using repeated contrasts and using the same contrast matrix. The final maximal models supported by the data included participant and items as random intercepts and by-participant random slopes for WhType. The goodness-of-fit of alternative models for the random effects structure was assessed by comparing the Akaike information criterion scores (Akaike, 1974). p-Values were calculated by Satterthwaite's approximation for denominator degrees of freedom, using the lmerTest package (Kuznetsova et al., 2017).

Results
The accuracy scores inform us about the final interpretation that participants assign to multiple wh-questions. RTs, that is, the listening times at each segment in the question, inform us on how sentences are processed incrementally as they unfold and at which segment(s) in the sentence participants encounter processing difficulties. Figures and averages are shown in untransformed measures for ease of interpretation, but statistical analyses were performed on log-transformed measures. Table 1 indicates accuracy scores for both children and adults in each experimental condition. For each test trial, an answer was coded as accurate when the participants identified in the image all the correct actions corresponding to the question, as the correct interpretation of multiple wh-questions in Romanian requires a PL answer. The correct answer for the fillers required identifying only one out of three characters. The fact that the accuracy rate for the filler trials was very high in both children and adults shows that participants do not have difficulties with the comprehension of single embedded questions. While adults were at ceiling, children's accuracy rate for the filler trials was 95% (96% response accuracy for subject who-questions and 94% response accuracy for subject which-questions).

Accuracy
For the analysis, we will first focus on the comparison between multiple who-questions and multiple which-questions with a SO and OS order of constituents. Figure 2 illustrates the distribution of accurate responses (in percentages) in the who and which experimental conditions for both children and adults. The results indicate that both children and adults (a) comprehend multiple who-questions better than multiple which-questions and (b) that there is a difference in accuracy between the SO and the OS conditions in which, but not in who-questions.   Table 2 gives the output for the fixed effects of the final GLMER fit to children's and adults' accuracy scores for SO and OS multiple wh-questions in the conditions with two who and two which constituents.
The model revealed a significant main effect of WhType, showing that multiple which-questions (M = 67%) are significantly less accurate than multiple whoquestions (M = 93%). The effect of WhOrder was also significant and indicates that multiple wh-questions with an OS order (M = 76%) lead to lower response accuracy than multiple wh-questions with a SO order (M = 83%). The significant effect of Group reveals that children perform significantly less accurately (M = 73%) than adults (M = 90%). In order to explain the direction of the statistically significant interactions in the final model, we nested the pairwise comparisons. The significant interaction between WhType and WhOrder and subsequent pairwise comparisons show that there is no significant difference between multiple SO-Who and multiple OS-Who questions (β = −0.033, SE = 0.310, z = −0.108, p = .914), while multiple SO-Which questions were overall more accurate than multiple OS-Which questions (β = 1.214, SE = 0.186, z = 6.520, p <.001). A significant interaction between WhType and Group and subsequent pairwise comparisons reveal that there is no significant difference in performance between children and adults in the case of multiple who-questions (β = −0.606, SE = 0.635, z = −0.955, p = .339), but that children perform significantly less accurately than adults with multiple which-questions (β = −1.835, SE = 0.334, z = −5.494, p < .001). The significant interaction between WhOrder and Group and subsequent pairwise comparisons indicate that multiple wh-questions with a SO order yield significantly lower accuracy in children as compared to adults (β = −1.892, SE = 0.467, z = −4.051, p < .001), while performance with OS multiple wh-questions does not differ significantly between the two groups (β = −0.548, SE = 0.429, z = −1.277, p = .201). We also constructed an additional model to test for the interaction between WhType, WhOrder, and Group, and the results of this model were compared to the model with two-way interactions only by means of the anova function. This revealed no significant difference between the two models, based on the p-value associated with the chi-squaredistributed likelihood ratio (p = .120), and thus no significant three-way interaction between these three factors. Moving on to examine the results obtained for multiple wh-questions with an OS order, we observe from Figure 3 that the OS-Which questions yielded the lowest accuracy scores in both children and adults. Whereas adults comprehended OS-Who and OS-WhichWho questions equally well, children's performance with OS-WhichWho questions was less accurate than their performance with OS-Who questions.
The significant main effect of WhType in the analysis for multiple wh-questions with an OS order (Table 3) reveals that OS-Who questions (M = 94%) are significantly more accurate than OS-WhichWho questions (M = 78%), and that the OS-WhichWho conditions are comprehended significantly better than the  OS-Which conditions (M = 59%). The significant main effect of Group indicates that, when we also consider the results for OS-WhichWho questions, children are overall less accurate (M = 70%) with OS multiple wh-questions than adults (M = 86%). The significant interaction between OS-Who versus OS-Which and Group and follow-up pairwise comparisons show that, for children, OS-Who questions are significantly more accurate than OS-WhichWho questions (β = 2.030, SE = 0.451, z = 4.501, p < .001), but no significant difference emerges between the two conditions in adults (β = −0.255, SE = 0.614, z = 0.414, p = .909). Moreover, the significant interaction between OS-WhichWho versus OS-Which and Group and subsequent pairwise comparisons indicate that for children, and even more so for adults, OS-WhichWho questions yield significantly higher accuracy than OS-Which questions (children: β = 1.036, SE = 0.407, z = 2.547, p = .029; adults: β = 2.474, SE = 0.523, z = 4.733, p < .001).
Given that the child group covers a large age range (6-9 years old), we ran further analyses on the child data only and included age in months as a continuous variable in the model in order to test whether age plays a role in modulating the comprehension of multiple wh-questions in children. The analysis revealed a significant effect of Age (β = 0.054, SE = 0.022, z = 2.447, p < .01), showing that older children give overall more accurate responses than younger children. The interaction between WhOrder and Age was also significant (β = 0.030, SE = 0.014, z = 2.132, p < .05), indicating that the effect associated with the order of the wh-elements increases with age. The results of a post-hoc analysis (see Appendix B for the full output of each model), aiming to disentangle the effects of Age and WhOrder on multiple who-questions and multiple which-questions, reveal a significant effect of Age in both multiple who-questions (β = 0.098, SE = 0.039, z = 2.466, p < .05) and multiple which-questions (β = 0.036, SE = 0.012, z = 2.949, p < .01). In other words, older children tended to perform better on the task than younger children. However, WhOrder was a significant predictor only in the model that considered multiple which-questions (β = 0.471, SE = 0.171, z = 2.741, p < .01). Specifically, multiple which-questions containing a SO order yielded more accurate responses than multiple which-questions with an OS order. For multiple which-questions, we also found a significant interaction between WhOrder and Age (β = 0.034, SE = 0.015, z = 2.235, p < .05). Subsequent pairwise comparisons show that older children comprehend SO-Which questions containing a SO order significantly better than younger children (β = 0.054, SE = 0.015, z = 3.612, p < .001). Although response accuracy for OS-Which questions also increases with age (β = 0.019, SE = 0.014, z = 1.350, p = .177), this does not reach significance.
An asymmetry between multiple questions with two who-elements and those containing two which-elements also surfaces when analyzing the errors that children and adults make in answering these questions. The three main types of wrong answers or errors, summarized in Table 4, are (a) over-exhaustive answers (when participants identify all the pairs in the image, even though one of them does not match the action), (b) singleton answers (when participants answer only one wh-word, either the wh-subject or the wh-object, and they exhaustively list all the individuals involved in the corresponding action), and (c) role reversals (when participants reverse the Agent-Patient roles). Children not only make more errors than adults but also make a significant number of singleton errors, that is, they answer only one of the wh-words. While adults very rarely give such answers (4 out of a total of 97 errors), this is the most common type of error that children make when answering questions with two whoelements and the second most common type of error in questions with two whichphrases and in WhichWho questions. An analysis of children's errors by age group (Table 5) shows that the younger children (6-to 7-year-olds) give more singleton responses consisting of exhaustive subject or object lists than the older children. From the total of 152 such errors, 121 appear in the 6-to 7-year-old group, compared to only 14 for the 8-year-old group and 17 for the 9-year-old group. The most frequent error that both children and adults make in multiple which-questions is role reversal, indicating that both groups have more difficulties mapping the correct argument role unto the two wh-words when both are lexically restricted. Table 4. Type of errors (percentages and raw numbers out of total number of errors per condition) in children and adults for both SO and OS multiple wh-questions with two who-constituents (Who), two which-constituents (Which), and a which-object and a who-subject (WhichWho)  To summarize, the accuracy data show that children made significantly more errors than adults in their comprehension of multiple which-questions. Furthermore, children also showed lower accuracy in the comprehension of multiple wh-questions with an OS order, both when the questions contained two which-phrases and when the questions contained a which-object phrase and a who-subject. No differences emerged with respect to the comprehension of multiple who-questions.

Reaction times
The RT analyses were performed on residual RTs for accurate trials only. These were calculated by subtracting the participants' raw RTs from the total duration of each segment. Residual RTs were further screened for extreme values and outliers (see Marinis, 2010). Extreme values were defined as RTs below -1000 ms and above 2500 ms on the basis of histograms and were eliminated from the dataset. Outliers were defined as RTs above and below 3 standard deviations for each condition separately per participant and item and were replaced with the mean RT for each condition per participant and item. Extreme values and outliers comprised 1.6% of the data (2.3% of the data for children and 0.8% of the data for adults). Although each test sentence started with Paddington / wants to know /, we only analyzed the RTs for six segments, Segment 1 being the first wh-word. Figures 4-7 show the RTs in milliseconds for children and adults at each of the segments of interest, starting from the first wh-word that the participants hear. Like for the accuracy data, we first analyze the RTs for multiple who-questions and multiple which-questions with a SO and OS order, which allows us to examine how multiple whquestions are processed in real-time and whether differences appear between who and which-phrases. We only report the significant effects and interactions in the text and we present the data segment-by-segment. Figure 4 reports the RTs for children and Figure 5 shows the RTs for adults.     We found again a significant main effect of Group, which reflects longer RTs in children as compared to adults (β Adults vs. Children = 0.226, SE = 0.079, t = 2.868, p < .01).
Let us now compare the RTs for the OS conditions only in order to investigate whether the type of wh-element affects children's and adults' sensitivity to ordering constraints in multiple wh-questions. As above, we only report the significant effects and interactions and we present the data segment-by-segment. Figures 6 and 7 show the RTs for the OS conditions for children and adults, respectively.

Segment 1
The analysis for the OS conditions revealed a significant effect of Group at Segment 1 (β Adults vs. Children = 0.281, SE = 0.085, t = 3.281, p < .01), which indicates longer RTs in children as compared to adults. We also found significant differences between the levels of the WhType factor. To recall, because WhType was a three-factor level in the case of the OS conditions, we specified the following contrasts in the model (c2 vs. 1, which tested the differences between the level Who and the level WhichWho, and c3 vs. 2, which tested the differences between the level WhichWho and the level Which). Therefore, the analysis reflects significantly shorter RTs for the WhichWho than the Who conditions at Segment 1 (β Who vs. WhichWho = −0.325, SE = 0.054, t = −5.975, p < .001). The interaction between WhType Who vs. WhichWho and Group was also significant (β Who vs. WhichWho

Segment 6
The difference between the Who and WhichWho conditions only approaches significance (β Who vs. WhichWho = 0.150, SE = 0.073, t = 2.039, p = .051) but goes in the same direction as in the preceding segments to show that the WhichWho condition yields longer RTs than the Who condition. We also examined whether RTs at each segment vary as a function of Age in the Child group and whether Age modulates the online sensitivity to ordering constraints; however, no significant effect of Age emerged at any segment. In addition, we performed an analysis of RTs for the trials with incorrect responses in the SO and OS-Which conditions. The small amount of inaccurate trials for the other Who conditions and for the adult data (see Table 1) did not allow any analyses to be conducted for these conditions and for the adults. Visual inspection of the data, followed up by LMER models at each segment, for the Which conditions only, revealed no significant effect of wh-order. The OS-Which condition yielded faster RTs than the SO-Which condition at the last segment; however, the effect did not reach significance (p = .142).
Summarizing, the online reaction data reveal a slowdown for who-versus whichphrases, as well as longer RTs associated with the clitic region in multiple whichquestions in both children and adults. Children also show longer RTs for wh-objects as compared to wh-subjects (irrespective of the type of wh-element). Adults show a slowdown in RTs in the OS-Who as compared to the SO-Who conditions.

Discussion
In this study, we aimed to examine the acquisition and processing of multiple who and which-questions in Romanian that display ordering constraints and involve exhaustivity. The specific goals were to determine (a) how children and adults process multiple who-questions as compared to which-questions, (b) whether they display an online sensitivity to ordering constraints in multiple wh-questions, and (c) the extent to which the type of wh-element affects the comprehension of multiple wh-questions.
We carried out a self-paced listening experiment that simultaneously investigated online processing and offline comprehension of multiple wh-questions in Romanian children and adults. The study manipulated the type of wh-phrase (who vs. which) and the order of these elements (SO vs. OS). Romanian requires all wh-phrases to be fronted and exhibits strict ordering constraints in who, but not in which-questions: fronting a who-object over a who-subject is ungrammatical, while fronting a which-object over a which-subject or a who-subject is not. Accuracy analyses tested the offline comprehension of multiple wh-questions and allowed to address (a) the impact of the type of wh-element on response accuracy, as this has been shown to affect the offline comprehension of single wh-questions and (b) the issue of exhaustivity because we measured whether children and adults used an exhaustive or a non-exhaustive response, in other words, whether they give the exhaustive sets of possible answers to both wh-elements, which are pairwise linked. RT analyses measured how children and adults process multiple who-and multiple which-questions online and whether they are sensitive to the ordering constraints present in multiple who, but not in multiple which-questions.
The findings for offline accuracy show that the type of wh-element (who vs. which) modulates the comprehension of multiple wh-questions. Both children and adults comprehend who-questions very well, even when the object fronts over the subject, despite the fact that both the object and the subject in the OS condition share the same Q or interrogative feature. To recall, an analysis of multiple wh-questions in terms of featural intervention and RM postulates that multiple who-questions with an OS order involve a featural identity relation that should be ruled out as ungrammatical because the elements entering this featural relation appear in a nesting chain configuration (Krapova & Cinque, 2008;Rizzi, 2011Rizzi, , 2017. However, the findings reveal that children and adults have no difficulties comprehending the OS-Who ungrammatical sentences. We attribute this result to the specificity of the task: participants are required to listen to the question and then map its meaning unto a correct interpretation in order to select all the pictures that match the described action. In so doing, the participants need to encode back-to-back wh-elements and then, upon reaching the verb, retrieve them and establish the correct thematic relations. Both children and adults are able to map the wh-phrases onto the corresponding thematic relations, while disregarding the ungrammaticality of the sentence. In other words, they can successfully repair the ungrammatical structure in order to arrive at the correct interpretation for the question. Note that these results are in line with those of Hofmeister et al. (2013), as these authors also report high accuracy scores for multiple who-questions with an ungrammatical order (84.4% accuracy), although in their study questions with two bare wh-words are less accurate than questions with two which-phrases.
Over-exhaustive responses constituted the only type of error that the adults in our study made when answering multiple who-questions (Table 4). Thus, on rare occasions, they also identify the pair of referents performing the wrong action. Nonetheless, the fact that they do not give any single pair answers or exhaustive lists of subjects or objects is evidence that they do not have difficulties with any of the two steps necessary to derive a paired list answer: exhausting the question domain and pairing the two wh-elements. Children also give over-exhaustive answers to multiple who-questions, but mostly they make singleton response errors (Table 4), in which they provide exhaustive lists of referents either for the wh-subject or for the wh-object. A closer inspection of error types for each age group (Table 5) reveals that this is the most frequent type of error in the youngest children tested, the 6-to 7-year-olds. This is in line with previous studies on the comprehension of multiple wh-questions in Romanian (Bentea, 2010;Măniţă, 2017) showing that even at the age of 6, Romanian children tend to answer only one of the wh-words in the question. However, when answering only one wh-word, the children in our study give exhaustive lists of subjects and objects which can indicate that they only exhaust over one wh-phrase, while the other might not be present in their interpretation (see Schulz & Roeper, 2011). Our results therefore reveal that, at the age of 8, children interpret multiple wh-questions as requiring exhaustive PL answers. Younger children, on the other hand, seem to have acquired the exhaustive reading, but they have difficulties linking the two wh-words.
In the case of multiple which-questions, children and adults are more accurate with questions involving a SO order as compared to those in which the object moves across the subject. For children, we found an effect of age showing that younger children give less accurate responses than older children to SO-Which questions. Children and adults have more difficulties assigning a correct interpretation to OS-Which questions. On the other hand, children differ from adults in their comprehension of OS-WhichWho questions, that is, questions with an OS order and in which the object contains a lexical N restriction, while the subject is a bare wh-word (who). Whereas adults comprehended OS-WhichWho very well, on a par with OS-Who questions, children struggle more with the comprehension of OS-WhichWho questions. At first view, this result follows from an account of multiple wh-questions in terms of featural intervention, given that both OS-Which (16) and OS-WhichWho (17) questions instantiate an inclusion relation created by the presence of a Q feature on the two fronted wh-elements: This could indicate that children find it difficult to comprehend or compute inclusion configurations in general and not just those created by a N feature (or lexical restriction) shared between the moved element and the intervening one, as in the case of single which-object questions. Moreover, given an analysis which postulates two potential attractors (Q N and Q) for lexically restricted wh-phrases (Villata et al., 2016, Rizzi, 2017, (17) is predicted to yield similar results to (16) because both give rise to the same inclusion configuration in which the Q feature on the two interveners (the nested chains formed by the moved subjects care fată [which girl] and cine [who] and their traces) is included in the featural specification of the element that gets attracted to a higher Q N position.
The results show, however, that both children and adults comprehend OS-WhichWho questions (17) significantly better than OS-Which questions (16). Therefore, the two cases of inclusion cannot be considered on a par. We postulate that the featural similarity in lexical N restriction between the two wh-arguments in OS-Which questions drives the added complexity in comprehension. The presence of a wh-subject containing a lexical N restriction in OS-Which questions hinders the correct assignment of thematic relations to the two wh-elements which, in turn, leads to misinterpret OS-Which questions as containing a SO order instead. The finding that even SO-Which questions pose more difficulties for younger children than for older children also suggests that a lexical N restriction on both the wh-subject and the wh-object makes which-questions harder for comprehension, despite the presence of case information. One possibility is that children's overall difficulties with multiple which-questions could be related to the presence of two lexically restricted wh-words. This could make parsing the question in relation to the visual cues more taxing, given that the visual cues were presented at the end, after participants heard the whole sentence. However, if this were the case, then we would expect children to comprehend SO-Which and OS-Which questions on a par, as they both contain an additional lexically restricted element. The fact that we find an asymmetry in comprehension, with lower accuracy for structures in which the object precedes the subject, suggests that the difficulties associated with multiple which-questions containing an OS order stem from intervention effects triggered by the presence of a lexical N restriction in the set of features characterizing both the wh-object and the intervening subject chain.
The difficulty with OS-Which questions is reflected in type of errors that participants make for multiple which-questions, as the most frequent errors are role reversals, meaning that both children and adults interpret the first NP as Agent and the second NP as Patient. Although which-questions are harder for children to comprehend than who-questions, this does not show that they have not acquired exhaustivity in which-questions. The other errors children make include over-exhaustive responses (i.e., children answer by listing all the pairs of characters in the visual display, including the one in which the Agent-Patient roles are reversed) and singleton answers (again, children provide exhaustive lists of subjects or objects). We take these errors to suggest that children show mastery of exhaustivity in multiple which-questions as well. In addition, the most common errors in which-questions were reversals, which we do not consider on a par with single-pair answers, which would have been equivalent to answering with only one of the pairs performing the correct action. Rather, children's errors with which-questions reveal that children have difficulties assigning the correct theta-roles when both wh-arguments contain a nominal restriction.
Intervention effects in comprehension arise despite the fact that which-objects in Romanian are marked for Case by the preposition pe. This is in line with the literature that has tested the effect of Case mismatch on the comprehension of single object which-questions (for Hebrew : Friedmann et al., 2017;for Romanian: Bentea, 2016) showing that the dissimilarity in Case features between the moved object and the intervening subject does not enhance the comprehension of object which-questions and cannot overcome the intervention effects found in these structures. Friedmann et al. (2017), following Belletti et al. (2012), argue that only mismatches in features acting as triggers of syntactic movement, typically inflected on the verb, can facilitate intervention configurations and that Case, although relevant for movement, does not trigger it, and thus is not a relevant feature for modulating intervention effects. 11 Hence, the inclusion of a N feature seems to be more penalizing for comprehension than the inclusion of a Q feature alone. Our data thus suggest that an analysis in which the second which-phrase is attracted by a simple Q head does not fully account for the response pattern obtained for OS-Which and OS-WhichWho questions. However, if a N feature is also present on the lower which-element, we are now faced with the challenge of accounting for the grammaticality of these structures, since now both elements are specified for the same features (Q N) and the configuration yields nested chains. A syntactic analysis of multiple wh-movement goes beyond the scope of this paper, but we could speculate that one possibility to derive the grammaticality of examples like (16) would be to broaden the class of features to also include a Top(ic) feature on the which-phrases, Top being the feature associated with D(iscourse)-linking. This proposal runs into issues of its own. Under the assumption that D-linking is determined in the presence of a context of utterance, then even the Who-questions in our study are D-linked and thus specified for a Top feature since all the questions were preceded by a lead-in introducing a specified set of referents. Another possibility would be to include both Top and a different featural specification (potentially captured in terms of specificity) for clitic-resumed and non-clitic-resumed Topics. This would go in line with Krapova and Cinque (2008, p. 186) who show that, at least in Bulgarian, clitic-resumed which-phrases target a different position than non-clitic-resumed which-phrases.
Another potential explanation for the results obtained comes from cue-based interference models (Lewis & Vasishth, 2005;Lewis et al., 2006;van Dyke & McElree, 2011), which account for difficulties with long-distance dependency processing in terms of constraints from memory retrieval mechanisms. Under this view, memory retrieval is driven by cues, which identify the features of the element(s) to be retrieved and distinguish it from other irrelevant representations in memory. Specifically, upon encountering a constituent (e.g., the wh-phrase in a question), information about this element is encoded in memory (e.g., syntactic category, animacy, argument, etc.). This constituent then has to be retrieved from memory at the gap position and integrated into the structure. At this point, the previously encoded cues are analyzed and if another constituent shares similar cues with those of the element that needs to be retrieved from memory, this second constituent will interfere with the processing of the initially encoded element. This then results in an increased processing cost for the structure. When the cues of potentially interfering constituents are sufficiently different, this results in a reduced processing cost for the structure, which, in turn, will make the structure easier to comprehend. In the multiple wh-questions tested in this study, not just one but two wh-constituents need to be encoded at the very beginning of the structure and then the information encoded needs to be maintained in memory until it can be retrieved at the gap positions where these constituents can be successfully integrated into the structure. If the set of cues of the subject and object in a multiple which-question are sufficiently similar, this will overload memory capacity and the structure will be more costly for comprehension. If the two sets of cues are dissimilar, like in an OS-WhichWho question, memory resources will be less burdened and the structure easier to comprehend. Further research is needed to assess whether children's memory skills interact with their language abilities to modulate the comprehension of complex structures like multiple which-questions. This would require the use of a working memory task, as well as a comparison of different experimental designs, not only designs where the pictures appear after the end of the sentences, like in the present study, but also designs where the pictures are present on screen while the sentences are being processed.
Moving on to the analyses of RTs, these reveal an effect of type of wh-element in both groups of participants, with shorter RTs when processing which-as compared to who-elements. This effect surfaces when participants encode the syntactic and semantic information associated with the wh-fillers, so before they reach the verb region where they have to retrieve this information and successfully map the wh-phrases to the thematic structure of the verb. This is in line with the self-paced reading results reported in Hofmeister et al. (2013). Although not directly comparable, as Hofmeister et al. (2013) do not report word-by-word reading times for the whole sentence, their results reveal shorter residual reading times at the word immediately preceding the verb when this is a which-phrase compared to when it is a who-phrase. However, contrary to the predictions based on Hofmeister et al. (2013), we did not find a difference in RTs at the verb (nor its spillover regions) between the conditions with two who and those with two which elements, as participants do not process the verb region faster in the conditions with one or two which-phrases.
Moreover, the online results reveal that adults, but not children, listen longer to the wh-subject in the OS-Who conditions, so those conditions which violate Superiority constraints. In the SO conditions, participants heard a wh-subject in Segment 1 followed by a wh-object in Segment 2, whereas in the OS conditions they heard a wh-object in Segment 1, followed by a wh-subject in Segment 2. While a which-object preceding a which-subject or a who-subject is grammatical, a who-object preceding a who-subject leads to Superiority violations and the sentence should be ruled out as ungrammatical. Adults, but not children, show an online sensitivity to ordering constraints in multiple who-questions. The advantage of a processing system that is fully developed in adults could account for the fact that adults, but not children, can detect the ungrammaticality of OS-Who questions. Children, unlike adults, are unable to recruit this information in real-time processing or it could be that the timing of the effect might take much longer to surface in children. There is evidence that 5-year-olds do not actively form filler-gap dependencies in real-time comprehension of wh-questions and that when active dependency formation appears in 6-year-olds, there is a small delay in its execution as compared to adults (Atkinson et al., 2018). Other visual world studies have also found young children to be slow in processing filler-gap dependencies, with effects occurring after the end of the sentence (Adani & Fritzsche, 2015). The children in our study are older; however, the structures tested are more complex as they require encoding and integrating two wh-elements in the structure. Another possibility is that the ordering constraint in children seems to be overridden by the quest for meaning. It could be a task effect as children first have to listen and get the correct meaning of the questions they are hearing, and then they have to select the correct pictures. Children can do that by repairing the ungrammaticality of the sentence, given that there is sufficient time between the moment when the wh-dependency is initiated and the moment when it is interpreted. When children are given time to encode the meaning of each scene before giving an answer, they can plausibly understand the sentence and map the wh-phrases unto the correct argument structure of the verb. Further research including finer-grained measures of sentence processing, like visual world paradigm, as well as a production task, could provide additional evidence that children's online insensitivity to ordering constraints is due to a task effect or related to a non-adult-like grammar. 12 We also found longer RTs associated with the clitic region in multiple whichquestions in both children and adults, which indicates that participants have more difficulties processing these elements. Clitic doubling requires extra processing because upon encountering the clitic, one needs to identify the correct antecedent, namely the wh-object. The clitic appears in a derived position preceding the verb (see Coene & Avram, 2012 for analyses of clitic doubling constructions in Romanian) and requires the establishment of an additional syntactic and referential dependency with its antecedent. Children might find this more difficult than adults, although both groups slow down when processing the clitic, which, in turn, results in lower accuracy scores for multiple questions containing a which-object. Moreover, the fact that the two potential antecedents, the wh-subject and the wh-object, match in gender and number features could render the processing of the clitic more costly for children and adults alike.
To conclude, our findings indicate a speed-accuracy trade-off. Children and adults are more accurate with multiple who-than which-questions, but they slow down when they process who-as compared to which-phrases. An intervention effect appears in OS-Which questions but only in accuracy, showing that participants find it harder to establish the correct thematic relations between the moved wh-phrases and the verb in the presence of two which-phrases. We identified the source of this intervention effect as the inclusion of a lexical N restriction in the set of features characterizing both the wh-object and the intervening subject chain. This inclusion of a N feature seems to be more costly for comprehension than the inclusion of a Q feature alone, because children and adults comprehend OS-WhichWho questions better than OS-Which questions. The lack of intervention effects in terms of RTs indicates that such effects occur at a later stage, after children have heard the whole sentence and when they interpret its meaning.
5 In addition, there is evidence from the literature on adult sentence processing that which-elements, compared to who-elements, increase the acceptability of sentences with island violations (for English: Goodall, 2014;Atkinson et al., 2016;for French: Villata et al., 2016). 6 On the other hand, in a self-paced reading study in Dutch with single who-and which-questions with role reversal (1), Donkers, Hoeks, & Stowe (2013) (see also references therein) found that, compared to who, the which N questions showed consistently longer reading times until the final segment of the sentence.
1 Who/Which servant has the emperor looked for in the cellar? 7. Note that multiple wh-questions containing two which-elements in English (1) can be answered both with (1a), a pair-list answer, and with (1b), a single-pair answer: 1. Which child bought which book? (a) John bought Zog and Mary bought The Gruffalo. (b) John bought Zog.
However, the availability of single-pair answers to questions like (1a) remains an open issue. Some authors (Barss, 2000;Dayal, 2016) find them acceptable, whereas others (Comorovski, 1996) consider multiple which-questions unacceptable on the single-pair reading. 8. Which-elements can also be separated from other bare wh-phrases by fronting them in a matrix clause with bare elements appearing in a lower position (1). This option is ruled out for bare wh-words (2). 9. One reviewer notes that the study did not include sentences with one bare phrase and one which-phrase that violate ordering constraints. Sentences with two who-phrases represent a clear case of ungrammaticality and are also consistently judged as highly degraded in acceptability judgment studies with superiority violations (Hofmeister et al., 2013) and extractions from weak islands (Villata et al., 2016). Sentences where who precedes which, on the other hand, are judged to be significantly better than those with two whoelements and significantly worse than those with two which-elements (Hofmeister et al., 2013). Informal judgments from adult Romanian-speakers seem to confirm this pattern for Romanian as well. Given the gradient in judgments associated with ungrammatical sentences in which who precedes which, we have decided not to include them in the present study. However, this paves the way for a follow-up study assessing sensitivity to ordering constraints in questions that contain both a who-phrase and a which-phrase either in subject or in object position. 10. We only used 10 fillers in order to reduce the length of the task itself and of the test sessions. We could only take the children out of their classroom for 45 min at a time and especially younger children found it difficult to concentrate for more than 30 min. 11. However, divergent findings are reported in other studies (see, e.g., Varlokosta, Nerantzini & Papadopoulou (2015), who looked at the comprehension of movement structures like wh-questions and relative clauses in Greek-speaking children). 12. One anonymous reviewer suggests that the result showing adults, but not children, to be sensitive to the ordering violation in OS-Who questions, could be more a question of metalinguistic awareness rather than a focus on meaning. Although this is a plausible interpretation, the current design of the experiment does not allow us to directly address this. This remains for future research. One possibility would be to use a grammaticality judgment task, along the lines of Gavarró (2020), who examined children and adults' judgments of object relative clauses, long-distance wh-questions, and ungrammatical wh-questions involving RM violations in Catalan and found that children reject sentences containing such violations more often than object relatives or long-distance wh-questions. 13. The total number of answers in the who-conditions is 319 as there is one missing value both for SO-Who and for OS-Who.