1. Background: reconstruction in A′-movement
Reconstruction for Condition C as in (1) has played an important role in linguistic theory as a diagnostic for movement:
The ungrammaticality of (1) follows if the wh-phrase containing the R-expression John is interpreted in its pre-movement position as in (2). Since John is c-commanded by the coreferential pronoun he, a Condition C violation obtains. Example (1) is thus ungrammatical for the same reason as *Hei likes this picture of Johni.
Two aspects of Condition C reconstruction have played a prominent role in the literature. First, they have been claimed to display argument/adjunct asymmetries: only R-expressions inside arguments trigger Condition C effects, while R-expressions inside adjuncts do not, see, e.g. Lebeaux (Reference Lebeaux and Rothstein1991: 211–212):
The asymmetry has been linked to theta theory. While arguments have to be merged cyclically with their predicates to ensure that they receive the proper thematic interpretation, adjuncts can be introduced after movement, i.e. undergo so-called late merger and thus bleed Condition C.
Second, it has been claimed that only R-expressions contained inside predicates obligatorily lead to Condition C violations, while those inside arguments do not always. This has been linked to the fact that only the former reconstruct obligatorily (see, e.g, Huang Reference Huang1993; Heycock Reference Heycock1995: 558–561). The contrast in (5) thus partially conflicts with the baseline data in (1), see Huang (Reference Huang1993: 110):
Another contrast that has been discussed prominently concerns movement types. Relative clauses are sometimes claimed to display weaker Condition C effects than wh-movement or no Condition C effects whatsoever (Citko Reference Citko2001: 136):
The absence of Condition C effects in relatives has been accounted for by means of the matching analysis, where either the RC-internal representation of the external head can be deleted without violating recoverability, see (7a) (Citko Reference Citko2001), or where vehicle change relates the R-expression inside the head of the relative clause to a pronoun inside the relative clause, see (7b) (Sauerland Reference Sauerland, Schwabe and Winkler2003):
While these facts are often cited in the literature, several aspects of Condition C reconstruction are contested. In the following subsections, we will briefly summarize some of the major empirical issues, for both English and German.
1.1 Contested facts about English
Apart from the studies to be discussed in the next section, virtually all of the literature on Condition C reconstruction is based on introspective judgments. Against this background, it is unsurprising that there is disagreement on both the basic facts and their theoretical interpretation.
The major issue concerns the general robustness of Condition C reconstruction. While often taken for granted under A′-movement, there is a sizable list of dissenting voices, see, e.g. Heycock (Reference Heycock1995) and Fischer (Reference Fischer, van Koppen, Thrift, van der Torre and Zimmermann2002, Reference Fischer2004). The examples in (8) are a small selection from data presented in Safir (Reference Safir1999: 609) that are supposed to show the absence of Condition C effects:
The second controversial issue concerns the argument–adjunct asymmetry. On the one hand, the empirical contrast has been called into question, on the other hand, putative contrasts have been linked to other factors (see Heycock Reference Heycock1995; Lasnik Reference Lasnik1998; Fischer Reference Fischer2004: 161–162). Moreover, analytically, it is not really clear what qualifies as an argument in the nominal domain (see, e.g. Fischer Reference Fischer, van Koppen, Thrift, van der Torre and Zimmermann2002, Reference Fischer2004; Donati & Cecchetto Reference Donati and Cecchetto2011; and the references in Bruening & Al Khalaf Reference Bruening and Al Khalaf2019: 248).
Huang (Reference Huang1993: 110) observed another empirical complication, namely that the strength of Condition C effects with arguments decreases with increasing distance between R-expression and pronoun. Thus, while in the minimal pair in (5), there was an argument/predicate contrast in that the effect was rather weak with arguments, the contrast disappears once the coreferential pronoun is in the matrix clause (and thus closer to the R-expression) as shown in (9); in that configuration both arguments and predicates seem to display a Condition C effect:
1.2 Condition C reconstruction in German
While the issue is somewhat less prominent in the literature, the robustness of Condition C reconstruction seems to be equally contested in German. In the first systematic discussion of Condition C reconstruction, Frey (Reference Frey1993: 143–153) presents evidence for reconstruction of topicalized arguments on the basis of (10):
The fronted direct object is reconstructed to its base position below the indirect object and therefore causes a Condition C violation. Similarly, in Salzmann (Reference Salzmann2017: 137) it is argued that Condition C effects are robust in wh-movement and topicalization but weak/absent in relativization:
However, as discussed in Fischer (Reference Fischer, van Koppen, Thrift, van der Torre and Zimmermann2002: 70–71, 79; Reference Fischer2004; 161–164, 175–177), many of the types of examples that are controversial in English – recall (8) – also don’t seem to display strong Condition C effects in German:
1.3 Intermediate summary and objectives
As this section has shown, while Condition C reconstruction has played a prominent role in syntactic theory, its force is rather unclear given the empirical controversies surrounding it. The goal of this paper is thus to provide an empirically more solid base by investigating Condition C reconstruction from an experimental perspective. Our focus will be on German because German is less studied in this respect, both theoretically and experimentally.
Our paper is structured as follows: In Section 2, we will summarize previous experimental work on Condition C reconstruction in English and German and point out shortcomings related to the use of thresholds and concerning the failure to properly take non-syntactic factors into account. In Sections 3–6, we report our experiments on reconstruction in German A′-movement. These experiments are based on grammatical contrasts and successively neutralize possible non-syntactic factors. Overall, the case for Condition C reconstruction weakens. In the general discussion in Section 7, we conclude that the results argue against a theory that includes both reconstruction and a hard/inviolable Condition C constraint.Footnote 2
2. Previous experimental work
Despite the empirical controversies, Condition C reconstruction has only recently been subjected to experimental scrutiny.Footnote 3 There are three studies on English, namely Adger et al. (Reference Adger, Drummond, Hall, van Urk, Lamont and Tetzloff2017), Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019), and Stockwell, Meltzer-Asscher & Sportiche (Reference Stockwell, Meltzer-Asscher, Sportiche, Farinelly and Hill2021) and one study on German of our own (Wierzba, Salzmann & Georgi Reference Wierzba, Salzmann and Georgi2021). Apart from Stockwell et al. (Reference Stockwell, Meltzer-Asscher, Sportiche, Farinelly and Hill2021), which we will address in Section 5 because it involves a different reasoning, we will summarize these studies and their major results as they have influenced the questions and methodologies we pursue in what follows.
The arguments in favor or against Condition C reconstruction in these works are based on experiments testing the following constellation:
The basic idea is that Condition C is violated in (14a) (in situ condition) and therefore coreference between the pronoun and the R-expression should be unavailable. If a different pattern – i.e. higher availability of coreference – is observed in (14b) (moved condition), this has been interpreted as evidence against reconstruction; and if a similar pattern is observed in (14b) (low availability of coreference), this has been interpreted as evidence in favor of reconstruction.
Adger et al. (Reference Adger, Drummond, Hall, van Urk, Lamont and Tetzloff2017) report on three experiments that investigate different aspects of Condition C reconstruction, including the difference between predicates (APs) and arguments (DPs), the difference between arguments and adjuncts of nouns (complement clauses versus relative clauses), and the effect of (linear and structural) distance. The participants were presented with matrix wh-questions as in (15): R-expression and pronoun were highlighted and the participants were asked in a forced-choice task (yes/no) whether the two could refer to the same individual.
The AP versus DP-contrast was tested by comparing local wh-movement of APs with local wh-movement of DPs, with the R-expression contained in a complement:
The argument–adjunct asymmetry was tested by contrasting wh-movement of DPs with the R-expression contained in either a complement clause or a relative clause:
The factor distance contained three levels. Short refers to a monoclausal wh-question. Embedded 1 refers to a long-distance question with the coreferential pronoun as the matrix subject. Embedded 2 refers to a long-distance question with the coreferential pronoun as the embedded subject. They are illustrated in (18):
Finally, the experiments contained control items without movement, which also varied the distance between coreferential pronoun and R-expression:
The major results of the experiments can be summarized as follows: There was a clear contrast between predicates and arguments in that non-coreference was robust in the former, while, in the latter, coreference was available to varying degrees. No clear evidence for an argument–adjunct asymmetry (in the sense that only the former reconstruct) was found, even though coreference was more available with adjuncts. Finally, in all experiments, there was an effect of linear distance – coreference becomes more available the larger the distance between R-expression and coreferential pronoun. The results are illustrated in Table 1 (we omit the results of experiment 2, where the availability of coreference with wh-moved DP-PP short was significantly higher, viz., 58.7%; RC = relative clause; CC = complement clause; PP = PP complement).
The authors conclude from these results that only predicates and their (PP) complements reconstruct. With respect to reconstruction of DPs, they argue that all modifiers inside DPs can be deleted in the bottom copy so that DP arguments generally do not cause Condition C effects. This conclusion is based on the high availability of coreference in the conditions with DP movement. The asymmetry between DPs and APs is argued to follow from independent differences in the interpretation of LF-structures between predicates and arguments. The distance effect is finally linked to non-syntactic factors.
Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019) present two experiments on Condition C reconstruction in questions. They focus on the argument–adjunct asymmetry and investigate PP modifiers (experiment 1) as well as CP-modifiers (experiment 2), i.e. relative clauses versus complement clauses. The conditions are tested with wh-movement and without. The authors criticize the method of Adger et al. (Reference Adger, Drummond, Hall, van Urk, Lamont and Tetzloff2017) as inviting the subjects to engage in metalinguistic reasoning. They instead propose a different approach which involves embedded questions with two possible referents for the pronoun: the R-expression in the matrix clause (the subject) and the one within the wh-phrase. The participants then had to answer a forced-choice question and had to decide to which of the two R-expressions the pronoun referred (in the in situ condition, the embedded clause was a simple declarative clause):
The major results of their experiments can be summarized as follows: The authors found a significant contrast between the in situ and moved conditions in that coreference was much more available in the latter. There was no significant contrast between arguments and adjuncts; coreference was more available with CP-modifiers than with PP modifiers. The results are summarized in Table 2.
In their interpretation of the results, the authors primarily capitalize on the difference between the moved and the in situ conditions. They argue that if Condition C is a hard grammatical constraint, one expects there to be no difference between movement and in situ. However, they do find a substantial difference. In experiment 1, coreference is chosen at a rate close to chance (50%) under movement, while in the in situ condition, coreference was chosen at rates close to zero. This is interpreted as evidence against Condition C reconstruction. In experiment 2, the rate of coreference is below chance level, but since it is significantly higher than in the in situ condition, it is argued that this shows that no grammatical constraint on coreference is at play but rather non-syntactic factors. To capture the pattern, the authors argue that dependents of N should be uniformly treated as adjuncts, which is why they need not be present in the bottom copy.
In addition, Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019) present an experiment on English PP fronting. They observe that a fronted adjunct containing an R-expression does not easily allow coreference with a pronominal subject, as in The policeman said that near Dan, he saw a snake, where coreference between Dan and he was only chosen at a rate of 8.6%. The 8.6% is taken to be close enough to zero to indicate a Condition C violation and the authors therefore conclude that these PPs, including their nominal complement, reconstruct.
In Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021) we present four experiments on reconstruction for Condition C in German wh-questions. Experiment 1 investigates reconstruction of R-expressions contained in predicates. In experiment 2, reconstruction of R-expressions contained in either PP arguments or PP adjuncts of nouns is investigated. Experiments 3 and 4 investigate the effect of distance by testing reconstruction in long-distance movement. The methodology in these experiments was inspired by that used in Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019) and thus also involved embedded questions with two possible antecedents for the pronoun. The major difference was that instead of asking a forced-choice question, participants were asked two questions after each item and had to decide for each R-expression whether it was a possible antecedent of the pronoun (for discussion of this methodological choice, see Section 3.2). A (translated) sample item with questions is given in (21) (here and in what follows Q1 refers to the question about coreference with the matrix R-expression, while Q2 refers to the question about coreference with the embedded R-expression, which is the one within the wh-phrase in the moved condition):
The major results of this study are the following: Coreference was disfavored with both APs and DPs. This is particularly obvious in the short conditions, where the difference between in situ and moved is rather small. Coreference was more available with adjuncts than with arguments, but the difference is numerically very small. As in the experiments by Adger et al. (Reference Adger, Drummond, Hall, van Urk, Lamont and Tetzloff2017), there was a distance effect in that coreference with the embedded R-expression becomes more available with increasing distance between wh-phrase and coreferential pronoun. The results are illustrated in Table 3.
We concluded in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021) that these results are compatible with the view that A′-moved constituents (including the PP modifiers) reconstruct, with the caveat that the result is based on a null effect (the lack of a difference between the in situ and moved conditions). Numerically, DPs/arguments showed more positive responses to Q2 than APs/predicates. However, since they were not tested within the same experiment, no firm conclusions could be drawn. The results did not provide conclusive evidence for an argument–adjunct asymmetry and, thus, a late merger approach. There was a small difference in the short condition in the predicted direction, but, even with adjuncts, coreference was available only to a very limited extent. The higher availability of coreference in the long-distance conditions was attributed to processing difficulties.
In a later, up to now unpublished, follow-up experiment, we tested reconstruction of APs and DPs within a single design to find out whether the predicate–argument asymmetry could be confirmed. We will report on this experiment (henceforth: AP/DP experiment) very briefly here as this asymmetry is not the main focus of this paper; more details can be found in our data repository (see the link in Section 3.3). 36 native speakers of German, recruited via prolific.co, took part by way of the platform L-Rex (Starschenko & Wierzba Reference Starschenko and Wierzba2020); the same two-question method as in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021) was used. The experiment had a 2 × 2 × 3 design with the factors movement (moved vs. in situ), category (AP vs. DP; sum-coded) and distance (short, embedded 1, embedded 2). With DPs, the R-expression was contained in a PP argument. The main results are the following: As in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), coreference was disfavored with both APs and DPs, but it was significantly more available with DPs.Footnote 4 Also, as in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), there was a distance effect in that coreference increases in the embedded 1/2 conditions. The results are listed in Table 4.
The results are in line with the predicate-argument hypothesis in that coreference is less available with APs than with DPs. The distance effect is most likely unrelated to reconstruction since it affects APs and DPs, even some of the in situ conditions and there is a concomitant decrease in the availability of coreference with the matrix subject in the embedded 1 and 2 conditions. The responses to both Q1 and Q2 are closer to chance level in embedded 1/2; a possible interpretation is that participants found it more difficult to judge the interpretation possibilities in these more complex cases.Footnote 5 It is less clear what the results imply for the reconstruction of DPs. Coreference is clearly higher in the moved condition than in the in situ condition, which may suggest the absence of reconstruction (viz., late merger of modifiers). Compared to the first four experiments by Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), coreference is more available, even though the follow-up experiment was based on the same materials. We will come back to the interpretation of the DP reconstruction data at several points in this paper.
2.3 Problems of the threshold-based reasoning and non-syntactic factors
As described in (14), experimental research on Condition C reconstruction has relied on a comparison of the availability of coreference in structures with and without movement. Depending on the result, different conclusions are drawn:
In what follows, we want to point out potential problems with both types of conclusions, having to do with the interpretation of the terms ‘low’ and ‘high’.
We will start with the first type of reasoning based on an asymmetry between movement and in situ. Given the significantly higher availability of coreference under movement, Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019) conclude that there is no reconstruction for Condition C (conclusion type (22a)). The difference that Adger et al. (Reference Adger, Drummond, Hall, van Urk, Lamont and Tetzloff2017) report between in situ and moved DPs is interpreted the same way.
There is, however, a question that remains open with respect to these findings. Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019) predict that the responses in the conditions without a surface violation of Condition C should allow for both interpretations and, thus, responses at chance level (around 50% for each interpretation) are expected in their forced-choice paradigm; however, for DP movement with PP arguments/adjuncts, the observed values are between 22% and 31%. In Adger et al.’s (Reference Adger, Drummond, Hall, van Urk, Lamont and Tetzloff2017) study, in which there was only one R-expression and participants judged whether coreference is possible, one might expect the proportion of positive responses to approach 100% in the absence of a grammatical violation, but in their first experiment on DPs, the observed values are between 30% and 64%. Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019: 257) argue that the reason for the discrepancy between the expected and observed responses cannot be core-syntactic, because values close to zero would be expected if the violation of a hard grammatical constraint is involved. They suggest that the discrepancy might instead have to do with linear distance: coreference between R-expression and pronoun might be dispreferred because they are very close to each other in these conditions (recall also the effect of linear distance observed in Adger et al. Reference Adger, Drummond, Hall, van Urk, Lamont and Tetzloff2017 and Wierzba et al. Reference Wierzba, Salzmann and Georgi2021).
Thus, the argumentation rests on the assumption that coreference values that are not close to zero but also not as high as the expected chance level are compatible with a scenario in which there is no reconstruction and extra-syntactic factors cause a decrease in positive responses (reducing it from ~50% to ~20–30%), but incompatible with a scenario in which there is reconstruction and other factors cause an increase in positive responses (raising it from ~0% to ~20–30%).
In our view, however, there are several conceivable scenarios in which reconstruction does play a role, but nevertheless we do not find complete unavailability of coreference in the experiments. One possibility is that there is inter-speaker variation, with some participants generally employing reconstruction, while others do not. A second possibility is that the argument/adjunct status and, thus, reconstruction behavior of PP modifiers may vary between items and/or participants. A third possibility is that non-syntactic factors interact with the binding principles and in some cases even override them (see below). This could distort the interpretation of the results, even if the in situ and moved versions of the same type of sentence are directly compared – these versions inevitably differ not only with respect to the surface syntactic relations, but also with respect to linear order (anaphoric vs. cataphoric relation – the latter typically being dispreferred; see also Yoshida, Potter & Hunter Reference Yoshida, Potter and Hunter2019: 1535–1539) and distance between R-expression and pronoun. Thus, it is conceivable that asymmetries between the two versions are not (necessarily) due to a difference in the syntax but may be caused by surface-oriented extra-syntactic factors.Footnote 6
We now turn to problems with reasoning that is based on symmetry between movement and in situ: In Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019) on PP fronting in English, in experiments 1–4 in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021) on wh-movement of DPs and APs in German as well as the AP/DP experiment, low availability of coreference (‘close to zero’) in sentences with movement is interpreted as evidence for reconstruction for Condition C (conclusion type (22b)). There are at least two problems with this reasoning. First, as mentioned above (and discussed in more detail below), other – non-syntactic – factors might disfavor the relevant reading (viz., coreference with the embedded R-expression) independently and lead to low coreference values even in the absence of reconstruction. Second, a certain amount of random noise is always expected in behavioral data, and it is difficult to define a systematic threshold at which a value is or is not close enough to zero in absolute terms. Thus, in the experiments by Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019), 8.6% (in the PP-fronting experiment) is indeed closer to zero than 22% (PP arguments inside DPs), but what about the 11.8% found for the PP adjuncts inside DPs in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021) reported above, and what if the values were around 15%? Note also that experiments 1–4 in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021) and the AP/DP experiment are based on the same materials, but the coreference values for the reconstruction of DPs (with PP arguments) in the short condition vary between 6.9% (experiment 2), 11.1% (experiment 4), and 20.8% (AP/DP experiment). Given the threshold logic, one would have to conclude that there is reconstruction of DPs in experiment 2 (where the values are similar to the PP cases in Bruening & Al Khalaf Reference Bruening and Al Khalaf2019) but probably not in the AP/DP experiment (where the values are close to the DP-cases in Bruening & Al Khalaf Reference Bruening and Al Khalaf2019). It should be clear that this will quickly lead to contradictions. To a large extent, then, setting a threshold at a certain value will be arbitrary. In fact, Adger et al. (Reference Adger, Drummond, Hall, van Urk, Lamont and Tetzloff2017) use a different criterion in the interpretation of their first experiment, namely whether coreference is accepted in the majority of cases, viz., above 50% (interpreted as evidence against reconstruction) or less (interpreted as compatible with reconstruction). This criterion would imply reconstruction for experiments 1–4 in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021) and the AP/DP experiment as well as for DP movement with arguments in Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019), in partial conflict with their conclusions.Footnote 7
The issue with thresholds is related to a more general problem that arises in the experimental investigation of reconstruction: when sentences with/without movement are compared, the hypothesis that there is no reconstruction predicts a difference, but the hypothesis that there is reconstruction basically predicts the lack of a difference, i.e. a null effect, which is more difficult to interpret.
In addition to the issues raised by threshold-based reasoning, another shortcoming of previous work is that it does not sufficiently take into account the influence of non-syntactic factors, especially factors that generally govern pronoun resolution and can increase or decrease the availability of coreference (see also Gor Reference Gor2020). In what follows we will list the factors that have received most attention in the literature and discuss their implications for the current debate.Footnote 8
First, it has been shown that the more prominent an expression is in a certain hierarchy, the more likely it is to act as an antecedent. This can involve prominence with regard to thematic role (agent > patient > other), grammatical function (subject > object > other) or information structure (topics are preferred antecedents), see, e.g. Grosz & Sidner (Reference Grosz and Sidner1986), Brennan (Reference Brennan1995), Cowles, Walenski & Kluender (Reference Cowles, Walenski and Kluender2007), Kaiser (Reference Kaiser2011), and Schuhmacher, Dangl & Uzun (Reference Schumacher, Dangl, Uzun, Holler and Suckow2016) for German. Second, there is work showing that if there are two similarly salient antecedents for a personal pronoun, there is a preference for coreference with the linearly closer antecedent, see, e.g. Cunnings, Patterson & Felser (Reference Cunnings, Patterson and Felser2014). A third factor is plausibility. As Gor & Syrett (Reference Gor, Syrett, Ronai, Stigliano and Sun2019) and Gor (Reference Gor2020) demonstrate, it can even override Condition C in backward anaphora.
It is likely that these factors affect the judgments in some of the experimental settings. With respect to the design in Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019), Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), the AP/DP experiment, as well as Experiments 1 and 2 below, preferences of pronoun resolution may lead to a preference for the embedded subject pronoun to corefer with the matrix subject. Thus, the availability of coreference with the embedded R-expression is likely to be decreased. Proximity may have the opposite effect in these designs. Given that the R-expression within the wh-phrase is the closer antecedent, this factor could increase the availability of coreference with the embedded R-expression. The factor plausibility can play a role in all experimental designs discussed in this paper. It can either lead to an increase or a decrease in the availability of coreference, depending on the item.Footnote 9
Given the problems with threshold-based reasoning and the probable influence of non-syntactic factors on coreference judgments, we believe that we currently do not fully understand what the data tell us. The aim of this paper is thus to develop an experimental design allowing us to determine more precisely to what extent the observed patterns actually reflect reconstruction for Condition C. We will investigate grammatical contrasts that inform us about Condition C without requiring reference to absolute coreference values. We will also investigate the influence of some non-syntactic factors on coreference judgments. We will focus on those that the syntactic literature and the literature on pronoun resolution have shown to be most influential.
3. Experiments: preliminary remarks
3.1 Motivation and outlook
In the following sections, we present three experiments that investigate Condition C reconstruction in German A′-movement. They are based on grammatical contrasts and do not require reference to absolute coreference values. We attempt to successively neutralize possible non-syntactic factors. We will see that the case for Condition C reconstruction weakens once factors like plausibility and referential accessibility are taken into account. The results clearly argue against a theory that includes both reconstruction and a hard/inviolable Condition C constraint. There remains some residual evidence that the base position matters, the implications of which are explored in the general discussion.
3.2 Methodological remarks
In our experiments, we used the method introduced in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021). Participants were told they were going to see one sentence and two questions on each page of the questionnaire. They were instructed that the sentence might have more than one interpretation and that they were going to be asked whether certain interpretations of the sentence are possible or not. The task was illustrated using the example Maria hat Anna besucht, weil sie nett ist ‘Mary visited Anna because she is nice’. Participants were explicitly told that this sentence has two interpretations (even if one might be more readily available), and that in an example like this they should answer ‘yes’ to both presented questions (‘Can the sentence be interpreted such that … (i) Mary is nice (ii) Anna is nice’). The instructions also stated that both potential interpretations should be carefully considered and that sometimes one, both, or neither of them might be available. Each following page of the questionnaire looked as follows:
The target sentence is a construction with two possible antecedents for the pronoun, the R-expression in the matrix clause (the matrix subject) and the second R-expression, which is either in the embedded clause (within the wh-phrase in the case of embedded questions) or inside the external head of a relative clause, e.g. as in Boris told us which report on David he ignored/Boris mentioned every report on David that he ignored. In Experiment 3, the design is slightly modified in that the R-expression inside the wh-phrase is already introduced together with the other R-expression in a sentence preceding the indirect question. The participants then have to decide for each referent whether it is a possible antecedent or not by answering two yes/no questions, e.g. Q1 Can this sentence be interpreted such that Boris ignored a report? / Q2 Can this sentence be understood such that David ignored a report? The order of presentation of the two questions was balanced: in half of the stimuli, Q1 appeared above Q2, and in the other half it was the other way round. The R-expressions we used were exclusively common first names.Footnote 10
Our design is a modification of that used in Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019), where a forced-choice question was asked and speakers had to choose between the matrix and the embedded R-expression. The major reason for our modification is to ensure we can determine which coreference options are available and which are not, even in the presence of non-syntactic factors favoring one of the readings.
In the forced-choice task used by Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019), participants have to pick one of the readings, even if neither of the options violates Condition C and thus both should be available from a syntactic point of view. However, non-syntactic factors (which will be discussed in more detail in connection with Experiments 2 and 3) can favor one of the readings: one would thus not necessarily expect a 50% : 50% distribution of answers even in the absence of a Condition C violation, but there might be a preference toward one of the options for independent reasons. The benefit of the forced-choice approach is that if participants do choose the other option in many cases (contra the expected preference) – as found by Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019) in the sentences with wh-movement – a strong argument can be made in favor of the view that both options are in fact available and neither of them violates a grammatical principle. The disadvantage of the method, however, is that if a preference of almost 100% for one of the options was found, it would be difficult to determine whether this is just due to the fact that this is the preferred reading (while the other is grammatical, but dispreferred), or whether the other option is really completely excluded on syntactic grounds. In this case, asking two questions – whether reading A is possible, and whether reading B is possible – can provide the crucial information that would be missing in a forced-choice task. In our view, the two-question method is thus better suited for our purposes: in the German sentences that we aim to investigate, we suspect that non-syntactic factors might play a major role, and it is thus particularly important to choose a method that allows us to determine possible rather than preferred readings.
This reasoning is supported by an experiment that we conducted in order to compare the two methods. We replicated the AP/DP experiment with the forced-choice method (see the Appendix A.1 for details). In the replication, we found that coreference with the embedded R-expression was extremely close to zero with both APs/DPs, viz., 0% (APs) and 0.7% (DPs). Under the threshold-based logic, this would imply a Condition C violation and the lack of a predicate–argument asymmetry. The comparison with the AP/DP experiment with the two-question method, where the corresponding value was at 20.8% with DPs, suggests that the forced-choice method is indeed too coarse when strong non-syntactic factors are present. The limits of the forced-choice method become even more visible once subject questions are used where Condition C is not at stake (John wonders which picture about Bill pleased him). In an exploratory experiment (see the Appendix A.3 for details), the two-question method showed a preference for coreference with the matrix subject, but coreference with the embedded R-expression was also highly available (58%) (as in Experiment 2 below). Under the forced-choice method, however, coreference with the embedded R-expression was chosen only in 5.6% of the cases. Interpreting the low value as a Condition C effect would be an obviously wrong conclusion in this case. In our Experiment 3 below we have reduced the bias in favor of the matrix R-expression by introducing the embedded/lower R-expression in the prior linguistic context.Footnote 11
As we will see in the results of the experiments discussed below, the two-question method worked as intended in that participants were willing to give two ‘yes’ or two ‘no’ responses in this type of task (depending on the item/filler); they were thus not biased toward giving exactly one positive response, suggesting that they were evaluating both options and did not end up interpreting the instructions as a forced-choice task after all. We will also see that there can be some interaction between the two questions in that a strong preference for coreference with the matrix subject (high percentage of yes-answers to Q1) can decrease the amount of yes-answers to Q2, even if that corresponds to a perfectly grammatical option. But in all cases, the combined percentages clearly exceed 100%. The same pattern can be found in the results for the fillers. We will report on two groups of fillers that served as controls in this respect. The first group involves ambiguous relative clauses (Leyla erzählt, dass die Verwandte, die sie besucht hat, in Budapest wohnt ‘Leyla tells us that the relative [who she visited/who visited her] lives in Budapest’ with the question whether it can be understood such that Leyla/the relative was visited), for which we expected two positive responses. The results are in line with this expectation: in the AP/DP experiment we found 83.3% positive responses to Q1 and 89.8% for Q2. The second selected group are sentences for which we expected two negative responses (Gustav erwähnte, dass Karl und Jonas ihn Bücher einscannen ließen ‘Gustav mentionend that Karl and Jonas had him scan books’ with the question whether Karl/Jonas did the scanning). The results in the AP/DP experiment showed the same proportion of positive responses (7.6%) for both Q1 and Q2. This shows that the task worked as intended and did not induce a bias to give exactly one positive response.Footnote 12 The method’s reliability is also supported by the fact that the proportions observed in our critical items are similar in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), the AP/DP experiment and in Experiment 1 for the same conditions.
3.3 Supplementary materials
The materials, raw results files, and analysis scripts for all experiments reported here can be found on OSF under the following link: https://osf.io/24xh3
4. Experiment 1: wh -questions versus relative clauses
As mentioned above, the major goal of this paper is to investigate Condition C reconstruction without having to refer to absolute coreference values. We instead develop designs that are based on grammatical contrasts from which we can draw conclusions about the reconstruction behavior of A′-movement. In Experiment 1, we compared two constructions involving A′-dependencies: wh-questions (with DP movement) and relative clauses. As discussed in Section 1, it has been proposed in the literature that they differ in their reconstruction behavior. Our aim was to test whether reconstruction for Condition C is indeed less robust in relative clauses in comparison to wh-questions. If this is the case, it can provide an argument in favor of the view that coreference under A′-movement is (also) constrained by grammatical factors, such as by movement type.
4.1 Participants and procedure
Participants were recruited via prolific.co and 32 native speakers of German took part. A web-based questionnaire was set up using SoSciSurvey (Leiner Reference Leiner2018). The basic procedure was as in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021) and the AP/DP experiment where participants were asked to answer two questions with regard to coreference possibilities.
In addition to the coreference questions, we asked participants to rate the sentence on a 1–7 scale (as in experiments 3–4 in Wierzba et al. Reference Wierzba, Salzmann and Georgi2021). The ratings were collected to check whether any problems were introduced by potentially low acceptability of some of the tested conditions: long-distance movement and, in particular, long relativization are often perceived as degraded in German.
A total of 76 stimuli were presented to each participant (32 critical items, 32 fillers, and 12 exploratory items for additional research questions). For the critical items, 128 data points were collected per condition/question (four from each participant). On average, the questionnaire took about 25 minutes to complete.
4.2 Design and materials
Experiment 1 had a 2 × 4 design. The first manipulated factor was dependency (wh-question vs. relative clause). The second factor was distance. We tested the same levels as in Wierzba et al’s (Reference Wierzba, Salzmann and Georgi2021) experiments 3–4 (short, embedded 1, embedded 2, and coordination). We are omitting the level coordination here for presentational reasons – to facilitate visual comparison of the AP/DP experiment and Experiment 1 and to avoid discussing effects that are tangential to the main research questions of this paper and would require digressing exposition. All data, including all levels of distance, were included in the statistical analyses reported below. The remaining six conditions are illustrated in the sample item in (24).
The corresponding questions Q1 and Q2 were Kann man den Satz so verstehen, dass Mark/Ben eine Bemerkung mitbekommen hat? ‘Can this sentence be interpreted such that Mark/Ben overheard a comment?’ (questions) and Kann man den Satz so verstehen, dass Mark/Ben die Bemerkungen mitbekommen hat? ‘Can this sentence be interpreted such that Mark/Ben overheard the comments?’ (RCs).
All items involved either a wh-question or a relative clause; no in situ versions were included. The relative clause heads were preceded by the universal quantifier jede/jeder ‘every’ to ensure a restrictive reading of the relative clauses. The R-expression was always included in a PP argument to the noun.Footnote 13 We adopted most of the materials from Wierzba et al’s (Reference Wierzba, Salzmann and Georgi2021) experiment 4, but replaced the matrix verb erzählen ‘tell’ with erwähnen ‘mention’ so that it would be compatible with both CP-complements (embedded questions) and DP-complements (relative clauses). We also changed some of the proper names and nouns to ensure that the interpretation of the relative pronoun was unambiguous (with respect to number and gender), i.e. it was only compatible with the head noun.
4.3 Hypotheses and predictions
The matching analysis of relative clauses predicts the absence of Condition C effects since no R-expression is present in the RC-internal copy that is c-commanded by the pronoun (recall (7)). If this hypothesis is correct, we expect a significant effect of the factor dependency with respect to Q2 (the question asking about coreference between the embedded R-expression and the pronoun): the proportion of positive responses to Q2 should be higher for relative clauses than for wh-questions. If a different derivation underlies relative clauses, viz., the raising analysis, where there is a full copy of the external head inside the RC, we expect no asymmetry between questions and relative clauses.
As in the AP/DP experiment, the embedded 1 and embedded 2 conditions were included in order to make the design parallel to the previous studies discussed in Section 2 and to reassess Wierzba et al’s (Reference Wierzba, Salzmann and Georgi2021) proposal that the effect of structural distance can be attributed to non-syntactic factors.
For the analysis of Experiment 1, the factor dependency was sum-coded, while distance was treatment-coded with short as the baseline. This means that with respect to dependency, we will treat both levels – relative (rel) versus wh – symmetrically (comparing each of them to the overall mean). For distance, we will be making the following comparisons: short versus embedded 1, short versus embedded 2. This type of contrast coding means that the model output for dependency will represent a simple effect – the difference between relative clause and wh-question within the baseline level of distance (short) – giving us an impression of the difference between the categories in the basic case. The interaction terms (dependency*distance) indicate if increased distance between R-expression and pronoun changes the basic difference between relative clauses and wh-questions.
According to the GLMM, there was a significant simple effect of dependency (wh-movement vs. relativization) at the short baseline level of distance (z = 6.672, p < 0.001) with respect to Q2. There was a significant interaction between dependency and distance at the other levels in comparison to the short baseline (embedded 1: z = 2.919, p = 0.004; embedded 2: z = 4.117, p < 0.001) in the direction of a less pronounced difference between the two dependency types. The model results for the fixed effects are shown in Table 6.
The results lend support to the hypothesis that the two types of dependency differ regarding reconstruction: coreference between the lower R-expression and the pronoun is more available in relative clauses than in wh-questions. The higher availability in relative clauses is compatible with the matching analysis of relativization where there is no full representation of the external head in the RC-internal bottom position. Both versions of the matching analysis that we discussed (based on recoverability/vehicle change; recall (7)) predict the absence of a Condition C effect and thus higher availability of coreference in relative clauses. The raising analysis fails to predict the wh-relativization asymmetry.
With respect to the factor distance, it is notable that wh-questions are more affected, while the percentages in relativization are quite similar in the three conditions (and do not increase monotonically with increasing distance). In addition, as in the AP/DP experiment, visual inspection suggests a decrease in positive answers to Q1 with increasing distance, which affects both dependency types.
In line with similar findings reported by Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), inspection of the acceptability ratings that were collected in Experiment 1 suggests that the effect is independent of how acceptable participants found these structures. Post hoc inspection of the data, in which we divided participants into three groups based on their acceptability rating for the embedded 1/2 conditions, revealed similar patterns in the coreference judgments across all groups. We interpret this as support for the view that coreference is generally more difficult to judge in the more complex structures, especially in embedded 2.
Given that there is no Condition C effect under the matching analysis, our relativization examples are predicted to be fully grammatical. It may thus be surprising that the rate of positive answers to Q2 remains between 48% and 59% rather than approaching 100%. There are two reasons suggesting that rates around 50% for Q2 may be close to the maximum one will obtain for grammatical sentences with this experimental setting. First, the rates for coreference in relativization are not much affected by distance, in contrast to what we observe for wh-movement; this points toward a ceiling effect. Second, we will see in Experiment 2 that even in in situ conditions without a Condition C violation, i.e. examples that are indisputably grammatical, the positive responses for Q2 remain between 56% and 66%.
There are two related remaining questions: First, is the difference between relative clauses and wh-questions really due to a difference in the syntactic structure, viz., the presence/absence of an R-expression in the bottom copy, or could it be caused by other syntactic or non-syntactic factors? As for other syntactic factors, an anonymous reviewers suggests that the fact that the R-expression is contained in an A-position in relative clauses (rather than in an A′-position in questions) could be responsible for the asymmetry. While this is indeed a syntactic difference (and RCs are similar to A-movement with regard to Condition C reconstruction), we are not aware of any syntactic accounts where this difference would translate into a Condition C asymmetry. Note also that this suggestion seems to imply that the reconstruction behavior in RCs would change if the head noun were A′-moved. However, we are not aware of any such effects. As for non-syntactic effects, as far as we can see, it is unlikely that the difference is related to plausibility (we do not see a straightforward reason why interpreting Peter and he as coreferential should be less plausible in ‘John mentioned which statue of Peter he saw’ than in ‘John mentioned every statue of Peter that he saw’). An (inevitable) difference between the conditions is that the R-expression and the pronoun are directly adjacent in the wh-question, whereas one word (the relative pronoun) intervenes in the relative clause. We consider it unlikely that this is responsible for the difference and will come back to the issue of closeness in the discussion of Experiments 2 and 3 where we will see that high coreference values are certainly possible if wh-word and pronoun are adjacent. There is one factor that may indeed be at work here, though, namely the referential accessibility of (the phrase containing) the R-expression. We will discuss this factor in Experiment 3 and return to the implications for the wh-relativization contrast in the general discussion.
The second question concerns the interpretation of the results for wh-questions. We find similar values as in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021) and the AP/DP experiment (confirming the reliability of the method): again, the availability of coreference with the lower R-expression is relatively low, but not at floor. Given the contrast with relativization, one a priori possible interpretation is that this indicates that there is reconstruction in wh-questions. However, this only holds as long as the wh-relativization asymmetry is related to a grammatical factor. Once the difference between RCs and questions can be related to a non-syntactic factor, this conclusion can no longer be drawn. We will come back to the interpretation of the wh-movement data in the general discussion.
5. Experiment 2: subjects versus objects (S/O)
Experiment 2 was designed to address crucial questions that the experiments in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), the AP/DP experiment and Experiment 1 left open: what does it mean that coreference between pronoun and embedded R-expression in German wh-movement of DPs is neither close to 0% nor to 100%? How can we disentangle the effects of grammatical principles and extra-syntactic factors? To tackle these issues, we compare wh-movement of objects with wh-movement of subjects. The crucial conditions differ in the base position of the fronted constituent, whereas the linear order and the distance between R-expression and pronoun are identical:
In (25a) (object movement), Condition C is violated only under the assumption that there is reconstruction. In (25b) (subject movement), Condition C is not violated, irrespective of whether reconstruction is assumed or not.Footnote 15 In all other respects, the sentences are as similar as possible, especially with respect to plausibility, topicality, and linear distance. Note that such near-minimal pairs can only be constructed in a head-final language like German, while in English (as shown by the translations), there would be a difference in the distance between R-expression and pronoun in the two conditions (but see also Note 18). The benefit of this design thus is that the reconstruction hypothesis predicts a difference here (not the absence of a difference, as in the previous designs): coreference between the pronoun and the embedded R-expression would violate Condition C in (25a), but not (25b). Crucially, we then do not have to rely on the problematic interpretation of absolute values (whether the responses are close to 0% or 100%) in this type of design: (25b) will provide us with a baseline that will show us what proportion of positive responses we should expect in the absence of any grammatical violation, purely based on pronoun resolution preferences unrelated to binding. If we find fewer positive responses in (25a) than in (25b), even if they are not at zero, this would lend itself to an explanation in terms of reconstruction (but see the discussion at the end of this section for why this conclusion may be premature).
The design, hypotheses, and planned analysis of Experiment 2 were pre-registered prior to data collection at https://osf.io/mjgpz.
5.1 Participants and procedure
The basic procedure was the same as described above for the AP/DP experiment and Experiment 1. A web-based questionnaire was set up using the platform L-Rex (Starschenko & Wierzba Reference Starschenko and Wierzba2020). No acceptability ratings were collected. A total of 32 participants, recruited via prolific.co, were tested and 78 stimuli were presented to each participant (32 critical items, 44 fillers, and 2 items intended for exploratory investigation of an additional research question). For the critical items, 128 data points were collected per condition/question (four from each participant). On average, completing the questionnaire took 24 minutes.
5.2 Design and materials
Experiment 2 had a 2 × 2 × 2 design. The eight conditions are illustrated in (26). The first manipulated factor was movement (in situ/moved). The second factor was phrase: the R-expression was either contained in the subject (in that case, the object was a pronoun) or in the object (in that case, the subject was a pronoun). The third manipulated factor was arg/adj: the R-expression was either contained in an argument PP of the noun or in a PP adjoined to the noun.Footnote 16
The corresponding questions, Q1 and Q2, were Kann man den Satz so verstehen, dass Lisa/Hanna eine Geschichte ärgerlich fand? ‘Can this sentence be interpreted such that Lisa/Hanna found a story upsetting?’ (object conditions) and Kann man den Satz so verstehen, dass eine Geschichte Lisa/Hanna verärgert hat? ‘Can this sentence be interpreted such that a story upset Lisa/Hanna?’ (subject conditions).
5.3 Hypotheses and predictions
All hypotheses below refer to effects on the proportion of ‘yes’ answers to the question about coreference between the pronoun and the R-expression in the embedded clause (Q2). For the evaluation of hypotheses H1, H2(a), and H2(b), we take into account only the argument conditions. Whether there is a difference between arguments and adjuncts is tested by means of hypothesis H3.
Hypothesis H1 is the premise on which the experimental design relies; only if this holds are the results informative in the intended way. It predicts a simple effect of phrase in the following direction: there should be more positive responses to Q2 in the ‘subject in situ (argument)’ condition (in which there is no Condition C violation) than in the ‘object in situ (argument)’ condition.
The two crucial hypotheses with respect to reconstruction are:
Both hypotheses presuppose that H1 holds. If that is the case, then H2(a) predicts that there should also be a simple effect of phrase in the condition with movement: there should be more positive responses to Q2 in the ‘subject moved (argument)’ condition than in the ‘object moved (argument)’ condition. Crucially, this would be evidence in favor of reconstruction that is not based on the lack of an effect. Hypothesis H2(b) predicts an interaction between movement and phrase: The difference between ‘object moved (argument)’ and ‘subject moved (argument)’ should be smaller than between ‘object in situ (argument)’ and ‘subject in situ (argument)’.
Hypotheses H2(a) and H2(b) are not mutually exclusive. Our design potentially allows us to distinguish between data patterns compatible with exceptionless reconstruction for Condition C (evidence for H2(a), no evidence for H2(b)), fully surface-oriented evaluation of Condition C (no evidence for H2(a), evidence for H2(b)), and patterns in which both the base position and the surface position of the moved phrase play a role (in case we find evidence for both H2(a) and H2(b)).
In addition, we test the argument/adjunct asymmetry hypothesis:
We consider two predictions of H3. First, if the prediction of the reconstruction hypothesis H2(a) is borne out, then there should be a simple interaction between phrase and arg/adj within the ‘moved’ conditions in the following direction: there should be a smaller difference between ‘subject moved (argument)’ and ‘subject moved (adjunct)’ than between ‘object moved (argument)’ and ‘object moved (adjunct)’. The reasoning behind this prediction is that if there is no reconstruction for adjuncts, then they should always show a high proportion of positive answers to Q2 in the conditions with movement: i.e. they should be more similar to arguments in the subject movement conditions than in the object movement conditions (where H2(a) predicts less ‘yes’ answers for arguments). Second, H3 predicts a simple interaction between movement and arg/adj within the ‘object’ condition in the following direction: there should be a larger difference between ‘object in situ (adjunct)’ and ‘object moved (adjunct)’ than between ‘object in situ (argument)’ and ‘object moved (argument)’. This is based on the reasoning that the reconstruction hypothesis predicts a similar pattern in the in situ and moved conditions for arguments. If there is no reconstruction for adjuncts, there should be a difference between the in situ and moved conditions.
Two generalized linear mixed models were fit. The contrast coding was chosen in such a way that it allowed us to test all predictions described above. Thus, in both models, all factors were treatment-coded, with object as the baseline level of phrase and argument as the baseline level of arg/adj. For the factor movement, two different kinds of contrast coding were required in order to test all of the predictions. In Model 1, ‘in situ’ was coded as the baseline level. This means that in the output of this model, phrase will represent a simple effect: the difference between ‘object’ and ‘subject’ within the levels ‘argument’ and ‘in situ’ of the other factors. This will allow for evaluation of the predictions of H1. In Model 2, ‘moved’ was coded as the baseline level for evaluation of H2(a) and the first prediction of H3, which predict simple effects/interactions within this level. For H2(b) and the second prediction of H3, the contrast coding of the factor movement is not relevant, thus, it can be evaluated based on the output of any of the models.
According to a generalized linear mixed model, the prediction of H1 (Condition C hypothesis) was confirmed: a simple effect of phrase was found within the levels ‘in situ’, ‘argument’ of the other factors (z = 8.226, p < 0.001 in Model 1). The prediction of H2(a) (reconstruction hypothesis) was confirmed: a simple effect of phrase was also found within the levels ‘moved’ and ‘argument’ of the other factors (z = 2.391, p = 0.017 in Model 2). The prediction of H2(b) (surface hypothesis) was confirmed: a simple interaction between movement and phrase was found within the level ‘argument’ of the remaining factor (| z | = 5.596, p < 0.001 in Models 1/2). Neither of the predictions of H3 (argument/adjunct asymmetry hypothesis) was confirmed: there was no significant simple interaction between phrase and arg/adj within the level ‘moved’ (z = 0.087, p = 0.931 in Model 2), nor a significant simple interaction between movement and arg/adj within the level ‘object’ (| z | = 0.209, p = 0.834 in Models 1/2). The full results of the models are shown in Tables 8 and 9.
In Experiment 2, evidence for both the reconstruction hypothesis and the surface hypothesis was found: the finding that the ‘moved subject’ and ‘moved object’ conditions differ in spite of their similarity at the surface supports the view that the base position of the moved phrase plays a role. However, the difference between the ‘moved object’ and ‘in situ object’ conditions shows that the surface position matters as well. No evidence for an argument/adjunct asymmetry was found.
How can the finding that the base position plays a role (pointing toward reconstruction) be reconciled with the finding that the surface position also matters (speaking against reconstruction)? In other words, how can we interpret intermediate response patterns that neither correspond to the clear 0/100 divide that we would expect if coreference were fully determined by binding principles and reconstruction, nor to the complete absence of a difference between ‘moved subject’ and ‘moved object’ expected if reconstruction did not play a role at all? There are two basic possibilities: First, there is reconstruction but other factors lead to a higher availability of coreference than expected. Second, there is no reconstruction and, despite the fact that the subject/object conditions are near-minimal pairs, there are additional factors causing the asymmetry.
We will first discuss an interpretation in terms of reconstruction. In Section 2.3, we considered three possible scenarios in which the wh-phrase is reconstructed, but coreference in the moved condition is still available to some extent. The first two options had to do with a potential by-subject or by-item split: there might be a group of participants that reconstructs and another that does not (variation between dialects or idiolects); or a fixed group of items which reconstructs and another which does not (not necessarily related to our categorization as an argument/adjunct – otherwise we should have seen a difference between our conditions in this respect – but to some inherent property of the items). However, the idea of a by-subject or by-item split was not supported by post hoc analyses. We take the difference between the positive responses to Q2 in the ‘moved subject’ and ‘moved object’ conditions to be the main indicator of reconstruction. By-subject and by-item analyses of this measure revealed a gradient and unimodal distribution rather than a split between subgroups of speakers or items. Thus, whether coreference is available or not in the moved condition does not seem to vary systematically, based on idiolects or a specific property of the items, but it rather seems to vary individually from case to case.
Another possibility that we considered in Section 2.3 was that even if there is reconstruction (i.e. even if the PP modifier is present in the bottom copy), non-syntactic factors could still influence participants’ judgments in this type of experimental task and lead them to respond with ‘yes’ in spite of a Condition C violation. In order to explain the pattern that we see, these would need to be factors that are likely to raise the availability of coreference in the ‘object moved’ more than in the ‘object in situ’ condition. One such factor could be closeness: coreference with the embedded R-expression could be judged to be possible because it is a very close antecedent in the moved condition. Another factor could be linear order: coreference with the embedded R-expression would yield a (usually preferred) anaphoric instead of a cataphoric relation in the moved condition, potentially raising the number of positive responses. The degree to which participants’ judgments are affected by such factors could plausibly vary on an individual basis, accounting for the gradience. This suggests treating Condition C as a soft factor, a point we return to in the general discussion. The subject–object asymmetry then follows under the assumption that the violation of a soft constraint still comes at a price, viz., reduces the availability of coreference.
Based on our data, one cannot assess conclusively how much these potential factors contribute to the patterns and how the various factors interact exactly. Thus, while there may be plausible explanations for why different response patterns to ‘object in situ’ and ‘object moved’ could emerge even if there is reconstruction, we cannot rule out that the asymmetry is due to non-reconstruction. This in turn leads us to the second possible interpretation of the pattern in this experiment, namely that the difference between ‘object moved’ and ‘subject moved’ is not explained in terms of reconstruction but by means of non-syntactic factors.
The anonymous reviewers suggested the following two possible alternative explanations of the contrast. First, it has been observed in pronoun resolution that parallelism between the function of the antecedent and the pronoun increases the likelihood of coreference (e.g. Stevenson, Nelson & Stenning Reference Stevenson, Alexander and Stenning1995). Thus, a subject pronoun prefers a subject antecedent, while an object pronoun prefers an object antecedent. This could indeed have an effect in the design used in Experiment 2: in the object moved condition, the pronoun is a subject. Consequently, it could be more attracted to the matrix subject; the lower availability of coreference with the embedded R-expression could thus be related to this factor rather than reconstruction (while in the subject moved condition, the pronoun is an object, which would not be equally attracted to the matrix subject).Footnote 17 The second alternative capitalizes on the differential anaphoric availability of grammatical functions: in general, subjects tend to be more prominent antecedents than objects (recall from Section 2.3). This could affect coreference in Experiment 2 as follows: in the object moved condition, the matrix subject is far more salient than the R-expression within the wh-moved object, while in the subject moved condition, the asymmetry is not as substantial since both phrases bear the subject function.
These two explanations of the subject–object contrast based on non-syntactic factors are indeed obvious alternatives, which we address in the next experiment. Note that the role of non-syntactic factors is certainly visible in one part of the data. Although the examples with the subject in situ are uncontroversially grammatical, coreference with the lower R-expression is only available at 56–66% and thus substantially lower then the 100% one might a priori expect. This is clearly related to the fact that the matrix subject is a more salient antecedent than an R-expression within a wh-phrase.Footnote 18
6. Experiment 3: Non-syntactic factors (non-syn)
Given the alternative explanations of the subject–object contrast detected in Experiment 2 that we discussed at the end of the last section, we designed another experiment to be able to tease apart the syntactic explanation (based on reconstruction) from the non-syntactic one (based on preferences in pronoun resolution). While we adopted the basic design of the previous experiment, to test both alternative explanations (parallel function and higher salience of subjects), certain modifications were necessary. We added a context sentence before the indirect question in which we introduced two R-expressions one of which would be repeated as the R-expression within the wh-phrase. In addition, we varied the grammatical function of the two R-expressions in the context sentence (subject vs. object). The context sentence is followed by the matrix clause that introduces the indirect question. Unlike in the previous experiment, no referent is introduced in the matrix clause; rather, an impersonal construction is used. The basic structure of an item would thus be as follows (in (27) with a wh-object):
There are four conditions: The wh-phrase is either a subject or an object and the grammatical function of the R-expression in the context sentence that will be taken up within the wh-phrase is either subject or object. In what follows, R2 is the R-expression contained in the wh-phrase, R1 is the other R-expression that only occurs in the context sentence.
Under the parallel function hypothesis, the expectation is that coreference with the R-expression inside the wh-phrase (R2) will be higher if the referent of this R-expression is introduced (in the context sentence) with the same grammatical function as the pronoun. Thus, the availability of coreference with the lower R-expression (R2) does not depend on the grammatical function of the wh-phrase; rather, it is (indirectly) affected by the relationship between R2’s grammatical function in the context sentence and the grammatical function of the pronoun. Under the subject prominence hypothesis, the expectation is that coreference with the R-expression within the wh-phrase (R2) is higher if R2 is introduced as a subject in the context sentence. Again, the availability of coreference with R2 is indirectly affected, namely by its grammatical function in the context sentence.
6.1 Participants and procedure
The procedure was the same as in Experiment 2; 32 participants were tested and 76 stimuli were presented to each participant (32 critical items, 44 fillers). For the critical items, 256 data points were collected per condition/question (8 from each participant). On average, completing the questionnaire took 26 minutes.
6.2 Design and materials
Experiment 3 had a 2 × 2 design. The four conditions are illustrated in (28). We partially followed the design of Experiment 2: we adopted the factor phrase (R-expression contained in subject/object). We dropped the factors movement and argument/adjunct: R2 was always contained in a PP-argument of a wh-moved DP. Function in context was included as an additional factor: As mentioned above, we added a context sentence before the indirect question in which we varied which of the two proper names was introduced as a subject and which as an object.
The corresponding questions Q1 (coreference with R1) and Q2 (coreference with R2) were (object wh) Kann man den Satz so verstehen, dass Kerstin ein Geschenk entzückend fand/Ilse ein Geschenk entzückend fand? ‘Can this sentence be understood such that Kerstin found a present enrapturing/Ilse found a present enrapturing’ and (subject wh) Kann man den Satz so verstehen, dass Ilse ein Geschenk entzückte/Kerstin ein Geschenk entzückte? ‘Can this sentence be understood such that a present enraptured Ilse/a present enraptured Kerstin?’
The items were based on the indirect questions in Experiment 2, we only added a context sentence and an impersonal matrix clause. Context sentence and indirect question were identical in all items except that we varied the object of the preposition: ‘party’, ‘meeting’, ‘celebration’, ‘festivity’. The fillers were identical to those in Experiment 2, except that we also added a context sentence containing two R-expressions to make the task similar to that of the items.
6.3 Hypotheses and predictions
H1 predicts a main effect of phrase in the direction of more ‘yes’ responses to Q2 in [subject wh-phrase] than in [object wh-phrase].
H2 predicts an interaction between phrase and function in context. The interaction should be toward more ‘yes’ responses to Q2 in [object wh-phrase, R-exp. in wh-phrase introduced as subject] and [subject wh-phrase, R-exp. in wh-phrase introduced as object] than in [object wh-phrase, R-exp. in wh-phrase introduced as object] and [subject wh-phrase, R-exp. in wh-phrase introduced as subject].
H3 predicts a main effect of function in context in the direction of more ‘yes’ responses to Q2 in [R-exp. in wh-phrase introduced as subject] than in [R-exp. in wh-phrase introduced as object].
The results are summarized in Table 10 and illustrated in Figure 3. Generalized linear mixed models were fit. The contrast coding allowed us to test the predictions described above: both factors were sum-coded. The full model results are shown in Table 11. The predictions of H1 (reconstruction hypothesis) were confirmed: there was a significant main effect of phrase in the predicted direction. The predictions of H2 (parallel function) and H3 (subject prominence) were not confirmed: there was neither a significant main effect of function in context nor a significant interaction between the two factors.
We did not find evidence for the hypotheses H2 and H3: the availability of coreference with R2 was not significantly affected by the grammatical function it bears in the context sentence. As in Experiment 2, there is a contrast between wh-subjects and wh-objects: Coreference with R2 is more available with wh-subjects than with wh-objects, as predicted by H1 (reconstruction hypothesis).
These findings speak against the view that the subject/object contrast that we observed in Experiment 2 can be fully reduced to pronoun resolution preferences: a residual contrast remains even if these factors are controlled for. However, numerically, the subject/object contrast is smaller than in the previous experiment. Compared to Experiment 2, where the difference between wh-subjects and wh-objects was between 36% and 51%, in Experiment 3, the difference was smaller (45 vs. 51%) which, given the results we have obtained so far in our experiments, is probably close to the maximum that one can get for Q2 in this experimental design. This difference between Experiments 2 and 3 (with the caveat that this is only a tentative post hoc observation across two separate experiments) would be compatible with the view that while none of the specific parallelism/salience-based hypotheses that we tested was confirmed, there might indeed be another important factor facilitating coreference here, namely the accessibility of the referent. If it has already been introduced in the previous discourse as in Experiment 3, coreference with R2 seems to become more available. This observation recalls the discussion in Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019: 268–269), who argue for an improvement in a similar configuration (without, however, investigating this experimentally). There remains an asymmetry, though, in that coreference with R2 with wh-subjects seems to be unaffected by this. The values for Q2 do not differ between Experiments 2 and 3 (51% for moved arguments in both experiments).
As for the consequences for syntactic theory, the results of Experiment 3 challenge the view that the main factors affecting coreference are reconstruction and a hard Condition C constraint. This would imply a categorical split into grammatical (wh-subjects) and ungrammatical (wh-objects), which seems inadequate in view of the small size of the subject/object contrast that remains once the influence of potential confounds is reduced. Nevertheless, the contrast does not vanish completely, which also needs to be accounted for – we will discuss possible explanations in the next section. From a methodological point of view, Experiment 3 stresses the importance of the context sentences used in the materials and of the development of experimental designs that allow us to also detect subtle contrasts.
7. General discussion and conclusion
Given the results of Experiments 2 and 3, reconstruction and a hard Condition C constraint cannot be considered the main factors governing coreference in German A′-movement. We will now discuss the implications for DP movement in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), the AP/DP experiment and Experiment 1, differences between Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), the AP/DP experiment and Experiment 1 versus Experiments 2 and 3, the AP/DP and wh-relativization contrast, possible differences between German and English, and the status of the residual subject/object asymmetry found in Experiment 3.
First, the data with wh-subjects in Experiments 2 and 3 support our arguments against the threshold-based logic of previous work (including our own) according to which values below a certain threshold close to zero indicate reconstruction, while values above it indicate non-reconstruction. In subject wh-movement where Condition C is not at stake, coreference with R2, which is fully grammatical, remains around 50%. This is clearly related to a non-syntactic factor, namely the preference for coreference with the matrix subject. It is plausible that this also contributes to the low coreference values for R2 that we obtained in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), the AP/DP experiment, and Experiment 1 with wh-movement of DP-objects. Together with the results of Experiment 3, which clearly argue against reconstruction and a hard Condition C constraint as the main factors governing coreference in A′-movement, a plausible interpretation of the DP movement data in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), the AP/DP experiment, and Experiment 1 is that the low values for Q2 do not result from reconstruction and thus a classical Condition C violation either (our conclusion is thus eventually similar to Bruening & Al Khalaf Reference Bruening and Al Khalaf2019, although their reasoning is based on a different non-syntactic factor).
Second, it is notable that the proportion of positive responses to Q2 in the ‘moved object’ conditions was overall higher in Experiments 2 and 3 (36–46%) than in experiments 2 and 4 of Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), the AP/DP experiment, and Experiment 1 (7%, 11%, 21%, and 13%, respectively, in the short conditions). We think that this may be due to changes in the materials: in order to construct semantically similar object and subject variants, we mainly chose predicates expressing emotion or evaluation, for which pairs of the type ‘X found Y upsetting’ and ‘Y upset X’ could be constructed. It is possible that this change made the items more uniform and overall increased the plausibility of the coreferential reading. In experiments 2 and 4 of Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), the AP/DP experiment, and Experiment 1, all kinds of transitive verbs were used, which perhaps introduced a larger amount of variability in plausibility; it might be generally quite easy to imagine having some emotion or attitude toward a book/report/rumor/etc. about oneself, whereas it might vary more whether performing a specific action to it (reading/hiding/falsifying/etc.) is perceived as likely. These substantial differences also provide another argument against threshold-based reasoning as they show how much the values can vary between experiments based on properties of the materials, even though the basic grammatical configuration is the same (object wh-movement).
Third, what remains open at this point is what the reasoning so far implies for the AP/DP-contrast in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), the AP/DP experiment, and the wh-/relativization contrast in Experiment 1. Starting with the AP/DP-contrast, while the facts are compatible with a theory where only APs/predicates reconstruct (see, e.g. Adger et al. Reference Adger, Drummond, Hall, van Urk, Lamont and Tetzloff2017; Bruening & Al Khalaf Reference Bruening and Al Khalaf2019), we cannot rule out that the factors that increased coreference with R2 for object wh-movement in Experiments 2 and 3 could also lead to a significant improvement with AP-movement. Thus, perhaps, coreference with APs also becomes more available with experiencer predicates, e.g. in a sentence like Peter erzählt, wie ungerecht gegen Hans ihm das Urteil erscheint ‘Peter tells us how unjust against John the judgment seems to him’. Furthermore, it may be possible to further increase the availability of coreference by introducing the R-expression inside the AP in the previous context as we did for DPs in Experiment 3. The (remaining) AP/DP-asymmetry could then perhaps be related to non-syntactic factors as well (e.g. a difference in plausibility). At this point, any conclusions about AP-reconstruction and the AP/DP-contrast strike us as premature, and we intend to investigate the different possibilities in future work. As for the wh-relativization contrast in Experiment 1, if wh-movement of DPs does not involve a typical Condition C violation, just like relativization, a non-syntactic explanation is required. In Section 4 we already hinted at such a possibility: a difference in referential accessibility. The difference between Experiments 2 and 3 can be interpreted such that coreference with R2 is more available if the referent is established in the prior discourse and thus more accessible. Importantly, relative clauses are sometimes analyzed as involving topicalization of some sort. For instance, in cartographic work, relative pronouns have been argued to occupy a topic position (see Bianchi Reference Bianchi1999). Since the pronoun refers back to the head of the relative clause, this could make it and the R-expression contained in it referentially more accessible and thus facilitate coreference. It is not clear at this point, though, whether this is sufficient to account for the asymmetry. Note that the external head in our experiments is headed by a universal quantifier, which does not make an ideal topic. Conversely, the wh-phrase in our experiments is headed by ‘which’, which is normally associated with D-linking. Thus, it is not clear whether the referential asymmetry is large enough to account for the difference in coreference (it would arguably have to be the relative pronoun that makes the difference). Given these limitations, we leave an exploration of this hypothesis for the wh-relativization asymmetry for future work.
The results of the experiments in Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021), the AP/DP experiment, and Experiment 1 deviated from what had been reported for English in that we found coreference with DPs to be much less available. Initially, we investigated whether the differences with regard to coreference could be related to differences in the method or in the materials. However, as described in the Appendix (A.1), an additional experiment based on the materials of the AP/DP experiment but using the forced-choice method of Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019) did not change the results as the availability of coreference remained very low. We also replicated the second experiment of Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019) by translating their items into German (and using their forced-choice method) and still found a substantial difference in the availability of coreference, see the Appendix (A.2). This puzzling asymmetry has changed with Experiments 2 and 3 where the values for wh-objects are closer to the English facts and which invite the same conclusion, namely that reconstruction and, consequently, a hard Condition C constraint are not the main factors affecting coreference in A′-movement. Thus, the picture has become less clear-cut with the evidence for reconstruction in German fading and new evidence for reconstruction in English being introduced into the discussion (as in Stockwell et al. Reference Stockwell, Meltzer-Asscher, Sportiche, Farinelly and Hill2021). Given that crosslinguistic differences in this area are a priori unexpected, a thorough crosslinguistic comparison of Condition C reconstruction remains an important topic for future research. We should add that our results converge with previous work on English with regard to the lack of an argument–adjunct asymmetry (but see Stockwell, Meltzer-Asscher & Sportiche, to appear, for different results).
The final aspect we need to address is the residual subject–object asymmetry in Experiments 2 and, especially, 3. While the coreference values for subject and object wh-movement become very similar in Experiment 3 and a classification into grammatical (wh-subjects) and ungrammatical (wh-objects) seems inadequate, there remains a difference: (i) the small numerical difference is significant in Experiment 3 and (ii) only objects are affected by the design change in Experiment 3 where the R-expression within the wh-phrase is introduced in the previous discourse, while wh-subjects show the same coreference values in Experiments 2 and 3. We can think of two principled possibilities, (i) another non-syntactic factor or (ii) a possibly soft/violable syntactic factor. One possible non-syntactic factor could be the grammatical relation of the wh-phrase that contains the antecedent. Given that subjects are more salient antecedents than objects, if the grammatical function of an XP also affects the accessibility of R-expressions contained in that XP, we expect R-expressions inside subjects to be more accessible than R-expressions inside objects. While not implausible, we are not aware of any independent work supporting this assumption. In addition, the predictions are the same as any syntactic account that relies on the subject–object asymmetry, which is why they cannot easily be teased apart in the case at hand.Footnote 19
The alternative to a non-syntactic factor is a soft/violable grammatical factor. It has long been known that Principle C is sometimes violable. Recent work on backward anaphora by Gor & Syrett (Reference Gor, Syrett, Ronai, Stigliano and Sun2019) and Gor (Reference Gor2020) has shown that while a Condition C-configuration leads to an expectation of obviation (see Safir Reference Safir2004), it can be overridden under certain conditions, including various pragmatic conditions and especially under high plausibility of coreference. Thus, a violable Condition C constraint could interact with other constraints (plausibility, referential accessibility) and cause the weak subject–object asymmetry. Given the variable results we have obtained for wh-objects, it is conceivable that the Condition C effect is visible to various extents, depending on the strength of the other factors. Note that this view requires reconstruction (viz., presence of PP modifiers in the bottom copy) after all and thus something else must be said to account for the in situ/moved contrast. One possibility is that in the moved condition, the initial parse (before reconstruction) would involve forward anaphora rather than backward anaphora. Since forward anaphora is preferred over backward anaphora, this could facilitate coreference.Footnote 20 Our data do not abjudicate between these two views. What they show quite clearly is that a theory which includes both reconstruction and a hard Condition C constraint fails. They are compatible with either a theory without reconstruction (viz., no PP modifiers in the bottom copy), but with a possibly strong Condition C constraint (supplemented by some additional constraint to account for the subject–object asymmetry) or a theory that includes reconstruction (viz., PP modifiers present in the bottom copy) and a soft Condition C constraint (and some additional factor to account for the object moved/object in situ asymmetry).
We will conclude by stressing an important methodological point: we have criticized previous work for relying on differences in coreference between moved/in situ. We do believe that as long as only object wh-movement is investigated, the problems we have identified remain serious: the two structures differ in other respects that may affect coreference judgments (forward/backward anaphora, linear distance). Our design in Experiments 2 and 3 is crucially different in that we don’t have to rely exclusively on the difference between moved and in situ in the object conditions. Rather, we can compare the values for object moved with two reference points: object in situ, which provides the baseline for ungrammatical structures, and the subject conditions, which provide the baseline for grammatical structures. We can then determine whether the values in ‘object moved’ are closer to the ungrammatical baseline or the grammatical baseline and draw conclusions based on that rather than having to rely on thresholds of absolute values. Importantly, the comparison moved/in situ remains important in our design; without it, we would only obtain a subject–object contrast and crucially could not conclude that Condition C is best viewed as a soft constraint.
A. Further methodological aspects
Given the discrepancies between Wierzba et al. (Reference Wierzba, Salzmann and Georgi2021) and the results reported for English in Adger et al. (Reference Adger, Drummond, Hall, van Urk, Lamont and Tetzloff2017) and Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019), we conducted an experiment testing to what extent the differences between the languages could be related to the method or materials. It was run parallel to the AP/DP experiment, using the same materials, but adopting Bruening & Al Khalaf’s (Reference Bruening and Al Khalaf2019) forced-choice method. The experiment not only contained a replication of the AP/DP experiment, but also a replication of Bruening & Al Khalaf’s (Reference Bruening and Al Khalaf2019) study with the same materials (translated into German)Footnote 21 and an exploratory investigation of subject wh-movement (a pilot for Experiment 2). Participants were recruited via prolific.co and 36 native speakers of German took part. We used the platform L-Rex (Starschenko & Wierzba Reference Starschenko and Wierzba2020) for the web-based questionnaires. The instructions were adopted from Bruening & Al Khalaf (Reference Bruening and Al Khalaf2019) with 100 stimuli presented to each participant: 48 critical items, 36 fillers, 8 items for the exploratory investigation, and 8 items that were direct translations of Bruening-AlKhalaf’s materials. For the critical items, 144 data points were collected per condition/question (four from each participant). On average, the questionnaires took about 25 minutes to complete.
A.1. Replication of the AP/DP experiment with the forced-choice method
The results of our parallel study to the AP/DP experiment are summarized in Table 12.
Fitting a statistical model to all data from this experiment was impeded by the presence of complete separation (i.e. 100% positive responses and thus 0 variance) in the short AP conditions. This made it impossible to fit a converging model to the complete data set. However, visual inspection of the percentages in the short conditions shows that there is no trend toward less robust reconstruction with DPs (in contrast to the AP/DP experiment with the two-question method): the embedded R-expression was not chosen more frequently as a referent in the moved than in the in situ condition. For statistical analysis of the remaining data (embedded 1 and embedded 2), we decided to use sum-coding for distance (in a post hoc decision) to be able to compare these two levels to each other. The factors movement and category were also sum-coded, as in the AP/DP experiment. Within this subpart of the data, a significant main effect of movement was found (z = 3.48, p < 0.001), but no significant main effect of category nor distance. None of the interactions was significant. The full model output is shown in Table 13.
In comparison to the AP/DP experiment, the results of the replication show a divergence: the AP/DP asymmetry found in the AP/DP experiment was not detected in the forced-choice replication and coreference with DP movement was so low that, given a threshold-based logic, it would arguably be interpreted as evidence for a Condition C effect, a conclusion that is less obvious under our two-question method, where coreference with the embedded R-expression was more available.
A.2. Replication of Bruening & Al Khalaf’s ( Reference Bruening and Al Khalaf 2019 ) second experiment
The results of our replication are shown in Table 14, in comparison to the original English experiment. Both factors (movement, argument/adjunct) were sum-coded. No main effect of movement (z = 0.37, p = 0.71) nor argument/adjunct (z = –0.18, p = 0.86) was found, nor a significant interaction (z = –0.15, p = 0.89). The results show that coreference is much less available in German under forced choice as well, even though the same method and direct translations of the materials were used here. In the wh-movement conditions, coreference with the embedded R-expression was preferred much more frequently in the English study (arguments: 22%, adjuncts: 31%) than in our German replication (8%/6%).
A.3. Subject questions under forced choice
In four exploratory items, we investigated subject wh-movement, a pilot for our Experiment 2. These items were also part of the AP/DP experiment. The results in both experiments are shown in Table 15. With the two-question method, coreference with the embedded R-expression is available to a substantial extent, as expected given that Condition C is not violated. With the forced-choice method, however, coreference with the embedded R-expression is available only to a very small extent, even though such examples are unquestionably grammatical. Given the threshold-based logic of such approaches, the 5.6% may be considered close enough to zero and thus – wrongly – be taken to indicate ungrammaticality.