19.1 Introduction
One fundamental constraint on natural language is that, while linguistic representations are hierarchical, words have to be produced (and heard) one after the other. Many cognitive scientists argue this requires that speakers and hearers map the hierarchical information associated with sentences and their meanings into (and from) linearized strings of linguistic material while minimizing ambiguity and information loss. This mapping is made possible by the complex interaction of word order, morphology, and prosody. At the prosodic level, duration (the focus of this chapter), together with pitch and intensity, contributes both to speech segmentation (e.g., to identify word boundaries) and to grouping words into hierarchically organized prosodic representations (partially) mapping into underlying syntactic structures and their semantic interpretation.
Our ability to use durational information for speech disambiguation depends in turn on the ability to track the rhythmic properties of speech, as defined in this volume. For example, while speech rate itself does not carry grammatical information, it largely shapes our perception of both segmental and suprasegmental features, leading us to perceive the same speech sounds as being longer in the context of faster speech rate and shorter in the context of slower speech rate.Footnote 1 In other words, speech rate provides a baseline against which we can evaluate the durational properties of segmental and suprasegmental information against our predictions of their expected duration at that specific rate. At segmental level, effects of rate normalization have been reported for, as examples, voice onset time (Miller and Grosjean, Reference Miller and Grosjean1981; and others), vowel length (Reinisch and Sjerps, Reference Reinisch and Sjerps2013; and others), and word segmentation (Salverda et al., Reference Salverda, Dahan and McQueen2003; Reinisch et al., Reference Reinisch, Jesse and McQueen2011). Similarly, in the domain of phrasal prosody, the realization and interpretation of prosodic boundaries have also been shown to be contextually dependent in production and their interpretation in parsing (on the importance of relative versus absolute values in the perception of boundaries, see, for example, Schafer, Reference Schafer1997; Carlson et al., Reference Carlson and Frazier2001; Clifton Jr et al., Reference Carlson and Frazier2002; Frazier et al., Reference Frazier and Carlson2006; and others).
The durational properties of speech, however, are not only determined by structural factors but can also be modeled as a function of predictability: Less predictable words and segments are typically produced more slowly, and more predictable material is produced more quickly. Considerations of predictability and structure often make aligned predictions in relation to duration (e.g., in the domain of focus or in the case of early closure discussed below). To distinguish the relative contribution of these two factors, we identified a case in which the two factors make opposite predictions: the contrast between structural sisterhood (e.g., The horse raced past the barn and fell) and nesting (e.g., The horse raced past the barn fell). It is well established that nested structures are harder to process and understand (at least out of the blue) than sisterhood structures. While the source of the so-called garden path effects triggered by nesting is still a contentious matter, predictability-based accounts have become increasingly more prominent in psycholinguistics (Hale, Reference Hale2001; Levy, Reference Levy2008).
The main aim of this chapter is to argue that there are well-defined sets of environments for which lower predictability is in fact associated with shorter duration due to prosodic principles taking precedence over predictability in shaping the durational properties of speech. After briefly summarizing the role of prosodic structure and predictability in duration, we report on four recent studies showing that an inverse relation between duration and complexity is observed in the prosodic disambiguation of nested garden path sentences in production and comprehension. Specifically, the results of these studies show that, across different syntactic categories, nested garden path sentences are exceptional in displaying a relatively faster tempo when disambiguated towards less predictable structures. The important conclusion here is that, while predictability plays an important role in modulating duration, when in conflict with prosody, prosodic structure takes precedence.
We see the acceleration of a speech rhythm baseline as a hallmark of structural nesting in garden path sentences, and we suggest an account in which this change of tempo stems from an interaction between two types of principles: principles governing the syntax–prosody mapping, and principles balancing the size of prosodic phrases.Footnote 2 Nesting naturally generates a conflict between these two types of principles: Syntactically and semantically, it creates complex objects that should ideally be prosodified as a single constituent. This, however, would result in longer phrases, which can be difficult to manage for balancing principles. One solution is to break these long syntactic objects into separate prosodic phrases (as in extraposition; Wagner, Reference Wagner2005, Reference Wagner2010). We argue that nested garden path sentences, however, are unique in preventing nested elements from being prosodically separated from their hosts. In these structures the need for balance is satisfied by an increasing tempo of the long phrase. This solution increases parsability of nested garden paths while satisfying both types of principles. In essence, the prosodic pattern of nested garden path sentences exemplifies how different levels of representation interact and impact the rhythmic properties of speech. In simpler terms, disrupting the rhythm of speech at a lower level (tempo acceleration) helps create a smoother rhythm at a higher level (balanced prosodic phrases).
19.1.1 Duration and Prosodic Structure
We will follow the common assumption that prosodic structures constitute an independent level of representation. In this perspective, prosody can be understood as the structure that determines and organizes the acoustic realization of an utterance in relation to its phrasing (the chunking of speech) and (lexical and post-lexical) prominence configuration, among other aspects of speech (i.e., voice quality, rate; Selkirk, Reference Selkirk1980; Beckman and Pierrehumbert, Reference Beckman and Pierrehumbert1986; Nespor and Vogel, Reference Nespor and Vogel2007). Prosodic structure is conceived as a grammatical system, made of hierarchically ordered constituents, that is distinct from syntactic structure (but see other approaches to the prosody–syntax interface; Cooper and Paccia-Cooper, Reference Cooper and Paccia-Cooper1980; Gee and Grosjean, Reference Gee and Grosjean1983) and is universal (with the implementation of language-specific adjustments). Figure 19.1 schematizes prominence and phrasing-related constituents of the prosodic structure as posited by well-known phonological accounts of prosody (i.e., the autosegmental-metrical [AM] framework; Pierrehumbert, Reference Pierrehumbert1980; Beckman and Pierrehumbert, Reference Beckman and Pierrehumbert1986; Ladd, Reference Ladd1986) and described using notational conventions such as the ToBI (Tones and Break Indices) transcription system (Silverman et al., Reference Silverman, Beckman and Pitrelli1992; Beckman and Ayers, Reference Beckman and Ayers1997).
Prosodic hierarchy.
Schematic representation of the prosodic hierarchy. “IP” stands for intonational phrase (demarcated by boundary tones “T %”) and “ip” for intermediate phrase (demarcated by phrase accents “T-”) grouping words (“ω”) and syllables (“σ”). T* stand for pitch accents realized on lexically stressed syllables.

While syntax and prosodic structure are not isomorphic, the relative degree of structural and interpretive integration of linguistic elements is one strong determinant of an utterance’s prosodic structure. Principles governing the interface of prosody and syntax, such as edge alignment (Selkirk, Reference Selkirk and Horne2000) and wrap (Truckenbrodt, Reference Truckenbrodt1995), push for prosodic phrasing to align with syntactic phrases as much as possible.
Edge alignment:
“The right edge of any syntactic phrase (XP) in syntactic structure must be aligned with the right edge of a Major Prosodic Phrase in prosodic structure.” (Selkirk, Reference Selkirk and Horne2000, p. 232)
Wrap:
“Each syntactic XP must be contained in a phonological phrase.” (Truckenbrodt, Reference Truckenbrodt1995, p. 10)
Within the realm of duration, pre-boundary lengthening (for English; Klatt, Reference Klatt1976; Wightman et al., Reference Wightman, Shattuck-Hufnagel, Ostendorf and Price1992), pauses (Watson and Gibson, Reference Watson and Gibson2005; Breen et al., Reference Breen, Watson and Gibson2011), and domain-initial strengthening (Cho and Keating, Reference Cho and Keating2009) are the main phenomena in which duration determines whether a boundary is perceived and what strength is associated with it (see the discussion in Wagner and Watson, Reference Wagner and Watson2010; Turk and Shattuck-Hufnagel, Reference Turk and Shattuck-Hufnagel2014; Dahan, Reference Dahan2015, and references cited therein).
For our purposes, a good illustration of how duration guides prosodic grouping at the phrasal level across both production and comprehension comes from Kjelgaard and Speer’s (Reference Kjelgaard and Speer1999) work on the disambiguation of garden path sentences such as (1). In (1a) (i.e., the most accessible interpretation in the absence of prosodic disambiguation), the verb leaves is interpreted transitively and forms a single phrase with the following determiner phrase (DP) the house, which is followed by a prosodic break (indicated with “||” here). In (1b) the same verb is interpreted intransitively, and it constitutes the final region of the temporal modifier when John leaves. This region is immediately followed by a prosodic boundary that signals that the following DP the house should not be integrated in the same phrase but should be interpreted as the subject of the following predicate is dark. This parse is strongly dispreferred in the absence of prosodic information or punctuation. The preference for interpreting the verb transitively (as in 1a) can alternatively be interpreted as part of a generalized parsing preference to incorporate incoming words into the phrase being processed whenever grammatically possible (late closure strategy; Frazier, Reference Frazier1979) or as generated by the relative higher frequency of transitive versus intransitive readings of verbs such as leaves (Tanenhaus et al., Reference Tanenhaus, Carlson and Trueswell1989; MacDonald et al., Reference MacDonald, Pearlmutter and Seidenberg1994).
(1)
a. When John leaves the house || it’s dark. Late closure b. When John leaves || the house is dark. Early closure
Kjelgaard and Speer show that lengthening of the pre-boundary word leaves in (1b), compared to when the same word occurs in phrase-medial position in the late closure example in (1a), helps listeners avoid a garden path effect, even in the absence of an actual pause between the verb and the following DP. It is important to stress that what is critical for disambiguation is information about relative duration: The presence of a boundary in a given region will be a matter of relative duration of a given segment, not of the absolute durational properties of that segment. For a detailed discussion of how the importance of relative duration and global, rather than local, measures support a role for prosodic representations, see, for example, Schafer (Reference Schafer1997); Carlson et al. (Reference Carlson and Frazier2001); Clifton Jr et al. (Reference Carlson and Frazier2002); Frazier et al. (Reference Frazier and Carlson2006); Shatzman and McQueen (Reference Shatzman and McQueen2006); Speer and Blodgett (Reference Speer, Blodgett, Traxler and Gernsbacher2006).
Constraints favoring the alignment of syntactic and prosodic phrasing, however, operate in tandem with other eurythmicFootnote 3 or balance constraints, which ensure, for example, that prosodic phrases do not exceed a certain size in production, and are preferably parsed into units of similar length in comprehension (Frazier and Fodor, Reference Frazier and Fodor1978; Gee and Grosjean, Reference Gee and Grosjean1983; Ghini, Reference Ghini1993; Fodor, Reference Fodor2002).
Binary minimum:
“A major phrase must consist of at least two minor/accentual phrases.” (Selkirk, Reference Selkirk and Horne2000)
Binary maximum:
“A major phrase may consist of at most two minor/accentual phrases.” (Selkirk, Reference Selkirk and Horne2000)
Uniformity:
“A string is ideally parsed into same length units.” (Ghini, Reference Ghini1993)
Conflicts can arise between these different constraints and generate mismatches between syntactic and prosodic phrasing such as the one in (2) and (3) (adapted from Turk and Shattuck-Hufnagel, Reference Turk and Shattuck-Hufnagel2014, p. 2):
(2) Syntactic structure:
[This is [the cat [that ate [the rat [that ate the cheese]]]]]
(3) Prosodic structure:
[This is the cat] [that ate the rat] [that ate the cheese].
The extent of syntax–prosody mismatch is rather controversial (for a recent summary, see Bennett and Elfner, Reference Bennett and Elfner2019) and outside the immediate scope of this chapter, so we will not devote a long discussion to it. But to appreciate the debate, note that some researchers (e.g., Wagner, Reference Wagner2005, Reference Wagner2010) suggest that true instances of a mismatch might be more limited than standardly assumed, and that cases such as (2) and (3) can be accounted for by extraposition of the relative clauses (RCs), such that they attach to a higher position in the syntactic structure and provide a much tighter alignment between prosodic and syntactic structures. Whatever the correct account for these and other similar contrasts, they serve well to illustrate the tension between syntax–prosody mapping and eurhythmic principles, an issue we’ll return to below when discussing the prosodic properties of nested garden path sentences.
19.1.2 Duration and Predictability
Durational properties of speech have also been shown to be largely dependent on predictability. More predictable words/segments are reliably associated with a shorter production duration. Less predictable elements of utterances, on the other hand, are more carefully articulated, and thus produced more slowly (Lieberman, Reference Lieberman1963; Aylett, Reference Aylett2000; Jurafsky et al., Reference Jurafsky, Bell, Gregory and Raymond2001; Aylett and Turk, Reference Aylett and Turk2004; Turk and Shattuck-Hufnagel, Reference Turk and Shattuck-Hufnagel2014; Levy and Jaeger, Reference Levy and Jaeger2007; and others).
For example, syllables in words that carry new or contrasting information, and are thus less predictable, are typically associated with prosodic prominence, making them louder, longer, and articulated more carefully (Aylett, Reference Aylett2000; Aylett and Turk, Reference Aylett and Turk2004; Watson et al., Reference Watson, Arnold and Tanenhaus2008; Turk and Shattuck-Hufnagel, Reference Turk and Shattuck-Hufnagel2014).Footnote 4 Aylett (Reference Aylett2000) and Aylett and Turk (Reference Aylett and Turk2004) suggest an information theoretic account of these observations, the smooth signal redundancy hypothesis, according to which the inverse relation between duration and predictability “provides an efficient way of ensuring that elements with low levels of language redundancy are produced for a longer period of time and perhaps with more salient acoustic characteristics, and will thus be likely to be recognized” Aylett and Turk (Reference Aylett and Turk2004, p. 33). This is illustrated in (4), where the words pizza and John are associated with longer duration when in focus. It is also well known that elements in prosodically prominent positions receive more attention, are remembered better, and are processed faster than non-prominent elements (Cutler, Reference Cutler1976, Reference Cutler2012).Footnote 5
a. What does John like? John likes PIZZA.
b. Who likes pizza? JOHN likes pizza.
Similar considerations apply to the domain of syntactic phrasing, where more predictable parses are expected to be produced faster than less predictable ones. The early closure garden path sentence discussed above (and repeated in 5) provides a good illustration of this. As mentioned, according to one analysis of early closure, the intransitive reading in (5b) is harder to parse because it is less predictable, for example, because it is encountered less frequently, than the transitive reading in (5a) (Tanenhaus et al., Reference Tanenhaus, Carlson and Trueswell1989; MacDonald et al., Reference MacDonald, Pearlmutter and Seidenberg1994). A predictability account would assume that a longer duration of the ambiguous region (leaves the house) would help listeners to resolve the ambiguity towards the less predictable parse (presumably by positing a boundary between leaves and the house).
a. When John leaves the house || it’s dark.
b. When John leaves || the house is dark.
As this example illustrates, the same durational difference could in principle be correctly predicted on the basis of both prosodic structure and predictability (i.e., longer duration at leaves for early closure than late closure). In many instances of ambiguity, it would indeed seem that the effects on production of both prosodic structure and predictability are essentially aligned: “when predictability is low, syllables are more likely to be prosodically prominent, and words are more likely to be demarcated using prosodic boundary correlates such as initial- and final-lengthening and pause” (Turk and Shattuck-Hufnagel, Reference Turk and Shattuck-Hufnagel2014, p. 4). However, as Turk and Shattuck-Hufnagel (Reference Turk and Shattuck-Hufnagel2014, p. 4) also point out, “the problem with predictability as a factor affecting duration is that it is unclear whether prosodic structure and predictability are both motivated as separate and independent factors affecting duration. This is because prosodic structure and predictability are not independent.”
To dissociate the relative contribution of predictability and prosodic structure, it is important, therefore, to focus on cases in which these two factors make opposite predictions. In recent work (Grillo and Turco, Reference Grillo and Turco2016; Grillo et al., Reference Grillo, Aguilar, Roberts, Santi and Turco2018, Reference Grillo, Aguilar, Roberts, Santi and Turco2019), we have suggested that the contrast between sisterhood and nesting is one such case.
19.2 Prosody and Predictability Make Opposite Predictions for Nested Garden Paths
The structural ambiguity of nesting versus sisterhood is extremely common and has been one of the main testing grounds for different theories of sentence processing. The sentences in (6) are a few examples of well-described cases of locally ambiguous sentences involving this structural ambiguity. As shown by the bracketing, the first example of each pair of sentences illustrates nesting: The ambiguous phrase is embedded within the constituent it modifies and forms a single syntactic phrase with it. The second example of each sentence pair illustrates sisterhood: In these examples the ambiguous phrase is a sister constituent to the preceding constituent. For each set of examples, the nested reading has been shown to generate longer reading times at the disambiguation region and poorer comprehension (at least out of the blue) than the sisterhood reading (see, for example, Pickering and Van Gompel, Reference Pickering, Van Gompel, Traxler and Gernsbacher2006, for a review).Footnote 6
(6)
a. [[The horse [raced past the barn]] fell] Reduced relative b. [[The horse] [raced past the barn] and [fell]] Main verb c. John [told [the man [that he was running with]] [to wait]] RC d. John [told [the man] [that he was running with Max]] CC e. [Put [the horse in the barn] [on the truck]] Restrictive PP f. [Put [the horse] [in the barn]] Goal PP g. John [saw [the man [with the binoculars]]] Restrictive PP h. John [saw [the man] [with the binoculars]] Instrumental PP
While the source of this contrast in processing difficulty is still a contentious matter, an increasingly prominent perspective in psycholinguistics is that nested structures are harder to parse because they are less predictable. This approach is supported by corpus studies and computational modeling. Hale (Reference Hale2001), for example, shows that the reduced relative parse in (6a) is seven times less likely to occur in a corpus than the simpler analysis involving the unmodified noun phrase. Similarly, Jurafsky (Reference Jurafsky1996), based on data from Connine et al. (Reference Connine, Ferreira, Jones and Frazier1984), combined syntactic probability of the main verb and reduced-RC parse (and the related lexical probability of the intransitive versus transitive reading of the verb raced in the horse raced past the barn (and) fell) to estimate “the probability ratio of the two analyses of pre-disambiguation context … as roughly 82:1” (Levy, Reference Levy, Lin, Matsumoto and Mihalcea2011).
From an information theoretic perspective, therefore, longer reading times observed in processing studies for nested structures are seen as a function of lower predictability. As with production, studies on sentence processing show an inverse relationship between redundancy and duration. Structural analyses entertained at earlier regions of a sentence generate structural expectations about parts of the sentence that are yet to come (Konieczny, Reference Konieczny2000; Hale, Reference Hale2001; Lau et al., Reference Lau, Stroud, Plesch and Phillips2006; Staub and Clifton, Reference Staub and Clifton2006; Levy, Reference Levy2008; Levy et al., Reference Levy, Fedorenko, Breen and Gibson2012; Traxler, Reference Traxler2014; Kuperberg and Jaeger, Reference Kuperberg and Jaeger2015). The amount of time spent reading a particular region is a function of the strength of prior expectations.
If the durational properties of speech are primarily determined by predictability, we should expect the lower predictability of nested garden paths to lead to longer durations in production than string-identical sentences involving more predictable sisterhood structures. This perspective should generate similar predictions for comprehension as well: A longer produced duration of the ambiguous region should make it easier for listeners to parse structural nesting; a shorter produced duration, on the other hand, should aggravate the garden path effect as it would be taken to map onto the more predictable sisterhood structure.
Importantly, considerations of prosodic structure generate the opposite predictions in relation to both production and comprehension of nested garden paths, leading us to expect a shorter duration for nesting than sisterhood in production, which translate into facilitated processing, that is, a reduction in the garden path effect for nesting when comprehending an ambiguous phrase produced with a relatively shorter duration. These predictions are based on the relative degree of structural (and interpretive) integration of nesting and sisterhood and the constraints on prosodic structure introduced above.
Sister constituents are independent phrases and attach higher in the syntactic structure than nested constituents. This syntactic difference makes sister constituents more likely to be produced as separate prosodic phrases than nested constituents, which are contained within the XP they modify, and are thus more likely to be mapped onto a single prosodic phrase (following the principles of edge alignment and wrap). Indeed, it is well established that a higher attachment site correlates with separate phrasing, and given that prosodic phrasing modulates the durational properties of utterances in predictable ways (e.g., pre- and post-boundary lengthening), separate phrasing typically leads to longer durations for attachment to higher positions and shorter durations for more deeply embedded strings (Hirschberg and Avesani, Reference Hirschberg and Avesani1997; Clifton Jr et al., Reference Carlson and Frazier2002; Wagner and Watson, Reference Wagner and Watson2010; Poschmann and Wagner, Reference Poschmann and Wagner2015; Grillo and Turco, Reference Grillo and Turco2016; among many others).
Given the differences in attachment site of sister and nested constituents, we can predict that sister constituents are more likely to be produced as separate phrases than nested constituents. In turn, this leads us to predict a longer duration for sisterhood structures at the regions preceding and following the phrase boundary, despite their higher predictability.Footnote 7
The example from nested garden paths involving prepositional phrases (PPs) in (6e) versus (6f), repeated below, illustrates these effects very appropriately. These sentences have been consistently shown to be prosodically disambiguated (Snedeker and Trueswell, Reference Snedeker and Trueswell2003; Kraljic and Brennan, Reference Kraljic and Brennan2005; Schafer et al., Reference Schafer, Speer, Warren, White, Trueswell and Tanenhaus2005; Speer et al., Reference Speer, Warren and Schafer2011). Importantly, a stronger prosodic boundary between the DP (the horse) and the PP (in the barn) leads to a longer duration of both regions in the more predictable and easier to parse sisterhood condition (7a) than in the less predictable nested structure in (7b).
a. [Put [the horse] [in the barn]].
b. [Put [the horse [in the barn]] [on the truck]].
While this prosodic pattern is well attested, no work to date to our knowledge has attempted to integrate these results within a predictability account of duration. More generally, despite the central role played by nested garden paths in guiding psycholinguistic theories, until recently there has been surprisingly little experimental evidence on how the contrasts in (6) are prosodically disambiguated in production and comprehension.Footnote 8 A comprehensive study of the prosody of sisterhood versus nesting garden path sentences is still lacking. This is probably also due to the widespread assumption that some of these ambiguities (e.g., main verb versus reduced relatives) are not prosodically disambiguated (see, for example, Fodor, Reference Fodor2002; Wagner and Watson, Reference Wagner and Watson2010).
19.2.1 Reduced-RC Garden Paths
Recent results from our research group support the prediction that nested garden path sentences are associated with a shorter duration than the sisterhood structure, despite nesting being more difficult to parse out of the blue (and, at least in some cases, less predictable because of their lower frequency). In this section we review these results, and in the following section we will review our theoretical interpretation of these durational effects. In Grillo et al. (Reference Grillo, Aguilar, Roberts, Santi and Turco2018) we investigated the prosodic disambiguation of classic garden path sentences that are temporarily ambiguous between a main verb and a reduced-RC verb (6a, b)]. This type of garden path is generated by the ambiguity between a past tense and a passive past participle reading of the verb raced. As mentioned, we predicted that a difference in attachment height of the verb phrase (VP) (raced past the barn) with respect to the subject DP (the horse) should lead to shorter duration for the less predictable reduced-RCs. As explained above, the rationale for these predictions comes from the simple observation that the ambiguous string forms a single constituent in the reduced-RC analysis, but two separate syntactic phrases in the main verb analysis.
We also suggested that prosodic disambiguation of this type of contrast might have previously gone unnoticed because these sentences were typically presented in isolation and started with the critical pre-boundary ambiguous region. This is a problem because, as mentioned above, the absolute durational properties of the ambiguous region are not informative in isolation, but only when evaluated in relation to a baseline tempo provided by linguistic material preceding the ambiguous region itself (Schafer, Reference Schafer1997; Carlson et al., Reference Carlson and Frazier2001; Clifton Jr et al., Reference Carlson and Frazier2002; Frazier et al., Reference Frazier and Carlson2006).
In Grillo et al. (Reference Grillo, Aguilar, Roberts, Santi and Turco2018) we used the Planned Production paradigm to test these hypotheses. In Planned Production, participants are instructed to silently scan the entire sentence before producing them naturally and fluently at normal speed. All stimulus sentences were preceded by a short introductory main clause (Greg said that in (8)) that was neutral with respect to the relevant disambiguation, but provided a baseline tempo against which the durational properties of the ambiguous region could be evaluated:Footnote 9
a. Main verb condition:
Greg said that [TP [DP the businessmen] [VP loaned money at low interest] and [VP were told to record their expenses]].
b. Reduced-RC condition:
Greg said that [TP [DP the [NP businessmen [CP loaned money at low interest] [VP were told to record their expenses]]].
The results showed that the contrast in (8) is prosodically disambiguated, and that this disambiguation can be detected as early as the subject DP (the businessmen). We found the ambiguous region (the businessmen loaned money at low interest) to be shorter in the less predictable RC condition (8a) than the more predictable main clause condition (8b). This difference was visible both in terms of absolute and relative duration, calculated as the ratio of the ambiguous region duration to that of the intro clause (Greg said that). Speech rate increased significantly more for the less predictable nested condition than for the sisterhood condition. These durational differences cannot be accounted for from a predictability perspective, as they contradict the predictions it would make. Instead, they reflect the distinct structural relation between the DP and the VP in these sentences.
In a follow-up forced-choice comprehension study (Grillo et al., Reference Grillo, Santi, Aguilar, Roberts and Turco2022), one of the participants’ recordings from the Planned Production study of Grillo et al. (Reference Grillo, Aguilar, Roberts, Santi and Turco2018) were used to test the extent to which listeners use these prosodic cues to overcome the garden path effect. The disambiguating region was removed from the recordings of both the main verb and the reduced relative conditions. To test the hypothesis that the presence of a baseline tempo is essential to the interpretation of the temporal properties of the ambiguous region (i.e., to the interpretation and disambiguation of the boundary between the businessmen and loaned money at low interest in (8)), we also manipulated whether an introductory clause was present (9a–b) or not (9c–d):
a. Reduced-RC prosody – baseline:
Greg said that the businessmen loaned money at low interest …
b. Main verb prosody – baseline:
Greg said that the businessmen loaned money at low interest …
c. Reduced-RC prosody – no baseline:
The businessmen loaned money at low interest …
d. Main verb prosody – no baseline:
The businessmen loaned money at low interest …
The same durational (and more generally prosodic) differences (in absolute terms) are present in both pairs of sentences; that is, identical recordings are used for condition A and C, on the one side, and for conditions B and D, on the other. The only difference between the two is the presence of an introductory sentence, providing a slower baseline tempo, which was removed from the recording in conditions C and D.
Participants listened to the cropped recording and then selected between two alternative continuations (main verb versus reduced-RC) that were written on the screen. In line with previous results from the garden path literature, a strong preference for the main verb continuation was found across all conditions. Main verb-compatible continuations constituted approximately 80% of choices in both full (baseline present) and cut (baseline absent) main verb conditions. The reduced relative continuation, nevertheless, was chosen twice as often when the RC prosody was preceded by an introductory clause to provide a baseline tempo. This is predicted by our approach since without a baseline tempo it should be impossible to decide whether the durational properties of the ambiguous region should be taken to indicate lengthening/lack of lengthening. A much more likely outcome is that they will be taken to reflect grammatically uninformative absolute speech rate. We thus correctly predict a smaller garden path effect when reduced-RCs are embedded within a main clause providing a baseline rate.Footnote 10
19.2.2 CC/RC Ambiguity
More recently, Grillo et al. (Reference Grillo, Aguilar, Roberts, Santi and Turco2023) showed that speakers similarly use temporal cues to prosodically disambiguate the complement clause (CC) versus RC garden path (10). The complementizer phrase (CP) that he was singing with is initially temporarily ambiguous between being a modifier (i.e., an RC) (10a) or a co-argument (i.e., a CC) of the DP, the editor (10a).
a. RC
The kind lyricist [V′ told [DP the [CP [NP editor [C′ that he was singing with]]]] [CP to listen]].
paraphrasable as: what the kind lyricist told the editor that he was singing with was to listen
b. CC
The kind lyricist [VP [V′ told [DP the [NP editor]] [CP that he was singing with Lola]]].
paraphrasable as: what the kind lyricist told the editor was that he was singing with Lola
The results of the Planned Production study showed that the ambiguous region (the editor that he was singing (with)) is associated with faster tempo when nested (the RC in (10a)) than when in a sisterhood relation (the CC in (10b)). In line with what we observed with reduced-RCs, this pattern was not only visible in raw durations (in ms) but also, and more importantly, in the change in speech rate between the intro phrase (the kind lyricist told) and the ambiguous region (the editor that he was singing with). Predictably, this pattern reversed at the final word of the ambiguous region (the preposition with), which precedes a major prosodic boundary in the RC condition.
Notice that we do not mean to suggest that tempo modulation is solely responsible for disambiguation of these structures. As can be clearly seen in Figure 19.2 (C, D), clear differences in pitch and intensity also differentiate the two readings (which is also supported by a ToBI analysis presented in Grillo et al., Reference Grillo, Aguilar, Roberts, Santi and Turco2023). What is important for the present discussion is that the prosodic structure of RCs leads to (among other things) shorter duration, despite their lower predictability.
Spectrograms of nested garden path sentences.
Example of waveform, spectrogram, and fundamental frequency (F0) track of main verb (A), reduced-RC (B), CC (C), and RC (D) structures; recorded by British English native informants in Grillo et al. (Reference Grillo, Aguilar, Roberts, Santi and Turco2018, Reference Grillo, Aguilar, Roberts, Santi and Turco2023) and used for the forced-choice experiments in Grillo et al. (Reference Grillo, Santi, Aguilar, Roberts and Turco2022, Reference Grillo, Aguilar, Roberts, Santi and Turco2023) and segmented word by word. Absolute durations (in ms) of the nested material and the DP it modifies are indicated.




In a follow-up comprehension study (Grillo et al., Reference Grillo, Aguilar, Roberts, Santi and Turco2023), we showed that listeners are sensitive to these prosodic differences in guiding their interpretations. In a forced-choice task, participants heard sentence fragments that again excluded the disambiguating region (e.g., The kind lyricist told the editor that he was singing with) and then selected between printed CC (Lola) and RC (to listen) continuations. The target sentences contained a prosodic disambiguation that was consistent with the Planned Production study. The results showed a strong effect of prosody, with the selection of an RC continuation more than doubling following the RC prosody (57.5%) compared to following the CC prosody (25.5%). A post hoc analysis that removed eight outlier participants who only selected a CC continuation showed that the two prosodic forms were equally informative to the choice of continuation; that is, participants chose an RC completion after hearing the RC prosody (70% of target RC responses) in equal proportion to a CC continuation after hearing a CC prosody (70% of target CC responses).
Summarizing thus far, we have presented converging evidence for a durational contrast between sisterhood and nesting. In each instance, temporal modulation (faster tempo in comparison to a baseline earlier in the sentence) has been observed for the more complex and less predictable nested reading. While the realization of these structures also differs along other prosodic dimensions (i.e., tonal), this durational pattern is clear and is observed both in novel studies from our research group (on the main verb/reduced relative ambiguity, the CC/RC ambiguity, the pseudo-relative/RC ambiguity) and for previous results from the literature (restrictive/goal PPs). While more work is ongoing to determine the relative contribution of tempo as opposed to other prosodic dimensions, it is nevertheless striking that listeners are able to converge on the less predictable reading in the presence of increased tempo that entails a shorter time to process the ambiguity. This brings about a significant conclusion: Prosodic structure and predictability are not always aligned. Prosodic structure seems to determine the durational properties of speech above and beyond predictability. This does not mean that predictability does not itself independently modulate duration but that when the two are in conflict, prosodic structure takes precedence. In future work, we plan to investigate the modulatory role of predictability on duration by adding different measures of predictability to our model (e.g., relative frequency of past tense versus past participle of different predicates).
19.3 The Interaction of Mapping and Eurhythmic Constraints in Garden Path Prosody
Following syntax–prosody mapping principles, we have proposed that the relative degree of structural (and interpretive) integration of nesting and sisterhood leads to specific predictions about the durational properties of these structures. Sister constituents are by definition independent XPs and are thus more likely to be produced as separate prosodic phrases than nested constituents. Nested constituents, on the other hand, are contained within the XP they modify and are thus more likely to be mapped onto a single prosodic phrase (following the principles of edge alignment and wrap). This would correctly predict the pre- and post-boundary regions dictated by the relative strength of the boundary between sister and nested constituents.
As seen above, constraints favoring the alignment of syntactic and prosodic phrasing, however, are not absolute but operate in tandem with other eurhythmic or balancing constraints, which ensure that strings are ideally parsed into units of the same length and that do not exceed a certain size (Frazier and Fodor, Reference Frazier and Fodor1978; Gee and Grosjean, Reference Gee and Grosjean1983; Ghini, Reference Ghini1993; Selkirk, Reference Selkirk and Horne2000; Fodor, Reference Fodor2002). Balance constraints lead a rhythmic pattern at the level of prosodic phrasing and might be rooted in more basic neural mechanisms (see Chapter 18). Nesting, and in particular nested garden paths, constitute a domain of natural tension between mapping and balancing constraints, providing a valuable foundation to evaluate their interaction. On the one hand, mapping constraints increase the likelihood that nested material is spelled out as a single constituent with the phrase it modifies. On the other hand, nesting increases the size of the host constituent, increasing the chances that the two will be split into separate phrases by balancing constraints. As we have seen above in Examples (1) and (2), this tension can result in separate phrasing in simpler cases, that is, when no garden paths are involved. Whether this happens through extraposition (which allows to also satisfy mapping constraints) or otherwise is out of the scope of this chapter. The crucial question we wish to address is, why does this not seem to happen in the domain of nested garden paths? Our answer is that separate phrasing would only worsen the garden path effect. We illustrate this claim through the familiar contrast between main verb and CCs (sisterhood) and (reduced-)RCs.
Prosodification of main verb and CCs in (11a, b) is straightforward: Principles of syntax–prosody mapping will lead to a preference to generate independent, and fairly balanced, phrases for the DP [the horse] and the VP [raced past the barn] and for the DP [the woman] and the CP [that was running with Max].
a. The horse || raced past the barn || and fell.
b. John told the woman || that he was running with Max.
In the case of nesting, however, it is easy to see that a conflict will arise between these two types of principles. On the one hand, we can expect syntax–prosody mapping principles to push for the nested phrase to be produced as a single prosodic phrase with the head it modifies, as in (12a, b). This phrasing, however, will lead to a nonuniform pattern for the two sister constituents ([the horse raced past the barn] and [fell]; [the woman that he was running with] and [to leave]). Eurhythmic principles will resist this unbalanced (and heavy) prosodification, pushing for alternative phrasing, as, for example, the one in (12c, d).
a. The horse raced past the barn || fell.
b. John told || the woman that he was running with || to wait.
c. The horse || raced past the barn || fell.
d. John told the woman || that he was running || with to wait.
The problem with the prosodifications in (12c, d) is that the boundary between the horse and raced (and between the woman and that) actually encourages the incorrect parse, potentially generating an even stronger garden path effect. We follow the rational speaker hypothesis (Clifton Jr et al., Reference Carlson and Frazier2002) that claims that speakers use prosody in an “internally consistent, rational, fashion, and that the listener assumes such rationality in interpretation” (Frazier et al., Reference Frazier and Carlson2006, p. 246). Paraphrasing Frazier et al. in this perspective, if a speaker intends a structure where a constituent contains the reduced-RC, she will not insert a prosodic boundary that separates the reduced relative from the rest of its constituent without good reason. We thus expect that (reduced-)RCs in ambiguous environments will be produced as a single prosodic phrase with the DP they modify, that is, as in (12a, b). The same reasoning applies to the other instances of nested garden path ambiguities, and we will not repeat it for space reasons. We thus argue that in the case of nested garden paths, the conflict between syntactic and length constraints will not be resolved in favor of length balance.
Nevertheless, another option might be available for achieving some balance without strongly violating syntactic mapping constraints and going against the rational speaker hypothesis. This alternative, already envisaged in Ghini (Reference Ghini1993) as an integral part of his Uniformity Principle, involves reducing the size of the offending phrase by compressing it, that is, increasing its tempo.Footnote 11, Footnote 12
“A string is ideally parsed into units of the same length phrases. The average weight of the phrase depends on tempo: at an average rate of speech (moderato), a phrase contains two phonological words; the number of Ws within a phrase increases or decreases by one by speeding up or slowing down the rate of speech.” (Ghini, Reference Ghini1993, p. 56)
This global effect (global because it applies to the whole ambiguous region and is not just localized at the boundaries) is exactly what we observed in Grillo et al. (Reference Grillo, Aguilar, Roberts, Santi and Turco2018, Reference Grillo, Aguilar, Roberts, Santi and Turco2023). Once again, increased tempo can be grammatically relevant, but only when evaluated relative to a baseline tempo, which explains why reduced relative garden paths appear not to be prosodically disambiguated when presented in isolation. Without a baseline tempo, the increased tempo associated with nesting cannot be interpreted as acceleration, but only as fast speech rate, which (while important in a paralinguistic dimension) is not grammatically relevant.
That some form of compression is essential for successful prosodification (and parsing) of nested structures is also supported by recent results on the processing of multiple center embedding (multiple nesting to use the present terminology) discussed in Fodor et al. (Reference Fodor, Nickels, Schott, de Almeida and Gleitman2018). The groundbreaking work of Fodor et al. shows that the intelligibility in reading (i.e., in the absence of explicit prosody) of famously unparsable sentences with multiple center embeddings (the mouse that the cat that the dog chased bit died) increases considerably (in predictable ways) when the relative weight of each phrase is carefully manipulated to encourage phrasings that optimize the conflicting syntactic and length constraints:
(13) The French woman || the man I met loves || died.
Fodor et al. (Reference Fodor, Nickels, Schott, de Almeida and Gleitman2018) show that by lengthening the first DP and shortening the following material, they can encourage the multiple center embedding to be parsed in three optimal phrases, as in (13). The same manipulation of size leads not only to better prosodification in production but also to improved comprehension in silent reading. If our analysis is on the right track, we predict that similar results could also be achieved in less well-balanced sentences (in terms of length), by modulating the relative weight of each phrase through speech rate (i.e., slowing down tempo in the first phrase and accelerating in the second one).
In future work we also aim to investigate the feasibility of extending this account to the better-studied case of polysyllabic shortening, that is, the inverse relationship between the size of a constituent and the duration of its primary stressed syllable (see Lehiste, Reference Lehiste1972; Nooteboom, Reference Nooteboom1997; Davis et al., Reference Davis, Marslen-Wilson and Gaskell2002; Salverda et al., Reference Salverda, Dahan and McQueen2003; Shatzman and McQueen, Reference Shatzman and McQueen2006; White and Turk, Reference White and Turk2010; and others).
More work is needed to evaluate this account against potential alternative analyses. One possible alternative explanation for these effects, discussed in Santi et al. (Reference Santi, Grillo, Molimpakis and Wagner2019), is that nested material is produced faster to optimize processing of main clauses by reducing the temporal distance between, for example, the head of a subject (the horse) and the predicate of the main clause (fell). Grillo and Turco (Reference Grillo and Turco2016), however, show that the same durational contrast between sisterhood and embedding is also observed when the nested material is right-branched and thus does not interfere with the processing of the main clause. Another alternative is that nested material might be produced faster because it involves old, backgrounded, or not-at-issue information. Although this explanation may seem attractive, it is not without its challenges. The first one is that, contrary to this common assumption, modifiers can in fact contain new information (e.g., in my class there’s [a student [who met the president]], where the whole complex nominal, including the modifier [who met the president], arguably carries new information). Another, potentially stronger, argument against this kind of analysis comes from a comparison of restrictive and appositive RCs. Appositive relatives (John, who is a great guy, arrived yesterday) are the textbook case of backgrounded/not-at-issue phrases. Contrary to restrictive relatives, they attach higher in the structure and crucially are associated with stronger prosodic boundaries, commas, intonation, and are produced more slowly than nested material (Poschmann and Wagner, Reference Poschmann and Wagner2015). Nevertheless, much more work is needed to investigate how information structure and constituent structure interact in shaping prosody in nested garden paths, but see Guo et al. (Reference Guo, Grillo and Mattys2023, Reference Guo, Grillo and Mattys2024a, Reference Guo, Grillo and Mattys2024b, 2025) for some preliminary results in this domain.
19.4 Conclusions
A number of recent studies from comprehension and production show that nested versus sisterhood structures are prosodically disambiguated and that this disambiguation generates predictable durational differences. A relatively faster tempo/shorter duration is found for less predictable (and harder to parse) nested structure, such as (reduced-)RCs, than the more predictable sisterhood structure, such as a main clause, CC, and pseudo-relative (in Italian). These results strongly suggest that the effects of both prosodic structure and predictability on duration are not always aligned. When not aligned, structural factors seem to determine durational properties above and beyond predictability. We have provided a principled account of these effects and argued that, while surprising at first sight, these results are expected to arise from the application of independently motivated principles of prosody.
We do not mean to suggest that the disambiguation of nested structure is achieved solely on the basis of durational information. Prosody varies along multiple dimensions (duration/tonal changes/intensity), and a combination of any or all of these can be and (as our preliminary results show) is used for encoding prosodic structure. More work is thus needed to fully establish the relative contribution of these different factors to the disambiguation of nested garden path sentences and to firmly establish to what extent durational differences are decisive. It’s also still very much an open question whether the effects observed here are best explained from a localized perspective (in which the durational differences should be taken to reflect boundary phenomena) or whether global accounts (in which temporal differences are expected beyond boundary regions) should also be invoked. Which aspect of duration is more relevant – that across boundary lengthening or changes to speech rate – needs to be further explored.
We conclude by stressing once more that the global effects on speech rate are compatible with the independently motivated localized pattern of pre- and post-boundary lengthening described above, and in fact the two factors appear to show independent effects at different regions. While more work is needed to properly assess the hypothesis presented here and disentangle the relative contribution of these two factors, we have sketched a principled argument for garden paths with a temporary clausal attachment ambiguity to lead to a shorter duration for the less predictable nested structure.
Summary
Tracking temporal properties of speech is essential to parsing both segmental and suprasegmental information, and changes in tempo can inform syntactic processing. Results from production and comprehension show that nested garden path sentences (e.g., the horse raced past the barn (and) fell) are prosodically disambiguated. This is achieved (also) through tempo modulation. Tempo modulation is a hallmark of structural nesting, generated by the interaction of eurhythmic and syntax–prosody mapping principles.
Implications
Nested garden path sentences are exceptional in displaying relatively faster tempo in the face of lower predictability. Future work should focus on the interplay of tempo and other prosodic variables in nesting and on how these are independently modulated by information structure, predictability, and constituent structure.
Gains
These findings have implications for cognitive science and psycholinguistics in countering standard simple views that higher predictability maps onto a shorter duration. These effects can be overshadowed by prosodic factors, as an increased tempo encodes the less predictable nested structure. The work is relevant to a growing body of literature on the role of rhythm in neurobiology of language by showing that changes in rhythm are a key signal from the producer to the comprehender of differential structure to the same linear word order.




