31.1 Introduction
Models of speech timing must reflect the mechanisms by which speakers communicate linguistic structure to listeners through systematic durational variations (e.g., Klatt, Reference Klatt1976; van Santen and Shih, Reference van Santen and Shih2000; White, Reference White2002, Reference White2014). Such models refer to theories of prosodic structure and to some notion of hierarchically organised prosodic constituents, such as syllables, word, phrases, and so on (e.g., Nespor and Vogel, Reference Nespor and Vogel1986; Selkirk, Reference Selkirk1986). Furthermore, some accounts of observed durational patterns specifically propose direct temporal influences between higher and lower prosodic constituents, for example, between syllables and some form of stress-delimited feet (e.g., O’Dell and Nieminen, Reference O’Dell and Nieminen1999; Port, Reference Port2003), whereby, for example, the number of syllables within the higher-level constituent directly influences the duration of the lower-level constituent (e.g., Lehiste, Reference Lehiste1972). Critical debates remain, however, over the degree to which prosodic constituents are strictly hierarchical and over the nature of the timing constraints that such hierarchical relations impose on speech production (e.g., Shattuck-Hufnagel and Turk, Reference Shattuck-Hufnagel and Turk1996; Fletcher, Reference Fletcher, Hardcastle, Laver and Gibbon2010; White and Malisz, Reference White, Malisz, Gussenhoven and Chen2020).
31.2 Coupled Oscillator Models and Temporal Compression Effects
Arguing against isochronous timing principles in (then extant) notions of ‘rhythm class’, Dauer (Reference Dauer1983) reported positive relationships between inter-stress interval duration and the number of inter-stress syllables for a range of languages, whether, at the time, categorised as ‘stress-timed’ or ‘syllable-timed’.Footnote 1
Reanalysing Dauer’s (Reference Dauer1983) data using linear regression, Eriksson (Reference Eriksson1991) explicitly modelled inter-stress duration as a function of the number of syllables in an inter-stress interval: I = a + nb, where a is the intercept, b is the slope of the regression line, and n is the number of syllables in the inter-stress interval. Eriksson reported that the slope, representing the additional duration due to each new syllable in an inter-stress interval, was similar across languages (approximately 100 ms). He also commented on systematic linguistic differences in the intercept of the regression line: this value clustered around 200 ms in English and Thai (then so-called stress-timed languages), and at 100 ms in (‘syllable-timed’) Spanish, Greek, and Italian.
Eriksson (Reference Eriksson1991) asserted that the ‘natural interpretation’ of intercept value is that it refers to the extra duration of stressed syllables (relative to unstressed syllables) in the inter-stress interval (what we refer to here as the magnitude of durational stress contrast). However, he also observed that the intercept value does not in itself capture the locus of this additional duration, raising the possibility that linguistic variation in intercept values could (alternatively) indicate variable compression of syllables somewhere in the inter-stress interval; that is, the residual intercept durational value could be underpinned by inverse relationships between the number of syllables in the interval and their average duration. In such an account, inter-stress interval duration is a function both of duration added by each new syllable (‘syllable effect’) and syllabic compression due to the composition of the inter-stress interval (‘inter-stress effect’; but see, for example, van Santen, Reference van Santen, Sagisaka, Campbell and Higuchi1997, and White, Reference White2014, for arguments against the syllabic compression interpretation).
Following Eriksson’s second, syllabic compression interpretation of his regression models of cross-linguistic inter-stress-interval duration, O’Dell and Nieminen (Reference O’Dell and Nieminen1999) attempted to capture the hypothesised timing influences on these intervals by positing two interacting oscillators, representing two levels of the prosodic hierarchy: the syllabic oscillator and the inter-stress (or stress-foot) oscillator. These oscillators are proposed to have their own natural frequencies, with the syllabic oscillator higher in frequency than the inter-stress oscillator. Importantly for the generation of observed durational patterns, the oscillators are proposed to interact with each other via a coupling function. As such, the coupled oscillators settle into stable frequency patterns in which the frequency of the faster oscillator is an integer multiple of the frequency of the slower oscillator (see Windmann, Reference Windmann2016). Figure 31.1 shows a schematic representation of a 1:2 ratio of the syllable oscillator to the inter-stress oscillator representing a stable state coupling. It may also be noted that these oscillators are not associated with neural or physiological processes in O’Dell and Nieminen’s purely mathematical models, but there are obvious parallels with accounts of the synchronisation of the temporal structure of speech to endogenous neural oscillations (see, for example, Chapters 3 and 5).
Schematic representation of coupled oscillators.
Stable state between syllabic oscillator (dashed line) and inter-stress oscillator (solid line), where the frequency of the syllabic oscillator is an integer multiple of the frequency of the inter-stress oscillator (here, for illustrative purposes only, a 1:2 ratio).

According to O’Dell and Nieminen (Reference O’Dell and Nieminen1999), languages differ in which oscillator dominates as regards timing. In so-called stress-timed languages, the inter-stress oscillator would be the most dominant; thus, as the number of syllables increases in a stress group, the inter-stress oscillator tends to preserve its natural frequency and imposes frequency (and thus durational) changes on the syllabic oscillator. The opposite would be true in so-called syllable-timed languages.
In O’Dell and Nieminen’s (Reference O’Dell and Nieminen1999) model, the relative oscillator strength parameter, r, can be estimated as the ratio of the intercept a (which reflects stress-level timing influence) to the slope b (which reflects the duration due to additional syllables in the inter-stress interval); thus: r = a/b. If r > 1, the stress oscillator dominates, whereas if r ≤ 1, the syllabic oscillator dominates. O’Dell and Nieminen (Reference O’Dell and Nieminen1999) applied the oscillator strength parameter to Dauer’s (Reference Dauer1983) data, as reanalysed in Eriksson (Reference Eriksson1991), with the addition of data from Finnish. The r parameter value (r = a/b) classified languages in accordance with Eriksson (Reference Eriksson1991).
O’Dell and Nieminen (Reference O’Dell and Nieminen2009) discuss ‘polysyllabic shortening’, the postulated inverse relationship between the number of syllables in some constituent and the duration of syllables therein (e.g., Lehiste, Reference Lehiste1972; see, for example, Guba et al., Reference Guba, Mashaqba and Huneety2023, for a recent study on Modern Standard Arabic). O’Dell and Nieminen take polysyllabic shortening (across the inter-stress interval) as a reflection of the interaction between syllabic and inter-stress oscillators. Evidence for polysyllabic shortening is reported, for example, in Kim and Cole (Reference Kim and Cole2005), where stressed syllable durations were shorter as the size of the stress group increased in English (see also, for example, Lehiste, Reference Lehiste1972, regarding word-level polysyllabic shortening). Importantly, however, the coupled oscillators model does not hinge on the assumption of isochronous speech units; rather, compressibility effects only reflect hierarchical nesting, that is, the influence of higher prosodic units on the timing of lower prosodic units and vice versa (see Malisz et al., Reference Malisz, O’Dell, Nieminen and Wagner2016; White and Malisz, Reference White, Malisz, Gussenhoven and Chen2020).
Despite some success in coupled oscillator modelling of such timing effects, it has been argued that observed temporal compression, as is implied by polysyllabic shortening, may be reinterpreted in terms of localised lengthening effects (e.g., Beckman, Reference Beckman, Tohkura, Vatikiotis-Bateson and Sagisaka1992; White, Reference White2002; White and Turk, Reference White and Turk2010). For example, Port (Reference Port1981) reported polysyllabic shortening of stressed syllables such as dib in nonsense sequences like dib … dibber … dibberly. It is important to note, however, that all tokens in this study were realised as the only new material in a fixed carrier phrase: ‘I said [target word] again’. In this (English language) context, the targets will clearly carry phrasal stress (pitch accent), which causes lengthening of constituents within the stressed word (e.g., Cambier-Langeveld and Turk, Reference Cambier-Langeveld and Turk1999; Turk and White, Reference Turk and White1999). The degree of phrasal-stress lengthening of lexically stressed syllables has been shown to vary inversely with word length, with some of the additional length being evidenced on unstressed syllables in disyllabic and trisyllabic words (Turk and White, Reference Turk and White1999; White and Turk, Reference White and Turk2010). Thus, what may appear as polysyllabic shortening can be reinterpreted as due to the redistribution of phrasal-stress lengthening according to word length (White, Reference White2002, Reference White2014; see Beckman, Reference Beckman, Tohkura, Vatikiotis-Bateson and Sagisaka1992, for similar observations with regard to polysyllabic shortening and phrase-final lengthening).
Thus, whilst the coupled oscillators model captures hypothesised timing influences between prosodic units, the implied compressibility effects may not be supported by empirical observations. Rather, prosodic influence on speech timing primarily entails lengthening effects at domain heads (i.e., prominent constituents, such as stressed syllables and pitch-accented words) and edges (i.e., boundaries between prosodic constituents), with distribution and magnitude varying according to language-specific characteristics (for reviews, see Fletcher, Reference Fletcher, Hardcastle, Laver and Gibbon2010; White, Reference White2014; White and Malisz, Reference White, Malisz, Gussenhoven and Chen2020).
There remain, however, aspects of the coupled oscillators model that appear potentially useful in accounting for timing patterns in circumscribed speech contexts, such as in synchronised speech or speech cycling, as discussed below. Next, we consider how phase relations between hierarchical temporal units may offer an account of certain forms of observed timing variation.
31.3 Temporal Coordination between Different Rhythmic Timescales
Our aim in this section is to show that temporal phase relations may modulate the interaction between hierarchical units in synchronised speech or other constrained speech tasks, in particular, speech cycling (Cummins and Port, Reference Cummins and Port1998). Furthermore, cross-linguistic variation in the performance of such tasks may be informative about the localised timing effects that are evident in natural speech.
Regarding synchronised speech, Cummins (Reference Cummins2003) showed that when two speakers read a text together, they synchronise their speech very effectively, often with minimal time lag (between 40 and 60 ms). Cummins further showed that the effect of synchronisation is not the result of one speaker following the speech rate of the other, as there was no consistent leader. Further work suggested that synchronisation is based on a range of suprasegmental sources of information, including fundamental frequency and amplitude envelope modulation, and is not wholly dependent on speech intelligibility (Cummins, Reference Cummins2009); these findings are interpreted as evidence for acoustically based ‘entrainment’ between speakers talking synchronously, although definitions of entrainment vary and typically involve phase-resetting between coupled systems of oscillators (see Obleser and Kayser, Reference Obleser and Kayser2019, for a discussion), which are not precisely defined in Cummins’ account (see Chapter 14 for a discussion of the nature of entrainment with regard to speech).
Speech cycling represents another case of temporal coordination between a speaker and an external stimulus (e.g., Port et al., Reference Port, Cummins and Gasser1995). In speech cycling tasks, speakers repeat phrases to coordinate with metronome beeps, typically starting each new repetition of a phrase synchronously with a beep. The interval between repetition onsets is called the phrase repetition cycle (PRC). It is shown that acoustically salient points, namely stressed vowel onsets, tend to lie at certain privileged phases within the PRC (Figure 31.2). These phases typically divide the PRC into simple integer ratios, such as 1:3, 1:2, and 2:3, reflecting metrical structure within the PRC. These simple ‘phase angles’ (‘harmonic phases’) are said to be attractors for prominence in the PRC that emerge from task constraints, specifically, repeating sentences at a constant period; thus, the organisation of stress beats at privileged time intervals within the PRC reflects a hierarchical structure wherein the lower-level prosodic units, that is, stressed vowel onsets, are nested within a higher-level unit, that is, the PRC (Cummins and Port, Reference Cummins and Port1998).
Schematic representation of speech cycling task.
Interval a, defined as the interval from the first stressed syllable to the final stressed syllable, is divided by interval b – the PRC – to calculate the phase angle of the final stress. Here, the final stress is the second stress of the phrase; in some speech cycling tasks, there are three or more stressed syllables per phrase.

Languages appear to vary in the propensity for speakers to align prominent syllables at metrically important points within the PRC. For example, Cummins (Reference Cummins2002) asked English, Spanish, and Italian speakers to read sentences with two stressed syllables, each followed by an unstressed syllable, and to align the first stressed syllable to a high-tone beep and the second stressed syllable to a low-tone beep. English speakers found the task of metrical coordination easy to perform and showed close and consistent alignment with simple harmonic phases. On the other hand, Italian and Spanish speakers found it more difficult, even after more than 30 minutes of practice, and phase alignment was not close to simple harmonic phase angles. Cummins’ explanation of the easier performance of English speakers referred to the greater salience of stress feet in English than in Italian and Spanish.
Another (alternative or complementary) explanation for these reported cross-linguistic differences in the propensity for speech cycling (Cummins, Reference Cummins2002) may lie in variation in the magnitude of durational contrast between strong and weak syllables. English is known to have a high durational contrast between stressed and unstressed syllables, in part due to a substantial lengthening of (lexically and phrasally) stressed syllables and vowel reduction of unstressed syllables (e.g., Oller, Reference Oller1973; Klatt, Reference Klatt1976). On the other hand, the degree of durational contrast between strong and weak syllables in Italian and Spanish appears lower (than in, for example, English), with lower stress-related lengthening, especially for Spanish, and limited vowel reduction in unstressed syllables (Grabe and Low, Reference Grabe, Low, Gussenhoven and Warner2002; White and Mattys, Reference White and Mattys2007). The placement of metronome beeps at simple phases – in particular, at the desired phrase onset – leads to the emergence of prominence attractors (Cummins and Port, Reference Cummins and Port1998); therefore, the close alignment of stressed syllables to regular metronome beats is more natural in a language such as English, where stress contrast is high, than in languages with lower stress contrast (e.g., Spanish and Italian). The lower stress contrast in the latter languages tends to make prominences (stressed syllables) less acoustically salient and thus implies less compelling coordination of prominences with attractors; in other words, strong stress contrast affords temporal coordination with regular metronome beats. (Note: this interpretation by no means denies the obvious fact that stress per se is highly salient for speakers of languages such as Spanish, which actually has many more stress-based minimal word pairs than English, but simply suggests that there may be less natural affordance for acoustically based temporal coordination of prominences.)
Zawaydeh et al. (Reference Zawaydeh, Tajima, Kitahara, Parkinson and Benmamoun2002) compared the speech cycling performance of speakers of American English to those of Jordanian Arabic, finding that the English speakers tended to align stressed syllables closer to a simple phase of 1:2 than Arabic speakers (who tended to have later alignment). The vowel reduction of unstressed syllables in Jordanian Arabic is of lower magnitude than English; that is, stressed and unstressed syllables have more similar durations in Jordanian Arabic (Vogel et al., Reference Vogel, Athanasopoulou, Pincus and Ouali2017); thus, by the same argument as for Spanish and Italian (above), there is a lower affordance for alignment of stressed syllables to simple phases in Jordanian Arabic.
Similarly, Ghadanfari (Reference Ghadanfari2022) found dialectal differences in speech cycling between two varieties of Kuwaiti Arabic, Bedouin and Hadari. Specifically, there were smaller phase alignment differences between heavy and light stressed syllables for Hadari speakers than Bedouin speakers. Vowel duration analysis showed that Hadari had greater unstressed syllable reduction than Bedouin. Ghadafari’s interpretation was that unstressed syllable reduction in Hadari leads to stronger stress contrast, which affords more consistent alignment with respect to the PRC.
Ghadanfari (Reference Ghadanfari2022) further showed that speech rate mediates temporal coordination: at shorter metronome periods, where speech rate is likely to be faster, stressed syllables were closer to simple phase angles. This potentially reflects changes in relative durations at faster speaking rate – such as the compression of unstressed syllables – promoting the harmonic alignment of stressed syllables (and see below regarding between-language variation in speech rate influences on phase angles).
Another factor that may promote alignment to simple phases in speech cycling is phonetic compressibility: as discussed by Klatt (Reference Klatt1973), segments may vary in duration according to context but tend to have a minimum duration below which they may not be compressed. The minimum duration of specific segments will depend not only on the manner and place of articulation but also on perceptual factors, that is, what is required to phonetically distinguish sounds within a specific language’s phonemic inventory (see White, Reference White2014, for discussion). Compressibility may also relate to prominence, in particular, whether languages have a high degree of durational stress contrast, with reduction and shortening, or even deletion, of unstressed vowels or whole syllables (such as in Standard Southern British English: e.g., Beckman, Reference Beckman, Otake and Cutler1996).
Tajima (Reference Tajima1999) examined how phase alignment in English and Japanese was affected by manipulation of metronome rate, from slow to fast. English speakers demonstrated consistent alignment of stressed vowel onsets with simple phase angles across different metronome rates, whilst Japanese speakers showed alignment of prominent syllables to incrementally distinct phase angles as metronome rate increased. It is plausible that the more consistent phase alignment in English may be facilitated through relative tolerance of unstressed syllable compression with increasing metronome rate. Note that, regarding the impact of speech rate on temporal coordination, Kuwaiti Arabic dialects behaved more like Japanese than like English, given that phase alignment changed with increased rate (Ghadanfari, Reference Ghadanfari2022). In Kuwaiti Arabic, however, increased rate did not lead to qualitative changes in phase alignment, as for Japanese; rather, in Kuwaiti Arabic, alignment moved closer to a harmonic phase angle with increasing rate. The compressibility of Kuwaiti unstressed syllables may thus be intermediate between those of Japanese and English, although the range of rate variation was higher in the Japanese task (>10 metronome rates (Tajima, Reference Tajima1999) versus three metronome rates in the Kuwaiti task (Ghadanfari, Reference Ghadanfari2022)).
In summary, speech cycling studies show differences between languages, and between dialects of a particular language, in temporal coordination, as evidenced by their differential propensity to generate simple, consistent phase angles of stressed syllable alignment within the phrase repetition angle. As discussed, a plausible interpretation of these cross-linguistic and cross-dialectal differences relates to variation in durational stress contrast. From a top-down perspective, more durationally marked (and thus acoustically more salient) stressed syllables (in languages such as English) are more strongly attracted to metrically stable positions in repeated phrases. From a bottom-up perspective, languages or dialects allowing substantial compression of unstressed syllables (for example, English, in contrast with, for instance, Spanish or Japanese) provide more scope for consistent phase alignment of the stressed syllables, regardless of perturbations due to phonetic content of phrases or variation in metronome rates.
31.4 Temporal Coordination in Natural Dialogues
The nature of temporal coordination in speech cycling, that is, the division of the PRC into simple phases, is clearly specific to the constrained task demands. However, the interaction between interlocutors in natural dialogue has also been suggested to reflect patterns of temporal coordination, as the timing characteristics of a dialogue partner’s speech are proposed to influence the timing of turn-taking of the other interlocutors (e.g., Wilson and Wilson, Reference Wilson and Wilson2005). In this final section, we consider potential commonalities in the timing factors influencing temporal coordination, particularly speech rate and local durational cues, in speech cycling and natural dialogues.
Research on temporal coordination in natural dialogues has focused on the fluent timing of turn transitions between speakers (e.g., Wilson and Zimmerman, Reference Wilson and Zimmerman1986; Couper-Kuhlen, Reference Couper-Kuhlen1993; Benuš, Reference Beňuš2009). The reportedly minimal gaps (suggested to average 200 ms across languages – for example, Stivers et al., Reference Stivers, Enfield and Brown2009; Heldner and Edlund, Reference Heldner and Edlund2010) and relatively rare overlaps in interlocutors’ turns implies adaptation to the current speaker’s rate and anticipation of their utterance termination (Wilson and Wilson, Reference Wilson and Wilson2005). Moreover, Wilson and Wilson suggested a predictive mechanism in conversational turn-taking by which listeners entrain to the syllable oscillation rate of the speaker; thus, to avoid overlap, listeners coordinate the onset of their turns in anti-phase relation to the speaker’s syllable rate.
A related, but distinct, perspective on turn-transition timing was provided in studies of overlapped (i.e., interrupted) conversational transitions (Włodarczak et al., Reference Włodarczak, Simko and Wagner2012a, Reference Włodarczak, Simko and Wagner2012b) using corpora of spontaneous speech of American English, German, and French. They first observed that the initiation of an interrupting turn showed a bias to occur at the end of a vowel-to-vowel (VTV) interval in the preceding speech. Using the normalised pairwise variability index (nPVI; Low et al., Reference Low, Grabe and Nolan2000) of VTV duration, they found that more regular VTV timing was associated – for English dialogues – with this predominant pattern of a late interruption point in the VTV interval (Włodarczak et al., Reference Włodarczak, Simko and Wagner2012a). Reinforcing their conclusions with analyses of French and German dialogues, they interpret this pattern as evidence that coordination of turn-timing is underpinned by temporal entrainment between speakers (Włodarczak et al., Reference Włodarczak, Simko and Wagner2012b). They additionally argue that such entrainment is, in particular, governed by the salient recurrence of the perceptual centre of syllables (p-centre) (Morton et al., Reference Morton, Marcus and Frankish1976; Marcus, Reference Marcus1981).
With regard to durational cues to utterance boundaries, it has long been demonstrated that localised final lengthening may contribute to the salience of phrase or utterance endings (e.g., Price et al., Reference Price, Ostendorf, Shattuck-Hufnagel and Fong1991). It appears surprising, therefore, that Hoogland et al. (Reference Hoogland, White and Knight2023) found that inter-speaker intervals in question-answer sequences in Dutch and English were longer with longer final rhymes. However, they also reported an interaction with articulation rate of the preceding utterance: thus, at faster rates, inter-speaker intervals were shorter when the final rhyme was relatively long. The interpretation of Hoogland et al. of this interaction related to potential listener entrainment to foregoing speaking rate: as segments are shorter at faster speaking rate, the relative length of phrase-final segments is boosted (see Dilley and Pitt, Reference Dilley and Pitt2010; Reinisch et al., Reference Reinisch, Jesse and McQueen2011; Morrill et al., Reference Morrill, Baese-Berk, Heffner and Dilley2015), thus providing a more salient cue to question termination. Thus, local timing cues to turn-ending may potentially be mediated by listener entrainment to the foregoing utterance, at least insofar as it is required to develop expectations regarding segment duration.
31.5 Summary, Future Research, and Conclusions
The review presented here considers various influences on speakers’ temporal coordination patterns in artificial tasks, such as speech cycling, and in natural dialogue turn-taking. Speech cycling indicates cross-linguistic variability in speakers’ propensity to coordinate the occurrence of stressed syllables within an external cycle, with some languages’ greater length of stressed syllables and greater compressibility of unstressed syllables both potentially contributing to the more consistent alignment of stresses with simple phase angles of the PRC (e.g., Tajima, Reference Tajima1999; Cummins, Reference Cummins2002; Ghadanfari, Reference Ghadanfari2022).
Regarding natural dialogue turn-taking, existing accounts point to relative consistency of mean turn-transition time between languages (e.g., Stivers et al., Reference Stivers, Enfield and Brown2009). There are relatively few analyses of the factors that influence the variation of turn-transition time around the reported mean; however, those that have analysed these point to influences of local timing factors towards the ends of utterances, potentially interacting with foregoing articulation rate (e.g., Włodarczak et al., Reference Włodarczak, Simko and Wagner2012a, Reference Włodarczak, Simko and Wagner2012b; Hoogland et al., Reference Hoogland, White and Knight2023). How cross-linguistic variation in local timing effects serves to mediate fluent turn-timing has been little explored to date, not least due to the difficulties of comparing between spontaneous corpora of distinct languages, elicited using different methods and often for distinct research goals.
This review has suggested that findings from artificial tasks, such as speech cycling, may point to influences on coordination in natural conversation. One potential focus of research relates to cross-linguistic and cross-dialectal differences in structural factors influencing coordination. It has been established that speech rate affects the interpretation of local timing cues that signal structure (e.g., word and phrase boundaries). Reinisch et al. (Reference Reinisch, Jesse and McQueen2011) showed that the perception of stress is modulated by speaking rate, with faster foregoing utterance rate increasing the likelihood of listeners perceiving stress contrast. As languages and dialects differ in the magnitude of durational stress contrast, it is worth investigating if speech rate variation differentially influences stress perception between languages and between dialects.
Future cross-linguistic research could likewise benefit from experimentally controlled tasks that probe the influence of specific timing cues on temporal coordination. For example, paradigms requiring coordination of speech with movement (e.g., Allen, Reference Allen1972a, Reference Allen1972b; Rathcke et al., Reference Rathcke, Lin, Falk and Bella2021) offer a means of limiting cross-linguistic task variation in the interests of discerning how native language experience affects listeners’ temporal coordination behaviour. Likewise, artificial language tasks can manipulate timing cues whilst keeping segmental stimuli consistent between languages (e.g., White et al., Reference White, Benavides-Varela and Mády2020).
It is obvious that speakers of all languages are skilled at coordinating the flow of conversation. Artificial speech-based coordination tasks may be an effective means of unpicking the diverse cues that listeners use to achieve such interactional fluency.
Summary
Languages vary in their distribution and realisation of prominent versus less prominent syllables, and in the magnitude of local timing processes such as phrase-final lengthening. These timing differences influence speakers’ performance on temporally constrained artificial tasks such as speech cycling, and may have implications for the coordination of natural conversation.
Implications
It is obvious that speakers of all languages are skilled at coordinating the flow of conversation, but the mechanisms by which this temporal coordination is achieved remain unclear. Artificial speech-based coordination tasks may represent an effective means of unpicking the diverse cues that listeners use to achieve such interactional fluency.
Gains
Making cross-linguistic comparisons of speech timing is challenging given the diversity of structural and realisational differences. The degree to which speakers consistently coordinate prominent syllables within an externally imposed cycle in laboratory tasks is informative about the magnitude of stronger versus weaker syllable contrasts, and – potentially – about temporal coordination in conversation.

