Cross-Linguistic Consistency of Speech Rhythms and Pending Questions: Evidence from Bilingual and Second-Language Speakers

doi:10.1017/9781009295888.039

33 - Cross-Linguistic Consistency of Speech Rhythms and Pending Questions: Evidence from Bilingual and Second-Language Speakers

from Section 5 - Rhythm across Languages

Published online by Cambridge University Press: 23 April 2026

Sha Liu

Edited by

Lars Meyer and

Antje Strauss

Show author details

Lars Meyer: Affiliation:
Max Planck Institute for Human Cognitive and Brain Sciences
Antje Strauss: Affiliation:
University of Konstanz

Book contents

Summary

A long-standing debate among scholars continues concerning the validity of rhythmic classification of the world’s languages. In order to address the remaining questions, it is key to further explore the speech production by bilinguals with 2L1s and second-language speakers. According to the majority of previous studies, results from bilinguals are intermediate between those of the two kinds of monolinguals, and results from second-language speakers are influenced by the rhythms of their first languages, which appear to support the rhythmic classification. However, several questions remain. The first is how to classify languages that exhibit characteristics of multiple rhythmic types. The second is that previous studies generally demonstrate that languages are more or less stress-timed, syllable-timed, or mora-timed, rather than strictly belonging to a single rhythm category. The third is that the proposed rhythmic measures are not comprehensive, and new measures are needed to account for the morphological and syntactic components of languages.

Keywords

rhythmic classification VarcoΔV nPVI-V bilinguals with 2L1s second-language speakers

Information

Type: Chapter
Information: Rhythms of Speech and Language
Physiology, Cognition, Culture
, pp. 597 - 609

DOI: https://doi.org/10.1017/9781009295888.039 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2026
Creative Commons: This content is Open Access and distributed under the terms of the Creative Commons Attribution licence CC-BY-NC 4.0 https://creativecommons.org/cclicenses/

33 Cross-Linguistic Consistency of Speech Rhythms and Pending Questions: Evidence from Bilingual and Second-Language Speakers

33.1 Introduction

Abercrombie (Reference Abercrombie1965, Reference Abercrombie1967) proposes that languages can be categorized into three rhythmic groups: mora-timed, stress-timed, and syllable-timed languages (but see Roach, Reference Roach and Crystal1982; Cummins, Reference Cummins2012; Rathcke et al., Reference Rathcke, Lin, Falk and Dalla Bella2021, among others, for critical views). Ramus et al. (Reference Ramus, Nespor and Mehler1999) state that the rhythmic type of a language is associated with its speech segmentation unit. For instance, English, claimed as a representative stress-timed language, involves speech segmentation into feet, whereas Mandarin, a prototypical syllable-timed language, employs syllables for speech segmentation.

Two sets of rhythmic measures have received significant attention: one set proposed by Ramus et al. (Reference Ramus, Nespor and Mehler1999) and the other by Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002).Footnote ¹ Ramus et al. (Reference Ramus, Nespor and Mehler1999) segment speech into vowels and consonants, and calculate vocalic and intervocalic intervals. Ramus et al. (Reference Ramus, Nespor and Mehler1999) mainly focus on three measures: %V, ΔV, and ΔC. The measure %V is the proportion of vocalic intervals within a sentence; ΔV refers to the standard deviation of the duration of vocalic intervals within each sentence; and ΔC is the standard deviation of the duration of intervocalic intervals within each sentence. With reference to eight languages, Ramus et al. (Reference Ramus, Nespor and Mehler1999) report that %V and ΔC are in line with the notion of rhythmic classes. For example, according to the authors, English has lower %V than French, because English has reduced vowels and French does not. In addition, English has higher ΔC, because English has more complex onset and coda structures than French. The differences between English and French in terms of %V and ΔC are in line with the supposition that English is a typical stress-timed language and French is a representative syllable-timed language.

One controversial issue not referred to in Ramus et al. (Reference Ramus, Nespor and Mehler1999) is the speech rate factor. Barry et al. (Reference Barry, Andreeva, Russo, Dimitrova and Kostadinova2003) state that both ΔV and ΔC are inversely related to speech rate. Dellwo (Reference Dellwo, Karnowski and Szigeti2006) thus uses a normalized metric VarcoΔC, which is the standard deviation of intervocalic interval duration divided by the mean consonant duration. Dellwo (Reference Dellwo, Karnowski and Szigeti2006) claims that VarcoΔC discriminates better than ΔC between English and French. However, White and Mattys (Reference White and Mattys2007) argue that VarcoΔV appears to be more reliable and discriminative than raw measures, while VarcoΔC seems to remove variation that holds linguistic significance. The pairwise variability index (PVI), proposed by Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002), is the other set of rhythmic measures that has garnered widespread discussion. Different from the rhythmic measures in Ramus et al. (Reference Ramus, Nespor and Mehler1999), the PVI measures capture the sequential variations in intervals. Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002) state that the PVI measures vowel durations and the duration of intervals between vowels (excluding pauses) in speech, followed by the calculation of variability in consecutive measures. They also claim that speech rate should be taken into consideration for the PVI calculation of vocalic intervals, since speech rate may affect their duration. This adjusted metric for vocalic intervals is termed normalized PVI. In contrast, Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002) argue that normalization is not necessary for intervocalic intervals and use the raw PVI. The results reported in their study are as expected for Dutch, English, and German, typical stress-timed languages, and as expected for French and Spanish, typical syllable-timed languages. Their results for Japanese, a mora-timed language, are similar to those for syllable-timed languages.

The most straightforward way to examine the validity of rhythmic classification is to analyze the speech production by native speakers, especially that of monolingual speakers, by use of the rhythmic measures noted above. Another way to approach rhythmic classification is to examine speech by bilinguals with two first languages (hereafter 2L1s). It seems that results from bilinguals with 2L1s should be intermediate between the results from the respective monolingual speakers of their two languages. Examining second-language speakers, particularly the influence of their first language (L1) on their second language (L2), could also provide valuable insights into rhythmic classification. If rhythmic classification is tenable, the effect of the rhythm of L1 on the rhythm of L2 can be expected.

Bilinguals with 2L1s in this chapter are defined as those who have acquired two languages before three years old and can produce fluent and effective speech in both languages. To put it differently, both languages are considered their native languages (see, for example, Haugen, Reference Haugen1953; Weinreich, Reference Weinreich1953). The definition of a bilingual speaker here is not as strict as that of Bloomfield (Reference Bloomfield1933) as a perfect user of two languages in listening, reading, speaking, and writing; however, it is much stricter than that of MacNamara (Reference MacNamara1967) who includes anyone who has minimal competence in listening, reading, speaking, or writing a language other than his/her native language. L2 speakers in this chapter are those who did not acquire the language under discussion in early childhood and have not lived in a country where it is spoken for a long period, but learned it later through formal instruction or self-study (see, for example, Jenkins, Reference Jenkins2000; Mitchell and Myles, Reference Mitchell and Myles2004; Kormos, Reference Kormos2006).

This chapter is structured as follows. Sections 33.2 and 33.3 review rhythmic results from bilinguals with 2L1s and L2 speakers, respectively. Section 33.4 discusses remaining questions in rhythmic classification. Section 33.5 concludes the chapter.

33.2 Rhythms and Bilinguals with 2L1s

If bilinguals with 2L1s (henceforth bilinguals) show results in terms of rhythmic measures somewhere between the results from the monolinguals of their two languages, it demonstrates that the speech production by bilinguals contains the influence of the rhythm of one language on the other. In other words, the rhythmic differences between the two languages can be witnessed and thus support the validity of rhythmic classification.

33.2.1 Rhythmic Measures for Vowels

Bunta and Ingram (Reference Bunta and Ingram2007) compare the speech production by Spanish-English bilingual adults with that of monolingual peers in both languages, of monolingual children in both languages, and of Spanish-English bilingual children. Spanish is considered a syllable-timed language and English a stress-timed language. Bunta and Ingram (Reference Bunta and Ingram2007) mainly employ the normalized vocalic and intervocalic PVI measures (hereafter nPVI-V and nPVI-C, respectively) and find that the nPVI-V measure is effective in distinguishing speech rhythms while the nPVI-C measure does not seem to be an accurate indicator of speech rhythm. According to Bunta and Ingram (Reference Bunta and Ingram2007), the nPVI-V score for speech production in English from the bilingual adults (74.00) is slightly lower than that from the monolingual English adults (79.68), while the nPVI-V value of 43.00 from the bilingual adults in Spanish is marginally higher than the 39.43 from the Spanish monolingual adults. Since a lower nPVI-V score indicates more syllable-timed speech, the speech production in English by the Spanish-English bilingual adults is slightly more syllable-timed than that of the English monolinguals. Similarly, the speech production in Spanish by these bilingual adults is moderately more stress-timed than that of the Spanish monolinguals. This demonstrates that the Spanish-English bilingual adults show an interaction between the two different rhythms of their two languages (see also Mok, Reference Mok2011). In a similar vein, Liu and Takeda (Reference Liu and Takeda2021) focus on the speech production in English by English-Japanese and English-Mandarin bilingual adults and compare their speech production with that of English monolinguals. The proportion of CV (consonant-vowel) syllables in Japanese, a representative mora-timed language, is even higher than that of Mandarin, a typical syllable-timed language. Therefore, the English-Mandarin bilinguals are expected to be closer to the English monolinguals than the English-Japanese bilinguals. Liu and Takeda (Reference Liu and Takeda2021) take %V, ΔV, VarcoΔV, PVI-V, and nPVI-V all into consideration and find out that the English monolinguals, English-Mandarin bilinguals, and English-Japanese bilinguals have decreasing results as expected for two rhythmic measures: (i) 58.41, 48.20, and 46.06 in terms of VarcoΔV; and (ii) 64.80, 63.62, and 59.07 in terms of nPVI-V. Namely, the speech production in English by the two bilingual groups has shown influences of the rhythms of Mandarin and Japanese, respectively. This supports the claim that English, Mandarin, and Japanese each belong to a different rhythmic type. The results from bilingual adults in terms of rhythmic measures for vowels discussed in this subsection have provided support for rhythmic classification (for similar results from bilingual children, see Grabe et al., Reference Grabe, Post and Watson1999a, Reference Grabe, Gut, Post, Watson, Barrière, Morgan, Chiat and Woll1999b; Bunta and Ingram, Reference Bunta and Ingram2007; Kehoe et al., Reference Kehoe, Lleó and Rakow2011; Mok, Reference Mok2011). Section 33.2.2 will turn to rhythmic measures for consonants.

33.2.2 Rhythmic Measures for Consonants

As stated in Section 33.2.1, Bunta and Ingram (Reference Bunta and Ingram2007) have also employed the nPVI-C measure. However, they have not found it as effective as nPVI-V. The nPVI-C value for the speech production in English by English monolinguals is comparable to that by Spanish-English bilingual adults (74.35 versus 73.40). The same pattern can be seen between the nPVI-C values for the speech production in Spanish by Spanish monolinguals and Spanish-English bilingual adults (65.25 versus 67.80). In addition, no statistically significant differences have been found in terms of nPVI-C between Spanish and English spoken by Spanish-English bilingual adults. The only notable difference is between the English and Spanish monolinguals (74.35 versus 67.80). After a comparison with previous studies, Bunta and Ingram (Reference Bunta and Ingram2007) point out that both normalized PVI-C (nPVI-C) and raw PVI-C (rPVI-C) measures are subject to more individual variations than the measures themselves: Both nPVI-C and rPVI-C measures may erase significant differences between groups (see also Grabe and Low, Reference Grabe, Low, Gussenhoven and Warner2002; Whitworth, Reference Whitworth2002; Knight, Reference Knight2011; but see Arvaniti, Reference Arvaniti2012, for a different interpretation). The results in terms of ΔC, VarcoΔC, rPVI-C, and nPVI-C in Liu and Takeda (Reference Liu and Takeda2021) show an interesting pattern: The English-Japanese bilingual adults have the lowest values in terms of all these four measures among the English-Japanese bilingual adults, English monolinguals, and native Japanese speakers who speak English as an L2. Liu and Takeda (Reference Liu and Takeda2021) further examine the speech production in English by English-Mandarin bilingual adults and find that they have the lowest values in terms of ΔC and rPVI-C and the highest values in terms of VarcoΔC and nPVI-C among the English-Mandarin bilingual adults, English monolinguals, and native Mandarin speakers who speak English as an L2. Liu and Takeda (Reference Liu and Takeda2021) thus conclude that the rhythmic measures for consonants cannot clearly discriminate different rhythmic types. One possible explanation is that the measures of vocalic intervals, especially the nPVI-V measure, reflect rhythm in essence, whereas the measures of intervocalic intervals reflect differences in phonology (Kehoe et al., Reference Kehoe, Lleó and Rakow2011; Liu and Takeda, Reference Liu and Takeda2021). To exemplify, one main characteristic of stress-timed languages is the reduction of unstressed vowels. This seems to partly explain the lack of strong correlation between consonants and rhythmic classes. Another possible explanation is associated with the tendency of compressing the vowel duration to maintain a more regular duration of a syllable (Lehiste, Reference Lehiste1970; Lindblom et al., Reference Lindblom, Lyberg and Holmgren1981). Munhall et al. (Reference Munhall, Fowler, Hawkins and Saltzman1992) tested three native speakers of English and found that all subjects consistently shortened vowels when the durations of codas were increased, whereas the effects of vowel duration on the coda were less consistent. The lack of a consistent pattern of consonant durations partly explains the seemingly unreliable results from rhythmic measures of consonants.

33.3 Rhythms and L2 Speakers

Most studies on L2 speakers agree on L1-to-L2 transfer, resulting in intermediate rhythmic values in the L2, and rhythmic measures for vowels being more discriminative than rhythmic measures for consonants (Taylor, Reference Taylor1981; Bond and Fokes, Reference Bond and Fokes1985; Wenk, Reference Wenk1985; Mochizuki-Sudo and Kiritani, Reference Mochizuki-Sudo and Kiritani1989; Ueyama, Reference Ueyama1996, Reference Ueyama1999; Gut, Reference Gut2003; Jian, Reference Jian2004; Carter, Reference Carter, Gess and Rubin2005; Setter, Reference Setter2006; Coetzee and Wissing, Reference Coetzee and Wissing2007; Yune, Reference Yune2018). In addition, research into L2 speakers has covered more diversified languages and taken account of both the production and perception of rhythm.

33.3.1 Effective Rhythmic Measures

Carter (Reference Carter, Gess and Rubin2005) is perhaps the first to turn attention to the speech rhythm of L2 speakers, using the nPVI measure to examine natural speech production in English and Spanish spoken in North Carolina. According to Carter (Reference Carter, Gess and Rubin2005), Hispanic English is not as stress-timed as English: The mean nPVI-V value of 42.64 is much lower than that of the native English-speaking North Carolinians. Robles-Puente (Reference Robles-Puente2014) asks participants to read the passage The North Wind and the Sun in its English and Spanish versions and finds that his results are in line with those in Carter (Reference Carter, Gess and Rubin2005): The nPVI-V value of 39.7 for the L2 Hispanic English is not as stress-timed as that of the English control group (53.3). White and Mattys (Reference White and Mattys2007) focus on L2 Spanish and L2 English. They find that the VarcoΔV and nPVI-V of L2 Spanish whose native language is English are 52 and 51, respectively, while the VarcoΔV and nPVI-V of L1 Spanish are 41 and 36, respectively. As expected, L2 Spanish speakers have higher VarcoΔV and nPVI-V values than L1 Spanish speakers due to the influence from their L1 English. In a similar vein, the VarcoΔV and nPVI-V of L2 English in those whose native language is Spanish are 54 and 66, respectively, lower than 64 and 73 of L1 English. In terms of %V, L1 Spanish has a lower value than L2 Spanish in those whose native language is English (48 versus 52), which is unexpected. The L1 English has a lower %V value than L2 English in those whose native language is Spanish (38 versus 41) as expected. However, as White and Mattys (Reference White and Mattys2007) report that the %V measure was particularly effective in distinguishing other comparisons in their study, such as L1 Dutch versus L2 Dutch, they conclude that VarcoΔV and nPVI-V seem more reliable and discriminative and %V was particularly satisfactory.

Similarly, Liu and Takeda (Reference Liu and Takeda2021) ask participants to read an English text and note that VarcoΔV and nPVI-V are effective in discriminating the two L2 groups: The L2 English speakers of L1 Mandarin have higher values in terms of these two measures than the L2 English speakers of L1 Japanese as expected. On the other hand, Liu and Takeda (Reference Liu and Takeda2021) also find that the difference in terms of %V between the L2 English speakers whose native languages are Japanese and Mandarin, respectively, is marginal: 40.76 versus 41.41. As the proportion of CV syllables in Japanese is even higher than that of syllable-timed languages (Otake, Reference Otake1990), the L2 English speakers of L1 Japanese are expected to have a slightly higher %V than the L2 English speakers of L1 Mandarin. Therefore, Liu and Takeda’s (Reference Liu and Takeda2021) result does not support the claim that %V is the most discriminative measure (but see Lin and Wang, Reference Lin and Wang2005, for a different conclusion).

Mok and Dellwo (Reference Mok and Dellwo2008) investigate the speech rhythms of Cantonese-accented English and Mandarin-accented English by asking L2 English speakers of L1 Cantonese and Mandarin to read the English version of The North Wind and the Sun at a normal speed and comparing their results with the results from native British English speakers reading a short English passage. The result in terms of nPVI-V shows that Mandarin-accented English has an even higher nPVI-V result than the native British English speakers, while VarcoΔV and %V can distinguish different groups as expected. The result may be interpreted with caution: Since results from different groups are obtained from reading different texts, they may not be entirely comparable.

Jian (Reference Jian2004) compares the L2 English of native Taiwanese Mandarin speakers with the L1 English of native American English speakers by use of nPVI-V. The study concludes that the nPVI-V of L2 English is lower than that of L1 English. Setter (Reference Setter2006) takes a different approach by focusing on the syllable duration in L2 English of native Hong Kong Cantonese speakers. In comparison to British English, Hong Kong English has overall longer syllable durations. Additionally, L2 English speakers show less variation in the relative duration of tonic, stressed, unstressed, and weakened syllables than British English speakers. According to Setter (Reference Setter2006), the limited syllable weakening may prevent these L2 English speakers from exhibiting the native-like pattern of stress-timing and may be a factor in the L2’s syllable-timed rhythm. Coetzee and Wissing (Reference Coetzee and Wissing2007) compare Afrikaans English with Tswana English by asking native speakers of Afrikaans and Tswana to read the English version of The North Wind and the Sun. Afrikaans is a stress-timed language, while Tswana is a syllable-timed language. Therefore, Afrikaans English is estimated to be more stress-timed and Tswana English more syllable-timed. Coetzee and Wissing (Reference Coetzee and Wissing2007) find that Afrikaans English patterns with stress-timed languages and Tswana English with syllable-timed languages as expected. Coetzee and Wissing (Reference Coetzee and Wissing2007) attribute this to the transfer from the L1s of these speakers.

Studies on different languages have generally supported the validity of %V, VarcoΔV, and nPVI-V measures in differentiating L2 rhythms (but see Roach, Reference Roach and Crystal1982; Cummins, Reference Cummins2012; Rathcke et al., Reference Rathcke, Lin, Falk and Dalla Bella2021, for different opinions). This shows the efficacy of these measures in capturing the differences in L2 rhythms. As discussed in Section 33.2, VarcoΔV and nPVI-V are also effective in identifying rhythmic differences between bilinguals with 2L1s and monolinguals. Since both VarcoΔV and nPVI-V are rate-normalized, it suggests that rhythmic measures should take speech rate into consideration. One more note will be made before leaving this subsection. Most studies discussed here have also considered rhythmic measures for consonants. However, similar to the conclusion in Section 33.2, studies on L2 speakers also cast doubt on the validity of rhythmic measures for consonants. To exemplify, Liu and Takeda (Reference Liu and Takeda2021) report that the L2 English of L1 Japanese and the L2 English of L1 Mandarin groups have results much closer to the monolingual English group than the English-Japanese and English-Mandarin bilingual groups in terms of ΔC, VarcoΔC, rPVI-C, and nPVI-C (see also Lin and Wang, Reference Lin and Wang2005; White and Mattys, Reference White and Mattys2007; Mok and Dellwo, Reference Mok and Dellwo2008).

33.3.2 L2 and Perception

A few scholars take a different perspective and explore rhythmic classification from the perception side. Generally speaking, most of them agree that the results show the influence of L1 rhythm on the perception of L2 rhythm (Cutler et al., Reference Cutler, Mehler, Norris and Segui1986; Bertinetto and Fowler, Reference Bertinetto and Fowler1989; Masuko and Kiritani, Reference Masuko and Kiritani1990; Otake et al., Reference Otake, Hatano, Cutler and Mehler1993; Cutler and Otake, Reference Cutler and Otake1994; Erickson et al., Reference Erickson, Akahane-Yamada, Tajima and Matsumoto1999; but see a different conclusion in, for example, Bradley et al., Reference Bradley, Sánchez-Casas and García-Albea1993; Fear et al., Reference Fear, Cutler and Butterfield1995). For example, Bertinetto and Fowler (Reference Bertinetto and Fowler1989) use six pairs of Latinate words to examine the sensitivity of Italian and English speakers to artificially modified durations of unstressed vowels. The results from Bertinetto and Fowler’s (Reference Bertinetto and Fowler1989) study are in line with the classification of English as stress-timed and Italian as syllable-timed, as native English speakers were found to be relatively insensitive to the durational compression of unstressed vowels, while native Italian speakers were more sensitive. Erickson et al. (Reference Erickson, Akahane-Yamada, Tajima and Matsumoto1999) find that it is difficult for native Japanese speakers to count the syllables in each English word read by native American English speakers, which demonstrates that their L1 has made it difficult for them to understand the fundamental rhythmic components of English.

33.4 Remaining Questions

33.4.1 Non-prototypical Languages

One dilemma is the difficulty to classify non-prototypical languages. For example, Dauer (Reference Dauer1983, Reference Dauer1987), Arvaniti (Reference Arvaniti1991, Reference Arvaniti2007), Barry and Andreeva (Reference Barry and Andreeva2001), Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002), and Baltazani (Reference Baltazani2007) find it challenging to decide whether Greek is unclassifiable, syllable-timed, or of a mixed rhythm. Baltazani (Reference Baltazani2007) adopts the rPVI-V and rPVI-C measures and notes that Greek has an intermediate rPVI-V value of 45 between the 30 of Spanish and the 60 of German, while it has a higher rPVI-C value of 68 than both the 58 of Spanish and the 55 of German. Since Spanish and German are respectively representative syllable-timed and stress-timed languages, it is difficult to decide which rhythmic type Greek belongs to exactly (see also Balasubramanian, Reference Balasubramanian1980, for a similar conclusion concerning Tamil). Han (Reference Han1964), Ji (Reference Ji1993), Bond and Stockmal (Reference Bond and Stockmal2002), and Cho (Reference Cho2004) have all debated the classification of Korean’s rhythm as syllable-timed, stress-timed, mora-timed, or somewhere in between the first two categories. Most literature about bilinguals and L2 speakers is mainly limited to a few familiar languages, for example, English, French, German, Japanese, and Mandarin, as discussed in Sections 33.2 and 33.3, and thus cannot shed much light on the understanding of these understudied languages.

33.4.2 Rhythmic Classification: Categorical or Gradient

Some studies on monolinguals have suggested that the distinction between different rhythms appears to be gradient, instead of categorical: Languages appear to be more or less, instead of being absolute, mora-timed, stress-timed, or syllable-timed (Mitchell, Reference Mitchell1969; Port et al., Reference Port, Al-Ani and Maeda1980; Miller, Reference Miller1984; Dauer, Reference Dauer1987; Sato, Reference Sato1993; Minagawa-Kawai, Reference Minagawa-Kawai1999; Grabe and Low, Reference Grabe, Low, Gussenhoven and Warner2002). To exemplify, Mitchell (Reference Mitchell1969) argues that no language is entirely syllable-timed or stress-timed; rather, all languages exhibit both types of timing, with certain languages favoring one over the other. Dauer (Reference Dauer1987) offers a checklist with eight dimensions for the rhythmic classification of languages. A language is more likely to be called stress-timed if it receives more positive points than negative ones, and syllable-timed if it receives more negative points. English is closer to one end of Dauer’s scale and French is closer to the other. As a result, there is a continuum of rhythmic differences between languages, instead of an absolute difference. To give some language examples, Dimitrova (Reference Dimitrova1997) uses the methods in Dauer (Reference Dauer1987) and finds that the rhythm of Bulgarian is somewhere between stress-timed and syllable-timed. Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002) ask their participants to read The North Wind and the Sun in their native languages and demonstrate that English, Dutch, and German are stress-timed and French and Spanish are syllable-timed. However, Malay does not have a statistically significant difference from either representative stress-timed languages or syllable-timed languages in terms of the nPVI-V measure (Grabe and Low, Reference Grabe, Low, Gussenhoven and Warner2002). The rPVI-C measure provides even smaller differences: No statistically significant difference has emerged between Catalan and Spanish in terms of rPVI-C; Luxembourgish does not have a statistically significant difference from either representative stress-timed languages or syllable-timed languages in terms of rPVI-C (Grabe and Low, Reference Grabe, Low, Gussenhoven and Warner2002). Maddieson (Reference Maddieson, Hyman and Plank2018) argues from a different perspective: Individual variations within a language can be larger than differences between languages of various rhythms. Similarly, according to Arvaniti (Reference Arvaniti2012), the eight native speakers of German range from about 36% to 43% in terms of the %V measure, showing variations larger than the 3.8% difference between English and Spanish in terms of the same measure.

33.4.3 Rhythmic Measures: Beyond Duration

The rhythmic measures discussed in Sections 33.2 and 33.3 assume a direct and straightforward relationship between duration and abstract phonological categories including vowel reduction pattern, vowel weight, syllable structure, and so on. However, the duration of segments may be affected by multiple factors, such as the presence or absence of geminate consonants, the manner of articulation of consonants, vocalic length distinction, the pattern of vowel reduction, syllable structure, syllable position, stress, word type, boundary lengthening, speech rate, accent, intonation, tone, and the syntactic component (Delattre, Reference Delattre1966; Allen, Reference Allen1973; Klatt, Reference Klatt1975, Reference Klatt1976; Bornoze de Manrique and Signorini, Reference Borzone de Manrique and Signorini1983; Dauer, Reference Dauer1983, Reference Dauer1987; Jassem et al., Reference Jassem, Hill, Witten, Gibbon and Richter1984; Laeufer, Reference Laeufer1992; Nooteboom, Reference Nooteboom, Hardcastle and Laver1997; Rietveld et al., Reference Rietveld, Kerkhoff and Gussenhoven1999; Turk and Shattuck-Hufnagel, Reference Turk and Shattuck-Hufnagel2000). Roach (Reference Roach and Crystal1982), by comparing six languages, has found little evidence to support that the languages can be differentiated by the timing of inter-stress intervals or syllable durations (see also Wenk and Wioland, Reference Wenk and Wioland1982, using French as a language sample; Den Os, Reference Den Os1988, using Italian and Dutch; Nooteboom, Reference Nooteboom1991, using Swedish). This presents a challenge for rhythmic measures: How does one capture all these different factors in different languages and make a parallel and comprehensive comparison between different languages by use of rhythmic measures? More plainly, it is necessary for rhythmic measures to take not only phonological but also morphological and syntactic components into consideration, for example, morphological processes giving rise to consonant clusters, word order variations in languages, among others. Between the two effective rhythmic measures reviewed in Sections 33.2 and 33.3, VarcoΔV is designed to measure the dispersion of vocalic values and nPVI-V sequential variations. Neither of these two measures can take full account of the phonological components, not to mention the morphological and syntactic components. As far as rhythmic measures remain at the present stage, they cannot provide a full picture of different rhythmic types.

33.5 Conclusion

Although research on bilinguals with 2L1s and L2 speakers has provided support for rhythmic classification, results reviewed in this chapter cannot be claimed as completely satisfactory. Timing processing may operate at several levels – the segment level, the syllable level, and the phrase level – in addition to various factors, such as stress, surrounding context, position, speaking rate, and so on. How to capture all of these levels and factors into rhythmic measures is a challenge that rhythmic classification must solve.

Box 33.1Chapter Overview

Summary

Rhythmic classification has been subject to scrutiny from various angles, including rhythm production, rhythm perception, and factors influencing rhythm. Despite its long-standing history, questions remain about how to comprehensively, systematically, objectively, and precisely capture all aspects of rhythm in nature and structure.

Implications

Research on bilinguals with 2L1s and L2 speakers has provided some support for rhythmic classification. Future research should explore underrepresented languages and compare data from speakers with diverse language backgrounds, instead of focusing on a few representative languages. Additionally, individual variations, various factors affecting timing processes, and even perception should all be taken into account in rhythmic classification.

Gains

Disputes concerning rhythmic classification may continue. However, it cannot be denied that research on bilinguals with 2L1s and L2 speakers has provided moderate evidence for different rhythms in languages. It has been proposed that aspects of rhythm across cognitive domains overlap at both neural and cognitive levels. Research on language rhythm may provide insights into the neurological, molecular, and evolutionary underpinnings of human cognition.

Footnotes

¹ For a detailed description of rhythmic classification and rhythmic measures, please respectively refer to Chapters 30 and 34.

References

Abercrombie, D. (1965). Studies in Phonetics and Linguistics. London: Oxford University Press.Google Scholar

Abercrombie, D. (1967). Elements of General Phonetics. Edinburgh: Edinburgh University Press.Google Scholar

Allen, G. D. (1973). Segmental timing control in speech production. Journal of Phonetics, 1, 219–222.CrossRef Google Scholar

Arvaniti, A. (1991). The phonetics of Modern Greek rhythm and its phonological implications. PhD dissertation, University of Cambridge.Google Scholar

Arvaniti, A. (2007). Greek Phonetics: The state of the art. Journal of Greek Linguistics, 8(1), 97–208.10.1075/jgl.8.08arvCrossRef Google Scholar

Arvaniti, A. (2012). The usefulness of metrics in the quantification of speech rhythm. Journal of Phonetics, 40(3), 351–373. https://doi.org/10.1016/j.wocn.2012.02.003 CrossRef Google Scholar

Balasubramanian, T. (1980). Timing in Tamil. Journal of Phonetics, 8, 449–467.10.1016/S0095-4470(19)31500-1CrossRef Google Scholar

Baltazani, M. (2007). Prosodic rhythm and the status of vowel reduction in Greek. Selected Papers on Theoretical and Applied Linguistics, 17(1), 31–43.Google Scholar

Barry, W., and Andreeva, B. (2001). Cross-language similarities and differences in spontaneous speech patterns. Journal of the International Phonetic Association, 31, 51–66.10.1017/S0025100301001050CrossRef Google Scholar

Barry, W. J., Andreeva, B., Russo, M., Dimitrova, S. and Kostadinova, T. (2003). Do rhythm measures tell us anything about language type? Proceedings of the 15th International Congress of Phonetics Science, pp. 2693–2696.Google Scholar

Bertinetto, P. M., and Fowler, C. A. (1989). On sensitivity to durational modifications in Italian and English. Rivista di Linguistica, 1, 69–94.Google Scholar

Bloomfield, L. (1933). Language. New York: Holt, Rinehart & Winston.Google Scholar

Bond, Z. S., and Fokes, J. (1985). Non-native patterns of English syllable timing. Journal of Phonetics, 13, 407–420.10.1016/S0095-4470(19)30786-7CrossRef Google Scholar

Bond, Z. S., and Stockmal, V. (2002). Distinguishing samples of spoken Korean from rhythmic and regional competitors. Language Sciences, 24(2), 175–185.10.1016/S0388-0001(01)00013-4CrossRef Google Scholar

Borzone de Manrique, A. M., and Signorini, A. (1983). Segmental duration and rhythm in Spanish. Journal of Phonetics, 11, 117–128.10.1016/S0095-4470(19)30810-1CrossRef Google Scholar

Bradley, D. C., Sánchez-Casas, R. M., and García-Albea, J. E. (1993). The status of the syllable in the perception of Spanish and English. Language and Cognitive Processes, 8(2), 197–233.10.1080/01690969308406954CrossRef Google Scholar

Bunta, F., and Ingram, D. (2007). The acquisition of speech rhythm by bilingual Spanish- and English-speaking 4- and 5-year-old children. Journal of Speech, Language, and Hearing Research, 50(4), 999–1014.CrossRef Google Scholar PubMed

Carter, P. M. (2005). Quantifying rhythmic differences between Spanish, English, and Hispanic English. In Gess, R. and Rubin, E. J., eds., Theoretical and Experimental Approaches to Romance Linguistics: Selected Papers from the 34th Linguistic Symposium on Romance Languages (Current Issues in Linguistic Theory 272), pp. 63–75. Amsterdam, Philadelphia: John Benjamins.10.1075/cilt.272.05carCrossRef Google Scholar

Cho, M.-H. 2004. Rhythmic typology of Korean speech. Cognitive Processing, 5, 249–253.Google Scholar

Coetzee, A. W., and Wissing, D. (2007). Global and local durational properties in three varieties of South African English. Linguistic Review, 24, 263–289.10.1515/TLR.2007.010CrossRef Google Scholar

Cummins, F. (2012). Looking for rhythm in speech. Empirical Musicology Review, 7(1–2), 28–35.10.18061/1811/52976CrossRef Google Scholar

Cutler, A., and Otake, T. (1994). Mora or phoneme? Further evidence for language-specific listening. Journal of Memory and Language, 33(6), 824–844.10.1006/jmla.1994.1039CrossRef Google Scholar

Cutler, A., Mehler, J., Norris, D., and Segui, J. (1986). The syllable’s differing role in the segmentation of French and English. Journal of Memory and Language, 25(4), 385–400.10.1016/0749-596X(86)90033-1CrossRef Google Scholar

Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics, 11, 51–62.10.1016/S0095-4470(19)30776-4CrossRef Google Scholar

Dauer, R. M. (1987). Phonetic and phonological components of language rhythm. Proceedings of the 11th International Congress of Phonetic Sciences, 5, 447–450.Google Scholar

Delattre, P. (1966). A comparison of syllable-length conditioning among languages. International Review of Applied Linguistics in Language Teaching, 7, 295–325.Google Scholar

Dellwo, V. (2006). Rhythm and speech rate: A variation coefficient for delta C. In Karnowski, P. and Szigeti, I., eds., Language and Language Processing: Proceedings of the 38th Linguistic Colloquium, pp. 231–242. Frankfurt: Peter Lang.Google Scholar

Den Os, E. (1988). Rhythm and tempo in Dutch and Italian. PhD dissertation, University of Utrecht.Google Scholar

Dimitrova, S. (1997). Bulgarian speech rhythm: Stress-timed or syllable-timed? Journal of the International Phonetic Association, 27(2), 27–33.10.1017/S0025100300005399CrossRef Google Scholar

Erickson, D., Akahane-Yamada, R., Tajima, K., and Matsumoto, K. F. (1999). Syllable counting and mora units in speech perception. Proceedings of the 14th International Congress of Phonetic Sciences, pp. 1479–1482.Google Scholar

Fear, B. D., Cutler, A., and Butterfield, S. (1995). The strong/weak syllable distinction in English. Journal of the Acoustical Society of America, 97(3), 1893–1904.CrossRef Google Scholar PubMed

Grabe, E., and Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. In Gussenhoven, C. and Warner, N., eds., Laboratory Phonology 7, pp. 515–546. Berlin: Mouton de Gruyter.CrossRef Google Scholar

Grabe, E., Gut, U., Post, B., and Watson, I. (1999a). The acquisition of rhythm in English, French, and German. In Barrière, I., Morgan, G., Chiat, S., and Woll, B., eds., Current Research in Language and Communication: Proceedings of the Child Language Seminar, pp. 157–163. London: City University.Google Scholar

Grabe, E., Post, B., and Watson, I. (1999b). The acquisition of rhythm in English and French. Proceedings of the 14th International Congress of Phonetic Sciences, 2, 1201–1204.Google Scholar

Gut, U. (2003). Prosody in second language speech production: The role of the native language. Fremdsprachen Lehren und Lernen, 32, 133–152.Google Scholar

Han, M. S. (1964). Duration of Korean Vowels (Studies in the Phonology of Asian Languages II). Los Angeles: University of Southern California.Google Scholar

Haugen, E. (1953). The Norwegian Language in America: A Study of Bilingual Behavior. Philadelphia: University of Pennsylvania.10.9783/9781512820522CrossRef Google Scholar

Jassem, W., Hill, D. R., and Witten, I. H. (1984). Isochrony in English speech: Its statistical validity and linguistic relevance. In Gibbon, D. and Richter, H., eds., Intonation, Accent and Rhythm: Studies in Discourse Phonology, pp. 203–225. Berlin: de Gruyter.CrossRef Google Scholar

Jenkins, J. (2000). The Phonology of English as an International Language. New York: Oxford University Press.Google Scholar

Ji, M. J. (1993). The duration of sounds (in Korean). New Korean Language, 3, 39–57.Google Scholar

Jian, H. L. (2004). On the syllable timing in Taiwan English. Proceedings of Speech Prosody 2004, pp. 247–250.10.21437/SpeechProsody.2004-57CrossRef Google Scholar

Kehoe, M., Lleó, C., and Rakow, M. (2011). Speech rhythm in the pronunciation of German and Spanish monolingual and German-Spanish bilingual 3-year-olds. Linguistische Berichte, 227, 323–352.Google Scholar

Klatt, D. H. (1975). Vowel lengthening is syntactically determined in a connected discourse. Journal of Phonetics, 3, 129–140.10.1016/S0095-4470(19)31360-9CrossRef Google Scholar

Klatt, D. H. (1976). Linguistic uses of segmental duration in English: Acoustic and perceptual evidence. Journal of the Acoustical Society of America, 59(5), 1208–1221.10.1121/1.380986CrossRef Google Scholar PubMed

Knight, R. A. (2011). Assessing the temporal reliability of rhythm metrics. Journal of the International Phonetic Association, 41(3), 271–281.10.1017/S0025100311000326CrossRef Google Scholar

Kormos, J. (2006). Speech Production and Second Language Acquisition. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar

Laeufer, C. (1992). Patterns of voicing-conditioned vowel duration in French and English. Journal of Phonetics, 20, 411–440.10.1016/S0095-4470(19)30648-5CrossRef Google Scholar

Lehiste, I. (1970). Suprasegmentals. Cambridge: MIT Press.Google Scholar

Lin, H., and Wang, Q. (2005). Vowel quantity and consonant variance: A comparison between Chinese and English. Proceedings of Between Stress and Tone Conference, Leiden, the Netherlands, June 2005, pp. 1–24.Google Scholar

Lindblom, В., Lyberg, B., and Holmgren, K. (1981). Durational Patterns of Swedish Phonology: Do They Reflect Short-Term Memory Processes? Bloomington, IN: Indiana University Linguistics Club.Google Scholar

Liu, S., and Takeda, K. (2021). Mora-timed, stress-timed, and syllable-timed rhythm classes: Clues in English speech production by bilingual speakers. Acta Linguistica Academica, 68(3), 350–369.Google Scholar

MacNamara, J. (1967). The linguistic independence of bilinguals. Journal of Verbal Learning and Verbal Behavior, 6(5), 729–736.10.1016/S0022-5371(67)80078-1CrossRef Google Scholar

Maddieson, I. (2018). Is phonological typology possible without (universal) categories? In Hyman, L. M. and Plank, F., eds., Phonological Typology, pp. 107–125. Berlin: de Gruyter.10.1515/9783110451931-004CrossRef Google Scholar

Masuko, Y., and Kiritani, S. (1990). Perception of mora sounds in Japanese by non-native speakers of Japanese. Annual Bulletin of Research Institute of Logopedics and Phoniatrics, Faculty of Medicine, University of Tokyo, 24, 113–120.Google Scholar

Miller, M. (1984). On the perception of rhythm. Journal of Phonetics, 12(1), 75–83.CrossRef Google Scholar

Minagawa-Kawai, Y. (1999). Preciseness of temporal compensation in Japanese mora timing. Proceedings of the 14th International Congress of Phonetic Science, pp. 365–368.Google Scholar

Mitchell, R., and Myles, F. (2004). Second Language Learning Theories, second edition. London: Hodder Arnold.Google Scholar

Mitchell, T. F. (1969). Review of Abercrombie. Journal of Linguistics, 5, 153–164.10.1017/S0022226700002164CrossRef Google Scholar

Mochizuki-Sudo, M., and Kiritani, S. (1989). Production and perception of the rhythmic pattern of English by Japanese learners. Annual Bulletin of Research Institute of Logopedics and Phoniatrics, Faculty of Medicine, University of Tokyo, 23, 75–84.Google Scholar

Mok, P. P. K. (2011). The acquisition of speech rhythm by three-year-old bilingual and monolingual children: Cantonese and English. Bilingualism: Language and Cognition, 14(4), 458–472.CrossRef Google Scholar

Mok, P. P. K., and Dellwo, V. (2008). Comparing native and non-native speech rhythm using acoustic rhythmic measures: Cantonese, Beijing Mandarin and English. Proceedings of Speech Prosody 2008, pp. 423–426.10.21437/SpeechProsody.2008-93CrossRef Google Scholar

Munhall, K., Fowler, C., Hawkins, S., and Saltzman, E. (1992). “Compensatory shortening” in monosyllables of spoken English. Journal of Phonetics, 20(2), 225–239.10.1016/S0095-4470(19)30624-2CrossRef Google Scholar

Nooteboom, S.G. (1991). Some observations on the temporal organisation and rhythm of speech. Proceedings of the XIIth International Congress of Phonetic Sciences, pp. 228–237.Google Scholar

Nooteboom, S. G. (1997). The prosody of speech: Melody and rhythm. In Hardcastle, W. and Laver, J., eds., The Handbook of Phonetic Sciences, pp. 640–673. Oxford: Blackwell.Google Scholar

Otake, T. (1990). Rhythmic structure of Japanese and syllable structure. IEICE Technical Report, 89, 55–61.Google Scholar

Otake, T., Hatano, G., Cutler, A., and Mehler, J. (1993). Mora or syllable? Speech segmentation in Japanese. Journal of Memory and Language, 32(2), 258–278.10.1006/jmla.1993.1014CrossRef Google Scholar

Port, R. F., Al-Ani, S., and Maeda, S. (1980). Temporal compensation and universal phonetics. Phonetica, 37(4), 235–252.10.1159/000259994CrossRef Google Scholar

Ramus, F., Nespor, M., and Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 72, 1–28.Google Scholar

Rathcke, T., Lin, C. Y., Falk, S., and Dalla Bella, S. (2021). Tapping into linguistic rhythm. Laboratory Phonology, 12(1), 11. https://doi.org/10.5334/labphon.248 CrossRef Google Scholar

Rietveld, T., Kerkhoff, J., and Gussenhoven, C. (1999). Prosodic structure and vowel duration in Dutch. Proceedings of the 14th International Congress of Phonetic Sciences, pp. 463–466.Google Scholar

Roach, P. (1982). On the distinction between stress-timed and syllable-timed languages. In Crystal, D., ed., Linguistic Controversies, pp. 73–79. London: Edward Arnold.Google Scholar

Robles-Puente, S. (2014). Prosody in contact: Spanish in Los Angeles. PhD dissertation, University of Southern California.Google Scholar

Sato, Y. (1993). The durations of syllable-final nasals and the mora hypothesis in Japanese. Phonetica, 50, 44–67.CrossRef Google Scholar

Setter, J. (2006). Speech rhythm in world Englishes: The case of Hong Kong. TESOL Quarterly, 40(4), 763–782.10.2307/40264307CrossRef Google Scholar

Taylor, D. S. (1981). Non-native speakers and the rhythm of English. International Review of Applied Linguistics in Language Teaching, 19(3), 219–226.10.1515/iral.1981.19.1-4.219CrossRef Google Scholar

Turk, A. E., and Shattuck-Hufnagel, S. (2000). Word-boundary-related duration patterns in English. Journal of Phonetics, 28(4), 397–440.10.1006/jpho.2000.0123CrossRef Google Scholar

Ueyama, M. (1996). Phrase-final lengthening and stress-timed shortening in the speech of native speakers and Japanese learners of English. Proceedings of the Fourth International Conference on Spoken Language Processing, pp. 610–613. https://doi.org/10.21437/ICSLP.1996-154 CrossRef Google Scholar

Ueyama, M. (1999). Durational reduction in L2 English produced by Japanese speakers. Proceedings of the 14th International Congress of Phonetic Sciences, pp. 567–570.Google Scholar

Weinreich, U. (1953). Languages in Contact: Findings and Problems. The Hague: Mouton.Google Scholar

Wenk, B. J. (1985). Speech rhythms in second language acquisition. Language and Speech, 28(2), 157–175.10.1177/002383098502800205CrossRef Google Scholar

Wenk, B. J., and Wioland, F. (1982). Is French really syllable-timed? Journal of Phonetics, 10, 193–216.10.1016/S0095-4470(19)30957-XCrossRef Google Scholar

White, L., and Mattys, S. L. 2007. Calibrating rhythm: First language and second language studies. Journal of Phonetics, 35, 501–522.10.1016/j.wocn.2007.02.003CrossRef Google Scholar

Whitworth, N. (2002). Speech rhythm production in three German-English bilingual families. Leeds Working Papers in Linguistics and Phonetics, 9, 175–205.Google Scholar

Yune, Y. (2018). Native language interference in producing the Korean rhythmic structure: Focusing on Japanese. Phonetics and Speech Sciences, 10(4), 45–52.10.13064/KSSS.2018.10.4.045CrossRef Google Scholar

Accessibility standard: WCAG 2.0 A

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

The HTML of this chapter conforms to version 2.0 of the Web Content Accessibility Guidelines (WCAG), ensuring core accessibility principles are addressed and meets the basic (A) level of WCAG compliance, addressing essential accessibility barriers.

Content Navigation

Table of contents navigation
Allows you to navigate directly to chapters, sections, or non‐text items through a linked table of contents, reducing the need for extensive scrolling.

Index navigation
Provides an interactive index, letting you go straight to where a term or subject appears in the text without manual searching.

Reading Order & Textual Equivalents

Single logical reading order
You will encounter all content (including footnotes, captions, etc.) in a clear, sequential flow, making it easier to follow with assistive tools like screen readers.

Full alternative textual descriptions
You get more than just short alt text: you have comprehensive text equivalents, transcripts, captions, or audio descriptions for substantial non‐text content, which is especially helpful for complex visuals or multimedia.

Visualised data also available as non-graphical data
You can access graphs or charts in a text or tabular format, so you are not excluded if you cannot process visual displays.

Visual Accessibility

Use of colour is not sole means of conveying information
You will still understand key ideas or prompts without relying solely on colour, which is especially helpful if you have colour vision deficiencies.

Book contents

33 - Cross-Linguistic Consistency of Speech Rhythms and Pending Questions: Evidence from Bilingual and Second-Language Speakers

Summary

Keywords

Information

33.1 Introduction

33.2 Rhythms and Bilinguals with 2L1s

33.2.1 Rhythmic Measures for Vowels

33.2.2 Rhythmic Measures for Consonants

33.3 Rhythms and L2 Speakers

33.3.1 Effective Rhythmic Measures

33.3.2 L2 and Perception

33.4 Remaining Questions

33.4.1 Non-prototypical Languages

33.4.2 Rhythmic Classification: Categorical or Gradient

33.4.3 Rhythmic Measures: Beyond Duration

33.5 Conclusion

Summary

Implications

Gains

Footnotes

References

Accessibility standard: WCAG 2.0 A

Why this information is here

Accessibility Information

Content Navigation

Reading Order & Textual Equivalents

Visual Accessibility

Save book to Kindle

Save book to Dropbox

Save book to Google Drive