33.1 Introduction
Abercrombie (Reference Abercrombie1965, Reference Abercrombie1967) proposes that languages can be categorized into three rhythmic groups: mora-timed, stress-timed, and syllable-timed languages (but see Roach, Reference Roach and Crystal1982; Cummins, Reference Cummins2012; Rathcke et al., Reference Rathcke, Lin, Falk and Dalla Bella2021, among others, for critical views). Ramus et al. (Reference Ramus, Nespor and Mehler1999) state that the rhythmic type of a language is associated with its speech segmentation unit. For instance, English, claimed as a representative stress-timed language, involves speech segmentation into feet, whereas Mandarin, a prototypical syllable-timed language, employs syllables for speech segmentation.
Two sets of rhythmic measures have received significant attention: one set proposed by Ramus et al. (Reference Ramus, Nespor and Mehler1999) and the other by Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002).Footnote 1 Ramus et al. (Reference Ramus, Nespor and Mehler1999) segment speech into vowels and consonants, and calculate vocalic and intervocalic intervals. Ramus et al. (Reference Ramus, Nespor and Mehler1999) mainly focus on three measures: %V, ΔV, and ΔC. The measure %V is the proportion of vocalic intervals within a sentence; ΔV refers to the standard deviation of the duration of vocalic intervals within each sentence; and ΔC is the standard deviation of the duration of intervocalic intervals within each sentence. With reference to eight languages, Ramus et al. (Reference Ramus, Nespor and Mehler1999) report that %V and ΔC are in line with the notion of rhythmic classes. For example, according to the authors, English has lower %V than French, because English has reduced vowels and French does not. In addition, English has higher ΔC, because English has more complex onset and coda structures than French. The differences between English and French in terms of %V and ΔC are in line with the supposition that English is a typical stress-timed language and French is a representative syllable-timed language.
One controversial issue not referred to in Ramus et al. (Reference Ramus, Nespor and Mehler1999) is the speech rate factor. Barry et al. (Reference Barry, Andreeva, Russo, Dimitrova and Kostadinova2003) state that both ΔV and ΔC are inversely related to speech rate. Dellwo (Reference Dellwo, Karnowski and Szigeti2006) thus uses a normalized metric VarcoΔC, which is the standard deviation of intervocalic interval duration divided by the mean consonant duration. Dellwo (Reference Dellwo, Karnowski and Szigeti2006) claims that VarcoΔC discriminates better than ΔC between English and French. However, White and Mattys (Reference White and Mattys2007) argue that VarcoΔV appears to be more reliable and discriminative than raw measures, while VarcoΔC seems to remove variation that holds linguistic significance. The pairwise variability index (PVI), proposed by Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002), is the other set of rhythmic measures that has garnered widespread discussion. Different from the rhythmic measures in Ramus et al. (Reference Ramus, Nespor and Mehler1999), the PVI measures capture the sequential variations in intervals. Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002) state that the PVI measures vowel durations and the duration of intervals between vowels (excluding pauses) in speech, followed by the calculation of variability in consecutive measures. They also claim that speech rate should be taken into consideration for the PVI calculation of vocalic intervals, since speech rate may affect their duration. This adjusted metric for vocalic intervals is termed normalized PVI. In contrast, Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002) argue that normalization is not necessary for intervocalic intervals and use the raw PVI. The results reported in their study are as expected for Dutch, English, and German, typical stress-timed languages, and as expected for French and Spanish, typical syllable-timed languages. Their results for Japanese, a mora-timed language, are similar to those for syllable-timed languages.
The most straightforward way to examine the validity of rhythmic classification is to analyze the speech production by native speakers, especially that of monolingual speakers, by use of the rhythmic measures noted above. Another way to approach rhythmic classification is to examine speech by bilinguals with two first languages (hereafter 2L1s). It seems that results from bilinguals with 2L1s should be intermediate between the results from the respective monolingual speakers of their two languages. Examining second-language speakers, particularly the influence of their first language (L1) on their second language (L2), could also provide valuable insights into rhythmic classification. If rhythmic classification is tenable, the effect of the rhythm of L1 on the rhythm of L2 can be expected.
Bilinguals with 2L1s in this chapter are defined as those who have acquired two languages before three years old and can produce fluent and effective speech in both languages. To put it differently, both languages are considered their native languages (see, for example, Haugen, Reference Haugen1953; Weinreich, Reference Weinreich1953). The definition of a bilingual speaker here is not as strict as that of Bloomfield (Reference Bloomfield1933) as a perfect user of two languages in listening, reading, speaking, and writing; however, it is much stricter than that of MacNamara (Reference MacNamara1967) who includes anyone who has minimal competence in listening, reading, speaking, or writing a language other than his/her native language. L2 speakers in this chapter are those who did not acquire the language under discussion in early childhood and have not lived in a country where it is spoken for a long period, but learned it later through formal instruction or self-study (see, for example, Jenkins, Reference Jenkins2000; Mitchell and Myles, Reference Mitchell and Myles2004; Kormos, Reference Kormos2006).
This chapter is structured as follows. Sections 33.2 and 33.3 review rhythmic results from bilinguals with 2L1s and L2 speakers, respectively. Section 33.4 discusses remaining questions in rhythmic classification. Section 33.5 concludes the chapter.
33.2 Rhythms and Bilinguals with 2L1s
If bilinguals with 2L1s (henceforth bilinguals) show results in terms of rhythmic measures somewhere between the results from the monolinguals of their two languages, it demonstrates that the speech production by bilinguals contains the influence of the rhythm of one language on the other. In other words, the rhythmic differences between the two languages can be witnessed and thus support the validity of rhythmic classification.
33.2.1 Rhythmic Measures for Vowels
Bunta and Ingram (Reference Bunta and Ingram2007) compare the speech production by Spanish-English bilingual adults with that of monolingual peers in both languages, of monolingual children in both languages, and of Spanish-English bilingual children. Spanish is considered a syllable-timed language and English a stress-timed language. Bunta and Ingram (Reference Bunta and Ingram2007) mainly employ the normalized vocalic and intervocalic PVI measures (hereafter nPVI-V and nPVI-C, respectively) and find that the nPVI-V measure is effective in distinguishing speech rhythms while the nPVI-C measure does not seem to be an accurate indicator of speech rhythm. According to Bunta and Ingram (Reference Bunta and Ingram2007), the nPVI-V score for speech production in English from the bilingual adults (74.00) is slightly lower than that from the monolingual English adults (79.68), while the nPVI-V value of 43.00 from the bilingual adults in Spanish is marginally higher than the 39.43 from the Spanish monolingual adults. Since a lower nPVI-V score indicates more syllable-timed speech, the speech production in English by the Spanish-English bilingual adults is slightly more syllable-timed than that of the English monolinguals. Similarly, the speech production in Spanish by these bilingual adults is moderately more stress-timed than that of the Spanish monolinguals. This demonstrates that the Spanish-English bilingual adults show an interaction between the two different rhythms of their two languages (see also Mok, Reference Mok2011). In a similar vein, Liu and Takeda (Reference Liu and Takeda2021) focus on the speech production in English by English-Japanese and English-Mandarin bilingual adults and compare their speech production with that of English monolinguals. The proportion of CV (consonant-vowel) syllables in Japanese, a representative mora-timed language, is even higher than that of Mandarin, a typical syllable-timed language. Therefore, the English-Mandarin bilinguals are expected to be closer to the English monolinguals than the English-Japanese bilinguals. Liu and Takeda (Reference Liu and Takeda2021) take %V, ΔV, VarcoΔV, PVI-V, and nPVI-V all into consideration and find out that the English monolinguals, English-Mandarin bilinguals, and English-Japanese bilinguals have decreasing results as expected for two rhythmic measures: (i) 58.41, 48.20, and 46.06 in terms of VarcoΔV; and (ii) 64.80, 63.62, and 59.07 in terms of nPVI-V. Namely, the speech production in English by the two bilingual groups has shown influences of the rhythms of Mandarin and Japanese, respectively. This supports the claim that English, Mandarin, and Japanese each belong to a different rhythmic type. The results from bilingual adults in terms of rhythmic measures for vowels discussed in this subsection have provided support for rhythmic classification (for similar results from bilingual children, see Grabe et al., Reference Grabe, Post and Watson1999a, Reference Grabe, Gut, Post, Watson, Barrière, Morgan, Chiat and Woll1999b; Bunta and Ingram, Reference Bunta and Ingram2007; Kehoe et al., Reference Kehoe, Lleó and Rakow2011; Mok, Reference Mok2011). Section 33.2.2 will turn to rhythmic measures for consonants.
33.2.2 Rhythmic Measures for Consonants
As stated in Section 33.2.1, Bunta and Ingram (Reference Bunta and Ingram2007) have also employed the nPVI-C measure. However, they have not found it as effective as nPVI-V. The nPVI-C value for the speech production in English by English monolinguals is comparable to that by Spanish-English bilingual adults (74.35 versus 73.40). The same pattern can be seen between the nPVI-C values for the speech production in Spanish by Spanish monolinguals and Spanish-English bilingual adults (65.25 versus 67.80). In addition, no statistically significant differences have been found in terms of nPVI-C between Spanish and English spoken by Spanish-English bilingual adults. The only notable difference is between the English and Spanish monolinguals (74.35 versus 67.80). After a comparison with previous studies, Bunta and Ingram (Reference Bunta and Ingram2007) point out that both normalized PVI-C (nPVI-C) and raw PVI-C (rPVI-C) measures are subject to more individual variations than the measures themselves: Both nPVI-C and rPVI-C measures may erase significant differences between groups (see also Grabe and Low, Reference Grabe, Low, Gussenhoven and Warner2002; Whitworth, Reference Whitworth2002; Knight, Reference Knight2011; but see Arvaniti, Reference Arvaniti2012, for a different interpretation). The results in terms of ΔC, VarcoΔC, rPVI-C, and nPVI-C in Liu and Takeda (Reference Liu and Takeda2021) show an interesting pattern: The English-Japanese bilingual adults have the lowest values in terms of all these four measures among the English-Japanese bilingual adults, English monolinguals, and native Japanese speakers who speak English as an L2. Liu and Takeda (Reference Liu and Takeda2021) further examine the speech production in English by English-Mandarin bilingual adults and find that they have the lowest values in terms of ΔC and rPVI-C and the highest values in terms of VarcoΔC and nPVI-C among the English-Mandarin bilingual adults, English monolinguals, and native Mandarin speakers who speak English as an L2. Liu and Takeda (Reference Liu and Takeda2021) thus conclude that the rhythmic measures for consonants cannot clearly discriminate different rhythmic types. One possible explanation is that the measures of vocalic intervals, especially the nPVI-V measure, reflect rhythm in essence, whereas the measures of intervocalic intervals reflect differences in phonology (Kehoe et al., Reference Kehoe, Lleó and Rakow2011; Liu and Takeda, Reference Liu and Takeda2021). To exemplify, one main characteristic of stress-timed languages is the reduction of unstressed vowels. This seems to partly explain the lack of strong correlation between consonants and rhythmic classes. Another possible explanation is associated with the tendency of compressing the vowel duration to maintain a more regular duration of a syllable (Lehiste, Reference Lehiste1970; Lindblom et al., Reference Lindblom, Lyberg and Holmgren1981). Munhall et al. (Reference Munhall, Fowler, Hawkins and Saltzman1992) tested three native speakers of English and found that all subjects consistently shortened vowels when the durations of codas were increased, whereas the effects of vowel duration on the coda were less consistent. The lack of a consistent pattern of consonant durations partly explains the seemingly unreliable results from rhythmic measures of consonants.
33.3 Rhythms and L2 Speakers
Most studies on L2 speakers agree on L1-to-L2 transfer, resulting in intermediate rhythmic values in the L2, and rhythmic measures for vowels being more discriminative than rhythmic measures for consonants (Taylor, Reference Taylor1981; Bond and Fokes, Reference Bond and Fokes1985; Wenk, Reference Wenk1985; Mochizuki-Sudo and Kiritani, Reference Mochizuki-Sudo and Kiritani1989; Ueyama, Reference Ueyama1996, Reference Ueyama1999; Gut, Reference Gut2003; Jian, Reference Jian2004; Carter, Reference Carter, Gess and Rubin2005; Setter, Reference Setter2006; Coetzee and Wissing, Reference Coetzee and Wissing2007; Yune, Reference Yune2018). In addition, research into L2 speakers has covered more diversified languages and taken account of both the production and perception of rhythm.
33.3.1 Effective Rhythmic Measures
Carter (Reference Carter, Gess and Rubin2005) is perhaps the first to turn attention to the speech rhythm of L2 speakers, using the nPVI measure to examine natural speech production in English and Spanish spoken in North Carolina. According to Carter (Reference Carter, Gess and Rubin2005), Hispanic English is not as stress-timed as English: The mean nPVI-V value of 42.64 is much lower than that of the native English-speaking North Carolinians. Robles-Puente (Reference Robles-Puente2014) asks participants to read the passage The North Wind and the Sun in its English and Spanish versions and finds that his results are in line with those in Carter (Reference Carter, Gess and Rubin2005): The nPVI-V value of 39.7 for the L2 Hispanic English is not as stress-timed as that of the English control group (53.3). White and Mattys (Reference White and Mattys2007) focus on L2 Spanish and L2 English. They find that the VarcoΔV and nPVI-V of L2 Spanish whose native language is English are 52 and 51, respectively, while the VarcoΔV and nPVI-V of L1 Spanish are 41 and 36, respectively. As expected, L2 Spanish speakers have higher VarcoΔV and nPVI-V values than L1 Spanish speakers due to the influence from their L1 English. In a similar vein, the VarcoΔV and nPVI-V of L2 English in those whose native language is Spanish are 54 and 66, respectively, lower than 64 and 73 of L1 English. In terms of %V, L1 Spanish has a lower value than L2 Spanish in those whose native language is English (48 versus 52), which is unexpected. The L1 English has a lower %V value than L2 English in those whose native language is Spanish (38 versus 41) as expected. However, as White and Mattys (Reference White and Mattys2007) report that the %V measure was particularly effective in distinguishing other comparisons in their study, such as L1 Dutch versus L2 Dutch, they conclude that VarcoΔV and nPVI-V seem more reliable and discriminative and %V was particularly satisfactory.
Similarly, Liu and Takeda (Reference Liu and Takeda2021) ask participants to read an English text and note that VarcoΔV and nPVI-V are effective in discriminating the two L2 groups: The L2 English speakers of L1 Mandarin have higher values in terms of these two measures than the L2 English speakers of L1 Japanese as expected. On the other hand, Liu and Takeda (Reference Liu and Takeda2021) also find that the difference in terms of %V between the L2 English speakers whose native languages are Japanese and Mandarin, respectively, is marginal: 40.76 versus 41.41. As the proportion of CV syllables in Japanese is even higher than that of syllable-timed languages (Otake, Reference Otake1990), the L2 English speakers of L1 Japanese are expected to have a slightly higher %V than the L2 English speakers of L1 Mandarin. Therefore, Liu and Takeda’s (Reference Liu and Takeda2021) result does not support the claim that %V is the most discriminative measure (but see Lin and Wang, Reference Lin and Wang2005, for a different conclusion).
Mok and Dellwo (Reference Mok and Dellwo2008) investigate the speech rhythms of Cantonese-accented English and Mandarin-accented English by asking L2 English speakers of L1 Cantonese and Mandarin to read the English version of The North Wind and the Sun at a normal speed and comparing their results with the results from native British English speakers reading a short English passage. The result in terms of nPVI-V shows that Mandarin-accented English has an even higher nPVI-V result than the native British English speakers, while VarcoΔV and %V can distinguish different groups as expected. The result may be interpreted with caution: Since results from different groups are obtained from reading different texts, they may not be entirely comparable.
Jian (Reference Jian2004) compares the L2 English of native Taiwanese Mandarin speakers with the L1 English of native American English speakers by use of nPVI-V. The study concludes that the nPVI-V of L2 English is lower than that of L1 English. Setter (Reference Setter2006) takes a different approach by focusing on the syllable duration in L2 English of native Hong Kong Cantonese speakers. In comparison to British English, Hong Kong English has overall longer syllable durations. Additionally, L2 English speakers show less variation in the relative duration of tonic, stressed, unstressed, and weakened syllables than British English speakers. According to Setter (Reference Setter2006), the limited syllable weakening may prevent these L2 English speakers from exhibiting the native-like pattern of stress-timing and may be a factor in the L2’s syllable-timed rhythm. Coetzee and Wissing (Reference Coetzee and Wissing2007) compare Afrikaans English with Tswana English by asking native speakers of Afrikaans and Tswana to read the English version of The North Wind and the Sun. Afrikaans is a stress-timed language, while Tswana is a syllable-timed language. Therefore, Afrikaans English is estimated to be more stress-timed and Tswana English more syllable-timed. Coetzee and Wissing (Reference Coetzee and Wissing2007) find that Afrikaans English patterns with stress-timed languages and Tswana English with syllable-timed languages as expected. Coetzee and Wissing (Reference Coetzee and Wissing2007) attribute this to the transfer from the L1s of these speakers.
Studies on different languages have generally supported the validity of %V, VarcoΔV, and nPVI-V measures in differentiating L2 rhythms (but see Roach, Reference Roach and Crystal1982; Cummins, Reference Cummins2012; Rathcke et al., Reference Rathcke, Lin, Falk and Dalla Bella2021, for different opinions). This shows the efficacy of these measures in capturing the differences in L2 rhythms. As discussed in Section 33.2, VarcoΔV and nPVI-V are also effective in identifying rhythmic differences between bilinguals with 2L1s and monolinguals. Since both VarcoΔV and nPVI-V are rate-normalized, it suggests that rhythmic measures should take speech rate into consideration. One more note will be made before leaving this subsection. Most studies discussed here have also considered rhythmic measures for consonants. However, similar to the conclusion in Section 33.2, studies on L2 speakers also cast doubt on the validity of rhythmic measures for consonants. To exemplify, Liu and Takeda (Reference Liu and Takeda2021) report that the L2 English of L1 Japanese and the L2 English of L1 Mandarin groups have results much closer to the monolingual English group than the English-Japanese and English-Mandarin bilingual groups in terms of ΔC, VarcoΔC, rPVI-C, and nPVI-C (see also Lin and Wang, Reference Lin and Wang2005; White and Mattys, Reference White and Mattys2007; Mok and Dellwo, Reference Mok and Dellwo2008).
33.3.2 L2 and Perception
A few scholars take a different perspective and explore rhythmic classification from the perception side. Generally speaking, most of them agree that the results show the influence of L1 rhythm on the perception of L2 rhythm (Cutler et al., Reference Cutler, Mehler, Norris and Segui1986; Bertinetto and Fowler, Reference Bertinetto and Fowler1989; Masuko and Kiritani, Reference Masuko and Kiritani1990; Otake et al., Reference Otake, Hatano, Cutler and Mehler1993; Cutler and Otake, Reference Cutler and Otake1994; Erickson et al., Reference Erickson, Akahane-Yamada, Tajima and Matsumoto1999; but see a different conclusion in, for example, Bradley et al., Reference Bradley, Sánchez-Casas and García-Albea1993; Fear et al., Reference Fear, Cutler and Butterfield1995). For example, Bertinetto and Fowler (Reference Bertinetto and Fowler1989) use six pairs of Latinate words to examine the sensitivity of Italian and English speakers to artificially modified durations of unstressed vowels. The results from Bertinetto and Fowler’s (Reference Bertinetto and Fowler1989) study are in line with the classification of English as stress-timed and Italian as syllable-timed, as native English speakers were found to be relatively insensitive to the durational compression of unstressed vowels, while native Italian speakers were more sensitive. Erickson et al. (Reference Erickson, Akahane-Yamada, Tajima and Matsumoto1999) find that it is difficult for native Japanese speakers to count the syllables in each English word read by native American English speakers, which demonstrates that their L1 has made it difficult for them to understand the fundamental rhythmic components of English.
33.4 Remaining Questions
33.4.1 Non-prototypical Languages
One dilemma is the difficulty to classify non-prototypical languages. For example, Dauer (Reference Dauer1983, Reference Dauer1987), Arvaniti (Reference Arvaniti1991, Reference Arvaniti2007), Barry and Andreeva (Reference Barry and Andreeva2001), Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002), and Baltazani (Reference Baltazani2007) find it challenging to decide whether Greek is unclassifiable, syllable-timed, or of a mixed rhythm. Baltazani (Reference Baltazani2007) adopts the rPVI-V and rPVI-C measures and notes that Greek has an intermediate rPVI-V value of 45 between the 30 of Spanish and the 60 of German, while it has a higher rPVI-C value of 68 than both the 58 of Spanish and the 55 of German. Since Spanish and German are respectively representative syllable-timed and stress-timed languages, it is difficult to decide which rhythmic type Greek belongs to exactly (see also Balasubramanian, Reference Balasubramanian1980, for a similar conclusion concerning Tamil). Han (Reference Han1964), Ji (Reference Ji1993), Bond and Stockmal (Reference Bond and Stockmal2002), and Cho (Reference Cho2004) have all debated the classification of Korean’s rhythm as syllable-timed, stress-timed, mora-timed, or somewhere in between the first two categories. Most literature about bilinguals and L2 speakers is mainly limited to a few familiar languages, for example, English, French, German, Japanese, and Mandarin, as discussed in Sections 33.2 and 33.3, and thus cannot shed much light on the understanding of these understudied languages.
33.4.2 Rhythmic Classification: Categorical or Gradient
Some studies on monolinguals have suggested that the distinction between different rhythms appears to be gradient, instead of categorical: Languages appear to be more or less, instead of being absolute, mora-timed, stress-timed, or syllable-timed (Mitchell, Reference Mitchell1969; Port et al., Reference Port, Al-Ani and Maeda1980; Miller, Reference Miller1984; Dauer, Reference Dauer1987; Sato, Reference Sato1993; Minagawa-Kawai, Reference Minagawa-Kawai1999; Grabe and Low, Reference Grabe, Low, Gussenhoven and Warner2002). To exemplify, Mitchell (Reference Mitchell1969) argues that no language is entirely syllable-timed or stress-timed; rather, all languages exhibit both types of timing, with certain languages favoring one over the other. Dauer (Reference Dauer1987) offers a checklist with eight dimensions for the rhythmic classification of languages. A language is more likely to be called stress-timed if it receives more positive points than negative ones, and syllable-timed if it receives more negative points. English is closer to one end of Dauer’s scale and French is closer to the other. As a result, there is a continuum of rhythmic differences between languages, instead of an absolute difference. To give some language examples, Dimitrova (Reference Dimitrova1997) uses the methods in Dauer (Reference Dauer1987) and finds that the rhythm of Bulgarian is somewhere between stress-timed and syllable-timed. Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002) ask their participants to read The North Wind and the Sun in their native languages and demonstrate that English, Dutch, and German are stress-timed and French and Spanish are syllable-timed. However, Malay does not have a statistically significant difference from either representative stress-timed languages or syllable-timed languages in terms of the nPVI-V measure (Grabe and Low, Reference Grabe, Low, Gussenhoven and Warner2002). The rPVI-C measure provides even smaller differences: No statistically significant difference has emerged between Catalan and Spanish in terms of rPVI-C; Luxembourgish does not have a statistically significant difference from either representative stress-timed languages or syllable-timed languages in terms of rPVI-C (Grabe and Low, Reference Grabe, Low, Gussenhoven and Warner2002). Maddieson (Reference Maddieson, Hyman and Plank2018) argues from a different perspective: Individual variations within a language can be larger than differences between languages of various rhythms. Similarly, according to Arvaniti (Reference Arvaniti2012), the eight native speakers of German range from about 36% to 43% in terms of the %V measure, showing variations larger than the 3.8% difference between English and Spanish in terms of the same measure.
33.4.3 Rhythmic Measures: Beyond Duration
The rhythmic measures discussed in Sections 33.2 and 33.3 assume a direct and straightforward relationship between duration and abstract phonological categories including vowel reduction pattern, vowel weight, syllable structure, and so on. However, the duration of segments may be affected by multiple factors, such as the presence or absence of geminate consonants, the manner of articulation of consonants, vocalic length distinction, the pattern of vowel reduction, syllable structure, syllable position, stress, word type, boundary lengthening, speech rate, accent, intonation, tone, and the syntactic component (Delattre, Reference Delattre1966; Allen, Reference Allen1973; Klatt, Reference Klatt1975, Reference Klatt1976; Bornoze de Manrique and Signorini, Reference Borzone de Manrique and Signorini1983; Dauer, Reference Dauer1983, Reference Dauer1987; Jassem et al., Reference Jassem, Hill, Witten, Gibbon and Richter1984; Laeufer, Reference Laeufer1992; Nooteboom, Reference Nooteboom, Hardcastle and Laver1997; Rietveld et al., Reference Rietveld, Kerkhoff and Gussenhoven1999; Turk and Shattuck-Hufnagel, Reference Turk and Shattuck-Hufnagel2000). Roach (Reference Roach and Crystal1982), by comparing six languages, has found little evidence to support that the languages can be differentiated by the timing of inter-stress intervals or syllable durations (see also Wenk and Wioland, Reference Wenk and Wioland1982, using French as a language sample; Den Os, Reference Den Os1988, using Italian and Dutch; Nooteboom, Reference Nooteboom1991, using Swedish). This presents a challenge for rhythmic measures: How does one capture all these different factors in different languages and make a parallel and comprehensive comparison between different languages by use of rhythmic measures? More plainly, it is necessary for rhythmic measures to take not only phonological but also morphological and syntactic components into consideration, for example, morphological processes giving rise to consonant clusters, word order variations in languages, among others. Between the two effective rhythmic measures reviewed in Sections 33.2 and 33.3, VarcoΔV is designed to measure the dispersion of vocalic values and nPVI-V sequential variations. Neither of these two measures can take full account of the phonological components, not to mention the morphological and syntactic components. As far as rhythmic measures remain at the present stage, they cannot provide a full picture of different rhythmic types.
33.5 Conclusion
Although research on bilinguals with 2L1s and L2 speakers has provided support for rhythmic classification, results reviewed in this chapter cannot be claimed as completely satisfactory. Timing processing may operate at several levels – the segment level, the syllable level, and the phrase level – in addition to various factors, such as stress, surrounding context, position, speaking rate, and so on. How to capture all of these levels and factors into rhythmic measures is a challenge that rhythmic classification must solve.
Summary
Rhythmic classification has been subject to scrutiny from various angles, including rhythm production, rhythm perception, and factors influencing rhythm. Despite its long-standing history, questions remain about how to comprehensively, systematically, objectively, and precisely capture all aspects of rhythm in nature and structure.
Implications
Research on bilinguals with 2L1s and L2 speakers has provided some support for rhythmic classification. Future research should explore underrepresented languages and compare data from speakers with diverse language backgrounds, instead of focusing on a few representative languages. Additionally, individual variations, various factors affecting timing processes, and even perception should all be taken into account in rhythmic classification.
Gains
Disputes concerning rhythmic classification may continue. However, it cannot be denied that research on bilinguals with 2L1s and L2 speakers has provided moderate evidence for different rhythms in languages. It has been proposed that aspects of rhythm across cognitive domains overlap at both neural and cognitive levels. Research on language rhythm may provide insights into the neurological, molecular, and evolutionary underpinnings of human cognition.