Hostname: page-component-76fb5796d-zzh7m Total loading time: 0 Render date: 2024-04-25T12:54:55.889Z Has data issue: false hasContentIssue false

Stress and phrasal prominence in tone languages: The case of Southern Vietnamese

Published online by Cambridge University Press:  16 February 2017

Marc Brunelle*
Affiliation:
University of Ottawamarc.brunelle@uottawa.ca
Rights & Permissions [Opens in a new window]

Abstract

There is no consensus on the nature, or even the existence, of Vietnamese word stress. While some authors have proposed that it is morphosyntactically conditioned (Thompson 1963, Thompson 1965, Cao 2003 [1978], Ngô 1984), others have adopted the view that it is consistently word-final (Trần 1967; Nguyễn & Ingram 2006, 2007b; Phạm 2008; Nguyễn 2010) or that it lacks stress altogether (Emeneau 1951). This is due to the elusive nature of word prominence in Vietnamese, and to the small number of studies that tackle the issue experimentally. In this paper, acoustic experiments designed to test previous hypotheses and tease apart possible types of prominence are presented. Southern Vietnamese disyllabic words with various morphosyntactic structures were recorded in controlled environments to test for stress and phrasal effects. Their duration, mean intensity, mean f0, f0 range and formants were then measured to assess word prominence. Results suggest that there is little evidence for word stress in Southern Vietnamese and that reports of final stress can be reinterpreted as phrase-final lengthening. Focus-marking strategies bring no additional evidence for the existence of stress, but they seem to be partly speaker- and tone-specific, which supports results obtained in studies of Northern Vietnamese (Michaud 2005).

Type
Research Article
Copyright
Copyright © International Phonetic Association 2017 

1 Introduction

There is a puzzling level of disagreement about the status and nature of word stress in Vietnamese. While some authors analyze the language as stressless, others have proposed that it has unbounded final stress, bounded iambic stress, or even a morphosyntactically-conditioned stress system. This lack of consensus is primarily due to the stereotypically isolating nature of the language: to my knowledge, no morphophonological alternations or phonotactic distributions that could be used to diagnose word stress have ever been reported. Another reason why there is debate about the nature of Vietnamese stress is that different researchers appear to have conflated factors like word stress, focus and phrase-final lengthening. The main goal of this study is therefore to examine the acoustic properties of a number of morphosyntactically controlled disyllables to try to uncover hard evidence about syllable prominence.

The fact that even native linguists fail to agree on the nature of stress in Vietnamese (or even on its location in specific words) further suggests that even if it exists, it may not be very salient. This raises two interesting typological issues: (i) How is stress realized in a language in which every syllable bears a lexical tone? and (ii) Do all languages have some form of word stress?

1.1 Stress, stresslessness, and stress in tone languages

There is a certain conceptual vagueness in the use of the term stress in the literature on Vietnamese. To avoid further confusion, Hyman's (Reference Hyman2006) definition of stress-accent will be adopted in this paper. Primary stress (or just stress, for convenience) is an indication of

word-level metrical structure meeting the two following criteria:

  1. a. obligatoriness [. . .]: every lexical word has at least one syllable marked for the highest degree of metrical prominence (primary stress);

  2. b. culminativity [. . .]: every lexical word has at most one syllable marked for the highest degree of metrical prominence.

    (Hyman Reference Hyman2006: 231).

Words can also bear secondary stress, which typically manifests itself as a prominent initial or final syllable (edge-prominence), or as alternation of stressed and unstressed syllables (bounded or alternating stress). An additional factor that often plays a role in stress systems, weight-sensitivity, can be disregarded in Vietnamese, because it has a minimality requirement forcing all syllables to have at least two moras (i.e. it has no light syllables). Furthermore, even researchers who believe Vietnamese has stress agree that it is not contrastive (for full discussion see Section 1.4).

It is important to distinguish stress from phrasal prominence (or phrasal stress), a cover term for a variety of phenomena that can be misinterpreted as stress if words are recorded in isolation or in inadequate frame sentences. Phrasal prominence not only includes the lengthening effects typically found at phrase edges (Beckman & Edwards Reference Beckman, Edwards, Kingston and Beckman1990, Cutler & Butterfield Reference Cutler and Butterfield1990a, Turk & Shattuck-Hufnagel Reference Turk and Shattuck-Hufnagel2000), but can also result from the realization of boundary tones. Along similar lines, stress can be difficult to distinguish from word-final lengthening (Cutler & Butterfield Reference Cutler, Butterfield and Seidl1990b, Byrd Reference Byrd1996, Byrd & Krivokapić Reference Byrd and Krivokapić2006, Fletcher Reference Fletcher, Hardcastle, Laver and Gibbon2010). Information structure (focus, givenness, topicalization) can also affect phrasal prominence, but our understanding of information structure in Vietnamese is still limited (Michaud & Brunelle Reference Michaud, Brunelle, Féry and Ishihara2016).

Word stress (primary or secondary) is phonetically realized by means of a combination of increased duration, intensity and f0, and can also be cued by a reduction of phonological contrast (especially a neutralization of vowel quality) in unstressed syllables (Hayes Reference Hayes1995, van der Hulst Reference Van Der Hulst2012). In a variety of languages a longer syllable duration has been shown to be the clearest acoustic correlate of stress (Fry Reference Fry1955 on English, Sluijter & Van Heuven Reference Sluijter and Heuven1996 and Sluijter, Van Heuven & Pacilly Reference Sluijter, Heuven and Pacilly1997 on Dutch, De Jong & Zawaydeh Reference De Jong and Zawaydeh1999 on Jordanian Arabic, Arvaniti Reference Arvaniti2000 on Greek, Remijsen Reference Remijsen, Gussenhoven and Warner2002 on Ma'ya, Remijsen & Van Heuven Reference Remijsen and Van Heuven2005 on Papamientu, Prieto & Ortega-Llebaria Reference Prieto and Ortega-Llebaria2006 on Spanish and Catalan). The smallest reported stress-conditioned increase in duration is about 10% in word-initial Spanish syllables (Prieto & Ortega-Llebaria Reference Prieto and Ortega-Llebaria2006), but this lengthening is typically much greater in other languages. Note however that durational cues to stress are sometimes less direct. For instance, in Welsh, it is the post-stress consonant that is lengthened, rather than the stressed syllable itself (Williams Reference Williams1985).

Vowel quality and intensity are also generally robust cues, although their exact manifestations and contributions are partly language-specific. In languages that show dramatic reduction or phonological neutralization of vowel contrasts in unstressed syllables, like English, formants are radically centralized. However, even in languages without clear vowel reduction, F1 and F2 are usually affected to some extent (De Jong & Zawaydeh Reference De Jong and Zawaydeh1999, Remijsen Reference Remijsen, Gussenhoven and Warner2002, Remijsen & Van Heuven Reference Remijsen and Van Heuven2005, Prieto & Ortega-Llebaria Reference Prieto and Ortega-Llebaria2006). Formant measurements are difficult to compare across studies, but Jordanian Arabic, with a 5% boost in F1 in stressed syllables seems to have one of the lowest reported differences (De Jong & Zawaydeh Reference De Jong and Zawaydeh1999). The effect of stress on intensity is less straightforward: mean intensity often correlates with stress, but as the energy increase is not equally distributed in the frequency spectrum, a boosted intensity in higher harmonics has been shown to be an even better indicator of stress in Dutch, Spanish and Catalan unaccented syllables (Sluijter & Van Heuven Reference Sluijter and Heuven1996, Sluijter et al. Reference Sluijter, Heuven and Pacilly1997, Prieto & Ortega-Llebaria Reference Prieto and Ortega-Llebaria2006). However, no such effect is found in English (Campbell & Beckman Reference Campbell and Beckman1997), and it seems that, at least in some languages, overall intensity is more correlated with accent (i.e. the association of a melodic target, or pitch-accent, to a stressed syllable) than with stress itself. In fact, it has been proposed that the combined effect of duration and intensity may be a better way of capturing the perceptual prominence of stressed syllables than the two cues taken separately (Beckman Reference Beckman1986, Arvaniti Reference Arvaniti2000). Finally, although f0 is primarily a phonetic correlate of accent, a residual higher f0 may be maintained on stressed syllables, even when unaccented (De Jong & Zawaydeh Reference De Jong and Zawaydeh1999, Remijsen Reference Remijsen, Gussenhoven and Warner2002).

Although they are superficially competing for some of the same acoustic cues, lexical tone and stress are not incompatible in absolute terms. There are many well-described cases of languages where lexical tone and contrative stress coexist (Inkelas & Zec Reference Inkelas and Zec1998 on Serbo-Croatian, Riad Reference Riad1998 and Lahiri, Wetterlin & Jönsson-Steiner Reference Lahiri, Wetterlin and Jönsson-Steiner2005 on Scandinavian, Remijsen Reference Remijsen, Gussenhoven and Warner2002 on Ma'ya, Baart Reference Baart, Baart and Hyder Sindhi2003 on the tonal languages of Pakistan, Remijsen & Van Heuven Reference Remijsen and Van Heuven2005 on Papamientu, Nara Reference Nara2015 on Punjabi). In Ma'ya for instance, stress and tone rely on the same phonetic properties, but weighted in opposite orders (Remijsen Reference Remijsen, Gussenhoven and Warner2002). Perhaps more typically, in many languages with sparse tonal specification as diverse as Swedish (Riad Reference Riad1998, Lahiri et al. Reference Lahiri, Wetterlin and Jönsson-Steiner2005) and Punjabi (Baart Reference Baart, Baart and Hyder Sindhi2003, Nara Reference Nara2015), stress attracts tone and is the primary cue to stressed syllables. On the other hand, we know much less about languages in which tone is densely distributed, especially those in which each syllable has its own lexical stress. In East Asia, the best studied case is Mandarin, a language in which most disyllables have a trochaic structure and where unstressed syllables are segmentally reduced and bear a neutral tone (Norman Reference Norman1988, Chen Reference Chen1993, Duanmu Reference Duanmu2000). Another language where tone and vowel contrasts are neutralized in unstressed syllables is Burmese (Gruber Reference Gruber2011). In the absence of morphophonological alternations, one may wonder if unstressed syllables in these languages still bear an underlying tone, but tonal reduction without full neutralization is also attested: in Thai, unstressed syllables are shorter and have reduced vowels and tones, but tonal contrasts are still maintained (Potisuk, Gandour & Harper Reference Potisuk, Gandour and Harper1994, Reference Potisuk, Gandour and Harper1996). What Mandarin, Burmese and Thai have in common is that word stress is realized through a general reduction of unstressed syllables that includes, but is not limited, to tonal reduction. This contrasts with other East Asian tone languages, like Cantonese (Bauer & Benedict Reference Bauer and Benedict1997) and other southern Chinese languages (Norman Reference Norman1988), that have complex tone systems, but no stress.

Vietnamese is interesting in this context because it has a complex tone system and has often been described as having word stress, but does not exhibit a clear reduction of unstressed syllables in lexical words.Footnote 1 There is, however, considerable disagreement about the metrical structure of Vietnamese: even native-speaking linguists have failed to reach a consensual description of its stress system (see Section 1.4 for a discussion). The main goal of this paper is therefore to look at Vietnamese in more detail in order to determine if there is any form of word prominence in this language and, if there is any, to conceptually sort out the various forms of prominence at play.

1.2 Vietnamese syllables and tones

There is considerable phonological and phonetic variation between Vietnamese dialects (see Kirby Reference Kirby2011 for a comprehensive phonetic sketch of Hanoi Vietnamese, Brunelle Reference Brunelle, Jenny and Sidwell2015 for phonetic overviews of the northern and southern dialects). However, all dialects share relatively simple phonotactics: all syllables conform to a C(w)V(C) template and bear a lexical tone. Furthermore, there are, to my knowledge, no mentions of dialectal differences in the placement or realization of stress anywhere in the literature. The results presented here are therefore valid for the southern dialect, but should in theory be generalizable to other dialects. For the purpose of this paper, Southern Vietnamese will refer to the relatively homogeneous varieties spoken in Hồ Chí Minh City and the Mekong Delta.Footnote 2 There is minor lexical and phonological variation even in that zone, but no salient differences in tone or prosody have been reported. The transcriptions given below correspond to surface forms as pronounced in standard Southern Vietnamese, as it would be spoken in formal contexts by educated native speakers from Hồ Chí Minh City.

Two dialectal features related to tone are relevant to this paper. The first one is that while northern and central varieties have glottalized tones (Nguyễn & Edmondson Reference Nguyễn and Edmondson1997, Phạm Reference Phạm2003, Michaud Reference Michaud2004), southern dialects make no use of voice quality in tonal contrasts (Brunelle Reference Brunelle2009). The second one is that unlike Northern Vietnamese, which has six tones, southern dialects (and many central dialects) have merged the tones C1 (hỏi) and C2 (ngã) and now have a five-tone inventory. This simpler tone system is the reason why this paper focuses on Southern Vietnamese. In citation, the five tones of Southern Vietnamese have the following shapes (Figure 1): tone A1 (ngang) is high-level, tone A2 (huyền) is falling, tone B1 ( sc ) is rising, tone B2 (nặng) is low falling-rising and tone C (hỏi-ngã) is high falling-rising. Two additional tones are only found in checked syllables (i.e. syllables closed by voiceless stops) and will not be tested here: D1 (checked sc ) is a rising tone very similar to tone B1 ( sc ), while D2 (checked nặng) is a falling tone very similar to tone A2 (huyền).

Figure 1 The five tones of Southern Vietnamese in unchecked syllables (mean speaker z-normalized values obtained from all the words pronounced by the 18 speakers recorded for this study).

1.3 The Vietnamese lexicon

The native Vietnamese lexicon is largely composed of monosyllabic roots, but these roots are often compounded into polysyllabic words. Monomorphemic polysyllabic words also exist, but are limited to loanwords and ideophones. Assessments of the prevalence of polysyllables in the Vietnamese lexicon vary: Nguyễn (Reference Nguyễn1997: 35) reports that 80% of the lexicon is composed of disyllables, while Trần & Vallée (Reference Trần, Hiền and Vallée2009: 232) establish the respective proportions of disyllables and trisyllables in a syllabified lexicon at 49% and 1%. The proportion of polysyllables is certainly much lower in spontaneous speech (especially if one looks at token rather than type frequency), but compounds and polysyllables are nonetheless common and well-integrated into the lexicon and phonology. Since the morphosyntactic structure of compounds has been claimed to affect stress, a typological overview of Vietnamese polysyllables is given before addressing the issue of stress proper.

Native Vietnamese compounds can be grouped into two categories: coordinative and subordinative. Coordinative compounds are made up of two roots belonging to the same lexical category, out of which neither can be treated as a syntactic or semantic head. They typically have a generalized meaning which adds up to more than the sum of their parts. Thus, quần áo [wəŋA2 ʔaːw B1] <pants+shirt> means ‘clothes’. Although most coordinative compounds have a default lexicalized order, this order can be reversed without affecting their core meaning. For instance, áo quần, though much less common and more stylistically marked than quần áo, is acceptable and has the same compositional meaning. Subordinative compounds, on the other hand, are composed of a head and a modifier. As they are rigidly left-headed, they are not reversible. An example is cá heo [ka B1 hɛw A1] <fish+pig> ‘dolphin’, in which the syntactic and semantic head ‘fish’ is modified by heo ‘pig’. Previous researchers have pointed out that there is no clear dividing line between Vietnamese compounds and phrases, be it from phonetic (Ingram & Nguyễn Reference Ingram and Nguyễn2006, Nguyễn & Ingram Reference Nguyễn, Thư and Ingram2007a) or morphosyntactic perspectives (Thomas Reference Thomas1962, Noyer Reference Noyer1998). Frequent compounds seem equivalent to lexicalized phrases.

Reduplicates form a large proportion of the Vietnamese polysyllabic lexicon. One can roughly distinguish three forms of reduplicates. Real reduplicates are formed by applying a productive (or at least common) reduplication template to an existing root (Trương Reference Trương1883, Emeneau Reference Emeneau1951, Thompson Reference Thompson1965). For instance, vui [vuj A1] ‘happy’ can be reduplicated into vui vẻ [vui A1 vɛC] ‘very happy’ and mạnh [man B2] ‘strong’ into mạnh mẽ [man B2 mɛC] ‘very strong’ through a regular rhyme change and a predictable tone alternation.Footnote 3 The few available strategies for creating real reduplicates can result in forms where the base precedes the reduplicant, like in the examples just given, or in forms where the reduplicant precedes the base, as in mành mạnh [man A2 man B2] ‘strong-ish’. There is also a large set of pseudo-reduplicates, which are reduplicated forms where the reduplicant is lexicalized and cannot be derived from the base through an established morphophonological template. An example is nhút nhátu D1 ɲaːk D1] ‘very timid’, which is clearly related to nhát ‘timid’, even if there is no regular u~a reduplication pattern in Vietnamese. To this, one can also add a class of false reduplicates, that are actually a type of adjectival compounds, where an adjective is followed by a verb, like vui thích [vuj Á1 tʰɨt D1] <happy+to like> ‘very happy’.

There are three other types of polysyllabic words. The first type is ideophones, a large class of words that are phonological similar to reduplicates, but do not share their semantic properties (Brunelle & Lê Reference Brunelle, Lê and Williams2013). Words like lung tung [lu A1 tu A1] ‘disordered, without a clear goal’, are composed of syllables that share phonological properties, but are not derived through clear morphophonological processes and do not have a base (neither lung nor tung mean anything in isolation). The second type, Sino-Vietnamese compounds, have an ambiguous status as they are often not transparent to native speakers (Alves Reference Alves, Haspelmath and Tadmor2009). This is because Sino-Vietnamese monosyllabic roots are often semantically opaque, just like Latin or Greek roots in Western European languages and because Sino-Vietnamese subordinative compounds are right-headed, contrary to their systematically left-headed native correspondents. For instance, a word like tuần lộc [tuːŋA2 lo D2] <docile+deer> ‘elk, reindeer’ is opaque because the meaning of its roots is unknown to native speakers (except sinologists and zoologists), while a word like quốc ngữ [wɔk D1 ŋɨC] <country+language> ‘Vietnamese script’ is probably semantically transparent to all native speakers due the high frequency of its roots, but is structurally marked as its headenedness does not follow the morphosyntactic rules of native Vietnamese. The last type of polysyllabic word is composed of a few hundred monomorphemic polysyllabic loanwords, mostly borrowed from French (ban công [baːŋA1 ko A1] ‘balcony’ < Fr. balcon; sơ cua [səA1 kuəA1] ‘back-up suitor’ < Fr. roue de secours ‘safety wheel’) (Huỳnh Reference Huỳnh2008), but also from neighboring languages (Sài Gòn [saːj A2 ɡɔŋA2] name of a city < Khmer pr j nokor Footnote 4 ‘City of the forest’, Phan Rang [faːŋA1 ɾaːŋA1] name of a town < Cham/Sanskrit Panduranga).

This paper focuses on disyllabic words because they are by far the most common type of polysyllables. There is however no theoretical upper limit on the number of syllables in a word. Native coordinative compounds can have more than two syllables (e.g. anh chị em [an A1 ci B2 ɛm A1] <older brother+older sister+younger sibling> ‘siblings’), subordinative compounds/phrases are recursive (nhà máy điệna A2 maːj B1 ɗiːŋB2] <house+machine+electricity> ‘power station’) and reduplicates can have up to four syllables in some stylistic contexts (Ngô Reference Ngô1984).

1.4 Stress in Vietnamese

To my knowledge, Vietnamese stress has been described in four, largely incompatible, ways. The first is that there is no word stress; as suggested by Emeneau (Reference Emeneau1951: 25), ‘[t]here are no stress phenomena to be noted – every word in connected utterance . . . has the same degree of energy as every other’.Footnote 5 The second view is that there is morphosyntactically-defined word stress. This position was first put forward coherently by Cao (Reference Cao2003 [1978]) in a paper in which he made a set of observations about duration and tone reduction in Vietnamese (apparently following observations made in Thompson Reference Thompson1965). Cao did not fully distinguish sentential prominence and word stress, and described processes that seem to go beyond stress, such as phonological reduction of function words and the lexicalization of high-frequency phrases into compounds. However, he did make explicit claims about headeness and stress in coordinative and subordinative compounds, and these were later expanded and formalized by Ngô (Reference Ngô1984: 101). According to Ngô, all lexical words are stressed, but (i) coordinative compounds only have heads and therefore receive stress on each of their syllables, (ii) subordinative compounds receive stress on their non-head, which means that native subordinative compounds should be right-headed while Sino-Vietnamese compounds should be left-headed, (iii) reduplicates receive stress on their base, and (iv) non-Sino-Vietnamese loanwords are stressed on their initial syllable. Ngô’s extension of Cao's proposal to Sino-Vietnamese and non-Sino-Vietnamese loanwords is a radical move, but the general idea that stress is related to headedness remains influential among Vietnamese linguists.

The third position is that Vietnamese stress is always word-final. This is the position adopted by Nguyễn & Ingram, in a series of papers that constitute the only experimental evidence about Vietnamese stress (Nguyễn Reference Nguyễn and Thư2010, Nguyễn & Ingram Reference Nguyễn, Thư and Ingram2007a, Reference Nguyễn, Thư and Ingramb). In a study of disyllabic head-final reduplicates, Nguyễn & Ingram (Reference Nguyễn, Thư and Ingram2007b) found that first syllables are shorter and have more centralized vowels and more reduced f0 contours than second syllables, but that their intensity tends to be stronger; spectral tilt differences independent of tone were not found. Nguyễn & Ingram obtained similar results in a related study of disyllabic coordinative compounds: their second syllable is prominent in terms of F1 and duration, but their first syllable has a stronger intensity unaccompanied by any spectral tilt difference (Nguyễn & Ingram Reference Nguyễn, Thư and Ingram2007a, Figure 5).Footnote 6 Based on these results, they proposed that Vietnamese disyllables bear second-syllable stress and conservatively concluded that ‘there was clear acoustic evidence that Vietnamese disyllabic word forms are not symmetrical in terms of accentual prominence, but “right-headed” or biased in weight toward the second element’ (Nguyễn & Ingram Reference Nguyễn, Thư and Ingram2007a: 1757). However, as they acknowledge, the fact that their target words are always phrase-final makes it difficult to determine if second syllable prominence is really due to accentual prominence or if it is a consequence of phrase-final lengthening. In a more recent study, Nguyễn adopted a more formal analysis and proposed, based on an investigation of nonce place names of up to six syllables (e.g. La-na, La-na-ma, La-na-ma-ra, La-na-ma-ra-ga, La-na-ma-ra-ga-nha), that ‘polysyllabic words in Vietnamese tend to be parsed into bi-syllabic iambic feet with a rightward or retrograde [i.e. alternating] rhythmic pattern’ (Nguyễn Reference Nguyễn and Thư2010: 25). Unfortunately, results suggest that the task may have been too awkward to yield natural speech: quadrisyllabic nonce words, which match the longest attested monomorphemic words of Vietnamese (pê ni xi lin [pe A1 ni A1 si A1 lin A1] ‘penicillin’, phô tô cóp pi [fo A1 to A1 kɔp D1 pi A1] ‘photocopy’), have an abnormally long average duration of about 1200 ms, while hexasyllabic nonce words reach an alarming 2000 ms. Nguyễn's (Reference Nguyễn and Thư2010) claim about the alternating nature of Vietnamese stress thus seems bold, but Nguyễn & Ingram's (Reference Nguyễn, Thư and Ingram2006, Reference Nguyễn, Thư and Ingram2007a, b) more general proposal that stress is word-final or iambic must be considered seriously, especially since it was made independently by other researchers (Trần Reference Trần, Nguyễn, Trần and Dellinger1967, Phạm Reference Phạm2008).

The fourth position is that stress is a phrasal phenomenon. This view was adopted, explicitly or not, by several authors who noticed significant phrase-final lengthening in Vietnamese (Thomas Reference Thomas1962; Thompson Reference Thompson1963, Reference Thompson1965; Hoáng & Hoáng Reference Hoáng and Hoáng1975; Cao Reference Cao2003 [1978]). Some of its proponents deem that this final lengthening is accompanied by recursive iambic phrasal stress (Thomas Reference Thomas1962) or additional morphosyntactic stress (Thompson Reference Thompson1963, Reference Thompson1965; Cao Reference Cao2003 [1978]). A recent version of this proposal has been put forward by Schiering, Bickel & Hildebrandt (Reference Schiering, Bickel and Hildebrandt2010), based on Thompson's (Reference Thompson1965) examples and on a loose reinterpretation of Nguyễn & Ingram's (Reference Nguyễn, Thư and Ingram2006) conclusions. Schiering et al.’s (Reference Schiering, Bickel and Hildebrandt2010) claim is that Vietnamese has neither prosodic word nor word stress, but rather very short phrases (of an undefined nature) with final stress.

In order to determine which of these accounts is the most appropriate, an experimental investigation of Vietnamese stress was conducted. It was designed to determine if there is any form of prominence in Vietnamese disyllables and to assess if such prominence is caused by phrase-final effects or by word stress. Various types of compounds were recorded to see if their morphosyntactic structure affects prominence.

2 The study

Two rounds of recording sessions were conducted with native Southern Vietnamese speakers. In the first one, a word list designed to determine if the second-syllable prominence found in previous research was due to word stress or phrase-final lengthening was recorded. In order to assess the possible effect of morphosyntactic structure on stress, the word list included several lexical categories, following the typological sketch in Section 1.3 and the predictions reported in Section 1.4 above. The second round was based on two word lists meant to further investigate the role of morphosyntactic structure and to determine if disyllables under focus have any special prominence, and if one of their syllables becomes more prominent than the other, which could be interpreted as indirect evidence for stress.

2.1 Method

2.1.1 Materials

Three word lists containing Vietnamese disyllabic words framed in variable sentences were constructed in collaboration with a linguist in Hồ Chí Minh City. They were then double-checked with two graduate students in linguistics and two naïve speakers (see Appendix A). All collaborators were native speakers of Southern Vietnamese (one was bidialectal and also spoke Northern Vietnamese). The primary aim of the word lists was to present target words in a context where they were not easily identifiable to prevent participants from focalizing or rephrasing them. Special effort was also put into keeping sentences semantically and syntactically as natural as possible to favor a reading style close to spontaneous speech. The target words and frame sentences were not necessarily frequent or preferred in colloquial speech, but they were well-formed and participants were able to read them without hesitating or stumbling. An underlying assumption of this type of data is that a fluent reading style should not exhibit marked differences with spontaneous speech in terms of word stress placement.

The first word list consisted of disyllabic words chosen to test the stress patterns of words with different morphosyntactic structures (Sections 1.31.4) and the role of phrasal position. Five disyllabic words were chosen for each morphosyntactic category. Examples of each of these categories are given in Table 1.

Table 1 Types of disyllables included in the first word list.

In order to keep tonally-conditioned f0 variation relatively constant, all native subordinative compounds and loanwords (1b and 1c in Table 1) were composed of syllables with identical tones, and each subgroup contained one token of each of the five Southern Vietnamese tones. As it was not possible to find lexicalized coordinative compounds (1a and 1aʹ in Table 1) with perfectly controlled tone combinations, a decision was made to choose a set of five frequent and easily reversible words. Reduplicates (2a, 2b and 2c in Table 1) were derived from five base adjectives with each of the five tones. In order to keep the word list reasonably short, reduplicates were only tested in phrase-medial position.

A second round of recordings was organized to control more systematically for different patterns of headedness in loanwords (category 1c, loanwords, was broken down into Sino-Vietnamese and non-Sino-Vietnamese loanwords) and to determine if any type of prominence arose under focus. To that effect, a second word list was put together (Table 2). Again, each word type included words with each of the five Southern Vietnamese tones (their two syllables bore the same tone), except for the four monomorphemic disyllabic non-Sino-Vietnamese loanwords, which only bore the high-level tone A1 (ngang) and the low-falling tone A2 (huyền) (other tones are rare in such loanwords). The second word list was composed of naturalistic frame sentences similar to those used for the words in Table 1, but target words were only inserted in phrase-medial position.

Table 2 Types of disyllables included in the second word list.

A third word list was built to determine if corrective focus is realized on a particular syllable in phrase-medial disyllables. It comprised two target words for each of the eight categories in Tables 1 and 2, except category 2c (false reduplicants). When available, words bearing the level tone A1 (ngang), and the rising tone B1 ( sc ) were selected. Otherwise, words with A2 (huyền) were chosen. In the third word list, target frame sentences were interspersed with contextual prompts designed to elicit natural corrective focus. An example is given in (1).

  1. (1)

Note that, because of the structure of the Vietnamese lexicon, it was impossible to build a word list in which consonants and vowels were perfectly controlled. Segmental effects were therefore dealt with statistically (see description of random effects in Section 2.1.5 below), but a brief analysis of vowel formants is given in Section 2.2.5.

2.1.2 Participants

All participants were native speakers of Southern Vietnamese born and raised in Hồ Chí Minh City or the Mekong Delta. They were all aged between 18 and 26 years at the time of the recordings and were students in fields other than linguistics. In the first round of recordings, eight participants (four female, four male) recorded the first word list. In the second round of recordings, 10 different speakers (five male, five female) recorded the second and third word lists.

2.1.3 Procedure

All recordings were made in a soundproof booth in Hồ Chí Minh City. Sessions were recorded as uncompressed 44.1 kHz wav files with a Neumann TLM-102 condenser microphone. In the two recording sessions, word lists were read five times by each participant.

In order to maximize speech naturalness, participants were given a copy of the relevant randomized word list 10 minutes before recording sessions and were asked to read it once. As a result, recordings were produced fast enough not to sound hyperarticulated, but some sentences were read inaccurately (the most common error being that rare reversed coordinative compounds were sometimes reversed to their more frequent counterparts; e.g. cỏ cây ‘vegetation’ was read as cây cỏ). Tokens affected by serious reading disfluencies were discarded.

In the first round of recordings, participants only read the first word list. In the second round of recording, participants first read the second word list and then the third one. To ensure that participants produced the intended corrective focus in the third word list, an experimenter (a female native speaker of Southern Vietnamese) read a contextualization prompt before the participants produced each sentence (see (1) above and Appendix A).

2.1.4 Acoustic measures

Acoustic properties known to play a role in stress were measured in each of the two syllables of the targets words (using Praat 5.3.61). These properties, which mirrored those measured by Nguyễn & Ingram (Reference Nguyễn, Thư and Ingram2007b), were syllable duration, mean rhyme intensity, mean rhyme f0 (mean of five equidistant sampling points in the rhyme), and the first and second formants at the midpoint of the vowel nucleus.

Spectral tilt is reported in Nguyễn & Ingram (Reference Nguyễn, Thư and Ingram2007b), but was excluded in this study because proper experimental controls for these cues would have required unreasonably long word lists. In any case, as vowel quality and tone, two factors known to have an important effect on spectral tilt, were not controlled for in Nguyễn & Ingram's study, comparison of the results would not have been possible.

If there is stress in Vietnamese, significant asymmetries in duration, intensity and vowel formants (and possibly a marginally higher f0) should be expected, following the cross-linguistic tendencies described in Section 1.1. Based on previous work on stress in East Asian tone languages, a wider f0 range could also be expected in stressed syllables.

2.1.5 Statistical analysis

The data was analyzed using linear mixed models implemented in SPSS Statistics 23. The dependent variables were the acoustic measures just discussed (syllable duration, mean rhyme intensity, mean rhyme f0, F1 and F2 at the midpoint of vowel nuclei, and f0 range). The fixed effects included in at least one of the models presented in the results section, along with their levels, were:

  • Tone: A1 (ngang), A2 (huyền), B1 ( sc ), B2 (nặng), C (hỏi-ngã) and D1 (checked sc )

  • Word type: As listed in Tables 1 and 2 above

  • Position: Phrase-medial vs. phrase-final

  • Syllable position (Syll#): First or second syllable of the word

  • Information structure (IS): Focused or unfocused

  • All relevant interactions

The structure of fixed effects for specific models depended on the subset of the data included in a given analysis. More detail is provided for each model in the results section (Section 2.2). The random effects included in the models were always speaker and item (i.e. syllables). Following the recommendations of Barr et al. (Reference Barr, Levy, Scheepers and Tily2013), the random effect structure of the models was kept maximal: besides random intercepts for speaker and item, uncorrelated random slopes were included for each fixed effect included in a model, but not for interactions (slopes were uncorrelated because there was no a priori linguistic interpretation of eventual correlations; random slopes of interactions were excluded because models including them could not converge). As item is normally found in a single Word type and, in most models, in a single Syllable position, random slopes for combinations of these fixed and random factors (Wordtype|item, Syll#|item) were excluded.

Maximal models were simplified by dropping fixed effects if doing so yielded a significantly lower Akaike information criterion (AIC) score. Interactions were dropped before main effects, and main effects were not dropped if they contributed to a significant interaction.

As discussed in Section 2.2 below, most of the effects tested in the statistical models turned out not to be significant. To ensure that these null results were not due to an inadequate experimental design, post-hoc power analyses were conducted to determine if the experimental samples and statistical models used in this study were able to detect effects of a magnitude typically found in stress systems (Gelman & Hill Reference Gelman and Hill2007, Kirby & Sonderegger Reference Kirby and Sonderegger2016). Scripts were adapted from Snow (Reference Snow2009) to simulate datasets closely mirroring the data recorded in the first word list, and these simulated data were analyzed with models following the same structure as the six maximal models presented in Sections 2.2.1 and 2.2.2. Footnote 7 The random variance and errors included in the simulation were obtained from the real models. A thousand simulations were conducted for each of the six models, with datasets assuming fixed iambic stress and morphosyntactically-conditioned stress. With an intercept of 215 ms and a final lengthening of 150 ms, duration models were able to detect a 10 ms duration effect at p < .05 more than 99% of the time. Similarly, with an intercept of 66 dB, mean intensity models were able to detect a 1dB effect at p < .05 more than 99% of the time. Finally, assuming an intercept of 200 Hz, mean rhyme f0 models were able to detect a 5 Hz effect at p < .05 more than 99% of the time. The simulations further suggested that models were robust even when dealing with noisy data: when the random variance and errors inferred from the actual data were doubled, power was still above 0.8 for all models. As the magnitude of the effects tested in the simulations was significantly smaller than that attested in stress systems (recall Section 1.1 above), one can be confident that the null results reported below are not due to insufficient statistical power.

2.2 Results

2.2.1 Acoustic prominence in compounds

As Native Coordinative Compounds and Native Coordinative Reversed Compounds (1a and 1a’ in Table 1) were composed of the same syllables in opposite order, they offered a naturally controlled environment to look at the effect of factors like the position of a disyllable in the sentence (medial vs. final) and the order of a syllable in a disyllable (first or second). Mixed models were fitted on all tokens of these two types of compounds to look for asymmetries between their syllables (dependent variables: syllable duration, mean rhyme intensity, mean rhyme f0, F1, F2; fixed effects before model simplification: Syllable position, Position, Word type and all interactions).

Since only three fixed effects were significant in the five models, they are not given here but are instead reported in Appendix B (Tables A1–A5). The first significant effect is that the duration of phrase-final syllables (Syll#2*PositionFinal) was longer than that of other syllables (+41%), as shown in the top panel of Figure 2. Phrase-final syllables also had a significantly lower mean rhyme intensity (Syll#2*PositionFinal), but this effect was small (–0.47 dB) and is difficult to disentangle from syllable-specific effects in the mid panel of Figure 2. Finally, F1 was raised (+66 Hz) in phrase-penultimate syllables (PositionFinal), but this last effect is again blurred by syllable-specific effects (Syll#2*PositionFinal) in the bottom panel of Figure 2 (and is further discussed in Section 2.2.5). No significant effects were found for F2 and mean rhyme f0, and the order of syllables and type of coordinative compounds were never significant as main effects. As no differences between Native Coordinative compounds or their reversed counterparts were uncovered, they are merged as a single group in the rest of this section.

Figure 2 Speaker z-normalized duration, mean rhyme intensity and F1 at the midpoint of nuclei for native coordinative and reversed coordinative compounds, in sentence-medial and sentence-final positions and in the first and second position of disyllables. The four compounds are tìm kiếm [tim A2 kiːm B1] [to search+to find] ~ kiếm tìm ‘to look for’, quần áo [wəŋA2 aːw B1] [pants+shirt] ~ áo quần ‘clothes’, cây cỏ [kɛj A1 kɔC] [tree+grass] ~ cỏ cây ‘vegetation’ and đói nghèo [ɗɔj B1 ŋɛw A2] [hungry+poor] ~ nghèo đói ‘to live in hardship’. (A fifth pair, bàn ghếaːŋA2 ɡe B1] [table+chair] ~ ghế bàn ‘furniture’, had to be excluded: ghế bàn was not recorded due to an error in the word list.)

In order to determine if these results extend to other types of disyllabic words, mixed models were fitted on all non-reduplicate words recorded in the first word list. Models were fitted for syllable duration (Table 3), mean rhyme intensity (Table 4) and mean rhyme f0 (Table 5).Footnote 8 In the maximal models, the fixed factors were the position of the syllable in the word (Syllable position), the position of the word in the sentence (Position), the type of word (Word type), and relevant interactions (Syllable position*Position, Syllable position*Word type). Three types of words (Word type) were included in these models: Native Coordinative Compounds, Native Subordinative Compounds and Loanwords (respectively 1a/1aʹ, 1b and 1c in Table 1).

Table 3 Estimates of fixed effects on syllable duration in non-reduplicates (r 2 = 0.89). Reference category: First syllable of phrase-medial native coordinative compounds. Bold marks significant fixed effects.

Table 4 Estimates of fixed effects on mean rhyme intensity in non-reduplicates (r 2 = 0.91). Reference category: First syllable of phrase-medial native coordinative compounds.

Table 5 Estimates of fixed effects on mean rhyme f0 in non-reduplicates (r 2 = 0.90). Reference category: First syllable of phrase-medial native coordinative compounds.

By and large, these models confirmed what was found in Native Coordinative Compounds. The best model for syllable duration only included the main effects Syllable position, Position and their interaction (Table 3). The only significant effect in this model was an important lengthening (+70%) of the second syllable of disyllabic compounds in phrase-final position (Syll#2*PositionFinal). Otherwise, there was no evidence that either of the two syllables was more prominent than the other. The best models for mean rhyme intensity (Table 4) and mean rhyme f0 (Table 5) had a more complex structure, but no significant fixed factors.

To summarize, results obtained from compounds and loanwords were limited to phrase-final effects. A strong phrase-final lengthening was found in all word types investigated so far, accompanied by a weak phrase-final drop in mean rhyme intensity in native coordinative compounds, but not in native subordinative compounds and loanwords. A possible phrase-penultimate increase in F1 was also uncovered in native coordinative compounds and is investigated more systematically in Section 2.2.5. No effect in mean rhyme f0 or F2 was detected. Importantly, there was no evidence of a phonetic asymmetry attributable to word stress. Furthermore, inspection of random effects revealed no important difference between speakers; for example, in the models presented in Tables 3, 4 and 5, the variance estimates of the random slope Syllable by speaker, which is arguably the most important for detecting word stress, are marginal (0.00005 s, 0.01 dB and 13.16 Hz, respectively).

2.2.2 Acoustic prominence in reduplicates

The behavior of reduplicates was explored to determine if they showed evidence for stress, and if the results obtained in Nguyễn & Ingram (Reference Ingram and Nguyễn2006, Reference Nguyễn, Thư and Ingram2007b) could be replicated. This was done by fitting mixed models on all phrase-medial words included in the first word list. The compounds and loanwords analyzed in the previous section (Section 2.2.1), which were not statistically different, were pooled into a single category, Non-Reduplicates, and compared with three types of reduplicates (Word type): Head-Initial Reduplicates, Head-Final Reduplicates and (head-initial) False Reduplicates (2a, 2b and 2c in Table 1).

Models were fitted for syllable duration (Table 6), mean rhyme intensity (Table 7) and mean rhyme f0 (Table 8). In maximal models, the fixed factors were the order of the syllable in the word (Syllable position), the type of word (Word type) and their interaction.

Table 6 Estimates of fixed effects on syllable duration in phrase-medial words (r 2 = 0.64). Reference category: First syllable of disyllabic non-reduplicates.

Table 7 Estimates of fixed effects on mean rhyme intensity in phrase-medial words (r 2 = 0.68). Reference category: First syllable of disyllabic non-reduplicates.

Table 8 Estimates of fixed effects on mean rhyme f0 in phrase-medial words in (r 2 = 0.90). Reference category: First syllable of disyllabic non-reduplicates.

The best duration model included the two fixed effects, but not their interaction (Table 6). The lack of significance of the two fixed effects suggests that as for other word types (Section 2.2.1), there may be no durational prominence in reduplicates.

The best models for mean rhyme intensity (Table 7) and mean rhyme f0 (Table 8) were more complex, but still yielded no statistically significant results. Altogether, this suggests that there is no salient syllable in Southern Vietnamese reduplicates.

2.2.3 Focus

No evidence for word stress was uncovered in Southern Vietnamese native compounds, loanwords and reduplicates. However, there is a possibility that one of the two syllables of disyllables may only reveal its prominence in special conditions, by acting, for instance, as an anchor for focus. The data in the second and third word lists were therefore analyzed to see if there is any asymmetry between syllables when words are recorded under corrective focus. As already explained in Section 2.1.1, the coarse loanword category tested in Sections 2.2.1 and 2.2.2 (1c in Table 1 above) was broken down into three subgroups to further test the hypothesis that stress may be morphosyntactically conditioned; these subgroups were coordinative Sino-Vietnamese compounds, subordinative Sino-Vietnamese compounds and non-Sino-Vietnamese disyllabic loanwords (see Table 2 above). Mixed models were fitted for the syllable duration, mean rhyme intensity and mean rhyme f0 of all the words of the second word list and their focused counterparts in the third word list. The fixed factors included in maximal models were Syllable position, Information structure, Word type and all their interactions.

The best model for syllable duration is given in Table 9. The only significant fixed factor in this model was Information structure, indicating that both syllables of words under focus were longer than corresponding syllables in non-focal condition. The amount of lengthening in focused syllables was limited (10 ms), but inspection of random effects revealed it to be consistent in all speakers; the variance estimate for the random effect of Information structure by speaker was 0.00003 s.

Table 9 Estimates of fixed effects on syllable duration (r 2 = 0.69). Reference category: First syllable of unfocused disyllabic loanwords. Bold marks significant factors significant fixed effects.

The best model for mean rhyme intensity (Table 10) kept its maximal structure, but had no significant fixed effect. There was no evidence of intensity being used to mark focus by any of the speakers, as shown by the small variance estimate for the random effect of Information structure by speaker (0.23 dB).

Table 10 Estimates of fixed effects on mean rhyme intensity (r 2 = 0.80). Reference category: First syllable of unfocused disyllabic loanwords.

The best model for mean rhyme f0 (Table 11) had two significant fixed effects. First, Sino-Vietnamese subordinative compounds had a lower mean rhyme f0 than other loanwords (WordtypeSSub); Sino-Vietnamese coordinative compounds also approached significance. Second, the same Sino-Vietnamese subordinative compounds had a boosted mean rhyme f0 under focus (ISFocus*WordtypeSSub). There is no reason why a specific category of loanwords would have an overall lower mean f0, or a special behavior under focus: in fact, a close look at the data suggests that these results were an artefact of the experimental design and were caused by unexpected discrepancies in focus realization strategies across speakers and tones.

Table 11 Estimates of fixed effects on mean rhyme f0 (r 2 = 0.93). Reference category: First syllable of unfocused disyllabic loanwords. Bold marks significant fixed effects.

To understand this, it must first be pointed out that three speakers out of 10 did not use f0 for focus marking, but relied exclusively on lengthening, as shown in Figure 3 below (the large variance estimate for the random effect of Information structure by speaker, 83.59 Hz, confirmed this interspeaker variation). However, what is crucial here is that the seven speakers who did mark corrective focus with an f0 boost did not apply this strategy equally to all tones. As is again shown in Figure 3, while they boosted f0 in the rising tone B1 ( sc ), and to some extent, in the high-level tone A1 (ngang), they did not raise the f0 of the falling tone A2 (huyền).

Figure 3 Mean speaker z-normalized f0 of three tones in focused and unfocused conditions, for two groups of speakers exhibiting different behaviors.

This pattern provides an explanation for the two significant effects found in the model in Table 11: they were caused by differences in the distribution of the tones in the word list. Three out of the four non-Sino-Vietnamese loanwords bore the high-level tone A1 (ngang) (mirroring its natural prevalence in these loanwords), while Sino-Vietnamese compounds could bear all five tones. This over-representation of the high-level tone in non-Sino-Vietnamese loanwords resulted in their mean rhyme f0 to be higher, explaining the significant effect of Word type. In the subset of words that were focused, on the other hand, the Sino-Vietnamese loanwords all bore the high-level tone A1 (ngang) or the rising tone B1 ( sc ), two tones raised under focus. On the other hand, non-Sino-Vietnamese bore either the high-level tone A1 (ngang) or the falling tone A2 (huyền). Since the latter does not undergo an f0 boost, the non-Sino-Vietnamese loanwords included in the model had on average a lower mean f0 under focus.

The main result of this section is that there was no evidence that different types of loanwords (coordinative and subordinative Sino-Vietnamese, non-Sino-Vietnamese) behave differently, casting more doubt on the existence of morphosyntactically-conditioned stress. Moreover, even if the third word list had distributional limitations, a comparison of words recorded in focused and unfocused conditions showed that under corrective focus, the two syllables of disyllabic words systematically undergo the same acoustic changes. That said, some variability in focus realization strategies was encountered: while syllable lengthening was found in all speakers and all tones, f0 raising was only found in seven speakers out of 10, and failed to apply to the falling tone A2 (huyền).

2.2.4 Possible effect of stress on the f0 of individual tones

The mean rhyme f0 models presented so far rest on the assumption that all tones are similarly raised or lowered when (if) stressed. However, there is a possibility that under stress, individual tones behave differently, as they do under focus (Section 2.2.3). An expansion of the f0 range could for instance be implemented by raising high tones and lowering low tones. This possibility was evaluated by refitting the three mean rhyme f0 models presented above (see Tables 5, 8 and 11) with Tone and its interactions (except Tone*Word type) as additional fixed effects. Refitted models are reported in Appendix C (Tables A6, A7 and A8, respectively). In all of them, the main effect Tone was significant, which merely confirms that different tones have different mean f0s. As expected, the significance of other main effects closely matched that of the original models.

Less trivial is the fact that the large majority of interactions of Tone with other fixed factors were not significant, casting doubt on the existence of tone-specific stress realization strategies. There are two exceptions to this generalization. First, in the model refitted on all non-reduplicates in medial and final positions (Table A6), tone C (hỏi-ngã) had a higher mean rhyme f0 when sentence-final than when sentence-medial (last box of the second row of Figure 4). This can be interpreted as a consequence of phrase-final lengthening: due to the longer duration of the tone-bearing syllable in sentence-final position, the complex falling-rising tone C (hỏi-ngã) had time to rise to its maximum, something that it failed to do sentence-medially. Second, in the refitted focus model (Table A8), tone A2 (huyền) had a lower mean rhyme f0 in the second syllable of a word (second box of the top row of Figure 4). As this effect makes little linguistic sense and has no parallel in the other two models refitted with Tone (Tables A6 and A7), I will not attempt to interpret it here (given the 20 fixed effects tested in the model, this may very well be a spurious effect due to multiple comparisons).

Figure 4 Speaker-normalized f0 contours of the five unchecked tones (headers) in the two syllables of disyllables and in two phrasal position (top: medial, bottom: final). Means of all recorded data.

The near absence of significant interactions of Tone with other independent variables suggests that there is no overall expansion of the f0 range (or any other tone specific effects) in positions that could be assimilated to a stress environment.

2.2.5 Formants

For reasons discussed in Section 2.1.1, word lists perfectly balanced for vowel quality could not be constructed; vowel quality was therefore not included as a factor in the study, but was instead controlled for by including the random factor item (i.e. the syllable) in mixed models. Nonetheless, there were enough tokens of each vowel to look at formant frequencies in the entire dataset by focusing on factors that were significant in previous models (Syllable position, Position, Information structure) without breaking down the data by Word type, a factor that was not significant so far.

Models were fitted on F1 and F2 measured at the midpoint of the vowel nucleus. The best models included a large number of parameters, in part because of the 12 different vowel nuclei found in the word list /iː i u uː e o ɛ əː ə ɔ aː a/. The best F1 model (r 2 = 0.767) included all possible main factors and interactions, except Syll#*IS*vowel, and thus had 61 parameters. In the best F2 model (r 2 = 0.593), no factor or interaction could be dropped, yielding 65 parameters. Rather than trying to present these excessively large models here, results are illustrated by plotting vowel charts of their estimated marginal means. These charts are grouped together in Figure 5.

Figure 5 Estimated Marginal Means for the first and second formants of each vowel found in the corpus, in different conditions. Pairs of vowels surrounded by an ellipsis have statistically different F1 at p < .01. Pairs of vowels framed in a box are statistically different F2 at p < .01. Pairs of vowels surrounded by a dotted ellipsis have statistically different F1 and F2 at p < .01. Other vowel pairs are not statistically different.

In non-focused phrase-medial position (lower-left panel of Figure 5), four vowels had significantly different mean formants in the first and second syllables: /iː/ and /aː/ had a higher F2 when they were in the second syllable of a disyllabic word than when they were in its first syllable, while /aː/ and /ə/ had a lower F1 in the second syllable. /o/ had a higher F1 in the second syllable, but this is probably a spurious effect: the only word that had /o/ in initial syllable, công ty [ko ti] ‘company’, was systematically produced with a much lower vowel than expected. In short, there were some statistically significant differences between the two syllables, but they could not be interpreted as an expansion or a contraction of the vowel space, and are thus difficult to attribute to stress.

Looking at non-focused words in phrase-final position (lower-right panel), there was, like in Nguyễn & Ingram (Reference Nguyễn, Thư and Ingram2007a), a general tendency for F1 to be lowered in the second syllable that reached statistical significance for the vowels /e ə ɔ/. Nguyễn & Ingram interpreted this F1 lowering as evidence for ‘an articulatory gesture enhancement with larger mouth opening and jaw lowering indicative of stress or prosodic strengthening’ (Nguyễn & Ingram (Reference Nguyễn, Thư and Ingram2007a: 1755). A comparison of second syllable vowels in the lower-right panel with their counterparts in the lower-left panel revealed that the drop in F1 was limited to second syllables in final position, which would favor the prosodic strengthening interpretation over the stress one. Aside from F1 differences, some phrase-final non-focused vowels showed significant, but inconsistent F2 variability: /aː/ and /iː/ had a higher F2 in second syllable, but /i/ and /ə/ went in the opposite direction.

F1 also seemed affected by focus. The vowels of focused words in medial position (top panel) had a significantly lower F1 than their non-focused counterparts (lower-left panel). Furthermore, the significant differences in F1 and F2 between syllables found in unfocused condition (lower-left panel) were not found under focus. The only exception, the significant difference between first and second syllable /o/, could again be attributed to the word công ty ‘company’.

3 Discussion

The acoustic results and statistical analyses presented here yield a relatively simple picture. All other things being equal, none of the acoustic correlates investigated (syllable duration, mean rhyme intensity, mean rhyme f0, F1 and F2, f0 range) show a systematic asymmetry between the first and second syllables of disyllabic words. These null results do not seem to be caused by a lack of statistical power, as the power analyses described in Section 2.1.5 above have shown that the models and samples should be able to detect effects much smaller than those found in previously described stress systems.

One acoustic property that could admittedly be better investigated is spectral tilt, which has been shown to correlate with stress in unaccented syllables in a few languages (Sluijter & Van Heuven Reference Sluijter and Heuven1996, Sluijter et al. Reference Sluijter, Heuven and Pacilly1997, Prieto & Ortega-Llebaria Reference Prieto and Ortega-Llebaria2006). The word list used in this study does not sufficiently control for segments and tones to allow this type of analysis, but previous studies have found no effect of spectral tilt independent of tone in Vietnamese (Nguyễn & Ingram Reference Nguyễn, Thư and Ingram2007a, b). Overall, these results strongly argue against the existence of fixed word stress in Southern Vietnamese. As no asymmetry was uncovered in specific word types either, the hypothesis that stress is morphosyntactically-defined should also be reconsidered.

There is, however, a dramatic lengthening of syllables in sentence-final position, and this effect is accompanied by a drop in F1 in some low vowels, possibly because speakers have more time to fully realize jaw opening gestures in longer syllables. Altogether, this suggests that the final prominence that has been previously observed in Vietnamese may exclusively be due to phrase-final lengthening.

How can we reconcile these results with those of Nguyễn & Ingram (Reference Ingram and Nguyễn2006, Reference Nguyễn, Thư and Ingram2007a, Reference Nguyễn, Thư and Ingramb), who find a consistent, if weak, second syllable prominence even inside sentences? First of all, in their work on head-final reduplicates and coordinative compounds, duration, formants and f0 are more prominent on the second syllable than on the first, while intensity exhibits the opposite pattern. As these acoustic cues normally correlate in stress systems, this suggests that we are dealing with a form of prominence distinct from stress. As Nguyễn & Ingram (Reference Nguyễn, Thư and Ingram2007a: 1757) carefully put it, ‘it is possible that word stress levels exist only as a phonetic tendency in Vietnamese’. At the same time, they explicitly reject the possibility that the second syllable prominence they find is a pre-boundary effect, on the grounds that second syllables are slightly lengthened, but not their codas (Nguyễn & Ingram Reference Nguyễn, Thư and Ingram2007a, Reference Nguyễn, Thư and Ingramb). In any case, it is clear that the weak second syllable prominence they uncover, whatever it stems from, does not have the same magnitude as the increased duration found in sentence-final syllables in the current study. As an example, the second syllable of coordinative compounds is about 1.2 times longer than the first in Nguyễn & Ingram (Reference Nguyễn, Thư and Ingram2007a), compared to a ratio of 1.7 in this study.

One possible explanation for the weak final prominence found in both of Nguyễn & Ingram's experiments is that the fixed carrier sentences they used led participants to single out the target word and realize it as its own phrase (square brackets in (2) below). From the examples of carrier sentences given in their papers, it is also conceivable that the words following the target word form a phrase of their own (parentheses in (2)). Both of these hypotheses would make the second syllable of the target phrase-final, but not sentence-final.

  1. (2)

Such a parsing would likely cause some phrase-final lengthening (of hồng in (2)), but it would not be as strong as the dramatic lengthening found in sentence-final environments. This lengthening could in turn account for the increased f0 range and more peripheral formants of second syllables in Nguyễn & Ingram's studies, as increased duration would allow a fuller realization of articulatory targets. If this is interpretation were correct, Vietnamese would not have stress, but would exhibit gradient phrase-final strengthening: the right edges of higher prosodic domains would cause more lengthening than the edges of lower domains (Wightman et al. Reference Wightman, Shattuck-Huffnagel, Ostendorf and Price1992, Yoon, Cole & Hasegawa-Johnson Reference Yoon, Cole and Hasegawa-Johnson2007). This obviously requires further investigation.

A last result of this study is that corrective focus seems to be equally marked on the two syllables of disyllabic words. Overall, the strategies used for realizing corrective focus in Vietnamese are similar to those that have been reported in typologically-similar Chinese varieties, where the f0, intensity and duration of the focal syllable are systematically boosted (Xu Reference Xu1999, Xu, Xu & Sun Reference Xu, Xu and Sun2004, Chen, Wang & Xu Reference Chen, Wang and Xu2009). While participants did not clearly use intensity for focus marking, they were fairly consistent in realizing focus with an increased duration and a possibly related lower F1 at the midpoint of vowel nuclei. Moreover, seven out of 10 participants marked focus by boosting f0 in the high tones B1 ( sc ) and A1 (ngang), while the low falling tone A2 (huyền) remained low. The data collected here is based on too few tones to be easily generalizable and is not entirely comparable with previous work on Northern Vietnamese focus (Michaud Reference Michaud2005, Jannedy Reference Jannedy and Botinis2008, Miller et al. Reference Miller, Athanasopoulou, Pincus and Vogel2015), but it suggests, like previous work, that focus-marking strategies may to some extent be speaker-specific in Vietnamese. This would not be too surprising given that the most common strategies for marking Vietnamese focus are syntactic rather than prosodic (Michaud & Brunelle Reference Michaud, Brunelle, Féry and Ishihara2016). It is also worth noting that in the current study, several participants seemed to focalize more than the target word in a non-negligible proportion of their sentences, hinting that prosodic manipulation may not be their most natural focus-marking strategy. In any case, what is crucial here is that in disyllables, focus does not cause one syllable to become more prominent than the other. There is therefore no evidence that a syllable acts as an anchor or an attractor for focal prominence, which could have been interpreted as evidence for covert stress.

Since no evidence in favor of asymmetrical syllable prominence was found, one of the original motivations of this study, finding an East Asian tone language with stress but without syllable reduction, falls flat. Southern Vietnamese appears to be similar to Cantonese and other southern Chinese languages in that it does not have word stress.

4 Conclusion

No evidence for any type of word prominence in Vietnamese was found in this study. When all other factors are kept constant, there is no significant difference in duration, mean intensity, mean f0, formants or f0 range between the two syllables of disyllabic words, irrespective of their morphosyntactic headedness. Furthermore, no syllable gets special prominence when disyllables are under focus. It appears, after all, that the conservative position advocated by Emeneau (Reference Emeneau1951) may have been the correct one: Vietnamese does not seem to have stress, which would support the claim that some languages have no word stress at all (van der Hulst Reference Van Der Hulst2012, Hyman Reference Hyman and van der Hulst2014).

On the other hand, the last syllable of a Vietnamese sentence is dramatically lengthened, which agrees with traditional views of Vietnamese sentential rhythm (Emeneau Reference Emeneau1951, Hoáng & Hoáng Reference Hoáng and Hoáng1975) and reflects a well-documented cross-linguistic tendency. This raises the possibility that the very weak final prominence found in previous studies (Ingram & Nguyễn Reference Ingram and Nguyễn2006; Nguyễn & Ingram Reference Ingram and Nguyễn2006, Reference Nguyễn, Thư and Ingram2007a, Reference Nguyễn, Thư and Ingramb) may be caused by the edges of prosodic constituents intermediate between the word and intonational phrase. More work on the nature of such prosodic constituents is needed.

Results also show that corrective focus is systematically realized by means of increased duration and lower F1, but does not cause significant variation in intensity. Focus is also realized by boosting f0 in most, but not in all speakers, and seems to affect higher tones more than others. This partly confirms the possibility that prosodic focus has variable idiosyncratic realizations in Vietnamese (Michaud Reference Michaud2005, Miller et al. Reference Miller, Athanasopoulou, Pincus and Vogel2015).

Acknowledgements

Many thanks to Đào Mục Đích for his help with experimental design and Nguyễn Thụy Nhã Uyên for assiting with recording sessions. I am also grateful to Meng Yang, Mutsumi Oi and Phạm Thị Thu Hà for assisting with data processing. I would also like to thank Hạ Kiều Phương and Martine Grice for stimulating discussions of preliminary results, James Kirby and Alexis Michaud for comments on earlier drafts, as well as three anonymous reviewers and audiences at NAPhC 2014 and SEALS24. This project was funded by the Social Science and Humanities Research Council of Canada (SSHRC).

Appendix A. Word lists

First word list

Target words

Reduplicates

Randomized list

  • Nhuộm tóc đã lạc hậu rồi. Bây giờ có một phong trào mới hơn là xăm người.

  • uːm B2 tɔ D1 ɗa C laːk D2 hɔw B2 ɾoj A2. ɓi A1 jəːA2 kɔB1 mo D2 fɔŋA1 tʃaːw A2 məːj B1 həːŋA1 la A2 sam A1 ŋɨːj A2]

  • Đi Vũng Tàu chán nên nhà Hùng đi chỗ nào vui vui hơn.

  • i A1 vu C taːw A2 caːŋB1 nen A1 ɲa A2 hu A2 ɗi A1 co C naːw A2 vuj A1 vuj A1 həːŋA1]

  • Ở công viên Tao Đàn có nhiều cây cỏ.

  • [ʔəC ko A1 viːŋA1 taːw A1 ɗaːŋA2 kɔB1 ɲiːw A2 kɛj A1 kɔC]

  • Thời bao cấp nhà Vi nghèo đói lắm.

  • [tʰəːj A2 ɓaːw A1 kəp D1 ɲa A2 vi A1 ŋɛw A2 ɗɔj B1 lam A1]

  • Anh Tuấn nói tiếng Hàn Quốc rành rồi.

  • an A1 tuːŋB1 nɔj B1 tiːŋB1 haːŋA2 wok D1 ɾan A2 ɾoj A2]

  • Va li quần áo của Giang được xếp gòn gọn hơn va li của Chi.

  • [va A1 li A1 wəŋA2 aːw B1 kuəC jaːŋA1 ɗɨːk D2 sep D1 gɔŋA2 gɔŋB2 həːŋA1 va A1 li A1 kuəC ci A1]

  • Sở thú Sài Gòn có hai con rái cá đực.

  • [səC tʰu B1 saːj A2 ɡɔŋA2 kɔB1 haːj A1 kɔŋA1 ɾaːj B1 ka B1 ɗɨk D2]

  • Trái cây rẻ nhất ở Lạng Sơn là mận hậu.

  • [tʃaːj B1 kɛj A1 rɛC ɲək D1 ʔəC laːŋB2 səːŋA1 la A2 məŋB2 hɔw B2]

  • Va li quần áo của Giang được xếp gọn hơn va li của Chi.

  • [va A1 li A1 wəŋA2 aːw B1 kuəC jaːŋA1 ɗɨːk D2 sep D1ɡɔŋB2 həːŋA1 va A1 li A1 kuəC ci A1]

  • Ở khách sạn đó, phòng nào cũng có ban công rộng.

  • [ʔəC xat D1 saːŋB2 ɗɔB1 fɔ A2 naːw A2 ku C kɔB1 ɓaːŋA1 ko A1 ɾo B2]

  • Nhà Phương không có tiền mua áo quần.

  • a A2 fɨːŋA1 xo A1 kɔB1 tiːŋA2 muəA1 ʔaːw B1 wəŋA2]

  • Hiện nay xe máy Hàn Quốc có nhiều phụ kiện.

  • [hiːŋB2 naːj A1 xɛA1 maːj B1 haːŋA2 wok D1 kɔB1 ɲiːw A2 fu B2 kiːŋB2]

  • Má Chi nói nho nhỏ hơn con gái.

  • [ma B1 ci A1 nɔj B1 ɲɔA1 ɲɔC həːŋA1 kɔŋA1 ɡaːj B1]

  • Ở công viên Tao Đàn có nhiều cây cỏ cao.

  • [ʔəC ko A1 viːŋA1 taːw A1 ɗaːŋA2 kɔB1 ɲiːw A2 kɛj A1 kɔC kaːw A1]

  • Anh Tuấn nói tiếng Hàn Quốc rành rẽ rồi.

  • an A1 tuːŋB1 nɔj B1 tiːŋB1 haːŋA2 wok D1 ɾan A2 ɾɛC ɾoj A2]

  • Ở Hà Nội đi chỗ nào cũng phải tìm kiếm.

  • [ʔəC ha A2 noj B2 ɗi A1 co C naːw A2 ku C faːj C tim A2 kiːm B1]

  • Nhuộm tóc đã lạc hậu rồi. Bây giờ có một phong trào mới hơn là xăm người.

  • uːm B2 tɔ D1 ɗa C laːk D2 hɔw B2 ɾoj A2. ɓi A1 jəA2 kɔB1 mo D2 fɔŋA1 tʃaːw A2 məːj B1 həːŋA1 la A2 sam A1 ŋɨːj A2]

  • Nhà Phương không có tiền mua áo quần cũ.

  • a A2 fɨːŋA1 xo A1 kɔB1 tiːŋA2 muəA1 ʔaːw B1 wəŋA2 ku C]

  • Đi Vũng Tàu chán nên nhà Hùng đi chỗ nào vui thích hơn.

  • i A1 vu C taːw A2 caːŋB1 nen A1 ɲa A2 hu A2 ɗi A1 co C naːw A2 vuj A1 tʰɨt D1 həːŋA1]

  • Sở thú Sài Gòn có hai con rái cá.

  • [səC tʰu B1 saːj A2 ɡɔŋA2 kɔB1 haːj A1 kɔŋA1 ɾaːj B1 ka B1]

  • Sau khi về Việt Nam anh Hòa phải làm báo cáo dài.

  • [saːw A1 xi A1 ve A2 viːk D2 naːm A1 an A1 hwa A2 faːj C laːm A2 ɓaːw B1 kaːw B1 jaːj A2]

  • Nhà Phương không có tiền mua áo quần.

  • a A2 fɨːŋA1 xo A1 kɔB1 tiːŋA2 muəA1 ʔaːw B1 wəŋA2]

  • Ở Mỹ không có cải củ ngon.

  • [ʔəC mi C xo A1 kɔB1 kaːj C ku C ŋɔŋA1]

  • Hiện nay xe máy Hàn Quốc có nhiều phụ kiện tốt.

  • [hiːŋB2 naːj A1 xɛA1 maːj B1 haːŋA2 wok D1 kɔB1 ɲiːw A2 fu B2 kiːŋB2 to D1]

  • Nhuộm tóc đã lạc hậu rồi. Bây giờ có một phong trào mơi mới hơn là xăm người.

  • uːm B2 tɔ D1 ɗaːC laːk D2 hɔw B2 ɾoj A2. ɓi A1 jəːA2 kɔB1 mo D2 fɔŋA1 tʃaːw A2 məːj A1 məːj B1 həːŋA1 la A2 sam A1 ŋɨːj A2]

  • Thời bao cấp nhà Vi đói nghèo lắm.

  • [tʰəːj A2 ɓaːw A1 kəp D1 ɲa A2 vi A1 ɗɔj B1 ŋɛw A2 lam A1]

  • Va li quần áo của Giang được xếp gọn ghẽ hơn va li của Chi.

  • [va A1 li A1 wəŋA2 aːw B1 kuəC jaːŋA1 ɗɨːk D2 sep D1 ɡɔŋB2 ɡɛC həːŋA1 va A1 li A1 kuəC ci A1]

  • Nhà ba má Tuấn có nhiều bàn ghế.

  • a A2 ɓa A1 ma B1 tuːŋB1 kɔB1 ɲiːw A2 ɓaːŋA2 ɡe B1]

  • Ổng đi vòng quanh thế giới bằng thuyền buồm cổ.

  • [o A1 ɗi A1 vɔ A2 quan A1 tʰe B1 jəːj B1 ɓaŋA2 tʰɥiːŋA2 ɓuːm A2 ko C]

  • Nhuộm tóc đã lạc hậu rồi, bây giờ có một phong trào mới mẻ hơn là xăm người.

  • uːm B2 tɔ D1 ɗa C laːk D2 hɔw B2 ɾoj A2. ɓi A1 jəA2 kɔB1 mo D2 fɔŋA1 tʃaːw A2 məːj B1 mɛC həːŋA1 la A2 sam A1 ŋɨːj A2]

  • Má Chi nói nhỏ hơn con gái.

  • [ma B1 ci A1 nɔj B1 ɲɔC həːŋA1 kɔŋA1 ɡaːj B1]

  • Trái cây rẻ nhất ở Lạng Sơn là mận hậu chua.

  • [tʃaːj B1 kɛj A1 rɛC ɲək D1 ʔəC laːŋB2 səːn A1 la A2 məŋB2 hɔw B2 cuəA1]

  • Kết quả thi đại học của Hùng mỹ mãn quá.

  • [ket D1 wa C tʰi A1 ɗaːj B2 hɔ D2 kuəC hu A2 mi C maːŋC wa B1]

  • Má Chi nhỏ con hơn con gái.

  • [ma B1 ci A1 ɲɔC kɔŋA1 həːŋA1 kɔŋA1 ɡaːj B1]

  • Nhà ba má Tuấn có nhiều ghế bàn.

  • a A2 ɓa A1 ma B1 tuːŋB1 kɔB1 ɲiːw A2 ɡe B1 ɓaːŋA2]

  • Ở Hà Nội đi chỗ nào cũng phải kiếm tìm.

  • [ʔəC ha A2 noj B2 ɗi A1 co C naːw A2 ku C faːj C kiːm B1 tim A2]

  • Nhuộm tóc đã lạc hậu rồi. Bây giờ có một phong trào mơi mới hơn là xăm người.

  • uːm B2 tɔ D1 ɗa C laːk D2 hɔw B2 ɾoj A2. ɓi A1 jəA2 kɔB1 mo D2 fɔŋA1 tʃaːw A2 məːj A1 məːj B1 həːŋA1 la A2 sam A1 ŋɨːj A2]

  • Ở công viên Tao Đàn có nhiều cỏ cây.

  • [ʔəC ko A1 viːŋA1 taːw A1 ɗaːŋA2 kɔB1 ɲiːw A2 kɔC kɛj A1]

  • Anh Tuấn nói tiếng Hàn Quốc rành rành rồi.

  • an A1 tuːŋB1 nɔj B1 tiːŋB1 haːŋA2 wok D1 ɾan A2 ɾan A2 ɾoj A2]

  • Ở công viên Tao Đàn có nhiều cỏ cây cao.

  • [ʔəC ko A1 viːŋA1 taːw A1 ɗaːŋA2 kɔB1 ɲiːw A2 kɔC kɛj A1 kaːw A1]

  • Ổng đi vòng quanh thế giới bằng thuyền buồm.

  • [o A1 ɗi A1 vɔ A2 quan A1 tʰe B1 jəːj B1 ɓaŋA2 tʰɥiːŋA2 ɓuːm A2]

  • Bắc Kinh có hai sân bay.

  • ak D1 kɨn A1 kɔB1 haːj A1 səŋA1 ɓaːj A1]

  • Thời bao cấp nhà Vi nghèo đói.

  • [tʰəːj A2 ɓaːw A1 kəp D1 ɲa A2 vi A1 ŋɛw A2 ɗɔj B1]

  • Má Chi nói nhỏ nhẻ hơn con gái.

  • [ma B1 ci A1 nɔj B1 ɲɔC ŋɛC həːŋA1 kɔŋA1 ɡaːj B1]

  • Đi Vũng Tàu chán nên nhà Hùng đi chỗ nào vui hơn.

  • i A1 vu C taːw A2 caːŋB1 nen A1 ɲa A2 hu A2 ɗi A1 co C naːw A2 vuj A1 həːŋA1]

  • Ở Cali Việt Kiều nhớ cuộc sống của Sài Gòn cũ.

  • [ʔəC ka A1 li A1 viːk D2 kiːw A2 ɲəB1 kuːk D2 so B1 kuəC saːj A2 ɡɔŋA2 ku C]

  • Ở Hà Nội đi chỗ nào cũng phải tìm kiếm kỹ.

  • [ʔəC ha A2 noj B2 ɗi A1 co C naːw A2 ku C faːj C tim A2 kiːm B1 ki C]

  • Kết quả thi đại học của Hùng mỹ mãn.

  • [ket D1 wa C tʰi A1 ɗaːj B2 hɔ D2 kuəC hu A2 mi C maːŋC]

  • Thời bao cấp nhà Vi đói nghèo.

  • [tʰəːj A2 ɓaːw A1 kəp D1 ɲa A2 vi A1 ɗɔj B1 ŋɛw A2]

  • Sau khi về Việt Nam anh Hòa phải làm báo cáo.

  • [saːw A1 xi A1 ve A2 viːk D2 naːm A1 an A1 hwa A2 faːj C laːm A2 ɓaːw B1 kaːw B1]

  • Ở Hà Nội đi chỗ nào cũng phải kiếm tìm kỹ.

  • [ʔəC ha A2 noj B2 ɗi A1 co C naːw A2 ku C faːj C kiːm B1 tim A2 ki C]

  • Nhà Phương không có tiền mua quần áo cũ.

  • a A2 fɨːŋA1 xo A1 kɔB1 tiːŋA2 muəA1 wəŋA2 ʔaːw B1 ku C]

  • Ở khách sạn đó, phòng nào cũng có ban công.

  • [ʔəC xat D1 saːŋB2 ɗɔB1 fɔ A2 naːw A2 ku C kɔB1 ɓaːŋA1 ko A1]

  • Bắc Kinh có hai sân bay lớn.

  • ak D1 kɨn A1 kɔB1 haːj A1 səŋA1 ɓaːj A1 ləːŋB1]

  • Nhà Phương không có tiền mua quần áo.

  • a A2 fɨːŋA1 xo A1 kɔB1 tiːŋA2 muəA1 wəŋA2 ʔaːw B1]

  • Nhà ba má Tuấn có nhiều bàn ghế đẹp.

  • aːA2 ɓaːA1 maːB1 tuːŋB1 kɔB1 ɲiːw A2 ɓaːŋA2 ɡe B1 ɗɛp D2]

  • Ở Mỹ không có cải củ.

  • [ʔəC mi C xo A1 kɔB1 kaːj C ku C]

  • Đi Vũng Tàu chán nên nhà Hùng đi chỗ nào vui vẻ hơn.

  • i A1 vu C taːw A2 caːŋB1 nen A1 ɲa A2 hu A2 ɗi A1 co C naːw A2 vuj A1 vɛC həːŋA1]

  • Nhuộm tóc đã lạc hậu rồi, bây giờ có một phong trào mới lạ hơn là xăm người.

  • uːm B2 tɔ D1 ɗa C laːk D2 hɔw B2 ɾoj A2. ɓi A1 jəA2 kɔB1 mo D2 fɔŋA1 tʃaːw A2 məːj B1 la B2 həːŋA1 la A2 sam A1 ŋɨːj A2]

  • Va li quần áo của Giang được xếp gọn nhẹ hơn va li của Chi.

  • [va A1 li A1 wəŋA2 aːw B1 kuəC jaːŋA1 ɗɨːk D2 sep D1 ɡɔŋB2 ŋɛB2 həːŋA1 va A1 li A1 kuəC ci A1]

Second word list

Target words

Randomized word list

  • Hiện nay xe máy Hàn Quốc có nhiều phụ kiện tốt.

  • [hiːŋB2 naːj A1 xɛA1 maːj B1 haːŋA2 wok D1 kɔB1 ɲiːw A2 fu B2 kiːŋB2 to D1]

  • Ở Cam-pu-chia, có chính sách mới về hệ thống giao thông tốt.

  • [ʔəC kaːm A1 pu A1 chiəA1 kɔB1 cɨn B1 sat D1 məːj B1 ve A2 he B2 tʰo B1 jaːw A1 tʰoŋA1 to D1]

  • Chính quyền Lào Cai muốn bà con trồng cao su nên phải tuyên truyền vận đông mạnh.

  • [chɨn B1 ɥiːŋA2 laːw A2 kaːj A1 muːŋB1 ɓa A2 kɔŋA1 tʃo A2 kaːw A1 su A1 nen A1 faːj C tɥiːŋA1 tʃɥiːŋA2 vəŋB2 ɗo A1 man B2]

  • Để phân tích sự nghèo đói, Tổ chức Nông lương Thế giới đưa ra một yếu tố mới.

  • e C fəŋA1 tɨt D1 sɨB2 ŋɛw A2 ɗɔj B1 to B1 cɨk D1 no A1 lɨːŋA1 tʰe B1 jəːj B1 ɗɨəA1 ɾa A1 mo D2 ʔiːw B1 to B1 məːj B1]

  • Ở I-rắc, rất khó lập lại hoà bình thật.

  • [ʔəC ʔi A1 ɾak D1 ɾək D1 xɔB1 ləp D2 laːj B2 hwa A2 ɓɨn A2 tʰək D2]

  • Nghiên cứu này dựa vào ba tiền đề lớn.

  • iːŋA1 kɨw B1 naːj A2 jɨəB2 vaːw A2 ɓa A1 tiːŋA2 ɗe A2 ləːŋB1]

  • Kết quả thi đại học của Hùng mỹ mãn quá.

  • [ket D1 wa C tʰi A1 ɗaːj B2 hɔ D2 kuəC hu A2 mi C maːŋC wa B1]

  • Ở khách sạn đó, phòng nào cũng có ban công rộng.

  • [ʔəC xat D1 saːŋB2 ɗɔB1 fɔ A2 naːw A2 ku C kɔB1 ɓaːŋA1 ko A1 ɾo B2]

  • Ở Cali Việt Kiều nhớ cuộc sống của Sài Gòn cũ.

  • [ʔəC ka A1 li A1 viːk D2 kiːw A2 ɲəB1 kuːk D2 so B1 kuəC saːj A2 ɡɔŋA2 ku C]

  • Từ khi thay đổi ranh giới hành chính, Tháp Chàm thuộc về thành phố Phan Rang mới.

  • [tɨA2 xi A1 tʰaːj A1 ɗoj C ɾan A1 jəːj B1 han A2 cɨn B1 tʰaːp D1 caːm A2 tʰuːk D2 ve A2 tʰan A2 fo B1 faːŋA1 raːŋA1 məːj B1]

  • Hồi trước, Bác Hồ thích đi dép cao su, nhưng Bác Giáp thích đi xăng đan da.

  • [hoj A2 tʃɨːk D2 ɓaːk D1 ho A2 tʰɨt D1 ɗi A1 jɛp D1 kaːw A1 su A1 ɲɨŋA1 ɓaːk D1 jaːp D1 tʰɨt D1 ɗi A1 saŋA1 ɗaːŋA1 ja A1]

  • Kinh tế Xinh-ga-po rất phát triển nên có nhiều công ty lớn.

  • [kɨn A1 te B1 sɨn A1 ɡa A1 pɔA1 ɾək D1 faːt D1 tʃiːŋC nen A1 kɔB1 ɲiːw A2 ko A1 ti A1 ləːŋB1]

  • Sau khi về Việt Nam anh Hòa phải làm báo cáo dài.

  • [saːw A1 xi A1 ve A2 viːk D2 naːm A1 an A1 hwa A2 faːj C laːm A2 ɓaːw B1 kaːw B1 jaːj A2]

  • Hiện nay, phụ nữ Nhật rất thích mua mỹ phẩm Pháp.

  • [hiːn B2 naːj Á1 fu B2 nɨC ɲək D2 ɾək D1 tʰɨt D1 muəA1 mi C fəm C faːp D1]

Third word list

Target words

Subset of first and second word lists.

Randomized word list

  • Nhuộm tóc đã lạc hậu rồi. Bây giờ có một phong trào lạ lùng hơn là xăm người.

  • uːm B2 tɔ D1 ɗa C laːk D2 hɔw B2 ɾoj A2. ɓi A1 jəA2 kɔB1 mo D2 fɔŋA1 tʃaːw A2 la B2 lu A2 həːŋA1 la A2 sam A1 ŋɨːj A2]

    • Không phải! Có một phong trào mới mẻ hơn là xăm người.

    • [xo A1 faːj C kɔB1 mo D2 fɔŋA1 tʃaːw A2 məːj B1 mɛC həːŋA1 la A2 sam A1 ŋɨːj A2]

  • Nhà ba má Tuấn có nhiều tủ sách đẹp.

  • a A2 ɓa A1 ma B1 tuːŋA2 kɔB1 ɲiːw A2 tu C sat B1 ɗɛp D2]

    • Không phải! Họ có nhiều bàn ghế đẹp.

    • [xo A1 faːj C hɔB2 kɔB1 ɲiːw A2 ɓaːŋA2 ɡe B1 ɗɛp D2]

  • Bắc Kinh có hai cảng lớn.

  • ak D1 kɨn A1 kɔB1 haːj A1 kaːŋC ləːŋB1]

    • Không phải! Bắc Kinh có hai sân bay lớn.

    • [xo A1 faːj C ɓak D1 kɨn A1 kɔB1 haːj A1 səŋA1 ɓaːj A1 ləːŋB1]

  • Nhà Phương không có tiền mua tủ lạnh cũ.

  • a A2 fɨːŋA1 xo A1 kɔB1 tiːŋA2 muəA1 tu C lan B2 ku C]

    • Không phải! Họ không có tiền mua quần áo cũ.

    • [xo A1 faːj C hɔB2 xo A1 kɔB1 tiːŋA2 muəA1 wəŋA2 ʔaːw B1 ku C]

  • Đi Vũng Tàu chán nên nhà Hùng đi chỗ nào đông đông hơn.

  • i A1 vu C taːw A2 caːŋB1 nen A1 ɲa A2 hu A2 ɗi A1 co C naːw A2 ɗo A1 ɗo A1 həːŋA1]

    • Không phải! Nhà Hùng đi chỗ nào vui vui hơn.

    • [xo A1 faːj C ɲa A2 hu A2 ɗi A1 co C naːw A2 vuj A1 vuj A1 həːŋA1]

  • Ở Cam-pu-chia, có chính sách mới về giáo dục tốt

  • [ʔəC kaːm A1 pu A1 chiəA1 kɔB1 cɨn B1 sat D1 məːj B1 ve A2 jaːw B1 ju D2 to D1]

    • Không phải! Có chính sách mới về hệ thống giao thông tốt.

    • [xo A1 faːj C kɔB1 cɨn B1 sat D1 məːj B1 ve A2 he B2 tʰo B1 jaːw A1 tʰo A1 to D1]

  • Sở thú Sài Gòn có hai con sư tử đực.

  • [səC tʰu B1 saːj A2 ɡɔŋA2 kɔB1 haːj A1 kɔŋA1 sɨA1 tɨC ɗɨk D2]

    • Không phải! Họ có hai con rái cá đực.

    • [xo A1 faːj C hɔB2 kɔB1 haːj A1 kɔŋA1 ɾaːj B1 ka B1 ɗɨk D2]

  • Kinh tế Xinh-ga-po rất phát triển nên có nhiều tòa nhà lớn.

  • [kɨn A1 te B1 sɨn A1 ɡa A1 pɔA1 ɾək D1 faːt D1 tʃiːŋC nen A1 kɔB1 ɲiːw A2 twa A2 ɲa A2 ləːŋB1]

    • Không phải! Xinh-g a-po có nhiều công ty lớn.

    • [xo A1 faːj C sɨn A1 ɡa A1 pɔA1 kɔB1 ɲiːw A2 ko A1 ti A1 ləːŋB1]

  • Đi Vũng Tàu chán nên nhà Hùng đi chỗ nào đông đúc hơn.

  • i A1 vu C taːw A2 caːŋB1 nen A1 ɲa A2 hu A2 ɗi A1 co C naːw A2 ɗo A1 ɗɨk D1 həːŋA1]

    • Không phải! Nhà Hùng đi chỗ nào vui vẻ hơn.

    • [xo A1 faːj C ɲa A2 hu A2 ɗi A1 co C naːw A2 vuj A1 vɛC həːŋA1]

  • Sau khi về Việt Nam anh Hòa phải làm tường trình dài.

  • [saːw A1 xi A1 ve A2 viːk D2 naːm A1 an A1 hwa A2 faːj C laːm A2 tɨːŋA2 tʃɨn A2 jaːj A2]

    • Không phải! Ảnh phải làm báo cáo dài.

    • [xo A1 faːj C an A1 faːj C laːm A2 ɓaːw B1 kaːw B1 jaːj A2]

  • Để phân tích sự nghèo đói, Tổ chức Nông lương Thế giới đưa ra một nguyên tắc mới.

  • e C fəŋA1 tɨt D1 sɨB2 ŋɛw A2 ɗɔj B1 to B1 cɨk D1 no A1 lɨːŋA1 tʰe B1 jəːj B1 ɗɨəA1 ɾa A1 mo D2 ŋɥiːŋA1 tak D1 məːj B1]

    • Không phải! Họ đưa ra một yếu tố mới.

    • [xo A1 faːj C hɔB2 ɗɨəA1 ɾa A1 mo D2 ʔiːw B1 to B1 məːj B1]

  • Nhà Phương không có tiền mua bàn ghế cũ.

  • a A2 fɨːŋA1 xo A1 kɔB1 tiːŋA2 muəA1 ɓaːŋA2 ɡe B1 ku C]

    • Không phải! Họ không có tiền mua áo quần cũ.

    • [xo A1 faːj C hɔB2 xo A1 kɔB1 tiːŋA2 muəA1 ʔaːw B1 wəŋA2 ku C]

  • Ở khách sạn đó, phòng nào cũng có giường rộng.

  • [ʔəC xat D1 saːŋB2 ɗɔB1 fɔ A2 naːw A2 ku C kɔB1 jɨːŋA2 ɾo B2]

    • Không phải! Phòng nào cũng có ban công rộng.

    • [xo A1 faːj C fɔ A2 naːw A2 ku C kɔB1 ɓaːŋA1 ko A1 ɾo B2]

  • Ở Cali Việt Kiều nhớ cuộc sống của Hà Nội cũ.

  • [ʔəC ka A1 li A1 viːk D2 kiːw A2 ɲəB1 kuːk D2 so B1 kuəC ha A2 noj B2 ku C]

    • Không phải! Họ nhớ cuộc sống của Sài Gòn cũ.

    • [xo A1 faːj C hɔB2 ɲəB1 kuːk D2 so B1 kuəC saːj A2 ɡɔŋA2 ku C]

  • Nhuộm tóc đã lạc hậu rồi. Bây giờ có một phong trào kỳ quặc hơn là xăm người.

  • uːm B2 tɔ D1 ɗa C laːk D2 hɔw B2 ɾoj A2. ɓi A1 jəA2 kɔB1 mo D2 fɔŋA1 tʃaːw A2 ki A2 wak D2 həːŋA1 la A2 sam A1 ŋɨːj A2]

    • Không phải! Có một phong trào mơi mới hơn là xăm người.

    • [xo A1 faːj C kɔB1 mo D2 fɔŋA1 tʃaːw A2 məːj A1 məːj B1 həːŋA1 la A2 sam A1 ŋɨːj A2]

  • Nhà ba má Tuấn có nhiều chiếc xe đẹp.

  • a A2 ɓa A1 ma B1 tuːŋA2 kɔB1 ciːk D1 xe A1 ɗɛp D2]

    • Không phải! Họ có nhiều ghế bàn đẹp.

    • [xo A1 faːj C hɔB2 kɔB1 ɲiːw A2 ɡe B1 ɓaːŋA2 ɗɛp D2]

Appendix B. Mixed models for coordinative and reversed coordinative compounds

Table A1 Estimates of fixed effects on syllable duration of Native Coordinative Compounds and Reversed Native Coordinative Compounds (r 2 = 0.893). Reference category: First syllable of Reversed Native Coordinative Compounds in medial position. Bold marks significant fixed effects.

Table A2 Estimates of fixed effects on mean rhyme intensity of Native Coordinative Compounds and Reversed Native Coordinative Compounds (r 2 = 0.699). Reference category: First syllable of Reversed Native Coordinative Compounds in medial position. Bold marks significant fixed effects.

Table A3 Estimates of fixed effects on F1 at midpoint of vowel nuclei of Native Coordinative Compounds and Reversed Native Coordinative Compounds (r 2 = 0.752). Reference category: First syllable of Reversed Native Coordinative Compounds in medial position. Bold marks significant fixed effects.

Table A4 Estimates of fixed effects on F2 at midpoint of vowel nuclei of Native Coordinative Compounds and Reversed Native Coordinative Compounds (r 2 = 0.521). Reference category: First syllable of Reversed Native Coordinative Compounds in medial position.

Table A5 Estimates of fixed effects on mean rhyme f0 of Native Coordinative Compounds and Reversed Native Coordinative Compounds (r 2 = 0.924). Reference category: First syllable of Reversed Native Coordinative Compounds in medial position.

Appendix C. Mixed models for mean syllable f0 including tone as a fixed effect

Table A6 Estimates of fixed effects on mean rhyme f0, all phrase-medial words (r 2 = 0.94). Reference category: First syllable of disyllabic non-reduplicates with tone B1 ( sc ). Bold marks significant fixed effects.

Table A7 Estimates of fixed effects on mean rhyme f0, all phrase-medial words (r 2 = 0.90). Reference category: First syllable of disyllabic non-reduplicates with tone D1 (checked sc ). Bold marks significant fixed effects.

Table A8 Estimates of fixed effects on mean rhyme f0 (r 2 = 0.95). Reference category: First syllable of unfocused disyllabic loanwords with tone B1 ( sc ). Bold marks significant fixed effects.

Footnotes

1 Function words can be reduced (Hoàng & Hoàng 1975), but this is an altogether different issue.

2 In fact, one could argue that this homogeneous dialect zone extends all the way to Khánh Hòa province, in south-central Vietnam.

3 These tone alternations are termed a ‘tone sandhi’ in Nguyễn & Ingram (Reference Nguyễn, Thư and Ingram2007b). While they are the only authors to extend of the term ‘sandhi’ to a set of tone changes that are strictly morphologically conditioned, there is no disagreement on the specifics of the alternations at stake.

4 An anonymous reviewer questions this etymology, which is indeed problematic as it involves irregular sound change. In any case, what is relevant here is that Sài Gòn is a polysyllabic word whose syllables cannot be decomposed into semantically transparent morphemes.

5 Emeneau (Reference Emeneau1951) was working on the dialect of Hà Tĩnh. Stresslessness could in theory be limited to this specific dialect.

6 Note that the apparent difference in f0 in Figure 5 of Nguyễn & Ingram (Reference Nguyễn, Thư and Ingram2007a) is not necessarily meaningful as tone is not controlled for in their word list and not included as a random factor in their statistical analysis.

7 R scripts available upon request.

8 SPSS uses the Satterthwaite estimation to calculate degrees of freedom.

References

Alves, Mark J. 2009. Loanwords in Vietnamese. In Haspelmath, Martin & Tadmor, Uri (eds.), Loanwords in the world's languages: A comparative handbook, 617637. Berlin: Walter de Gruyter.CrossRefGoogle Scholar
Arvaniti, Amalia. 2000. The phonetics of stress in Greek. Journal of Greek Linguistics 1 (1), 939.CrossRefGoogle Scholar
Baart, Joan. 2003. Tonal features in languages of northern Pakistan. In Baart, Joan & Hyder Sindhi, Ghulam (eds.), Pakistani languages and society: Problems and prospects, 132144. Islamabad: National Institute of Pakistan Studies and Summer Institute of Linguistics.Google Scholar
Barr, Dale J., Levy, Roger, Scheepers, Christoph & Tily, Harry J.. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language 68 (3), 255278.Google Scholar
Bauer, Robert S. & Benedict, Paul K.. 1997. Modern Cantonese phonology. Berlin: Mouton de Gruyter.CrossRefGoogle Scholar
Beckman, Mary E. 1986. Stress and non-stress accent. Dordrecht & Riverton: Foris.CrossRefGoogle Scholar
Beckman, Mary E. & Edwards, Jan. 1990. Lengthenings and shortenings and the nature of prosodic constituency. In Kingston, John & Beckman, Mary E. (eds.), Papers in Laboratory Phonology I: Between the grammar and physics of speech, 152178. Cambridge: Cambridge University Press.Google Scholar
Brunelle, Marc. 2009. Tone perception in Northern and Southern Vietnamese. Journal of Phonetics 37, 7996.CrossRefGoogle Scholar
Brunelle, Marc. 2015. Vietnamese (Tiếng Việt). In Jenny, Matthias & Sidwell, Paul (eds.), The handbook of Austroasiatic languages, 909954. Leiden & Boston, MA: Brill.Google Scholar
Brunelle, Marc & , Thị Xuyến. 2013. Why is sound symbolism so common in Vietnamese? In Williams, Jeff (ed.), Grammatical aesthetics in Southeast Asia, 8398. Cambridge: Cambridge University Press.Google Scholar
Byrd, Dani. 1996. A phase window framework for articulatory timing. Phonology 13, 139169.CrossRefGoogle Scholar
Byrd, Dani & Krivokapić, Jelena. 2006. How far, how long: On the temporal scope of prosodic boundary effects. The Journal of the Acoustical Society of America 120 (3), 15891590.Google Scholar
Campbell, Nick & Beckman, Mary E.. 1997. Stress, prominence, and spectral tilt. Intonation: Theory, models and applications, 6770. http://www.isca-speech.org/archive_open/int_97/inta_067.html.Google Scholar
Cao, Xuân Hạo. 2003 [1978].Trọng âm và các quan hệ ngữ pháp trong tiếng Việt. Tiếng Việt: mấy vấn đề ngữ âm, ngữ pháp, ngữ nghĩa [Vietnamese: A few phonetic, grammatical and semantic problems], 161–184. Hà Nội: Nhà xuất bản Giáo dục.Google Scholar
Chen, Szu-Wei, Wang, Bei & Xu, Yi. 2009. Closely related languages, different ways of realizing focus. Proceedings of Interspeech 2009 Brighton, 10071010.CrossRefGoogle Scholar
Chen, Xiaonan Susan. 1993. Relative duration as a perceptual cue to stress in Mandarin. Language and Speech 36 (4), 415433.Google Scholar
Cutler, Ann & Butterfield, Sally. 1990a. Durational cues to word boundaries in clear speech. Speech Communication 9, 485495.Google Scholar
Cutler, Ann & Butterfield, Sally. 1990b. Syllabic lengthening as a word boundary cue. In Seidl, Roland (ed.), Proceedings of the 3rd Australian International Conference on Speech Science and Technology, 324328. Canberra: Australian Speech Science and Technology Association.Google Scholar
De Jong, Kenneth & Zawaydeh, Bushra Adnan. 1999. Stress, duration, and intonation in Arabic word-level prosody. Journal of Phonetics 27 (1), 322.Google Scholar
Duanmu, San. 2000. The phonology of Standard Chinese. New York: Oxford University Press.Google Scholar
Emeneau, Murray. 1951. Studies in Vietnamese (Annamese) grammar. Berkeley, CA: University of California Press.Google Scholar
Fletcher, Janet. 2010. The prosody of speech: Timing and rhythm. In Hardcastle, William J., Laver, John & Gibbon, Fiona E. (eds.), The handbook of phonetic science, 2nd edn., 523602. Oxford: Blackwell.Google Scholar
Fry, Dennis B. 1955. Duration and intensity as physical correlates of linguistic stress. The Journal of the Acoustical Society of America 27 (4), 765768.CrossRefGoogle Scholar
Gelman, Andrew & Hill, Jennifer. 2007. Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press.Google Scholar
Gruber, James. 2011. An acoustic, articulatory, and auditory study of Burmese tone. Ph.D. dissertation, Georgetown University.Google Scholar
Hayes, Bruce. 1995. Metrical stress theory: Principles and case-studies. Chicago, IL & London: University of Chicago Press.Google Scholar
Hoáng, Tuê & Hoáng, Minh. 1975. Remarques sur la structure phonologique du vietnamien. Études vietnamiennes 40, 6798.Google Scholar
Huỳnh, Sabine. 2008. L'assimilation des mots d'emprunts français à la langue vietnamienne: la question des tons. Cahiers de linguistique – Asie Orientale 37 (2), 223240.Google Scholar
Hyman, Larry. 2006. Word-prosodic typology. Phonology 23, 225257.CrossRefGoogle Scholar
Hyman, Larry. 2014. Do all languages have word accent? In van der Hulst, Harry (ed.), Word stress: Theoretical and typological issues, 5682. Cambridge: Cambridge University Peess.Google Scholar
Ingram, John & Nguyễn, Thị Anh Thư. 2006. Stress, tone and word prosody in Vietnamese compounds. In Warren & Watson (eds.), 193–198.Google Scholar
Inkelas, Sharon & Zec, Draga. 1998. Serbo-Croatian pitch accent: The interaction of tone, stress, and intonation. Language 64 (2), 227248.Google Scholar
Jannedy, Stefanie. 2008. The effect of focus on lexical tones in Vietnamese. In Botinis, Antonis (ed.), Proceedings of ISCA Tutorial and Research Workshop On Experimental Linguistics, 113116. Athens: ISCA and the University of Athens.Google Scholar
Kirby, James. 2011. Vietnamese (Hanoi Vietnamese). Journal of the International Phonetic Association 41 (3), 381392.CrossRefGoogle Scholar
Kirby, James & Sonderegger, Morgan. 2016. Model selection and phonological argumentation. Ms., University of Edinburgh & McGill University.Google Scholar
Lahiri, Aditi, Wetterlin, Allison & Jönsson-Steiner, Elisabet. 2005. Lexical specification of tone in North Germanic. Nordic Journal of Linguistics 28 (1), 6196.CrossRefGoogle Scholar
Michaud, Alexis. 2004. Final consonants and glottalization: New perspectives from Hanoi Vietnamese. Phonetica 61, 119146.Google Scholar
Michaud, Alexis. 2005. Prosodie de langues à tons (naxi et vietnamien), prosodie de l'anglais: éclairages croisés (École doctorale 268, Language et Langues). Ph.D. dissertation, Paris 3 – Sorbonne Nouvelle.Google Scholar
Michaud, Alexis & Brunelle, Marc. 2016. Information structure in Asia: Yongning Na (Sino-Tibetan) and Vietnamese (Austroasiatic). In Féry, Caroline & Ishihara, Shinichiro (eds.), The Oxford handbook of information structure, 774789. Oxford: Oxford University Press.Google Scholar
Miller, Taylor, Athanasopoulou, Angeliki, Pincus, Nadya & Vogel, Irene. 2015. The effect of focus on phonation in Northern Vietnamese tones. Presented at the Linguistic Society of America, Portland.Google Scholar
Nara, Kiranpreet. 2015. Punjabi tone and stress (Doabi dialect). MA memoir, University of Ottawa.Google Scholar
Ngô, Thanh Nhàn. 1984. The syllabeme and patterns of word formation in Vietnamese. Ph.D. dissertation, New York University.Google Scholar
Nguyễn, Đình-Hoà. 1997. Vietnamese. Amsterdam & Philadelphia, PA: John Benjamins.Google Scholar
Nguyễn, Thị Thư, Anh. 2010. Rhythmic pattern and corrective focus in Vietnamese polysyllabic words. Mon-Khmer Studies 39, 128.Google Scholar
Nguyễn, Thị Thư, Anh & Ingram, John. 2006. Reduplication and word stress in Vietnamese. In Warren & Watson (eds.), 187–192.Google Scholar
Nguyễn, Thị Thư, Anh & Ingram, John. 2007a. Acoustic and perceptual cues for compound-phrasal contrasts in Vietnamese. The Journal of the Acoustical Society of America 122 (3), 17461757.Google Scholar
Nguyễn, Thị Thư, Anh & Ingram, John. 2007b. Stress and tone sandhi in Vietnamese reduplications. Mon-Khmer Studies 37, 1539.Google Scholar
Nguyễn, Văn Lợi & Edmondson, Jerold. 1997. Tones and voice quality in modern Northern Vietnamese: Instrumental case studies. Mon-Khmer Studies 28, 118.Google Scholar
Norman, Jerry. 1988. Chinese. Cambridge: Cambridge University Press.Google Scholar
Noyer, Rolph. 1998. Vietnamese 'morphology' and the definition of word. University of Pennsylvania Working Papers in Linguistics 5 (2), 6589.Google Scholar
Phạm, Andrea Hoà. 2003. Vietnamese tones: A new analysis. New York: Routledge.Google Scholar
Phạm, Andrea Hoà. 2008. Is there a prosodic word in Vietnamese? Toronto Working Papers in Linguistics 29, 123.Google Scholar
Potisuk, Siripong, Gandour, Jakson & Harper, Mary. 1994. F0 correlates of stress in Thai. Linguistics of the Tibeto-Burman Area 17 (2), 127.Google Scholar
Potisuk, Siripong, Gandour, Jackson & Harper, Mary. 1996. Acoustic correlates of stress in Thai. Phonetica 53, 200220.Google Scholar
Prieto, Pilar & Ortega-Llebaria, Marta. 2006. Stress and accent in Catalan and Spanish: Patterns of duration, vowel quality, overall intensity, and spectral balance. Proceedings of Speech Prosody, 337340.Google Scholar
Remijsen, Bert. 2002. Lexically contrastive stress accent and lexical tone in Ma'ya. In Gussenhoven, Carlos & Warner, Natasha (eds.), Laboratory Phonology 7, 585614. Berlin & New York: Mouton de Gruyer.Google Scholar
Remijsen, Bert & Van Heuven, Vincent J.. 2005. Stress, tone and discourse prominence in the Curaçao dialect of Papiamentu. Phonology 22 (2), 205235.Google Scholar
Riad, Tomas. 1998. Towards a Scandinavian accent typology. Phonology and Morphology of the Germanic Languages 386, 77109.Google Scholar
Schiering, René, Bickel, Balthasar & Hildebrandt, Kristine A.. 2010. The prosodic word is not universal, but emergent. Journal of Linguistics 46 (3), 657709.Google Scholar
Sluijter, Agaath M. C. & Heuven, Vincent J. Van. 1996. Spectral balance as an acoustic correlate of linguistic stress. The Journal of the Acoustical Society of America 100 (4), 24712485.Google Scholar
Sluijter, Agaath M. C., Heuven, Vincent J. Van & Pacilly, Jos J. A.. 1997. Spectral balance as a cue in the perception of linguistic stress. The Journal of the Acoustical Society of America 101 (1), 503513.Google Scholar
Snow, Gregory L. 2009. Power analysis for multi-level models. https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q1/001790.html (last accessed 3 June 2016).Google Scholar
Thomas, David. 1962. On defining the 'word' in Vietnamese. Văn-hóa nguyêt-san XI (5), 519523.Google Scholar
Thompson, Laurence. 1963. The problem of the word in Vietnamese. Word 19, 3952.Google Scholar
Thompson, Laurence. 1965. Vietnamese reference grammar. Seattle, WA: University of Washington Press.Google Scholar
Trần, Hương Mai. 1967. Tones and intonation in South Vietnamese. In Nguyễn, Đăng Liêm, Trần, Hương Mai & Dellinger, David (eds.), Papers in Southeast Asian Linguistics No.1 (Series A – Occasional Papers #9), 1934. Canberra: Linguistics Circle of Canberra.Google Scholar
Trần, Thị Hiền, Thúy & Vallée, Nathalie. 2009. An acoustic study of interword consonant sequences in Vietnamese. Journal of the Southeast Asian Linguistics Society (1), 231249.Google Scholar
Trương, Vĩnh Ký. 1883. Grammaire de la Langue Annamite. Saigon: C. Guillaud et Martinon.Google Scholar
Turk, Alice E. & Shattuck-Hufnagel, Stefanie. 2000. Word-boundary-related duration patterns in English. Journal of Phonetics 28, 397440.Google Scholar
Van Der Hulst, Harry. 2012. Deconstructing stress. Lingua 122 (13), 14941521.Google Scholar
Warren, Paul & Watson, Catherine I. (eds.). 2006. Proceedings of the 11th Australian International Conference on Speech Science & Technology. Auckland: University of Auckland.Google Scholar
Wightman, Colin W., Shattuck-Huffnagel, Stefanie, Ostendorf, Mari & Price, Patti J.. 1992. Segmental durations in the vicinity of prosodic phrase boundaries. The Journal of the Acoustical Society of America 91, 17071717.Google Scholar
Williams, Briony. 1985. Pitch and duration in Welsh stress perception: The implications for intonation. Journal of Phonetics 13, 381406.CrossRefGoogle Scholar
Xu, Yi. 1999. Effects of tone and focus on the formation and alignment of f0 contours. Journal of Phonetics 27 (1), 55105.Google Scholar
Xu, Yi, Xu, Ching X. & Sun, Xuejing. 2004. On the temporal domain of focus. Proceedings of Speech Prosody 2004 Nara, 4.Google Scholar
Yoon, Tae-Jin, Cole, Jennifer & Hasegawa-Johnson, Mark. 2007. On the edge: Acoustic cues to layered prosodic domaines. Proceedings of the International Congress of Phonetic Science, 10171020.Google Scholar
Figure 0

Figure 1 The five tones of Southern Vietnamese in unchecked syllables (mean speaker z-normalized values obtained from all the words pronounced by the 18 speakers recorded for this study).

Figure 1

Table 1 Types of disyllables included in the first word list.

Figure 2

Table 2 Types of disyllables included in the second word list.

Figure 3

Figure 2 Speaker z-normalized duration, mean rhyme intensity and F1 at the midpoint of nuclei for native coordinative and reversed coordinative compounds, in sentence-medial and sentence-final positions and in the first and second position of disyllables. The four compounds are tìm kiếm [timA2kiːmB1] [to search+to find] ~ kiếm tìm ‘to look for’, quần áo [wəŋA2aːwB1] [pants+shirt] ~ áo quần ‘clothes’, cây cỏ [kɛjA1kɔC] [tree+grass] ~ cỏ cây ‘vegetation’ and đói nghèo [ɗɔjB1 ŋɛwA2] [hungry+poor] ~ nghèo đói ‘to live in hardship’. (A fifth pair, bàn ghếaːŋA2 ɡeB1] [table+chair] ~ ghế bàn ‘furniture’, had to be excluded: ghế bàn was not recorded due to an error in the word list.)

Figure 4

Table 3 Estimates of fixed effects on syllable duration in non-reduplicates (r2 = 0.89). Reference category: First syllable of phrase-medial native coordinative compounds. Bold marks significant fixed effects.

Figure 5

Table 4 Estimates of fixed effects on mean rhyme intensity in non-reduplicates (r2 = 0.91). Reference category: First syllable of phrase-medial native coordinative compounds.

Figure 6

Table 5 Estimates of fixed effects on mean rhyme f0 in non-reduplicates (r2 = 0.90). Reference category: First syllable of phrase-medial native coordinative compounds.

Figure 7

Table 6 Estimates of fixed effects on syllable duration in phrase-medial words (r2 = 0.64). Reference category: First syllable of disyllabic non-reduplicates.

Figure 8

Table 7 Estimates of fixed effects on mean rhyme intensity in phrase-medial words (r2 = 0.68). Reference category: First syllable of disyllabic non-reduplicates.

Figure 9

Table 8 Estimates of fixed effects on mean rhyme f0 in phrase-medial words in (r2 = 0.90). Reference category: First syllable of disyllabic non-reduplicates.

Figure 10

Table 9 Estimates of fixed effects on syllable duration (r2 = 0.69). Reference category: First syllable of unfocused disyllabic loanwords. Bold marks significant factors significant fixed effects.

Figure 11

Table 10 Estimates of fixed effects on mean rhyme intensity (r2 = 0.80). Reference category: First syllable of unfocused disyllabic loanwords.

Figure 12

Table 11 Estimates of fixed effects on mean rhyme f0 (r2 = 0.93). Reference category: First syllable of unfocused disyllabic loanwords. Bold marks significant fixed effects.

Figure 13

Figure 3 Mean speaker z-normalized f0 of three tones in focused and unfocused conditions, for two groups of speakers exhibiting different behaviors.

Figure 14

Figure 4 Speaker-normalized f0 contours of the five unchecked tones (headers) in the two syllables of disyllables and in two phrasal position (top: medial, bottom: final). Means of all recorded data.

Figure 15

Figure 5 Estimated Marginal Means for the first and second formants of each vowel found in the corpus, in different conditions. Pairs of vowels surrounded by an ellipsis have statistically different F1 at p < .01. Pairs of vowels framed in a box are statistically different F2 at p < .01. Pairs of vowels surrounded by a dotted ellipsis have statistically different F1 and F2 at p < .01. Other vowel pairs are not statistically different.

Figure 16

Table A1 Estimates of fixed effects on syllable duration of Native Coordinative Compounds and Reversed Native Coordinative Compounds (r2 = 0.893). Reference category: First syllable of Reversed Native Coordinative Compounds in medial position. Bold marks significant fixed effects.

Figure 17

Table A2 Estimates of fixed effects on mean rhyme intensity of Native Coordinative Compounds and Reversed Native Coordinative Compounds (r2 = 0.699). Reference category: First syllable of Reversed Native Coordinative Compounds in medial position. Bold marks significant fixed effects.

Figure 18

Table A3 Estimates of fixed effects on F1 at midpoint of vowel nuclei of Native Coordinative Compounds and Reversed Native Coordinative Compounds (r2 = 0.752). Reference category: First syllable of Reversed Native Coordinative Compounds in medial position. Bold marks significant fixed effects.

Figure 19

Table A4 Estimates of fixed effects on F2 at midpoint of vowel nuclei of Native Coordinative Compounds and Reversed Native Coordinative Compounds (r2 = 0.521). Reference category: First syllable of Reversed Native Coordinative Compounds in medial position.

Figure 20

Table A5 Estimates of fixed effects on mean rhyme f0 of Native Coordinative Compounds and Reversed Native Coordinative Compounds (r2 = 0.924). Reference category: First syllable of Reversed Native Coordinative Compounds in medial position.

Figure 21

Table A6 Estimates of fixed effects on mean rhyme f0, all phrase-medial words (r2 = 0.94). Reference category: First syllable of disyllabic non-reduplicates with tone B1 (sc). Bold marks significant fixed effects.

Figure 22

Table A7 Estimates of fixed effects on mean rhyme f0, all phrase-medial words (r2 = 0.90). Reference category: First syllable of disyllabic non-reduplicates with tone D1 (checked sc). Bold marks significant fixed effects.

Figure 23

Table A8 Estimates of fixed effects on mean rhyme f0 (r2 = 0.95). Reference category: First syllable of unfocused disyllabic loanwords with tone B1 (sc). Bold marks significant fixed effects.