1 Introduction
This chapter discusses the small pronunciation units into which the segments (i.e. consonants and vowels) of words seem to be organized: moras and syllables. Speakers of English and many other languages have an intuitive understanding of what syllables are, but moras are unfamiliar, and researchers disagree about whether it is appropriate to analyze English as having moras. Speakers of Japanese, in contrast, have an intuitive understanding of what moras are, while syllables are unfamiliar. Researchers who work on Japanese disagree about whether there are syllables distinct from moras.
Problems of definition are taken up in the remainder of this section. Section 2 provides a basic explanation of how the notion of mora has ordinarily been applied to Japanese, and Section 3 does the same for the notion of syllable. Section 4 deals with vowels that are adjacent to each other, with no intervening consonant. Such sequences can be problematic in the sense that it is sometimes uncertain whether or not the second vowel is a “special mora” (i.e. dependent on the immediately preceding mora). Section 5 looks at some generalizations that have usually been stated in terms of syllables and argues that they can also be stated in terms of moras. The brief conclusion in Section 6 suggests that there is an interaction between Japanese writing system and native speakers’ intuitions about moras.
1.1 Syllables
Most linguists would agree that syllables seem to be basic units of speech production and perception (Abercrombie Reference Abercrombie1967: 37; Lieberman Reference Lieberman1977: 120–121), but syllables are notoriously difficult to define either articulatorily or auditorily (Laver Reference Laver1994: 113–114; Rogers Reference Rogers2000: 267–268; Zec Reference Zec and de Lacy2007: 161). In prototypical cases, the number of syllables in an utterance matches the number of peaks of sonority (Goldsmith Reference Goldsmith and Goldsmith2011: 194), where sonority is taken to be a scale of intrinsic audibility on which individual segments can be ranked (Laver Reference Laver1994: 503–505; Parker Reference Parker2008). Syllables also seem to be intuitively natural units for ordinary native speakers of most languages, at least in the sense that there is general agreement on how many syllables a word contains, although not necessarily on where the boundary is between one syllable and the next (Clark and Yallop Reference Clark and Yallop1990: 97; Steriade Reference Steriade and Frawley2003: 193; Duanmu Reference Duanmu2009: 1–2).
There is also a consensus that phonotactic constraints apply to syllables (Pike Reference Pike1947: 180–181; Fudge Reference Fudge1969: 254; Zec Reference Zec and de Lacy2007: 162). For the most part, any string of phonotactically admissible syllables in a language is a phonotactically admissible word in that language, although not all phonotactic generalizations are syllable based (Zec Reference Zec and de Lacy2007: 192).
Most researchers today who would identify themselves as phonologists analyze Japanese (i.e. modern Tokyo “standard” Japanese) as having both moras and syllables: one-mora light syllables (short syllables), two-mora heavy syllables (long syllables), and even three-mora superheavy syllables (extra-long syllables), although this last category is marginal. There is, however, no colloquial Japanese word that denotes a Japanese syllable in this sense, and ordinary native speakers know how to count moras but not how to count syllables (Vance Reference Vance2008: 115–116; Labrune Reference Labrune2012b: 116).
1.2 Moras
A mora is understood as a unit of syllable weight (i.e. quantity) in languages that distinguish between light (i.e. short) and heavy (i.e. long) syllables (Trubetzkoy Reference Trubetzkoy and Baltaxe1969 [1939]: 173–181; Davis Reference Davis and Goldsmith2011: 103–108). Japanese moras are intuitively isochronous units of rhythm for native speakers, and scholars familiar with traditional Japanese language research often use the musical term haku ‘beat.’ Thus, in a well-known categorization of languages into rhythmic types (Pike Reference Pike1943: 35; Abercrombie Reference Abercrombie1967: 96–98), Japanese is the textbook example of a mora-timed language, in contrast to syllable-timed languages such as Spanish and stress-timed languages such as English (Ladefoged Reference Ladefoged1982: 226; Rogers Reference Rogers2000: 270–271; Kubozono and Honma Reference Kubozono and Honma2002: 20–21).
Languages do not, however, exhibit a straightforward relationship between syllable weight and duration (Davis: Reference Davis and Goldsmith2011: 132–133), and experimental work on Japanese has not been able to demonstrate even approximate isochrony of moras. Of course, no linguist would maintain that every mora has precisely the same duration, even at a fixed tempo. Phonetic research on mora timing in Japanese has focused on trying to show that compensation effects make average mora durations more nearly equal than would be expected from inherent segment durations. A thorough review of this research concludes that the evidence for such compensation is not persuasive (Warner and Arai Reference Warner and Arai2001), and it has even been suggested that moraic isochrony in Japanese is merely an illusion caused by the relative proportion of vowels and consonants in utterances (Ramus, Nespor, and Mehler Reference Ramus, Nespor and Mehler1999).
The remainder of this chapter assesses Japanese moras and syllables as psychological units. The arguments that researchers offer involve language-internal phenomena and behavior on psycholinguistic tasks. Moraic isochrony is not at issue.
1.3 Subsyllabic Constituents
The standard terminology for the parts of syllables presupposes that the boundaries between syllables can be determined, although it can accommodate ambisyllabicity, that is, a segment that is both the end of one syllable and at the beginning of the next. The nucleus of a syllable is the portion with the highest sonority (Zec Reference Zec and de Lacy2007: 163) and is prototypically a vowel, although many languages allow at least some consonants to function as nuclei (i.e. syllabic consonants). A syllable must have a nucleus, and some languages allow syllables with only a nucleus, as in the monosyllabic English word /ɔ/ awe. The non-nuclear segments in a syllable make up the margins. Consonants in the margin preceding the nucleus are the onset, and those in the margin following the nucleus are the coda. In the monosyllabic English word /plænt/ plant, for example, the nucleus is /æ/, the onset is /pl/, and the coda is /nt/. A syllable with a coda is closed, and a syllable without a coda (e.g. /pli/ plea) is open.
It is often claimed that in English and many other languages, the nucleus and the coda in a closed syllable form a unit and that syllables therefore have the hierarchical structure in Figure 7.1 (Fudge Reference Fudge1969: 268; Blevins Reference Blevins and Goldsmith1995: 212–216; Treiman and Kessler Reference Treiman and Kessler1995). (The symbol σ stands for a syllable.) The unit containing the nucleus and the coda (if there is a coda) is called the rhyme.

Figure 7.1 Hierarchical structure: onset + rhyme
Two main types of evidence are usually cited for the rhyme. First, English shows a strong preference for dividing between the onset and the rhyme in speech errors (Fromkin Reference Fromkin1971: 32; Kubozono Reference Kubozono1989: 266), as in /ren trɛk/ rain trek for intended /tr^en r^ɛk/ train wreck, in intentional blends, as in /brʌnč/ [brʌ̃ntʃ] brunch from /br^ɛkfəst/+/l^ʌnč/ breakfast+lunch, and in language games (Haraguchi Reference Haraguchi and van de Weijer2003: 49), as in Pig-Latin /ɪtsple/ it-splay for /spl^ɪt/ split. (The symbol ^ marks onset-rhyme division points in the source forms.) Not all phonologists regard such phenomena as persuasive evidence for the rhyme as a constituent (Davis Reference Davis1989; Goldsmith Reference Goldsmith and Goldsmith2011: 172).
The other type of evidence involves phonotactics, specifically, the claim that phonotactic restrictions are much more stringent between the nucleus and the coda than between the onset and the nucleus (Fudge Reference Fudge1969: 272–283; Haraguchi Reference Haraguchi and van de Weijer2003: 48; Duanmu Reference Duanmu2009: 40). For example, /a͜u/ is an admissible English nucleus, as in /ka͜unt/ count, and /mp/ is an admissible coda, as in /kæmp/ camp, but this nucleus and this coda cannot co-occur (Haraguchi Reference Haraguchi and van de Weijer2003: 48): */ka͜ump/. (A preceding asterisk indicates that a form is deviant.) Nonetheless, it is not true in every language that an admissible onset followed by an admissible rhyme is always an admissible syllable, and some phonologists therefore find the phonotactic evidence for the rhyme unconvincing (Zec Reference Zec and de Lacy2007: 177).
In languages with weight distinctions, the fact that onsets are never relevant to syllable weight (Hyman Reference Hyman1985: 6; Davis Reference Davis and Goldsmith2011: 117) has been used to motivate structures like the one in Figure 7.2, with the onset attached directly to syllable node and weight represented straightforwardly by the number of mora nodes (Zec Reference Zec and de Lacy2007: 176; Duanmu Reference Duanmu2009: 8). (The symbol µ stands for a mora.) The first mora in a heavy syllable is often called the head mora, and it typically has a higher sonority threshold than the non-head mora (Zec Reference Zec and de Lacy2007: 183), that is, the class of segments that can fill the head mora position is limited to a higher portion of the sonority scale.

Figure 7.2 Hierarchical structure with moraic nodes representing weight
2 Japanese Moras
The psychological reality of moras in Japanese is beyond dispute. Native speakers learn to count moras as small children, and moras are the units of traditional Japanese poetic meters. Furthermore, since Japanese has quite restrictive phonotactics, there is no uncertainty about the boundaries between moras in the sense that every phoneme in a traditional linear transcription is unambiguously a member of one particular mora.
A prototypical Japanese mora consists of a single consonant followed by a short vowel, as in the three moras of /miµzoµre/ mizore ‘sleet’ (using a subscript µ to mark the boundary between one mora and the next).Footnote 1 In some phonemic analyses, including the one adopted here, there are also moras with a two-consonant cluster preceding the vowel. The second consonant can only be /y/, as in the first mora of /kyaµku/ kyaku ‘guest.’Footnote 2 There are also moras consisting entirely of a vowel, as in the first and last moras of /eµgaµo/ egao ‘smiling face.’
Departing even further from the CV prototype are the two moraic consonants. The moraic nasal /N/ has a wide range of phonetic realizations, but its place of articulation and aperture (stop or approximant) are determined by the immediately following segment (Vance Reference Vance2008: 96–105).Footnote 3 The second mora in /teµNµpo/ tenpo ‘tempo’ (phonetically [tẽmːpo]) is a typical example. A moraic nasal can immediately precede a vowel or a semivowel, and so can the non-moraic nasals /m/ and /n/, but there is no ambiguity in the phonetic realizations, as shown in Table 7.1.Footnote 4 The odd-looking transcription [ɰ̃ː] in the left example in Table 7.1 is intended to convey the fact that the moraic nasal in this environment is realized as a long, nasalized approximant, that is, a vowel-like segment that functions as a consonant. The important point is just that /Ni/ is clearly distinguishable from /mi/ and /ni/.
Table 7.1 Phonetic realizations of moraic and non-moraic nasals
| kan’i ‘easy’ | kami ‘paper’ | kani ‘crab’ |
| /kaµNµi/ | /kaµmi/ | /kaµni/ |
| [kɑ̃ɰ̃ːi] | [kɑmʲi] | [kɑɲi] |
When a moraic nasal immediately precedes a non-moraic nasal, the phonetic realization of the two-phoneme sequence is an extra-long nasal with no auditory dividing point between one mora and the next. For example, /saµNµma/ sanma ‘saury’ is phonetically [sɑ̃mːːɑ]. Intuitively, the three moras correspond to [sɑ̃](/sa/), [mː] (/N/), and [mɑ] (/ma/), but [sɑ̃mːmɑ] is misleading because it implies that there is some sort of phonetic boundary between the end of the second mora and the beginning of the third mora. The two length marks in [sɑ̃mːːɑ] are intended to convey the idea that long [mː] realizes the moraic nasal and that the remaining portion of the bilabial nasal (i.e. the second [ː]) realizes the onset of the following mora.
The moraic obstruent /Q/ usually occurs immediately preceding a non-moraic obstruent and assimilates totally to that following obstruent.Footnote 5 For example, /baµQµta/ batta ‘grasshopper’ is realized as [bɑtːːɑ], and /reµQµša/ ressha ‘train’ is realized as [ɾeɕːːɑ]. Here again, there is no auditory dividing point within the extra-long consonant, although the three moras in the former intuitively correspond to [bɑ] (/ba/), [tː] (/Q/), and [tɑ] /ta/. The transcription [bɑtːtɑ] is misleading, however, because it implies that there is some sort of phonetic boundary between the end of the second mora and the beginning of the third mora, and [bɑttɑ] is even worse because it implies both a phonetic boundary and a short [t] as the realization of /Q/. The two length marks in [bɑtːːɑ] are intended to convey the idea that long [tː] realizes the moraic obstruent and that the remaining portion of the voiceless alveolar stop (i.e. the second [ː]) realizes the onset of the following mora. (The transcription [bɑtːɑ], with only one length mark, could be taken as suggesting the deviant phonemic form */baQa/.) When the non-moraic obstruent in such a sequence is an affricate, only the stop portion is elongated, as in /maµQµča/ matcha ‘powdered green tea’ realized as [mɑtːːɕɑ].
There is also an intuitive boundary between moras within a long vowel, but once again no auditory division. For example, using /R/ to represent moraic vowel length, /toµRµka/ tōka ‘ten days’ is realized as [toːkɑ]. The question of whether vowel length should be represented with a length phoneme or with a double vowel (as in /toµoµka/ rather than /toµRµka/) requires consideration of syllables and will therefore be postponed until Section 4.1.
Japanese has no restrictions on hiatus, that is, any two vowels can occur in sequence with no intervening consonant. In a sequence of two short vowels, the second constitutes a V mora, as in the second moras of /šiµo/ shio ‘salt’ and /taµi/ tai ‘sea bream.’ A sequence of two identical short vowels is phonetically distinct from a long vowel. For example, /kiµR/ kī ‘key’ is pronounced [kʲiː], with a long vowel, but /kiµ+i/ ki-i ‘strange’ is pronounced [kʲiˇi] in careful speech, with two short vowels separated by a brief dip in intensity known as vowel rearticulation (Bloch Reference Bloch1950: 105–106; Martin Reference Martin1952: 13), transcribed here as [ˇ].Footnote 6 There is almost always a morpheme boundary (at least arguably) where vowel rearticulation appears, but a word meaning ‘flame’ is an exception. This word is synchronically monomorphemic, but some speakers have vowel rearticulation where there used to be a boundary, that is, some speakers treat it as /hoµnoµo/, realized as [honoˇo], although others treat is as /hoµnoµR/, realized as [honoː].
3 Japanese Syllables
In traditional Japanese language research in Japan, the term onsetsu (the usual translation of syllable) was used to denote the moras described above in Section 2 (McCawley Reference McCawley1968: 131; Kubozono and Honma Reference Kubozono and Honma2002: 18). Influential American Descriptivists followed this tradition and used the English word syllable to denote these same units (Bloch Reference Bloch1950: 90–92; Martin Reference Martin1952: 12; Hockett Reference Hockett1955: 59). These syllables/moras do not correspond exactly to the weight units in moraic analyses like the one in Figure 7.2 because the traditional units incorporate onsets, as in model A in Figure 7.3. Model A does not quite faithfully reflect the now standard version of the traditional analysis, because all the syllables in Figure 7.3 have equal status. In this tradition, the moraic nasal /N/, the moraic obstruent /Q/, the vowel-length phoneme /R/, and (for some researchers) the second vowel in some V1V2 sequences are categorized as “special” moras (tokushu-haku). The crucial characteristic that makes them special is that they are less independent than “ordinary” moras. Although Japanese allows monomoraic words, such as /ka/ ka ‘mosquito’ and /u/ u ‘cormorant,’ there are no words consisting entirely of a single special mora. This systematic gap is unsurprising, since pronouncing special moras in isolation is unnatural (to varying degrees). In the remainder of this chapter, the term mora-qua-syllable analysis will refer to an analysis like model A in Figure 7.3.

Figure 7.3 Model A: Moras as syllables
One way of dealing with the dependent character of special moras is to draw a distinction between phonological and phonetic syllables (Arisaka Reference Arisaka1959 [1940]: 106–107). The idea is that in very slow and careful pronunciation, each mora can be pronounced as a separate syllable, although a special mora combines with the preceding mora into a single phonetic syllable in normal pronunciation. It has been suggested that this same distinction can be extended to handle so-called vowel devoicing, which often results in the complete absence of any vowel-like acoustic interval. For example, /ašita/ ashita ‘tomorrow’ is normally pronounced [ɑɕtɑ], but it can be hyperarticulated as [ɑɕitɑ], so [ɑɕtɑ] can be described as three phonological syllables (/aσšiσta/) but only two phonetic syllables ([ɑɕ.tɑ]). This description has encouraged some researchers to treat CV moras that have undergone vowel devoicing/deletion as an additional type of dependent mora (Terakawa Reference Terakawa1941: 18–21; Labrune Reference Labrune2012b: 135–136). On the other hand, the distinction between phonetic and phonological syllables has been used to handle vowel devoicing without treating special moras as phonological syllables (Hattori Reference Hattori1954: 32).
In any case, the appeal to unnaturally careful, mora-by-mora pronunciation is problematic for at least two reasons. First, because of the close (although not quite perfect) correspondence between moras and the individual letters of kana writing, mora-by-mora pronunciation is arguably just a way of spelling words out loud. Second, there is an important distinction between careful pronunciation and “elaborated” pronunciation (Linell Reference Linell1979: 54–56), although it is not always easy to draw. Careful pronunciation has a special status in phonological analysis because it appears to provide the basis for native-speaker intuitions (Jakobson and Halle Reference Jakobson, Halle and Jakobson1962: 466–467; Lass Reference Lass1984: 294–295). Elaborated pronunciation, on the other hand, introduces a kind of unnaturalness, and pronouncing special moras as separate syllables should probably be discounted as misleading elaboration (Kawakami Reference Kawakami1977: 76).
Labrune (Reference Labrune2012b: 139–140) advocates an analysis very similar to the one in model A in Figure 7.3 (see Section 6 below), but she calls the three units in each example word moras and rejects the idea that moras are the syllables of Japanese. One could certainly argue that treating Japanese special moras as syllables is at odds with the notion that syllables correspond fundamentally to a “sonority cycle” (Clements Reference Clements, Kingston and Beckman1990: 299), that is, the “wave-like recurrence of peaks of sonority” (Goldsmith Reference Goldsmith and Goldsmith2011: 194). If Japanese moras are syllables, all special moras are anomalous syllables, and many are highly anomalous.
Most phonologists who analyze Japanese as having light and heavy syllables adopt something like model B in Figure 7.4 (Kubozono Reference Kubozono1989: 254; Terao Reference Terao2002: 47–48). Model B represents the dependence of special moras directly by grouping each special mora into a syllable with the preceding mora. Syllable structure can be indicated in a linear phonemic transcription by using a subscript σ to mark the boundary between one syllable and the next and a subscript µ to indicate the boundary between two moras within the same syllable, as in /toµQσte/ and /kiσnoµR/ in Figure 7.4. A syllable boundary in this model always coincides with a mora boundary. In the remainder of this chapter, the term mora-plus-syllable analysis will refer to an analysis like model B in Figure 7.4.

Figure 7.4 Model B: Moras as subsyllabic constituents
The moras in model B, like the mora-size units in model A in Figure 7.3, incorporate onsets and thus do not correspond exactly to the weight units in moraic analyses like Figure 7.2. This structure is not usually argued for explicitly, but the rationale presumably is that “the initial C and V display strong solidarity” (Labrune Reference Labrune2012b: 133), that is, there are phonotactic restrictions on onset-nucleus combinations but not on nucleus-coda combinations. To give just a few examples, */wu/ and */yi/ are prohibited, and so is */ye/ for most speakers. Assuming a phonemic analysis that treats [ɸ] as /f/ and [h]/[ç] as /h/ (Vance Reference Vance2008: 80–82), */hu/ is prohibited. In contrast, any permissible ordinary mora can be followed by any permissible special mora (Labrune Reference Labrune2012b: 121).
Whether consonant+glide onsets, as in the /kyo/ of /koµNσkyo/ konkyo ‘evidence’ in Figure 7.4, are grouped into a single onset constituent is a question that has not attracted any attention, and the answer has no important consequences here.Footnote 7
4 Vowel Sequences
4.1 Vowel Length and Sequences of Identical Vowels
As noted in Section 2, there is a phonetic distinction in Japanese, at least in careful pronunciation, between a long vowel and a sequence of two identical short vowels, as the words /ǰiR/ jī ‘G’ ([dʑiː]) and /ǰi+i/ ji-i ‘intention to resign’ ([dʑiˇi]) illustrate. Intuitively, the second mora of /ǰiR/ behaves like a special (i.e. dependent) mora, but the second mora of /ǰi+i/ behaves like an ordinary (i.e. independent) mora, and traditional accounts in Japan treat most or all V moras as ordinary moras. For example, in the commentaries included in successive editions of the pronunciation dictionary issued by Japan’s public broadcasting corporation (NHK), Kindaichi (Reference Kindaichi and Kyōkai1966: 17–18, Reference Kindaichi and Kyōkai1985: 20–21, Reference Kindaichi and Kenkūyjo1998: 105–106) consistently recognizes only three special moras: /N/, /Q/, and /R/.
For Kindaichi (Reference Kindaichi and Kyōkai1966: 10), a special mora is just a mora consisting of one of the three “special phonemes” (tokushu-onso), that is, /N/, /Q/, or /R/. As he himself pointed out many years ago (Kindaichi Reference Kindaichi1950a), the phonetic distinction between a long vowel and two identical short vowels in sequence is a problem for any phonemic analysis that treats long vowels as double short vowels, and he saw this as a decisive argument in favor of the vowel-length phoneme /R/. A syllable advocate could argue that /ǰi+i/ has two light syllables and /ǰiR/ has one heavy syllable, allowing the difference to be represented as /ǰiσi/ versus /ǰiμi/ (Vance Reference Vance2008: 60–61), but this analysis violates a widely accepted principle, namely, that “lexical items do not contrast minimally in their syllabic divisions” (Steriade Reference Steriade and Frawley2003: 190).Footnote 8 Since sequences of two identical short vowels almost always straddle a morpheme boundary (see Section 2), the difference between [dʑiː] and [dʑiˇi] could be attributed to the difference between /ǰii/ and /ǰi+i/, with a syllable boundary corresponding to the morpheme boundary in the latter, but this solution will not work for /honoo/ ‘flame’ pronounced [honoˇo] (see Section 2). In any case, not all phonologists agree that lexical items cannot differ minimally in syllabification (Duanmu Reference Duanmu2009: 59).
4.2 V1V2 Sequences Ending in a High Vowel
V1V2 sequences are problematic when V2 is a high vowel (i.e. /i/ or /u/). Most phonologists who ascribe light and heavy syllables to Japanese recognize a fourth type of special mora in addition to /N/, /Q/, and /R/, namely, “the second half of diphthongal vowel sequences” (Kubozono Reference Kubozono and Tsujimura1999: 32). Most such sequences end in /i/, but some end in /u/. A diphthong can be described as a change in vowel quality within a syllable that is noticeable enough to make it sound as if there is a sequence of two phonetically distinct vowels. Depending on the language, a diphthong like [a͜i] can be a single phoneme (/a͜i/) or a sequence of two phonemes (/ai/ or /ay/), just as a stop+fricative sequence like [ts] can be an affricate (one phoneme) in some languages but a cluster (two phonemes) in other languages (Hyman Reference Hyman1985: 2; Duanmu Reference Duanmu2009: 6–7). Nonetheless, there is a tendency to misinterpret the term diphthong as implying a single-phoneme analysis, and Kubozono’s phrase diphthongal vowel sequence is presumably an attempt to avoid such misunderstanding. There is no question that all Japanese diphthongs should be analyzed phonemically as sequences of two vowels, and the more compact term quasi-diphthong will be used in the remainder of this chapter to denote the Japanese two-vowel sequences in question.Footnote 9
In a mora-plus-syllable analysis, like model B in Figure 7.4, it is often hard to tell whether a V/i/ or V/u/ sequence is a quasi-diphthong because it is hard to tell whether the two vowels are in the same syllable. In an analysis with no distinction between syllables and moras, like model A in Figure 7.3, a quasi-diphthong can be defined as a V/i/ or V/u/ sequence in which the second vowel behaves like a dependent mora, and the corresponding challenge is deciding whether there is such dependence. If all onsetless V moras were dependent, then all V1V2 sequences would be quasi-diphthongs. The remainder of this section is devoted to arguing that there is a distinction between dependent and ordinary V moras, although some instances of /i/ in V/i/ and /u/ in V/u/ are not readily categorizable.
The evidence for quasi-diphthong status involves accent patterns, so a brief introduction to the (modern Tokyo) Japanese pitch-accent system is necessary here (see also Chapter 8, Section 2.1, this volume). Some words are accented and others are unaccented, and the intonation pattern on a phrase is determined in part by whether or not the word(s) it contains are accented and, if so, by the location of the accent(s). An accent is manifested by a steep fall from a relatively high pitch to a relatively low pitch. The smallest intonational units in Japanese are accent phrases, and an accent phrase has at most one accentual fall. Most of the examples cited in this chapter consist of a content word in isolation or followed by a particle, and these are all pronounced as accent phrases. An unaccented accent phrase has no steep pitch fall.
In the remainder of this chapter, the location of an accent is marked in phonemic transcription by a downward-pointing arrow.Footnote 10 In /ka↓makiri/ kamakiri ‘praying mantis,’ for example, there is a relatively high pitch on /ka/ and a relatively low pitch on /ma/, /ki/, and /ri/, and /ka/ is the accented mora (or accented syllable). Virtually all textbooks and pronunciation guides for Japanese use a phonetically imprecise system for describing the pitch patterns on accent phrases that categorizes each mora as either high-pitched (H) or low-pitched (L), and in this system, the pitch pattern on /ka↓makiri/ is HLLL.Footnote 11 In an accent phrase that begins with an ordinary mora and is not initially accented, the first mora is L and the second mora is H, as in /pori+bu↓kuro/ pori-bukuro ‘plastic bag’ (LHHLL). This phrase-initial LH pattern is sometimes called initial lowering.
In principle, a noun consisting of n ordinary moras (or, equivalently, n light syllables) can have any of n+1 accent patterns: it can have an accent on any one of its n moras or it can be unaccented. Special moras, on the other hand, cannot ordinarily bear accent. Consequently, in a mora-plus-syllable analysis like model B in Figure 7.4, the number of possible accent patterns for a noun is one more than the number of syllables, regardless of whether the syllables are heavy or light, and this relationship is the basis for saying that syllables, not moras, are the accent-bearing units in Japanese (McCawley Reference McCawley1968: 59; Kubozono and Honma Reference Kubozono and Honma2002: 37–38), as illustrated in the examples in Table 7.2. The downward-pointing arrow appears between the two moras of an accented heavy syllable because, according to traditional descriptions, the first mora of such a syllable is H and the second mora is L. Phonetically, however, the pitch just falls smoothly from the beginning to the end of an accented heavy syllable.
Table 7.2 Possible accent locations in disyllabic nouns
| Light syllables | Heavy syllables |
|---|---|
| 2 Moras | 4 Moras |
| /ha↓ši/ hashi | /se↓N+niN/ sen-nin ‘thousand people’ |
| /haši↓/ hashi ‘bridge’ | /seN+ni/ ↓N sen-nin ‘hermit’ |
| /haši/ hashi ‘edge’ | /seN+niN/ sen-nin ‘full-time work’ |
It is resistance to bearing accent that has led researchers to treat some V/i/ and V/u/ sequences as quasi-diphthongs. If onsetless high-vowel moras repelled accent consistently (or at least nearly consistently), they could simply be identified as special moras, always dependent on an immediately preceding ordinary mora. In fact, however, there is no such consistency. Onsetless /i/ and /u/ often bear accent, and in a mora-plus-syllable analysis, such instances are treated as separate syllables, as in /hiσroσi↓σzuσmu/ hiroizumu ‘heroism.’ In a mora-qua-syllable analysis like model A in Figure 7.3, the phonemic transcription would be the same, but an onsetless /i/ or /u/ that carries accent is clearly not dependent on a preceding ordinary mora.
If accent falls on the first mora or on neither mora of a V1V2 sequence, the accent pattern usually provides no information about whether V2 is syllabic (i.e. independent). For example, there is no reliable way to tell whether (in a mora-plus-syllable analysis) /ka↓i+ro/ kai-ro ‘circuit’ should be treated as /ka↓μiσro/ or as /ka↓σiσro/ and whether /ni+kai+me/ ni-kai-me ‘second time’ should be treated as /niσkaμiσme/ or as /niσkaσiσme/. Initial lowering can be diagnostic if the second mora of a word is an onsetless V and the first mora does not bear accent, since initial lowering is optional in words that begin with a heavy syllable (see Section 5.1). For example, /dai+gaku/ dai-gaku ‘university’ would be /daσiσgaσku/ if initial lowering is obligatory (LHHH) and /daμiσgaσku/ if initial lowering is optional (LHHH~HHHH).Footnote 12
It has been noted many times that the default location for accent in recent borrowings and foreign names is the syllable containing the antepenultimate mora (or the first syllable if the word is shorter than three moras) (McCawley Reference McCawley1968: 133–134; Kubozono Reference Kubozono and Tsujimura1999: 43; Kubozono and Honma Reference Kubozono and Honma2002: 36–38). In many loanwords, a sequence of the form (C)V/i/ arguably behaves like a heavy syllable with respect to default accent. For example, /taipura↓itaR/ taipuraitā ‘typewriter’ has default accent if it is syllabified as /taμiσpuσraμ↓iσtaμR/, with /i/ treated as a special mora.
Comparable examples involving (C)V/u/ sequences are much harder to find, but verb forms provide some relevant data. When the citation form of a verb or adjective combines with the interrogative particle ka and the particle has a low and non-rising pitch, the combination expresses a kind of acquiescence. For example, Kowareru ka means something like ‘So (it’s) going to break.’ When the verb form is accented, like /koware↓ru/ kowareru ‘to break,’ its accent is retained before ka in this pattern: /koware↓ru ka/. When the verb form is unaccented, an accent appears immediately preceding ka, as in /agaru↓ ka/ (cf. unaccented /agaru/ agaru ‘to rise’) and /susumu↓ ka/ (cf. unaccented /susumu/ susumu ‘to advance’). The examples in Table 7.3 show what happens when ka in this meaning follows the citation form of an unaccented verb ending in (C)/au/ or (C)/ou/ or follows the citation form of an unaccented adjective ending in (C)/ai/, (C)/oi/, or (C)/ui/. In such combinations, the accent falls on the penultimate mora rather than the final mora. This is the expected result if the verb or adjective form ends in a heavy syllable (on the mora-plus-syllable analysis) or, equivalently, in a dependent (i.e. special) mora (on the mora-qua-syllable analysis).
Table 7.3 Accentuation of verb and adjective forms followed by ka
| Citation form | |
| /čigau/ chigau ‘to differ’ | /čiga↓u ka/ |
| /sasou/ sasou ‘to invite’ | /saso↓u ka/ |
| /akai/ akai ‘red’ | /aka↓i ka/ |
| /omoi/ omoi ‘heavy’ | /omo↓i ka/ |
| /usui/ usui ‘thin’ | /usu↓i ka/ |
There are, however, V/i/ and V/u/ sequences that do not behave like quasi-diphthongs. Pronunciation dictionaries (NHK Hōsō Bunka Kenkyūjo 1998; Kindaichi and Akinaga Reference Haruhiko and Kazue2001) give /fukui↓+ši/ for Fukui-shi ‘the city of Fukui,’ with /u/ and /i/ in separate syllables (/fuσkuσi↓σ+ši/), although many speakers seem to prefer /fuku↓i+ši/ (implying (/fuσku↓μiσ+ši/). There are foreign province and state names that show the same kind of variability. When such a name combines with /šuR/ shū ‘province,’ name-final accent appears. Typical examples are /arubaRta↓+šuR/ Arubāta-shū ‘the province of Alberta’ (where /ta/ is a short syllable and an ordinary mora) and /orego↓N+šuR/ Oregon-shū ‘the state of Oregon’ (where /goN/ is a long syllable or a sequence of an ordinary mora followed by a dependent mora). In /hawai↓+šuR/~/hawa↓i+šuR/ Hawai-shū ‘the state of Hawai‘i’ and /irinoi↓+šuR/~/irino↓i+šuR/ Irinoi-shū ‘the state of Illinois,’ however, many speakers prefer the form in which the onsetless /i/ mora carries the accent. In general, there is considerable variability and uncertainty regarding the dependent status of the second vowel in (C)V/i/ sequences and, to a lesser extent, (C)V/u/ sequences (which are relatively uncommon).
When there is good reason to treat a high front vowel as a dependent mora, it is not uncommon to see it transcribed phonemically as /J/ (see, e.g., Kubozono Reference Kubozono and Haraguchi1993: 73; Kawahara Reference Kawahara2016: 170), as in /taJpura↓JtaR/ for taipuraitā ‘typewriter.’ This phonemic distinction between /i/ and /J/ has the same basic motivation as the distinction between ordinary vowel phonemes and /R/. As noted earlier in this section, if long vowels and V1V1 sequences are both analyzed phonemically as two identical vowel phonemes in a row, lexical items can differ minimally in syllabification or, equivalently, in the dependence versus independence of an onsetless V mora. In the same way, if V1V2 sequences ending in a high front are all analyzed phonemically as ending /i/, the door is open to violations of the widely (but not universally) accepted principle that minimal pairs differing only in syllabification are not possible (Section 4.1). For example, although /ha↓ir-u/ hairu ‘to enter’ and /oi↓-ru/ oiru ‘to grow old’ are not a minimal pair, if they are treated as /ha↓μiσru/ and /oσi↓σru/, they both have penultimate accent (the regular accent pattern for an accented citation form of a verb). The problem of contrastive differences in syllabification disappears if these two words are phonemically /ha↓Jru/ and /oi↓ru/, because /J/, like /R/, implies dependence.
It would be preferable to be able to transcribe all onsetless high front vowel moras as /i/ and use other information to determine which instances of /i/ are dependent. If this were possible for all onsetless vowel moras, there would be no need for the phonemes /J/ and /R/, and the claim that lexical items do not differ minimally in syllabification could still be maintained. The same logic applies to the much less common instances of a high back vowel behaving as a dependent mora. If dependence is not predictable, then in addition to /R/ and /J/ a phoneme /W/, distinct from /u/, would presumably be necessary in examples like /dona↓W+gawa/ Donau-gawa ‘Danube River’ for speakers who have this pronunciation. River names ending in /gawa/ are generally accented on the syllable immediately preceding /gawa/ (or, in a mora-qua-syllable analysis, the last ordinary mora preceding /gawa/), as the examples in Table 7.4 illustrate. According to one authoritative accent dictionary (Kindaichi and Akinaga Reference Haruhiko and Kazue2001), Donau-gawa can be either /donau↓+gawa/ or /dona↓u+gawa/ (the same kind of variability exhibited by state names such as /hawai↓+šuR/~/hawa↓i+šuR/ Hawai-shū ‘the state of Hawai‘i’).
Table 7.4 Accentuation of river names
| Isolation form | |
| /te↓muzu/ Thames | /temuzu↓+gawa/ |
| /mišiši↓Qpi/ Mississippi | /mišišiQpi↓+gawa/ |
| /a↓mazoN/ Amazon | /amazo↓N+gawa/ |
| /ma↓reH/ Murray | /mare↓H+gawa/ |
| /do↓nau/ Donau ‘Danube’ | /donau↓+gawa/~/dona↓u+gawa/ |
5 Syllable-based Generalizations
This section reviews four generalizations that have usually been stated in terms of syllables, presupposing a mora-plus-syllable analysis. In each case, it is not difficult to capture the same generalization if one assumes a mora-qua-syllable analysis instead. In fact, there do not appear to be any decisive empirical differences between the two analyses as long as they treat the same subset of moras as special (i.e. dependent). The upshot is that the choice between the two analyses rests on the researcher’s view of language in general, that is, what properties are taken to be universal and what degree of language-specific variability is regarded as plausible.
5.1 Initial Lowering
In an accent phrase that is not initially accented, initial lowering (i.e. the rise from a low pitch on the first mora to a high pitch on the second mora; see Section 4.2) is optional in some cases. In a mora-plus-syllable analysis, the generalization is that initial lowering is optional when the initial syllable is heavy (Hattori Reference Hattori1954: 246; McCawley Reference McCawley and Hyman1977: 262; Kubozono Reference Kubozono2006b: 14). The facts may actually a bit more complicated (Labrune Reference Labrune2012b: 123–124), but in a mora-qua-syllable analysis, the same generalization can be captured by saying that initial lowering is optional when the second mora of a word is a special mora. It is not clear in either analysis what the motivation is for this optional flattening of the LH contour.
5.2 Accent-bearing Units
In a mora-plus-syllable analysis, as explained above in Section 4.2, it is syllables, not moras, that are the accent-bearing units of Japanese. In a mora-qua-syllable analysis, needless to say, moras are the accent-bearing units, but since special moras ordinarily resist carrying accent, there is usually “a leftward shift of the accent when they occupy a prosodic position where an accent would be expected to occur” (Labrune Reference Labrune2012b: 124). Accented special moras do sometimes occur, however, and only /Q/ is indisputably incapable of carrying accent (ibid.: 125, n. 10).
One example of accent on /N/ involves a compounding pattern in which a place name combines with an emphatic /Q/ and /ko/ ‘child’ to form a noun denoting a person from that place. Most compounds that follow this pattern are accented on the penultimate syllable. When the place name is /ro↓NdoN/ Rondon ‘London,’ the compound is /roμNσdoσN↓μQσko/ Rondon-kko ‘born and bred Londoner’ in a mora-plus-syllable analysis, with the accent carried by the first mora of the unusual heavy syllable /NμQ/. The alternative would be /roμNσdo↓μNμQσko/, with the accent carried by the first mora of the superheavy syllable /doμNμQ/, but Kubozono (Reference Kubozono and Tsujimura1999: 50–55) argues that such trimoraic syllables are universally dispreferred and that Japanese usually avoids creating them or repairs them when they arise. In a mora-qua-syllable analysis, the atypical accent on the second /N/ in /roNdoN↓Qko/ could be accounted for on the assumption that a special mora is dependent on the nearest preceding ordinary mora. Treating /N/ as independent and allowing it to bear accent can be understood as a way to avoid having the two special moras in /doNQ/ both dependent on the same ordinary mora /do/.
Labrune (Reference Labrune2012b: 125) claims that /R/ can also bear accent, but the example she cites is not persuasive. Accent dictionaries (NHK Hōsō Bunka Kenkyūjo 1998; Kindaichi and Akinaga Reference Haruhiko and Kazue2001) give two alternative patterns for a loanword meaning ‘chain store’: /če↓RN+teN/~/čeR↓N+teN/ chēn-ten, both represented in a way that implies accent-bearing /R/ in the latter. However, in the audio recording that accompanies this entry in electronic editions of the NHK dictionary, the latter alternative is pronounced /čee↓NteN/ [tɕeˇẽnːtẽɴː], with clear vowel rearticulation, implying the sequence /ee/ rather than the long vowel /eR/. Furthermore, initial lowering (Section 5.1) is obligatory when the latter form appears phrase-initially (*HHLLL), as expected if the first two moras are in separate syllables (/čeσe···/). In contrast, the unequivocal long vowel /eR/ in /čeRsu+gi↓NkoR/ Chēsu-ginkō ‘Chase Bank’ does not require (and typically lacks) initial lowering.
On a mora-plus-syllable analysis, the form of chēn-ten with a second-mora accentual peak can be treated as /čeσe↓µNσ+teµN/, with no trimoraic syllable. On a mora-qua-syllable analysis, this same form can be treated as /čeσe↓σNσteσN/, with no sequence of two special moras both dependent on the same ordinary mora. In neither case is there an accent-bearing special mora. However, if the second mora were /R/ rather than /e/, mora-plus-syllable /čeµR↓µNσteµN/ and mora-qua-syllable /čeσR↓σNσteσN/ would both have two dispreferred characteristics. The former has an accent-bearing special mora and a trimoraic syllable, and the latter has an accent-bearing special mora and two special moras (/R/ and /N/) both dependent on the ordinary mora /če/. It would be hard to explain why a form that violates two constraints would ever arise, given that the alternative (mora-plus-syllable /čeσe↓µNσteµN/ or mora-qua-syllable /čeσe↓σNσteσN/) avoids both violations. In short, it appears that /R/ cannot carry accent.Footnote 13
5.3 Accent Loss Before /no/
Some lexically accented nouns lose their accent immediately preceding the genitive particle no (Martin Reference Martin1975: 23–24). On a mora-plus-syllable analysis, this loss affects only nouns of two or more syllables that are lexically accented on the last syllable, which can be either light or heavy. For example, /aσšiσta↓/ ashita ‘tomorrow’ (with a light final syllable) and /kiσnoμ↓R/ kinō ‘yesterday’ (with a heavy final syllable) retain their lexical accents when followed by nominative ga (/ašita↓ ga/, /kino↓R ga/) but become unaccented when followed by no (/ašita no/, /kinoR no/). There are many exceptions, and accent retention before no may actually be the default (Labrune Reference Labrune2012b: 129–130), but the fact remains that every noun that does undergo this accent loss is lexically accented on its final syllable. Furthermore, the accent loss never affects monosyllabic nouns, regardless of whether the lone syllable is light or heavy. On a mora-qua-syllable analysis, the nouns in question can be characterized as (1) containing more than one ordinary mora and (2) having a lexical accent on the last ordinary mora.
If one is willing to posit underlying/input accents in locations where they never surface, it is possible to claim that a noun ending in a special mora that loses its accent before no has its lexical accent on that last mora (Labrune Reference Labrune2012b: 130). For example, /niho↓N/ Nihon ‘Japan,’ which loses its accent (/nihoN no/), would be LEX/nihoN↓/, whereas /hoRge↓N/ hōgen ‘dialect,’ which does not lose its accent (/hoRge↓N no/), would be LEX/hoRge↓N/. Since a special mora cannot ordinarily bear accent (see Section 4.2), the accent shifts leftward onto the preceding ordinary mora to yield the actual output form /niho↓N/ when not followed by no. No shift is necessary for /hoRge↓N/, since the output matches the input. This approach cannot account for nouns ending in an accented ordinary mora that fail to undergo accent loss. The fact that /cugi↓/ tsugi ‘next,’ for example, retains its accent (/cugi↓ no/) while /yama↓/ yama ‘mountain’ does not (/yama no/) cannot reasonably be attributed to an underlying difference in accent location.
The same sort of analysis is possible for some compounds with a second element (E2) ending in a special mora (Labrune Reference Labrune2012a: 227–231). For example, in descriptions that assume syllables distinct from moras, it is frequently claimed that in a compound with an E2 that is three moras or longer and accented on its final syllable as an independent word, the compound will be accented on the first syllable of E2 (Kubozono Reference Kubozono, Miyagawa and Saito2008: 177). A typical example with a light final syllable is /tera+o↓toko/ tera-otoko ‘temple assistant’ (cf. /otoko↓/ ‘man’), and one with a heavy final syllable is /aka+čo↓RčiN/ aka-chōchin ‘red lantern’ (cf. /čoRči↓N/ ‘lantern’). If the word for ‘lantern’ is underlyingly LEX/čoRčiN↓/, the accent in /aka+čo↓RčiN/ can be attributed to the underlying accent on the final mora of E2, just as in LEX/otoko↓/, making it unnecessary to refer to syllables distinct from moras. As a word on its own, of course, LEX/čoRčiN↓/ surfaces as /čoRči↓N/ in order to avoid an accent on a special mora. On this approach, the accent in compounds like /koRčoR+seNse↓i/ kōchō-sensei ‘school principal’ (cf. /seNse↓i/ ‘teacher’) indicates that the word for ‘teacher’ is underlyingly LEX/seNse↓i/, with the accent on the penultimate mora.
A major problem for an abstract analysis involving underlying accents on word-final special moras is that accent loss before no and E2-initial accent in compounds do not necessarily go together. For example, /niho↓N/ Nihon ‘Japan’ loses its accent before no (as noted above), indicating that the underlying form is LEX/nihoN↓/, but compounds like /niši+niho↓N/ nishi-Nihon ‘western Japan’ indicate that the underlying form is LEX/niho↓N/. On the other hand, /čoRči↓N/ chōchin ‘lantern’ retains its accent before no (/čoRči↓N no/), indicating that the underlying form is LEX/čoRči↓N/, but compounds like /aka+čo↓RčiN/ aka-chōchin ‘red lantern’ (cited above) indicate that the underlying form is LEX/čoRčiN↓/.
To summarize, the necessary (but not sufficient) conditions for accent loss before no can be stated either in terms of a mora-plus-syllable analysis (more than one syllable and accent on the final syllable) or in terms of a mora-qua-syllable analysis (more than one ordinary mora and accent on the last ordinary mora). It does not appear that either analysis offers any advantage in dealing with the irregularity of this accent-loss phenomenon.
5.4 Names ending with /taroR/
The name /ta↓roR/ Tarō is a traditional favorite for first-born sons, and it is also popular as a second element in longer names. Assuming a mora-plus-syllable analysis, these longer names are unaccented if the first element (E1) is monosyllabic, accented on the second syllable if E1 is two light syllables, and accented on the first syllable of /ta↓σroµR/ if E1 is three moras or longer (Kubozono Reference Kubozono and Tsujimura1999: 45–46). This pattern is clearly productive and applies, for example, to /kyuµRσ+taσroµR/ Kyū-tarō (the name of a cartoon character), in which E1 is a single heavy syllable. Additional examples are given in Table 7.5. There are also a few common nouns ending with Tarō, and in these as well, light and heavy monosyllabic E1s are treated the same: /yoσ+taσroµR/ yo-tarō ‘dunce,’ /puµRσ+taσroµR/ pū-tarō ‘day laborer.’
Table 7.5 Names ending with /taroR/
| E1 = (μ)σ | E1 = (μμ)σ | E1 = (μ)σ(μ)σ | E1 ≥ 3μ |
|---|---|---|---|
| /ya+taroR/ Ya-tarō | /kiN+taroR/ Kin-tarō | /momo↓+taroR/ Momo-tarō | /čikara+ta↓roR/ Chikara-tarō |
| /ki+taroR/ Ki-tarō | /šoR+taroR/ Sō-tarō | /kuni↓+taroR/ Kuni-tarō | /kareR+ta↓roR/ Karētarō |
To formulate the generalization in terms of a mora-qua-syllable analysis, one can say that names ending in Tarō are unaccented when E1 is either a single mora or a sequence of an ordinary mora followed by a special mora (Labrune Reference Labrune2012b: 131). The same set of E1s could also be specified as those that contain only one ordinary mora. It is not clear why this set of E1s should be a natural class in a mora-qua-syllable analysis (Kawahara Reference Kawahara2016: 186), but the generalization is not difficult to state.
6 Conclusion
This chapter has not attempted to catalog all the phonological properties of Japanese that researchers have described in terms of moras or in terms of syllables, since comprehensive reviews are readily available (Kubozono Reference Kubozono and Tsujimura1999; Kubozono and Honma Reference Kubozono and Honma2002: 25–96; Vance Reference Vance2008: 117–138). Instead, the focus here has been on the long-running controversy over whether or not Japanese has heavy syllables as well as light syllables, that is, syllables distinct from moras. As noted at the beginning of Section 5, both a mora-plus-syllable analysis and a mora-qua-syllable analysis are capable of capturing the same generalizations, provided that certain moras can be categorized as dependent (i.e. “special”). The crux of the debate is the fact that heavy syllables are not psychologically salient units for ordinary native speakers of Japanese. As Labrune (Reference Labrune2012b: 114) puts it, “the syllable is rather inconspicuous” in Japanese, and this lack of salience is presumably the reason that such speakers do not have a folk category for the units treated as syllables in a mora-plus-syllable analysis (as noted at the beginning of Section 1).Footnote 14
If, however, syllables are basic units of speech production and perception in all languages, as suggested in Section 1.1, then Japanese must have syllables, regardless of whether ordinary native speakers have a name for them. One possibility is that moras are the syllables of Japanese, as in a mora-qua-syllable analysis like model A in Figure 7.3. The drawback to this approach is that it treats Japanese as aberrant from a cross-linguistic point of view, with many anomalous syllables (Section 3). A more appealing possibility is that Japanese has both light and heavy syllables but that some language-particular factor prevents syllables from becoming psychologically salient units. The most likely candidate for a language-particular factor is the Japanese writing system, specifically the mora-based kana subsystems (hiragana and katakana) that children learn first on the path to literacy. It seems highly plausible that learning to read and write kana might cause or at least enhance the strong moraic intuitions of adult speakers (Kubozono Reference Kubozono and Tsujimura1999: 57). There is also experimental evidence that pre-literate children find it natural to treat syllables as units instead of or in addition to moras, and that their behavior becomes more mora-based as they learn kana (Inagaki, Giyoo, and Otake Reference Inagaki, Hatano and Otake2000).
1 Introduction
The pitch accent system of standard Tokyo Japanese is well known in the literature, but it represents only one type of Japanese pitch accent system. In fact, many regional dialects have systems that are strikingly different from the Tokyo system. This chapter aims to describe the diversity of Japanese pitch accent systems with the Tokyo system as a reference and, thereby, to illuminate the nature and range of pitch accent systems in the language.
To achieve this goal, this chapter is organized as follows. Section 2 discusses the number of pitch accent patterns permitted in the system as well as the functions that pitch accent plays therein. The peculiar accent class known as “unaccented words” is also sketched. Section 3 analyzes the phonetic correlate of pitch accent. In Tokyo Japanese and many other dialects, an abrupt pitch fall manifests the word-level phonological prominence, but there are also some dialects in which pitch rise plays the same role.
Section 4 focuses on the domain within which pitch accent patterns are defined. Tokyo Japanese and some other dialects use the word as this domain, while other dialects define pitch accent patterns within a larger domain. A hybrid system involving the two domains is also discussed. This is followed by a discussion of culminativity, the question of whether the system permits only one prominence or more than one prominence within the domain of lexical pitch accent (Section 5). This discussion bears crucially on the roles that the word-level prominence plays in each pitch accent system.
Sections 6 and 7 are concerned with the ways in which the word-level phonological prominence is assigned. Section 6 examines the linguistic unit that is used to determine the position of the prominence as well as the unit that bears the prominence. Tokyo Japanese is known to be a typical mora-counting language, but it relies crucially on the syllable when actually assigning pitch accent.Footnote 1 Moreover, some dialects use the syllable as a counting unit. As McCawley (Reference McCawley, Hinds and Howard1978) demonstrated, the choice between the mora and the syllable is entirely independent of the distinction between counting and accent-bearing units. Japanese dialects vary remarkably in this respect, depending on which unit they use as a counting unit and which unit they actually assign pitch accent to.
Section 7 discusses the direction in which the phonological prominence is determined. Tokyo Japanese typically determines the prominent position from the end of the word, but there are other systems in the language that choose the opposite directionality.
Section 8 expands the scope of our analysis to compound words, or words that consist of two or more words. Tokyo Japanese has a typical right-dominant accent rule whereby compound accent patterns are determined with reference to the phonological structure of the rightmost element. This contrasts with the left-dominant compound accent rule found in many other Japanese dialects. Moreover, some dialects exhibit a hybrid system where both the rightmost and leftmost elements are referred to by the compound accent rule. The final section gives a summary of the chapter as well as some agenda items for future work. Map 8.1 shows the locations where the main dialects that are discussed in this chapter are spoken.
Map 8.1 Main dialects of Japanese
2 Multiple-pattern Versus N-pattern Systems
2.1 Tokyo Japanese
Based on the number of pitch patterns observed or permitted in the system, Uwano (Reference Uwano and Kaji1999, Reference Uwano2012a) proposed classifying Japanese pitch accent systems into two major groups, accented systems and accentless ones, with the former further divided into two groups, multiple-pattern and N-pattern systems. Of these, the accentless group refers to systems that have no fixed pitch pattern for words, namely, those where pitch is not specified at the lexical level. Accented systems, on the other hand, have pitch specifications at the lexical level. Among this latter group, multiple-pattern systems refer to those in which the number of pitch patterns increases as the word becomes longer, whereas the number is fixed to a certain integer in N-pattern systems, typically ranging between one and four, independent of the length of the word.
Of these three types, Tokyo Japanese belongs to the multiple-pattern group since it exhibits two pitch patterns for monosyllabic nouns, three patterns for disyllabic nouns, and four patterns for trisyllabic nouns. This is illustrated in (1–3). In (1–3) and the rest of the chapter, pitch accent in Tokyo Japanese is denoted by an apostrophe, whereas superscript /°/ is added to unaccented words to clearly distinguish them from accented words. Dots indicate syllable boundaries, wherever necessary, and /ga/ is a grammatical particle denoting the nominative case (nom).
(1) Monosyllabic nouns
a.
hi’-ga ‘fire-nom’ b.
hi°-ga ‘sun-nom’
(2) Disyllabic nouns
a.
ha.’na-ga ‘Hana (female name)-nom’ a’me-ga ‘rain-nom’ a.’ki-ga ‘autumn-nom’ b.
ha.na’-ga ‘flower-nom’ a.ki’-ga ‘weariness-nom’ c.
ha.na°-ga ‘nose-nom’ a.me°-ga ‘candy-nom’ a.ki°-ga ‘vacancy-nom’
(3) Trisyllabic nouns
a.
ka’.bu.to-ga ‘helmet-nom’ b.
ko.ko’.ro-ga ‘heart-nom’ c.
o.to.ko’-ga ‘man-nom’ d.
sa.ka.na°-ga ‘fish-nom’
As shown in (1–3), each syllable can bear an accent, yielding n-patterns for n-syllable nouns.Footnote 2 Tokyo Japanese permits one additional pattern for each word length, hence (n+1) patterns for n-syllable words. This additional pattern is called the unaccented pattern, which involves no sudden pitch fall even if the noun is followed by a grammatical particle. This pattern apparently violates the principle of obligatoriness, or the idea that every lexical word must have at least one prominence or prominent position (Hyman Reference Hyman2006), and may differentiate Tokyo Japanese from many pitch accent languages in the world (Kubozono Reference Kubozono2012a). In phonetic terms, it is often difficult to distinguish this accent pattern from the finally accented pattern shown in (2b) and (3c) when words are pronounced in isolation or in utterance-final position (Vance Reference Vance1995), but they can be readily distinguished when followed by a grammatical particle since pitch suddenly falls immediately before the particle in one case but not in the other. Because of the unaccented pattern, moreover, the multiple-pattern system illustrated in (1–3) as a whole displays a contrast in accentedness – presence or absence of a pitch accent – in addition to accent position.
Four points must be noted here. First, while nouns thus exhibit multiple pitch patterns in Tokyo Japanese, verbs and adjectives permit only two patterns, as shown in (4): (a) a pattern with an accent on the penultimate mora and (b) the unaccented pattern.Footnote 3 Since accent position is fixed in accented verbs and adjectives, pitch accent can only be contrastive in terms of accentedness in these types of words.
(4) Verbs and adjectives in Tokyo Japanese
a.
Accented ki’.ru ‘to cut’ su’.ru ‘to print’ mo.to.me’.ru ‘to request’ a.o’i ‘blue, green’Footnote 4 u.ma’i ‘tasty’ b.
Unaccented ki.ru° ‘to wear’ su.ru° ‘to do’ ma.to.me.ru° ‘to sum up’ a.kai° ‘red’ a.mai° ‘sweet’
Second, the multiple pitch accent patterns displayed by nouns are not equally distributed across the vocabulary. On the contrary, there are only two accent patterns that are popular across nouns of different lengths: the antepenultimate pattern with an accent on the third mora from the end of the word, as in (3a), and the unaccented pattern as in (2c) and (3d) (Haraguchi Reference Haraguchi1991; Kubozono Reference Kubozono2006a). For example, these two accent patterns combined account for more than 90% of trimoraic nouns (Kubozono Reference Kubozono2006b). Moreover, the (n+1) rule does not work in relatively long words; for example, there are very few six-mora words, if any, that are initially accented or finally accented. This led Kubozono (Reference Kubozono, Miyagawa and Saito2008) to propose that Tokyo Japanese basically has a two-pattern system for nouns as it does for verbs and adjectives and that nouns differ from verbs/adjectives in the extent to which lexical exceptions are permitted.
Third, since there are only two dominant patterns in nouns in Tokyo Japanese, the contrastive function of pitch accent in this system naturally involves an opposition in accentedness. That is, most pairs of words that contrast in pitch accent show a contrast between the accented and unaccented patterns. This is exemplified in (5).
(5) Accented and unaccented pairs in Tokyo Japanese
a. Accented words
mi.ya.gi’.san ‘Mt. Miyagi’ ko’o.koo ‘filial piety’ ta’n.go ‘Tango (place name), the tango (dance)’ i’.on ‘ion (in chemistry)’ b. Unaccented words
mi.ya.gi.san° ‘produce of Miyagi’ koo.koo° ‘high school’ tan.go° ‘word’ i.on° ‘allophone’
Finally, and related to the third point, the contrastive function of pitch accent in Tokyo Japanese is not as high as might be assumed: it only accounts for 14% of all pairs of words that are segmentally homophonous, while the remaining 86% are pairs of words that are also homophonous accentually (Sibata and Shibata Reference Sibata and Shibata1990). This has two implications. For one thing, it implies that accentuation in Tokyo Japanese is rule-governed to a considerable extent: accent patterns are more or less predictable if the structure of the word is given. Second, it also implies that the primary function of pitch accent in Tokyo Japanese is not a distinctive one but rather a demarcative or culminative one, that is, the function of signaling the unity of words in the sentence or in connected speech. This view is supported by the fact that the default position of pitch accent is the antepenultimate mora irrespective of the length of the word: an abrupt pitch fall signals that it is almost the end of the word.
2.2 N-pattern Systems
While the number of accent patterns increases as the word becomes longer in Tokyo Japanese, there are also many dialects – especially in the southern part of Japan – that have N-pattern systems, or systems where the number of accent patterns is independent of the length of the word. The number may vary from one to four, with the two-pattern system being by far the most common (Uwano Reference Uwano and Kaji1999). One of the most well-known two-pattern systems is that of Kagoshima Japanese, which has a high (H) tone on the penultimate syllable (Type A) or on the final syllable (Type B) for both nouns and verbs/adjectives (Hirayama Reference Hirayama1951; Kibe Reference Kibe2000; Kubozono Reference Kubozono2006b, Reference Kubozono2010). The choice between the two accent patterns is mostly lexically determined in short words, although it becomes more predictable as the word becomes longer or morphologically more complex (Kubozono Reference Kubozono and van Oostendorp2011; see also Section 8 below). In (6) and the rest of the chapter, high-pitched portions are denoted by capital letters.Footnote 5 Parentheses describe an alternative accentual analysis of this system whereby Type A is regarded as accented and Type B as unaccented (Haraguchi Reference Haraguchi1977; Shibatani Reference Shibatani1990; Kubozono Reference Kubozono2012a; see also Section 3 below for more details).Footnote 6
(6) Two accent patterns in Kagoshima Japanese
a. Type A
NA.tsu (na’.tsu) ‘summer’ A.ka (a’.ka) ‘red’ na.tsu.ya.SU.mi (na.tsu.ya.su’.mi) ‘summer holiday’ KI.ru (ki’.ru) ‘to wear’ ma.to.ME.ru (ma.to.me’.ru) ‘to sum up’ b. Type B
ha.RU (ha.ru°) ‘spring’ a.O (a.o°) ‘blue, green’ ha.ru.ya.su.MI (ha.ru.ya.su.mi°) ‘spring holiday’ ki.RU (ki.ru°) ‘to cut’ mo.to.me.RU (mo.to.me.ru°) ‘to request’
Note that this system resembles the two-pattern accent system of verbs and adjectives in Tokyo Japanese illustrated in (4) above, although surface pitch patterns are often different between the two systems.
2.3 Accentless Systems
In addition to the multiple-pattern and N-pattern systems, Uwano (Reference Uwano and Kaji1999) posits an independent type called an accentless system. This is a system where word prosody does not use pitch. In the Kumamoto dialect spoken near Kagoshima, words are usually produced with a more or less flat pitch in utterances (Maekawa Reference Maekawa and Sato1997a). Pitch is used at the sentence level in this system, too, to differentiate interrogative sentences from declarative ones, for example. However, it is not specified at the lexical level, so a single word may be produced with several pitch patterns depending on the syntactic and/or pragmatic context, without changing its lexical meaning.
The accentless systems should not be confused with one-pattern systems that represent a subtype of N-pattern systems. A typical one-pattern system is found in the Miyakonojo dialect, which is also adjacent to Kagoshima (Haraguchi Reference Haraguchi1977; Shibatani Reference Shibatani1990). In this system, all words are pronounced in the same way, usually with a high pitch in phrase-final position so that pitch is not used distinctively in this system just as in accentless systems. However, pitch must nevertheless be specified at the lexical level in this system since words would sound awkward if they were pronounced with different pitch patterns, for example, with an H tone in their initial position. Moreover, the H tone is usually bound by bunsetsu, the basic syntactic phrase consisting of a content word and one or more function words. A typical pitch pattern of a sentence is illustrated in (7).
(7)
bo.ku-WA ni.hon-ka.RA ki.TA I-top Japan-from came ‘I came from Japan.’
3 Distinctive Feature: Pitch Fall Versus Pitch Rise
As mentioned above, the pitch accent system of Tokyo Japanese is sensitive to a pitch fall, processing words with an abrupt pitch fall as accented and those without it as unaccented. Accented words may also contrast with each other in terms of the position of the accent. In this type of system, pitch fall functions as the distinctive phonetic feature of pitch accent. While this feature is shared by many dialects of the language, there are also some dialects that exceptionally display sensitivity to pitch rise rather than pitch fall (Uwano Reference Uwano2012a). These dialects are found mainly in the Tohoku area of northern Japan such as Aomori, Akita, and Iwate Prefectures. One geographical exception to this is the Narada dialect spoken in Yamanashi Prefecture, which is surrounded by Tokyo-type dialects where pitch fall is distinctive (Uwano Reference Uwano2012a).
Narada, which is a highly endangered dialect spoken in a mountainous area, is similar to Tokyo Japanese in permitting (n+1) accent patterns for n-syllable nouns. However, it crucially differs from the standard variety in using pitch rise as a distinctive feature. This can be seen from the comparison of the two systems given in (8): the original data of Narada are cited from Uwano (Reference Uwano2012a: 1427) although in a different notation.
(8) Tokyo versus Narada (surface pitch patterns)
Tokyo Narada KA.bu.to ka.BU.to ‘helmet’ KA.bu.to-ga ka.BU.to-ga ‘helmet-nom’ ko.KO.ro KO.ko.RO ‘heart’ ko.KO.ro-ga KO.ko.RO-ga ‘heart-nom’ o.TO.KO O.to.ko ‘man’ o.TO.KO-ga O.to.ko-GA ‘man-nom’ sa.KA.NA SA.ka.na ‘fish’ sa.KA.NA-GA SA.ka.na-ga ‘fish-nom’
Comparison of the surface pitch patterns in (8) reveals that the two systems are more or less mirror-images of each other. Specifically, the Narada patterns exhibit a pitch rise in the positions where the Tokyo patterns show a pitch fall. Uwano (Reference Uwano2012a) interpreted this as evidence that pitch rise rather than pitch fall is distinctive in Narada. Using /˩/ as an accent mark for pitch rise, he interpreted the data in (8) as in (9). This analysis captures the basic identity between the two systems as well as their crucial difference: they are identical to each other with respect to the position where an abrupt pitch change occurs, but are different in the direction involved in the pitch change. The system of Narada Japanese is thus different from that of Tokyo Japanese in using pitch rise rather than pitch fall as a distinctive feature of pitch accent.Footnote 7
(9) Tokyo versus Narada (phonological analysis)
Tokyo Narada ka’.bu.to ka˩.bu.to ‘helmet’ ka’.bu.to-ga ka˩.bu.to-ga ‘helmet-nom’ ko.ko’.ro ko.ko˩.ro ‘heart’ ko.ko’.ro-ga ko.ko˩.ro-ga ‘heart-nom’ o.to.ko’ o.to.ko˩ ‘man’ o.to.ko’-ga o.to.ko˩-ga ‘man-nom’ sa.ka.na sa.ka.na ‘fish’ sa.ka.na-ga sa.ka.na-ga ‘fish-nom’
We have seen two types of pitch accent systems so far, one sensitive to pitch fall and the other to pitch rise. This classification may seem simple, but it is not as easy or straightforward as it might appear to be. To take one example, the two-pattern system of Kagoshima Japanese described in (6) may be interpreted either way. The traditional analysis assumes that the two patterns contrast with each other in terms of the position of the H tone, that is, penultimate versus final (Hirayama Reference Hirayama1951; Kibe Reference Kibe2000). This analysis implies that it is the position of a pitch rise that is relevant, that is, that pitch rise is the phonetic correlate of pitch accent in this system. However, this is not the only analysis that one could propose for this system.
Looking at the same data, Haraguchi (Reference Haraguchi1977) and Shibatani (Reference Shibatani1990) put forth an entirely different analysis whereby the system is sensitive to pitch fall rather than pitch rise: Type A has a pitch fall, whereas Type B does not. This led Haraguchi (Reference Haraguchi1977) and Shibatani (Reference Shibatani1990) to propose an accentual analysis whereby Type A is labeled as accented and Type B as unaccented (as shown in the parentheses in (6) above), just like the two major accent types in Tokyo Japanese described in (5).
One and the same set of data from a single dialect can thus be analyzed in two different ways, either as evidence for a system where pitch rise is distinctive or as evidence for a system involving pitch fall as a distinctive feature. This poses difficult questions in many cases. However, a more careful analysis might favor one analysis over the other. In the case of Kagoshima Japanese, the two competing analyses can be compared with each other in an objective way by considering a wider range of data. The most important is the fact that this dialect displays a tonal contrast in monosyllabic words as well: a pitch fall occurs within the sole syllable in Type A words, for example, /hi/ ‘sun’ and /ha/ ‘leaf,’ while no abrupt pitch fall occurs in Type B words, for example, /hi/ ‘fire,’ /ha/ ‘tooth.’ These two types of monosyllables are distinguished from each other not in terms of the position of a pitch rise but in terms of the presence or absence of a pitch fall (Kubozono Reference Kubozono and van Oostendorp2011; see Ishihara Reference Ishihara2004 and Kubozono Reference Kubozono, Riad and Gussenhoven2007a for additional arguments for this view).
4 Domain of Pitch Accent Assignment
Our discussion so far has assumed that word accent patterns are defined within the domain of the word across Japanese dialects. This is correct to the extent that pitch accent is a property of a particular word or morpheme in Japanese in general. However, this does not mean that the word is the domain of pitch accent assignment in all dialects. Japanese pitch accent systems actually fall into two groups in this respect, those whose pitch accent is manifested within the domain of the word per se and those that use a larger domain for the same purpose. Tokyo Japanese and other multiple-pattern systems belong to the first group, while N-pattern systems generally belong to the second. This difference can be understood by comparing Tokyo and Kagoshima Japanese.
In Tokyo Japanese, pitch accent does not change its position whether the word is pronounced in isolation, as in (10a), or in combination with the following grammatical particle(s), as in (10b–d). This reveals the nature of pitch accent in the dialect (see Akinaga Reference Akinaga1985 and Poser Reference Poser1984 for some exceptional cases where the grammatical particles affect the pitch accent of the preceding content word).
a.
ko.ko’.ro ‘heart’ b.
ko.ko’.ro-ga ‘heart-nom’ c.
ko.ko’.ro-ka.ra ‘from (the) heart’ d.
ko.ko’.ro-ka.ra-mo ‘from (the) heart, too’
In Kagoshima Japanese, in contrast, the position of the prominence is not fixed on a particular syllable of the word but shifts rightward if the word is followed by one or more particles. This occurs in both accent types, as shown in (11–12).
(11) Type A
a.
sa.KA.na ‘fish’ b.
sa.ka.NA-ga ‘fish-nom’ c.
sa.ka.na-KA.ra ‘from (the) fish’ d.
sa.ka.na-ka.RA-mo ‘from (the) fish, too’
(12) Type B
a.
ko.ko.RO ‘heart’ b.
ko.ko.ro-GA ‘heart-nom’ c.
ko.ko.ro-ka.RA ‘from (the) heart’ d.
ko.ko.ro-ka.ra-MO ‘from (the) heart, too’
What is invariant in (11–12) is that the prominence appears on the penultimate and final syllables, respectively, within the phrasal domain: the H tone apparently moves rightward as the phrase becomes longer. This is the domain generally referred to as bunsetsu (the minimal syntactic phrase consisting of a content word plus one or more grammatical particles). Positing the bunsetsu as the relevant domain where pitch accent patterns are manifested, the two accent patterns can be described in a principled way. This Kagoshima-type property can be found across dialects with an N-pattern accent system (Uwano Reference Uwano2012b).
To account for the difference between the Tokyo-type and the Kagoshima-type systems, Hayata (Reference Hayata1999) proposed to divide Japanese pitch accent into two types: word-accent and word-tone systems. In word-accent systems, a particular syllable or mora is chosen as the prominent position of the word; since the prominent position is marked at the lexical level, it will not change even if the word is placed in a larger domain. In word-tone systems, on the other hand, it is the pattern of prominence, not the prominent position per se, that is lexically specified. This pattern may naturally spread to a larger domain and be subsequently realized within each bunsetsu.
This analysis can explain why the two parameters pertaining to pitch accent – that is, the multiple versus N-pattern distinction and the word versus bunsetsu distinction – are largely correlated with each other (Kubozono Reference Kubozono2012b). In a word-accent system, where the prominent position is lexically marked, the number of loci (moras or syllables) for the prominence may increase as the word becomes longer. Moreover, the lexically marked position will not change whether the word is pronounced in isolation or in a phrase. In a word-tone system, in contrast, a particular prominence pattern rather than a particular position is lexically specified. In such a system, the number of distinctive pitch accent patterns should not increase in principle as the word becomes longer. Moreover, since the prominence is not specified for a certain position, the prominence pattern is permitted to spread beyond the word.
This situation is complicated by the fact that the word and the bunsetsu may both be the domains of pitch accent assignment in one and the same system. The Wan dialect of Amami-Kikaijima Ryukyuan is such a dialect (Uwano Reference Uwano2000).Footnote 8 As illustrated in (13), this dialect has a two-pattern system like Kagoshima Japanese, but unlike Kagoshima, one accent pattern is realized within the word and the other pattern in the bunsetsu domain, both on a moraic basis. Specifically, one accent pattern – Type α in Uwano’s analysis – involves a low (L) tone on the penultimate mora within the bunsetsu domain, while the other pattern – Type β – has an L tone on the antepenultimate mora and an H tone on the penultimate mora in the word domain.
(13) Wan dialect of Amami-Kikaijima
a. Type α
HA.sa.MI ‘scissors HA.SA.mi-GA ‘scissors-nom’ HA.SA.MI-ka.RA ‘from the scissors’ KAN.na.RI ‘thunder’ KAN.NA.ri-GA ‘thunder-nom’ KAN.NA.RI-ka.RA ‘from the thunder’ b. Type βFootnote 9
ha.TA.na (~ha.TA.NA) ‘sword’ ha.TA.NA-ga (~ha.TA.NA-GA) ‘sword-nom’ ha.TA.NA-KA.ra (~ha.TA.NA-KA.RA) ‘from the sword’ MEe.RA.bi (~MEe.RA.BI) ‘young girl’ MEe.RA.BI-ga (~MEe.RA.BI-GA) ‘young girl-nom’ MEe.RA.BI-KA.ra (MEe.RA.BI-KA.RA) ‘from the young girl’
This system is arguably a hybrid system which has combined the two types of systems found in other Amami-Kikaijima dialects, that is, a system where accent patterns are realized in the word domain and a system where they are manifested in the bunsetsu domain (Kubozono Reference Kubozono2015b).
5 Culminativity
In prosodic systems with a lexical accent, whether pitch accent or stress accent, a certain constituent (mora or syllable) is generally marked as the phonological head of the word so that the prominence associated with the head constituent signals the peak or edge of the word. However, Japanese permits two major exceptions to this culminative function of word accent. One is the existence of unaccented words discussed above, which do not have a phonologically prominent position. The other exception is the existence of words that have more than one pitch peak.
Tokyo Japanese permits unaccented words but not words that have more than one phonological peak or pitch accent. Thus, one word may have at most one pitch accent no matter how long it may be, as long as it is realized in one prosodic word.Footnote 10 In phonetic terms, this dialect does not allow pitch to rise again after it has fallen within the word domain: /ka’.ma.ki.ri/ ‘mantis’ and /ka’n.sai/ ‘Kansai,’ for example, show a pitch fall immediately after the first mora, /ka/, but no pitch rise after that. If the pitch should rise again, it would signal the beginning of the next word.
While this feature is shared by many Japanese dialects, including Kagoshima, it is not shared by all of them. In fact, there are several dialects, especially in the southern part of Japan, where one word permits more than one pitch peak. Koshikijima Japanese spoken on a small island in Kagoshima Prefecture is one such dialect. A sister dialect of Kagoshima Japanese, this endangered dialect permits two (and only two) accent types, Type A and Type B, and realizes them within the domain of bunsetsu rather than the word. Unlike its sister dialect, however, it permits two pitch peaks – or two H tones – in three-mora or longer words (Kamimura Reference Kamimura1937, Reference Kamimura1941; Kubozono Reference Kubozono2012c, Reference Kubozono2016). This is illustrated in (14), where the Teuchi dialect of Koshikijima Japanese is compared with Kagoshima and Tokyo Japanese.
(14)
Koshikijima-Teuchi Kagoshima Tokyo Type A A.me A.me a.ME ‘candy’ ba.REe BA.ree BA.ree ‘volleyball’ o.NA.go o.NA.go o.NA.GO ‘woman’ KA.ma.BO.ko ka.ma.BO.ko ka.MA.BO.KO ‘boiled fish paste’ KE.da.MOn ke.DA.mon ke.DA.MON ‘wild animal’ NA.TSU.ya.SU.mi na.tsu.ya.SU.mi na.TSU.YA.su.mi ‘summer holiday’ Type B a.ME a.ME A.me ‘rain’ MI.kaN mi.KAN MI.kan ‘orange’ KO.ko.RO ko.ko.RO ko.KO.ro ‘heart’ A.SA.ga.O a.sa.ga.O a.SA.ga.o ‘morning glory (flower)’ A.NI.saN a.ni.SAN A.ni.san ‘elder brother’ HA.RU.YA.su.MI ha.ru.ya.su.MI ha.RU.YA.su.mi ‘spring holiday’
As these examples show, Koshikijima Japanese has an H tone on the penultimate mora (Type A) and on the final mora (Type B), respectively, but it permits an additional H tone at the beginning of relatively long words. The two H tones are usually separated by one low-toned syllable as in (15).
a. Type A
KE.da.MOn ‘wild animal’ KE.da.MOn-ga ‘wild animal-nom’ KE.DA.mon-KA.ra ‘from the wild animal’ KE.DA.MON-ka.RA-mo ‘from the wild animal, too’ b. Type B
A.NI.saN ‘elder brother’ A.NI.san-GA ‘elder brother-nom’ A.NI.SAN-ka.RA ‘from the elder brother’ A.NI.SAN-KA.ra-MO ‘from the elder brother, too’
These accent patterns can be accounted for if one assumes that the dialect has two melodies – /H1L1H2L2/ (Type A) and /H1L1H2/ (Type B) – and that these melodies are associated with the segmental material from the right edge of the domain. This pitch accent assignment process can be described in a derivational way as in Figure 8.1 (Kubozono Reference Kubozono2012c).

Figure 8.1 Pitch accent assignment in Koshikijima Japanese
One may wonder here if the secondary prominence at the beginning of the word may be a phrasal tone signaling the beginning of the phrase just like the phrase-initial pitch rise in Tokyo Japanese (e.g. /a.ME.RI.KA/ ‘America’). This interpretation seems correct in the old system of Koshikijima that Kamimura (Reference Kamimura1937, Reference Kamimura1941) described eighty years ago, where H1 was linked only to the second mora in both accent classes (Kubozono Reference Kubozono2016). In the present-day system of Koshikijima-Teuchi, however, the same H tone is realized over multiple moras/syllables. Moreover, this tone signals not only the onset of a new bunsetsu, but also the Type A/B distinction in connected speech (Kubozono Reference Kubozono2012c). More specifically, H2 disappears in non-final position of the sentence, while H1 survives as the sole prominence. This is illustrated in (16), where /…/ means that the phrase is followed by another phrase in the same utterance.
(16) H2 deletion in connected speech in Koshikijima-Teuchi Japanese
a. Type A
KE.da.MOn… → KE.da.mon… ‘wild animal…’
KE.DA.mon-KA.ra… → KE.DA.mon-ka.ra… ‘from the wild animal…’
b. Type B
A.NI.saN… → A.NI.san… ‘elder brother…’
A.NI.SAN-ka.RA… → A.NI.SAN-ka.ra… ‘from the elder brother…’
This H tone deletion is a peculiar phenomenon involving the deletion of the lexically dominant H tone (H2) and the subsequent promotion of the secondary H tone (H1) as the dominant tone at the post-lexical level. Since this process applies to both Type A and Type B alike, the lexical tonal contrast comes to be signaled by the domain of H1 in non-final phrases in connected speech. This is shown in Figure 8.2. What this means is that, like H2, H1 is a lexical tone rather than a phrasal one in this system.

Figure 8.2 H2 tone deletion in connected speech
6 Counting and Accent-bearing Units
Japanese dialects also show diversity with respect to the phonological units used in pitch accent assignment. McCawley (Reference McCawley, Hinds and Howard1978) proposed a two-way classification of word accent systems whereby accent-bearing units are distinguished from the unit used to measure phonological distances. According to this classification, Tokyo Japanese is a “mora-counting, syllable language”: the mora is used as the basic unit to measure phonological distances, whereas the syllable actually bears the accent. The famous antepenultimate rule, for example, is formulated as in (17) (McCawley Reference McCawley1968), which accounts for the accent position of most accented nouns including loanwords, as exemplified in (18) (Kubozono Reference Kubozono2006a).
(17) Antepenultimate rule
Nouns are accented on the syllable containing the third mora from the end of word.
a.
ba’.na.na ‘banana’ b.
pa.re’e.do ‘parade’ c.
i’n.do ‘India’ d.
pi’i.man ‘green pepper’ e.
ba.do.mi’n.ton ‘badminton’ f.
e.re.be’e.taa ‘elevator’
The reason for the discrepancy between the syllable and the mora in Tokyo Japanese is that non-head moras of bimoraic syllables – for example, the moraic nasal /n/ as in /ba.do.mi’n.ton/ and the second half of the long vowel as in /e.re.be’e.taa/ in (18) – cannot bear the accent.Footnote 11 When accent falls on such moras by rule, it automatically shifts one mora to the left, that is, onto the head mora of the same syllable. This interaction between the mora and the syllable can be accounted for by generalizing the accent rule with the notion of bimoraic foot (see Poser Reference Poser1990 and Kubozono Reference Kubozono and Tsujimura1999 for more evidence for this constituent). This generalization is given in (19) and exemplified in (20), where foot boundaries (parentheses) are minimally given. In (20a), for example, the accent is placed on /ba/, which is the head mora of the bimoraic foot whose right edge is not aligned with the right edge of the word.
(19) Nouns are accented on the head mora of the rightmost, non-final foot.
a.
(ba’.na).na ‘banana’ b.
pa.(re’e).do ‘parade’ c.
(i’n).do ‘India’ d.
(pi’i).(man) ‘green pepper’ e.
ba.do.(mi’n).(ton) ‘badminton’ f.
e.re.(be’e).(taa) ‘elevator’
Note that this foot-based generalization does not dispense with the notion of syllable since, as the examples in (20) show, foot formation respects syllable boundaries (see Kubozono Reference Kubozono and Tsujimura1999; Ito and Mester Reference Ito, Mester and Kubozono2015; and Kawahara Reference Kawahara2016, among others, for more evidence for the syllable in Tokyo Japanese).
While the syllable and the mora are both indispensable units in the description of lexical pitch accent in Tokyo Japanese, only one of them is needed for the description of some other dialects. Nagasaki Japanese, for example, has a two-pattern pitch accent system like Kagoshima, but unlike this sister dialect, it uses the mora both to measure phonological distances and to bear the accent, which is realized as the prominent H tone in this dialect. Specifically, this system assigns an H tone on the second mora of the word – or the bunsetsu, to be more precise – in one class of word (Type A), while assigning a flat pitch pattern to the entire word/phrase in the other accent class (Type B) (Sakaguchi Reference Sakaguchi2001; Matsuura Reference Matsuura2014).Footnote 12
a.
ba.NA.na ‘banana’ b.
pa.REe.do ‘parade’ c.
iN.do ‘India’ d.
piI.man ‘green pepper’ e.
koN.saa.to ‘concert’
In this system, Type A words have an H tone on their second mora, be it a head mora as in (21a–b) or a non-head mora as in (21c–e). These accent patterns can be formulated by the mora alone, without reference to the syllable or syllable boundaries. Nagasaki Japanese can therefore be labeled as a “mora-counting, mora language.”
While Nagasaki Japanese relies solely on the mora, Kagoshima Japanese uses only the syllable in pitch accent assignment. As already mentioned in (6) above, this system assigns an H tone on the penultimate syllable in Type A, whether this syllable is monomoraic or bimoraic. Likewise, it assigns an H tone on the final syllable in Type B. Alternatively, if a pitch fall rather than the H tone itself is postulated as the distinctive phonetic feature of pitch accent, as assumed by Shibatani (Reference Shibatani1990), this system bears an accent on the second syllable from the end of the word/phrase. In either case, the syllable is used both as a unit to measure phonological distances and as the bearer of the prominence. There is no evidence for the mora in this dialect.Footnote 13 It is highly interesting to ask why the two sister dialects – Nagasaki and Kagoshima – thus use different units for pitch accent assignment.
It is worth referring to a hybrid system here, a system where both the mora and the syllable are used to measure phonological distances (and to bear phonological prominences). The Koshikijima-Teuchi dialect sketched in (14–15) above, for example, bears two H tones in both accent classes and assigns them in different ways – assigning the lexically more dominant H tone (H2) by counting the number of moras from the end of the word/phrase, while spreading the less dominant H tone (H1) to all syllables preceding H2 except the syllable immediately preceding it.
The peculiarity of this hybrid system may be understood by comparing it with the simpler pitch accent system of Amami-Kikaijima Ryukyuan. The Nakasato dialect of Amami-Kikaijima, for example, assigns an HLHL melody to loanwords mora by mora from the end of the word. This system consistently uses the mora both as a counting and an accent/tone-bearing unit, although it is otherwise very similar to Koshikijima-Teuchi Japanese. Example (22) shows the differences between the two systems.
(22) Kikaijima-Nakasato versus Koshikijima-Teuchi
Kikaijima-Nakasato Koshikijima-Teuchi TAN.ba.RIn TAN.ba.RIn ‘tambourine’ TEE.PU.RE.KOo.DAa TEE.PU.RE.koo.DAa ‘tape recorder’ E.SU.Oo.E.su E.SU.oo.E.su ‘SOS’ PII.PIi.E.mu PII.pii.E.mu ‘PPM’
7 Directionality
Directionality is another parameter that can be used to demonstrate the diversity of languages in general (Hyman Reference Hyman1977). This is also true of Japanese pitch accent systems. As the foregoing discussion shows, Tokyo Japanese determines the position of word-level prominence from the end of the word. The antepenultimate accent rule illustrated in (18) and (20), for example, places pitch accent on the third or fourth mora counted from the end of the word. The same directionality is employed in all other accent rules of the dialect, including the compound accent rule (McCawley Reference McCawley1968; Kubozono Reference Kubozono1995a, Reference Kubozono, Agbayani and Tang1997). Word-level prominence is measured in the same way in Kagoshima Japanese, too, where H tones are associated with the penultimate syllables (Type A) or the final syllables (Type B), as illustrated in (6) above. Koshikijima-Teuchi Japanese described in (14–15) also exhibits this directionality.
While this right-to-left procedure is very popularly found in Japanese dialects, there are also some dialects where the position of the word-level prominence is determined from the beginning of the word. Nagasaki Japanese is a typical example. As shown in (21) above, this dialect assigns an H tone on the second mora in Type A words, based on a left-to-right procedure – note that Type B words do not show any evidence for directionality since they exhibit a rather flat pitch pattern (Sakaguchi Reference Sakaguchi2001; Matsuura Reference Matsuura2014). Again, it is very interesting to ask why the directionality of accent assignment differs between Kagoshima and Nagasaki Japanese, two sister dialects both with two-pattern systems.
The situation is further complicated by the existence of a hybrid system where the two directionalities – left to right and right to left – are both involved. The Kokonogi dialect spoken in Fukui Prefecture, for example, has three distinctive pitch patterns based on pitch fall (Nitta Reference Nitta2012), one of which is obviously defined from the left edge of the word and another from the right edge, both on a moraic basis. Thus, a group of words including /hi.da.ri/ ‘left’ and /no.ko.gi.ri/ ‘saw, handsaw’ involve a pitch fall between the second and third moras in each bunsetsu, whereas another group including /ku.ru.ma/ ‘car’ and /ya.ma.za.ku.ra/ ‘wild cherry blossoms’ have a pitch fall between the final two moras in the same domain. These two patterns are illustrated in (23–24), respectively. A third group of words are pronounced with a flat pitch pattern involving an initial pitch rise, for example, /ma.KU.RA/ ‘pillow.’ This third pattern can be interpreted as involving either a left-to-right or right-to-left procedure.
(23) One accent pattern in the Kokonogi dialect
a.
hi.DA.ri ‘left’ hi.DA.ri-ga ‘left-nom’ b.
no.KO.gi.ri ‘saw’ no.KO.gi.ri-ga ‘saw-nom’
(24) Another accent pattern in the Kokonogi dialect
a.
ku.RU.ma ‘car’ ku.RU.MA-ga ‘car-nom’ b.
ya.MA.ZA.KU.ra ‘wild cherry blossoms’ ya.MA.ZA.KU.RA-ga ‘wild cherry blossoms-nom’
The same kind of hybrid system can be found in the Yuwan dialect of Amami Ryukyuan, spoken in the south of Kagoshima Prefecture.Footnote 14 With three distinctive pitch patterns, this system exhibits a pitch fall immediately after the syllable containing the second mora counted from the beginning of the bunsetsu in one accent pattern, and immediately after the syllable containing the penultimate mora in another pattern (Niinaga and Ogawa Reference Niinaga and Ogawa2011). Although a little more complicated than the Kokonogi system in (23–24), this dialect, too, has a hybrid system involving both the left-to-right and right-to-left procedures.
This situation is even further complicated by the Kuwanoura dialect in Koshikijima Japanese, which exhibits a hybrid situation within one and the same word (Kubozono Reference Kubozono2016). This dialect has a two-pattern system just like all its sister dialects of Koshikijima Japanese as well as Kagoshima and Nagasaki Japanese. It also permits two H tones within relatively long words just like most other dialects of Koshikijima Japanese. However, it differs from all its sister dialects in assigning the first H tone (H1) from the beginning and the second H tone (H2) from the end of the bunsetsu. Thus, H1 is usually associated with the first two moras from the beginning in both Type A and Type B words,Footnote 15 while H2 is linked basically to the penultimate and final moras counted from the end in Type A and Type B, respectively. Unlike the Koshikijima-Teuchi dialect sketched in (14–15) above, this dialect determines the positions of the two H tones independently and allows them to clash with each other as in /KA.ZAI.MO.no/ in (25).Footnote 16 The pitch patterns of the Koshikijima-Teuchi dialect are given for comparison.
(25)
Koshikijima-Kuwanoura Koshikijima-Teuchi Type A KA.ZA.ri.MO.no KA.ZA.ri.MO.no ‘decoration’ KA.ZA.ri.mo.NO-ga KA.ZA.RI.mo.NO-ga ‘decoration-nom’ KA.ZAI.MO.no KA.zai.MO.no ‘decoration’ (colloquial) KA.ZAI.mo.NO-ga KA.ZAI.mo.NO-ga ‘decoration-nom’ Type B I.NA.bi.ka.RI I.NA.BI.ka.RI ‘lightning’ I.NA.bi.ka.ri-GA I.NA.BI.KA.ri-GA ‘lightning-nom’ I.NA.bi.kaI I.NA.BI.kaI ‘lightning’ (colloquial) I.NA.bi.kai-GA I.NA.BI.kai-GA ‘lightning-nom’
8 Compound Accent
Japanese dialects fall into two groups according to the ways accent patterns of compound words are determined. One group, which is represented by Tokyo Japanese, looks at the rightmost element of the compound and preserves the accent of this element as much as possible. The other group is represented by Kagoshima Japanese and refers to the accent pattern of the leftmost element in determining the accent pattern of the compound (Uwano Reference Uwano and Kunihiro1997).
8.1 Right-dominant Compound Rule
The right-dominant nature of compound accentuation in Tokyo Japanese is illustrated in (26–28). Whether the rightmost element is ‘short’ (monomoraic or bimoraic) or ‘long’ (trimoraic or longer) (McCawley Reference McCawley1968), the basic principle underlying this rule is to preserve the accent of the rightmost element if it is lexically accented, unless it violates the non-finality constraint prohibiting the preservation of any accent on the word-final syllable (Kubozono Reference Kubozono1995a, Reference Kubozono, Agbayani and Tang1997; cf. Poser Reference Poser1990). Thus, the lexical accent of the rightmost element is readily preserved if it does not violate this constraint as in (26). If this accent cannot be preserved due to the non-finality constraint or if the rightmost element has no accent to preserve, a default compound accent emerges on the rightmost, non-final bimoraic foot of the compound.Footnote 17 This is exemplified in (27a, b), respectively, where foot structure is minimally shown.
(26) o.na.ga° + sa’.ru → o.na.ga-(za’.ru) ‘long tail + monkey; long-tailed monkey’
koo.so.ku° + ba’.su → koo.so.ku-(ba’.su) ‘highway + bus; highway bus’
ku’.ro + ka’.ra.su → ku.ro-(ga’.ra).su ‘black + crow; black crow’
ryuu.kyu’u + a.sa’.ga.o → ryuu.kyuu-a.(sa’.ga).o ‘Ryukyu + morning glory (flower); Ryukyuan morning glory’
a. a’.ki.ta + i.nu’ → a.ki.ta’)-(i.nu) ‘Akita + dog; Akita Dog’
me’.ron + pa’n → me.(ro’n)-(pan) ‘melon + bread; melon flavored bread’
te’ + ka.ga.mi’ → te-(ka’.ga).mi ‘hand + mirror; hand mirror’
b. ku.gu.ri’ + to° → ku.gu.ri’)-do ‘to go through + door; side door’
o.na.ga° + to.ri° → o.na.ga’)-do.ri ‘long tail + bird; long tailed cock’
ku’.ro + ga.ra.su° → ku.ro-(ga’ra).su ‘black + glass; black glass’
mi.na.mi° + a.me.ri.ka° → mi.na.mi-(a’.me).(ri.ka)~mi.na.mi a.(me’.ri).ka ‘south + America; South America’
A major exception to the generalization illustrated in (26–27) is unaccented compounds, most of which occur due to the deaccenting nature of so-called deaccenting morphemes (McCawley Reference McCawley1968; see Giriko Reference Giriko2009 and Kubozono Reference Kubozono2017 for other factors triggering unaccented compounds in Tokyo Japanese). Thus, native morphemes such as /i.ro’/ ‘color’ and /ka.ta’/ ‘type’ as well as Sino-Japanese morphemes including /to’o/ ‘political party’ and /se’n/ ‘line’ have an effect of deaccenting the entire compounds of which they form the final member.Footnote 18
a. pi’n.ku + i.ro’ → pin.ku-i.ro° ‘pink + color; pink’
ne.zu.mi° + i.ro’ → ne.zu.mi-i.ro° ‘rat + color; gray’
b. e’e + ka.ta’ → ee-ga.ta° ‘A + type; blood type A’
ha’.mu.ret.to + ka.ta’ → ha.mu.ret.to-ga.ta° ‘Hamlet + type; the type of Hamlet’
c. kyoo.san° + to’o → kyoo.san-too° ‘common wealth + political party; Communist Party’
roo.doo° + to’o → roo.doo-too° ‘labor + political party; Labor Party’
d. too.ka’i.doo + se’n → too.kai.doo-sen° ‘Tokaido + line; Tokaido Line’
yo.ko.su.ka° + se’n → yo.ko.su.ka-sen° ‘Yokosuka + line; Yokosuka Line’
Compound accentuation in Tokyo Japanese is thus determined by the rightmost member of compounds, including the deaccenting morphemes in (28). One potential exception to this general rule is the accentuation of “dvandva” compounds, or compounds involving a coordinate structure. This type of compound tends to preserve the lexical accent of the initial member and lose the accent (if any) of the second member if they consist of two short members as in (29a) (Akinaga Reference Akinaga1985). If they consist of relatively long members, on the other hand, they tend to split into two accentual units or prosodic words instead of being fused into one unit (Kubozono Reference Kubozono1988, Reference Kubozono2017). This latter pattern is illustrated in (29b), where {} denotes prosodic word boundaries.
a. a’.sa + ban° → {a’.sa-ban} ‘morning + evening; morning and evening’
a’.me + ka.ze° → {a’.me-ka.ze} ‘rain + wind; rain and wind’
ta’ + ha’.ta → {ta’-ha.ta} ‘rice field + farm; the fields’
b. che’.ko + su.ro.ba’.ki.a → {che.’ko}{su.ro.ba.’ki.a} ‘Czecho + Slovakia; Czecho-Slovakia’
bik.ku’.ri + gyoo.ten° → {bik.ku.’ri}{gyoo.ten°} ‘being astonished + being stunned; being astonished and stunned’
i’p.pu + ta.sai° → {i’p.pu}{ta.sai°} ‘one husband + many wives; polygamy’
to’o.zai + na’n.bo.ku → {to’o.zai}{na’n.bo.ku} ‘east & west + south & north; all directions, everywhere’
8.2 Left-dominant Compound Rule
While compound accentuation in Tokyo Japanese is basically right-dominant, there are many dialects whose compound accent patterns are left-dominant. The two-pattern accent systems found in Kyūshū – for example, Kagoshima, Nagasaki, and Koshikijima Japanese – are typical examples (Hirayama Reference Hirayama1951; Uwano Reference Uwano and Kunihiro1997; Hayata Reference Hayata1999; Kibe Reference Kibe2000). To take some examples from Kagoshima Japanese, compound words in this dialect take the Type A pattern (an H tone on the penultimate syllable) if their initial member is lexically Type A, while they take the Type B pattern (an H tone on the final syllable) if the initial member is Type B. This is exemplified in (30a, b), respectively.
a. NA.tsu + ma.TSU.ri → na.tsu-ma.TSU.ri ‘summer + festival; summer festival’
NA.tsu + ya.su.MI → na.tsu-ya.SU.mi ‘summer + holiday; summer holiday’
b. ha.RU + ma.TSU.ri → ha.ru-ma.tsu.RI ‘spring + festival; spring festival’
ha.RU + ya.su.MI → ha.ru-ya.su.MI ‘spring + holiday; spring holiday’
One naturally wonders here if the right-dominant and left-dominant nature of compound accentuation may have to do with the distinction between multiple-pattern and N-pattern systems discussed in Section 2, which, in turn, may be linked to the typological categorization of Japanese pitch accent into “word accent” and “word tone” discussed in Section 4 (see Kubozono Reference Kubozono2012b for arguments for this idea). Whether these parameters correlate with each other is a very important and interesting topic for future work.
8.3 Hybrid System
Interestingly, there also exists a pitch accent system that exhibits both the right-dominant and left-dominant features in its compound accentuation. This system is widely found in Kinki Japanese (Uwano Reference Uwano and Kunihiro1997; Hayata Reference Hayata1999). In the Kyoto dialect, for example, compound words display two major prosodic features, one concerning the pitch height of their initial position – high or low – and the other regarding their accentedness – accented or unaccented. Of these two features, the second one is shared by Tokyo Japanese as shown in (26–28). Kyoto Japanese is, in fact, quite similar to Tokyo Japanese in that compounds look at their final member to determine whether they are accented or unaccented. They also tend to preserve the lexical accent of their final member as much as possible, subject to the non-finality constraint.
In addition to this, compounds in Kyoto Japanese refer to their initial member to determine whether they begin with a high or low pitch. If the initial member begins with a high pitch when pronounced in isolation, this feature is inherited by the compound, as in (31a). Likewise, the compound begins with a low pitch, as in (31b), if the initial member is a low-beginning morpheme (Wada Reference Wada1942).
a. NA.tsu + YA.su.mi → NA.TSU-YA.su.mi ‘summer + holiday; summer holiday’
KYA.be.tsu + ha.TA.ke → KYA.BE.TSU-BA.ta.ke ‘cabbage + field; cabbage field’
b. ha.RU + YA.su.mi → ha.ru-YA.su.mi ‘spring + holiday; spring holiday’
ya.saI + ha.TA.ke → ya.sai-BA.ta.ke ‘vegetable + field; vegetable field’
Compound accentuation in Kyoto Japanese thus shows a hybrid system combining the right-dominant nature of compound accentuation found in Tokyo Japanese and the left-dominant one found in Kagoshima Japanese.Footnote 19 To use Hayata’s (Reference Hayata1999) terminology, Kyoto Japanese has both “word accent” and “word tone” and its compound accentuation involves the right-dominant preservation of “word accent” (pitch fall) and the left-dominant preservation of “word tone” (word-initial pitch pattern in this particular dialect).
9 Conclusions
9.1 Summary
This chapter has considered the diversity of pitch accent systems in Japanese with the system of Tokyo Japanese as a reference. It has shown that Japanese dialects differ from each other in many features pertaining to word accent. For example, Tokyo Japanese permits multiple accent patterns for nouns – two patterns for monosyllabic nouns, three patterns for disyllabic ones, four patterns for trisyllabic ones, etc. On the other hand, N-pattern systems permit a fixed number of accent patterns irrespective of the length of the word: Kagoshima Japanese and its sister dialects, for example Nagasaki and Koshikijima Japanese, exhibit only two patterns (Type A and Type B) no matter how long the word may be (Section 2).
Japanese dialects also show variability with respect to the distinctive phonetic feature of lexical pitch accent (Section 3). Tokyo Japanese and many other dialects employ pitch fall to differentiate one accent pattern from another, whereas some like the Narada dialect exceptionally use pitch rise for the same purpose.
Japanese dialects also display variability in how accent patterns are determined. First, they differ in the domain where pitch accent patterns are defined: Tokyo Japanese uses the word as the domain, whereas many dialects such as Kagoshima and Nagasaki Japanese realize pitch accent patterns in the domain of the bunsetsu, ‘the basic syntactic phrase’ (Section 4). Second, many dialects including Tokyo, Kagoshima, and Nagasaki Japanese permit only one prominence or underlying H tone per domain, whereas some dialects such as Koshikijima Japanese permit more than one prominence or H tone in the same domain (Section 5). Third, Tokyo and Nagasaki Japanese use the mora as the basic unit to measure phonological distances in pitch accent assignment, while Kagoshima Japanese measures phonological distances with the syllable (Section 6).
Fourth, Japanese dialects may vary in the directionality of pitch accent assignment (Section 7). Tokyo Japanese as well as Kagoshima and Koshikijima Japanese count the number of moras (or syllables) from the right edge of the domain. In contrast, Nagasaki Japanese determines the position of the word-level prominence from the left edge.
Finally, Japanese dialects display variability regarding compound accentuation, too (Section 8). Tokyo Japanese has a typical right-dominant compound accent rule by which the phonological structure of the rightmost element plays the key role in determining compound accent patterns. In contrast, Kagoshima Japanese and its sister dialects have a left-dominant accent rule whereby the prosodic property of the leftmost element is inherited by the entire compound.
These observations, which are summarized in Table 8.1, show how and to what extent pitch accent systems of Japanese differ from each other. They clearly demonstrate that Tokyo Japanese represents only one type of system among various Japanese pitch accent systems.
Table 8.1 Summary of various parameters and dialects
| Dialect | Pitch patterns in nouns | Distinctive feature | Domain | Prominence peak | Unit | Direction | Compound |
|---|---|---|---|---|---|---|---|
| Tokyo (Tokyo) | multiple (n+1) | pitch fall | word | one | mora +syll | R → L | R-dominant |
| Kyoto (Kyoto) | multiple (2n±1) | pitch fall | word | one | mora | R → L | R-dominant +L-dominant |
| Kokonogi (Fukui) | N-pattern (N=3) | pitch fall | bunsetsu | one | mora | R → L | ? |
| L → R | |||||||
| Nagasaki (Nagasaki) | N-pattern (N=2) | pitch fall | bunsetsu | one | mora | L → R | L-dominant |
| Kagoshima (Kagoshima) | N-pattern (N=2) | ? | bunsetsu | one | syll | R → L | L-dominant |
| Koshikijima-Teuchi (Kagoshima) | N-pattern (N=2) | ? | bunsetsu | two | mora +syll | R → L | L-dominant |
| Koshikijima-Kuwanoura (Kagoshima) | N-pattern (N=2) | ? | bunsetsu | two | mora +syll | R → L | L-dominant |
| L → R | |||||||
| Amami-Yuwan (Kagoshima) | N-pattern (N=3) | pitch fall | bunsetsu | one | mora +syll | R → L | ? |
| L → R |
Note: Parentheses after the dialect show the prefecture where it is spoken. ‘syll’ stands for ‘syllable,’ whereas R and L denote right and left, respectively. Question marks indicate that data and/or analyses are inconclusive.
The situation summarized in Table 8.1 is complex enough. However, it is further complicated by the existence of hybrid systems involving two seemingly competing features pertaining to one and the same parameter. For example, a single dialect may use both the word and the bunsetsu as the domain of pitch accent assignment: it defines one accent pattern within the word, while defining another pattern within the bunsetsu domain (Section 4). Similarly, a single system may use both the syllable and the mora as counting units: Koshikijima-Teuchi Japanese counts the number of moras to determine the position of the primary H tone, while counting syllables to define the position/domain of the secondary H tone (Section 6).
Moreover, while most dialects determine the position of word-level prominence from either the left edge or right edge of the word or bunsetsu, some dialects use both the left-to-right and right-to-left procedures in the same system or even within the same word (Section 7). Finally, the left-dominant and right-dominant compound accent rules may coexist within a single prosodic system (Section 8). The existence of these hybrid systems makes Japanese pitch accent look more complex, but even more interesting and fascinating at the same time.
9.2 Future Agenda
This chapter has raised as many questions as it has solved. One interesting question for future work concerns the relationship between the various parameters that are used to describe the pitch accent systems. It was suggested in passing that the distinction between multiple-pattern and N-pattern systems is closely related to the domain parameter (word versus bunsetsu) (Section 4) and also to the nature of compound accentuation (left-dominant versus right-dominant) (Section 8). On the other hand, our analysis has shown that the multiple-pattern versus N-pattern distinction is independent of other parameters regarding the distinctive feature (Section 3), culminativity (one prominence versus two prominences) (Section 5), the basic prosodic unit (mora versus syllable) (Section 6), and the directionality of pitch accent assignment (left to right versus right to left) (Section 7). The foregoing discussion has clearly shown that N-pattern systems display variability in these parameters: Nagasaki, Kagoshima, and Koshikijima Japanese, for example, all have two-pattern systems, but they exhibit different features with respect to culminativity, the prosodic unit, and directionality. It will be interesting to examine in more depth how these parameters interact with each other and other parameters.
To answer this interesting question, it will be necessary to look at more data from a wider range of Japanese dialects. Given that many regional dialects are at the risk of extinction like the Koshikijima and Kikaijima dialects, it is indeed vital to expand our data in order to better understand the diversity of pitch accent systems, while we also continue to examine the pitch accent system of Tokyo Japanese.
Expanding the scope of analysis is another dimension in which our research on Japanese pitch accent can be extended. Most research on Japanese pitch accent in the past has concentrated more or less on the analysis of word accent within the word or bunsetsu. On the other hand, how word accent patterns are manifested in sentences and connected speech remains a largely understudied topic. If word accent studies are expanded in this direction, they might reveal new aspects of lexical accent such as the loss of word accent or accentual contrasts in focus and other constructions beyond the word and phrase.
Research along these lines will inevitably call for cross-linguistic comparisons between Japanese and other languages. The diversity of Japanese pitch accent systems is certainly an interesting topic by itself, but it will be more interesting to consider it from cross-linguistic and typological perspectives to see what insight and implications the diversity has for the nature of word accent and prosodic typology.
1 Introduction
The last three decades have witnessed explosive development in the phonological study of languages’ intonation structure. Pierrehumbert’s (Reference Pierrehumbert1980) thesis on English intonation is undoubtedly the most influential contribution to intonational phonology. The intonation systems of the Japanese language have been extensively investigated, beginning from the 1980s (e.g. Poser Reference Poser1984; Pierrehumbert and Beckman Reference Pierrehumbert and Beckman1988; Kubozono Reference Kubozono and Haraguchi1993). Findings from Japanese have contributed to the development of a widely accepted framework for describing the intonational phonology of numerous languages (Ladd Reference Ladd2008; Gussenhoven Reference Gussenhoven2004).
This chapter aims to present an overview of the intonation structure of Japanese. Section 2 provides the background, defining what intonation is and briefly describing prosodic phrasing and boundary pitch movements. Sections 3 and 4 describe two major components of Japanese intonation, boundary pitch movements and prosodic phrasing, respectively. Section 5 discusses intonational contrasts, and Section 6 concludes this chapter. All illustrative utterances of Tokyo Japanese have been produced by the author (a native speaker). The speakers of the dialectal utterances discussed in Section 5.4 are native speakers of the corresponding dialects.
2 Intonation and Its Components
2.1 What is Intonation?
Ladd (Reference Ladd2008: 4) defines intonation as “the use of suprasegmental phonetic features to convey ‘post-lexical’ or sentence-level pragmatic meanings in a linguistically structured way.” Suprasegmental features are commonly defined as fundamental frequency (F0), intensity, and duration. F0 corresponds to the rate of complete cycles of vibration of the vocal folds per unit of time. A higher F0 value gives listeners the auditory sensation of a higher pitch. Thus, F0 is a physical property of speech, whereas pitch is its perceptual correlate (however, these two terms are used interchangeably in this chapter). In Japanese and many other languages (Cruttenden Reference Cruttenden1997; Gussenhoven Reference Gussenhoven2004), pitch is the primary feature in intonation; therefore, this chapter focuses on the F0 contours of the utterance.
Meanings conveyed by intonation apply to phrases or to an utterance as a whole. They include, according to Ladd (Reference Ladd2008), sentence type, speech act, focus, or information structure. A prototypical sentence-type distinction signaled by intonation is statement versus question, which in many languages is achieved by a rising boundary pitch movement at the end of the question sentence. Intonation contrasts with lexical tones (generally called “lexical pitch accents” in Japanese linguistics), which distinguish word-level meanings. In the distinction between háshi ‘chopstick’ and hashí ‘bridge,’ for example, a high pitch (denoted by the acute accent marker) falls on the first syllable in the former but on the second syllable in the latter. In Japanese, lexical pitch accent constrains prosodic phrasing above the word level; therefore, this chapter especially focuses on lexical pitch accent in Section 2.
Intonation conveys meanings in a structured way as a phonological organization (e.g. Ladd Reference Ladd2008: 3–42; Gussenhoven Reference Gussenhoven2004: 49–70). Pitch variation conveying non-linguistic information (e.g. sex, age, and emotional state) is excluded from intonation as defined here. This information can be interpreted even by listeners who do not know the language because forms and functions are directly correlated. For example, the more excited a person is, the higher is the pitch of his/her voice. In the currently defined meaning of intonation, in contrast, forms and functions are phonological in the sense that the phonetic dimension is mediated by the language’s phonological system. To confirm that intonation differs from non-linguistic pitch variation, Gussenhoven (Reference Gussenhoven2004: 49–70) demonstrates that intonation and other language components share the three design features that Hockett (Reference Hockett1958) identifies for human language: arbitrariness, discreteness, and duality of structure.
In fact, the form-function relation in intonation tends to be non-arbitrary. In many languages, rising or high pitch signals question while falling or low pitch signals statement (Ohala Reference Ohala1984). Moreover, a word produced with a higher pitch tends to be interpreted as being more informative than another produced with a lower pitch. Utterance-final rising, or high pitch in general, conveys continuation while falling or low pitch conveys finality (Gussenhoven Reference Gussenhoven2004: 71–96). Even though in most cases intonation lacks an arbitrary relation between form and function, this is not always the case. In Chickasaw, for example, the interrogative contour is falling, whereas the declarative contour is rising (Gussenhoven Reference Gussenhoven2004: 53–54), thus indicating the arbitrariness of the form-function relation in this language. While it is not easy to find evidence for arbitrariness in Tokyo Japanese, aspects of certain regional varieties of Japanese defy the non-arbitrary tendency. In the Kagoshima dialect, for example, a rising pitch contour does not signal interrogativity; the pitch contour in a question sentence is virtually identical to that in a statement (Kibe Reference Kibe, Kobayashi and Shinozaki2010). Furthermore, in the Imaichi dialect, the focused word bears the lowest pitch and the location of the pitch peak in the utterance is not associated with the focused word at all.Footnote 1
Most contemporary researchers agree that intonation has a discrete form-function relation, indicating that phonetic dimensions are not directly correlated with semantic ones. This can be illustrated by the case in which two phonetically similar pitch contours are interpreted as signaling discrete meanings. Pierrehumbert and Steele (1989) demonstrate that the phonetic continuum of English rise–fall–rise contours is interpreted by native speakers as two discretely different intonation patterns. However, we frequently observe cases in which a certain putative linguistic message is signaled by continuously variable pitch. For example, Liberman and Pierrehumbert (Reference Liberman, Pierrehumbert, Aronoff and Oerhle1984) show that in English, the degree of emphasis and pitch range size are correlated, with more emphatic utterances having a wider pitch range. This message (i.e. the degree of emphasis conveyed by a pitch range) cannot be considered to be the meaning that the intonation as defined here conveys. Instead, the pitch range variation at issue is considered to be a phonetic modification of a single phonological category of intonation. In fact, establishing discreteness in intonation is one of the most challenging issues in the study of intonation. In Japanese as well, the question of the number of intonation patterns is yet to be resolved. We will return to these issues in Section 3.3, which concerns the inventory of Japanese boundary pitch movements.
Duality of structure is the most controversial issue in intonation. According to Gussenhoven (2006), the existence of duality in intonation is suggested by the English calling contour or “vocative chant” (like Jo-ohn!) in which two phonological elements embody one intonational morpheme. However, we will not discuss duality of structure in this chapter.
2.2 Major Components of Japanese Intonation
We describe the Japanese intonation system in terms of two major components: boundary pitch movements (BPMs) and prosodic phrasing above the word level (prosodic phrasing).
BPMs are tones that occur at the end of the prosodic phrase and contribute to the pragmatic interpretation of the utterance (Venditti, Maekawa, and Beckman Reference Venditti, Maekawa, Beckman, Miyagawa and Saito2008: 471), for example, as showing questioning, continuation, or emphasis. This information is sometimes regarded as “modality,” which is defined as the mental attitude that a speaker assumes when s/he produces an utterance with a certain intention. According to Kori (Reference Kori and Kunihiro1997: 190), modalities include questions, strong insistence, incredulity, and emphasis, which can be distinguished solely by intonation. For example, when Ichirō ga hōmuran o utta (Ichiro nom homerun acc hit.past ‘Ichiro hit a homerun’) is articulated with a rising pitch at utterance-final mora (i.e. with a rising BPM), the sentence is interpreted as a question, whereas it is interpreted as a statement without this BPM. While BPM is sometimes referred to as “sentence-final intonation” (Kori Reference Kori and Kunihiro1997: 190), this terminology is avoided in this chapter as BPM occurs not only sentence-finally but also sentence-medially.
Prosodic phrasing is defined as the grouping of words in an utterance through suprasegmental features. In Japanese, prosodic phrasing can signal the focused word in the sentence, the syntactic constituency, and so forth. For example, when the NP utsukushii suishagoya no musume (beautiful miller gen daughter) is divided into two prosodic phrases, {utsukushii} {suishagoya no musume}, it is interpreted as ‘the miller’s beautiful daughter’ (curly brackets indicate boundaries of prosodic phrases). When, in contrast, the NP is produced with a single prosodic phrase, {utsukushii suishagoya no musume}, it is interpreted as ‘the daughter of the beautiful miller.’ These two interpretations are a consequence of differences in the syntactic structure. The former has the structure [utsukushii [suishagoya no musume]] in which the adjective utsukushii ‘beautiful’ modifies the extended NP suishagoya no musume ‘miller’s daughter’ (square brackets indicate syntactic boundaries). In contrast, the latter has the structure [[utsukushii suishagoya no] musume] in which the adjective modifies only the immediately following noun suishagoya.
3 Boundary Pitch Movements
3.1 Introduction
BPMs are tones that occur at the end of prosodic phrases. They include a slightly concave rising pitch movement, transcribed as LH% in the X-JToBI scheme (see Section 3.2), which typically occurs at the end of a question sentence. The sentence ending with a verb in a predicative form shown in Figure 9.1 is interpreted as a question when accompanied by LH% (left), whereas the same sentence is interpreted as a statement without LH% (right) (Uemura Reference Uemura1989; Japanese utterances can also have no BPM at all).
Figure 9.1 Question utterance with BPM and statement utterance without BPM. Ya’mano wa u’mi de oyo’gu. (LH%) (Yamano top sea loc swim) ‘Will Yamano swim in the sea?’ (left) and ‘Yamano will swim in the sea’ (right). The mora assigned a BPM is /gu/, which is underlined in this caption.
While LH% typically appears in question sentences, it does not always indicate a question. This is shown in Figure 9.2 where the sentence is interpreted as a statement even with LH%, indicating that the meanings of BPM are more abstract, as will be described in the following section.
Figure 9.2 Statement utterances with and without BPM (LH %). Ya’mano wa u’mi de oyo’gu yo. (Yamano top sea loc swim sfp) ‘Yamano will swim in the sea.’ The mora assigned a BPM is /yo/, which is underlined in this caption.
3.2 The BPM Inventory
LH% is not the only BPM in the inventory of Japanese BPM. As discussed in Section 3.4 below, there is disagreement over the number of BPMs. Figure 9.3 depicts the four main types of BPM indicated by the X-JToBI framework (Maekawa et al. Reference Maekawa, Kikuchi, Igarashi and Venditti2002), the extended version of Japanese Tone and Break Indices, or the J_ToBI system (Venditti Reference Venditti and Jun2005), which owes its theoretical foundation to Pierrehumbert and Beckman’s (Reference Pierrehumbert and Beckman1988) study. The four main types of BPM are H% (simple rise), HL% (rise–fall), LH% (scooped rise), and HLH% (rise–fall–rise). All the BPMs in this figure are attached to a phrase comprising a single word.
Figure 9.3 Main types of BPM. Ima ne (now sfp) ‘Now.’ The boundaries of the final mora /ne/ are marked.
H% differs from LH% mainly in its F0 shape. In the case of H%, F0 starts rising at the beginning of the phrase-final mora, whereas in the case of LH% it starts in the middle of the final mora. In addition to this alignment difference, pitch range is generally (but not necessarily) smaller in H% than in LH% (Venditti, Maekawa, and Beckman Reference Venditti, Maekawa, Beckman, Miyagawa and Saito2008). The resultant F0 shape for H% is a linear rise with a smaller excursion while that for LH% is a concave or scooped rise with a larger excursion.
H% in the utterance-final position generally does not signal a question interpretation. Instead, it provides information, for example, that the speaker is insisting (Venditti, Maeda, and van Santen Reference Venditti, Maeda and van Santen1998) or that he is firmly persuading the listener to agree with what has been said (Uemura Reference Uemura1989). As will be discussed in Section 3.5, H% is sometimes referred to as an “emphatic” rise in other frameworks (Kori Reference Kori and Kunihiro1997; Uemura Reference Uemura1989) as it emphasizes the phrase to which it is attached. H% is also used, according to Uemura (Reference Uemura1989), when the speaker is seeking approval, convincing the listener, inviting the listener’s attention, or blaming. H% can appear utterance-medially, and, in this case, it can also lend prominence to the phrase in which it appears (Kori Reference Kori and Kunihiro1997; Yoshizawa Reference Yoshizawa1960). It also signals a continuation of speech (Kori Reference Kori and Kunihiro1997).
LH% is most often observed at the ends of utterances and typically expresses a question. However, as mentioned in Section 3.1, LH% does not always convey the meaning of a question. Uemura (Reference Uemura1989) summarizes the functions of this BPM as an expression of intimacy or a friendly attitude toward the listener.
HL% is a rise–fall BPM in which the beginning of the rise is at the onset of the phrase-final mora with the peak at the end of the rise aligned with the middle of the mora (close to its onset). After the rise, F0 falls at the end of the mora, and its duration is lengthened. The function of HL% is akin to H%: it imparts a prominence to the phrase that the BPM is attached to. In their perception study, Venditti, Maeda, and van Santen (Reference Venditti, Maeda and van Santen1998) showed that HL% is perceived by the listener as explanatory and emphatic, and it is judged to signal continuation. Citing this study, Venditti, Maekawa, and Beckman (Reference Venditti, Maekawa, Beckman, Miyagawa and Saito2008) summarize the functions of HL% by stating that listeners expect speakers to use HL% when they are explaining a certain point and want to focus attention on a particular phrase in their explanation.
The choice of H% or HL% at least partly depends on speaking style and spontaneity. An analysis of the impression rating assigned to talks in the Corpus of Spontaneous Japanese (CSJ, Maekawa Reference Maekawa2003) showed that the rate of H% positively and negatively correlates with speaking style and spontaneity, respectively, while the rate of HL% negatively and positively correlates with speaking style and spontaneity, respectively (Maekawa Reference Maekawa2006). In other words, listeners judge H% to be more formal and less spontaneous than HL%.
The F0 configuration of the former part of HLH% is akin to HL%; however, F0 rises again after the fall in the case of HLH%. The final mora is considerably lengthened. Venditti, Maekawa, and Beckman (Reference Venditti, Maekawa, Beckman, Miyagawa and Saito2008) suggest that HLH% may be particularly characteristic of infant-directed speech (IDS) as it can give a wheeling or cajoling quality to the utterance to which it is attached. Indeed, an analysis of Japanese infant-directed speech using the RIKEN Japanese Mother–Infant Conversation Corpus (Mazuka, Igarashi, and Nishikawa Reference Mazuka, Igarashi and Nishikawa2006), which contains IDS and adult-directed speech (ADS) in Japanese, revealed a higher occurrence of HLH% in IDS than in ADS (Igarashi et al. Reference Igarashi, Nishikawa, Tanaka and Mazuka2013). However, this type of BPM occurs much less frequently than other types, even in IDS. It occurs only twelve times in eight hours of IDS produced by twenty-one mothers. The low frequency of HLH% is also confirmed by the CSJ analysis. It occurs only fourteen times in the forty-five hour core portion of the CSJ (Venditti, Maekawa, and Beckman Reference Venditti, Maekawa, Beckman, Miyagawa and Saito2008).
The X-JToBI framework describes types of BPM other than the four types discussed here (H%, LH%, HL%, and HLH%). These types are operationally considered to be variants of the main types of BPMs in X-JToBI (for details, see Igarashi Reference Igarashi and Kubozono2015: Section 3.4).
The meanings of the BPMs described here are merely a first approximation. Unfortunately, no analysis of the meanings conveyed by BPMs is uncontroversial, and a comprehensive description is beyond the scope of this chapter. However, it is reasonable to point out here that the relations between forms and meanings in Tokyo Japanese BPMs seem to lack arbitrariness, and they fit comfortably into Gussenhoven’s theory of biological code concerning form-function relations based on the effects of the production process’s physiological properties on the speech signal (Gussenhoven Reference Gussenhoven2004) (for a full discussion, see Igarashi Reference Igarashi and Kubozono2015: 548–550).
3.3 How Many BPMs Does Japanese Have?
No consensus has emerged regarding the number of BPMs in Japanese. Moreover, few quantitative analyses have been conducted for BPMs. In the following paragraphs, we will discuss the extent to which researchers agree or disagree on the inventory of BPMs. Figure 9.4 is a schematic representation of the categorical boundaries of the rising BPMs.
Figure 9.4 Schematic representation showing correspondences among categories of the rising BPMs identified by different researchers
Most researchers distinguish two types of rises: information-seeking question rise (InfoQ rise) and prominence-leading rise (Prom rise) (Venditti, Maeda, and van Santen Reference Venditti, Maeda and van Santen1998). The InfoQ rise typically occurs in a question ending, whereas the Prom rise typically occurs when a speaker is making an insistent statement, such as Yameru! ‘I will definitely quit!’
In addition to these two rises, Kawakami (Reference Kawakami1963) and Uemura (Reference Uemura1989) distinguish the InfoQ rise from what we may call the “incredulity question rise” (IncreQ rise) (Venditti, Maeda, and van Santen Reference Venditti, Maeda and van Santen1998). The IncreQ rise is typically observed in a question where the speaker is expressing disbelief, such as Yameru? ‘Will you quit??’ However, Venditti, Maekawa, and Beckman (Reference Venditti, Maekawa, Beckman, Miyagawa and Saito2008) suggest that the InfoQ and IncreQ rises are the extreme endpoints of a continuum that includes many intermediate degrees of emphatic lengthening. The continuum between InfoQ and IncreQ is shown in Figure 9.5. They also point out that the gradient nature of the relation between the phonetic dimensions and the continuum of contrasting degrees of incredulity suggests an analysis analogous to the one proposed by Hirschberg and Ward (1992) for the uncertainty versus incredulity interpretations of the English rise–fall–rise contour (transcribed as L*+H L- H%).
Figure 9.5 An InfoQ-IncreQ continuum. Sō na no (so cop nmlz) ‘Is it so?’
Thus, defining the inventory of BPMs is far from trivial, and we cannot always rely solely on native speakers’ intuition as their intuition in the case of intonational contrasts is less sharp than in the case of lexical tone contrasts (Gussenhoven Reference Gussenhoven2004: 60). As exemplified in the InfoQ–IncreQ continuum, it is difficult to decide whether two phonetically different contours convey two categorically distinct intonational patterns or are merely phonetic variants of a single pattern. Researchers have recently been attempting to develop experimental methods for establishing intonational categories (for a discussion of the experimental approaches, see Gussenhoven Reference Gussenhoven2004: 62–70).
4 Prosodic Phrasing
4.1 Double-layered Model
Prosodic phrasing is the grouping of words in an utterance. Although controversies still exist on the number of levels of prosodic phrasing in Japanese and their organization (for review, see Ishihara Reference Ishihara and Kubozono2015: 570; Igarashi Reference Igarashi and Kubozono2015: 527–529), this chapter, based on the X-JToBI framework, adopts the double-layered model, with the hierarchically organized Accentual Phrase (at the lower level) and Intonation Phrase (at the higher level).
4.2 Accentual Phrase
Japanese has two types of words: accented and unaccented. The former exhibit a pitch contour with a steep fall from high (H) to low (L) somewhere in the word while the latter show no such fall. In this chapter, the term pitch accent refers to this lexically specified pitch fall in the accented words. For example, a’me ‘rain’ has an accent on the initial syllable and is, therefore, an accented word, whereas ame ‘candy’ has no pitch accent and is, therefore, an unaccented word (the accented vowel is post-marked by an apostrophe). In addition to the presence or absence of pitch accent, its location in the word is also lexically specified; for example, na’mida ‘tear’ has an accent on the initial syllable, nomi’ya ‘pub’ on the second, and atama’ ‘head’ on the final.
An Accentual Phrase (AP) is defined as having a delimitative rise to high around the second mora and a subsequent gradual fall to low at the end of the phrase and as having at most one lexical pitch accent. While a typical AP comprises one lexical word plus a following particle or multiple particles (e.g. yama’ ga ‘mountain nom,’ niwa ni’ wa ‘garden loc top’), a single AP can often contain two or more lexical words. For instance, a noun with a genitive particle followed by another noun, such as Hiroshima no omiyage (Hiroshima gen souvenir) ‘a souvenir from Hiroshima,’ often forms a single AP. Moreover, a particle can form its own AP. For example, in a sequence of an accented noun and a following accented particle, such as nomi’ya ma’de (pub up-to) ‘to the pub,’ the noun and the particle are often merged into a single accented AP, with the accent of the particle being deleted, as in nomi’ya made. Deaccenting of particles is, however, not obligatory (see Igarashi Reference Igarashi and Kubozono2015: 538). When the accent is maintained, the particle forms its own AP.
The intonation contours of a single AP are described as a sequence of tones: an unaccented AP (1a), and an accented AP (1b).
“H*+L” stands for pitch accent, where the asterisk indicates the tone associated with the mora that is governed by the accented syllable. (An accent is assigned to a syllable, but the accentual H is associated with a mora.) Henceforth, the mora with which the H tone of H*+L is associated will be referred to as the accented mora.
L tones with a “%” symbol are called boundary tones, with %L and L% being the initial and final boundary tones, respectively. The low tone found at the beginning of the AP is sometimes called the initial lowering in some frameworks (e.g. Haraguchi Reference Haraguchi1977). The H-tone is called the phrasal high. The %L and H- function as starting and ending points, respectively, of the AP’s initial rise. The L% serves as the endpoint of a gradual pitch fall from H- in the case of an unaccented AP, or from L of H*+L in the case of an accented AP. In Figure 9.6, the unaccented adjective amai ‘sweet’ is combined with the following unaccented noun ame ‘candy’ into a single unaccented AP, where the %L H- L% pattern can be clearly observed.
Figure 9.6 An unaccented AP amai ame ‘sweet candy’ (left), and an accented AP uma’i ame ‘tasty candy’ (right)
4.3 AP Sequences
When a speaker produces fluent utterances of the sentences in (2), s/he groups the words into several APs. The syntactic branching structure is [A [N1 N2]], where A is an adjective and N is a noun followed by a particle but not [[A N1] N2]. The grouping of words into APs depends on the interaction of various factors such as word accentuation, syntactic branching, focus, and/or discourse structure (Venditti Reference Venditti and Jun2005). A typical prosodic phrasing of these sentences is shown in Figures 9.7–9.10.Footnote 2 The rise at the end of these utterances is a LH% BPM, which was discussed in Section 3.
a.
Amai jagaimo no nimono wa do’re desu ka? sweet potato gen stew top which cop int ‘Which are the sweet stewed potatoes?’ b.
Uma’i jagaimo no nimono wa do’re desu ka? tasty potato gen stew top which cop int ‘Which are the tasty stewed potatoes?’ c.
Amai zuwa’igani no nimono wa do’re desu ka? sweet snow.crab gen stew top which cop int ‘Which is the sweet stewed snow crab?’ d.
Uma’i zuwa’igani no nimono wa do’re desu ka? tasty snow.crab gen stew top which cop int ‘Which is the tasty stewed snow crab?’
Figure 9.7 Phrasing at the AP level. Amai jagaimo no nimono wa do’re desu ka? (sweet potato gen stew top which cop.pol int) ‘Which are the sweet stewed potatoes?’
Figure 9.8 Uma’i jagaimo no nimono wa do’re desu ka? (tasty potato gen stew top which cop.pol int) ‘Which are the tasty stewed potatoes?’
Figure 9.9 Amai zuwa’igani no nimono wa do’re desu ka? (sweet snow_crab gen stew top which cop.pol int) ‘Which is the sweet stewed snow crab?’
Figure 9.10 Uma’i zuwa’igani no nimono wa do’re desu ka? (tasty snow_crab gen stew top which cop.pol int) ‘Which is the tasty stewed snow crab?’
When a right-branching syntactic boundary exists, an AP boundary is frequently inserted. Thus in (2), the adjective amai or uma’i forms a single AP. When no right-branching boundary intervenes, an unaccented word and the word that follows it tend to be conjoined into a single AP. In (2a–b), therefore, two NPs jagaimo no and nimono wa are conjoined into an AP. When an accented word is followed by another word, the latter often forms its own AP (Vance Reference Vance2008: Section 7.6) even if there is no right-branching boundary. Thus, in (2c–d), two NPs zuwa’igani no and nimono wa form separate APs. In all the examples in (2), the VP do’re desu ka constitutes a single AP. Thus, prosodic phrasing in (2) can be couched in the form of (3), where parentheses denote the AP boundaries.
(3) Prosodic phrasing in (2)
a.
(amai) (jagaimo no nimono wa) (do’re desu ka) b.
(uma’i) (jagaimo no nimono wa) (do’re desu ka) c.
(amai) (zuwa’igani no) (nimono wa) (do’re desu ka) d.
(uma’i) (zuwa’igani no) (nimono wa) (do’re desu ka)
4.4 Downstep and Intonation Phrase
The Intonation Phrase (IP) is defined as the prosodic phrase immediately above the AP in the hierarchy within which pitch range is specified. At the beginning of each new IP, the speaker chooses a new pitch range that is independent of the specification of the preceding AP (Venditti Reference Venditti and Jun2005: 175). This process is called pitch reset. The pitch range specification of IPs is closely connected with a phonological process called downstep. Through this process, the pitch range of each AP is compressed when it follows an accented AP. Downstep is displayed in Figure 9.11. The peak of the third AP is significantly lower when preceded by an accented AP (right) than when preceded by an unaccented AP (left).
Figure 9.11 Downstep. An utterance without a downstep: Yubiwa o wasureta onna’ wa dare desu ka? (ring acc forgot woman top who cop.pol int) ‘Who is the woman that left the ring behind?’ (left), and an utterance with downstep on the third AP: Yubiwa o era’nda onna’ wa da’re desu ka? (ring acc chose woman top who cop.pol int) ‘Who is the woman that chose the ring?’ (right). The relevant portions of the F0 contours are marked by broken-line boxes. Dotted vertical lines symbolize AP boundaries.
When multiply accented APs form a single IP, downstep occurs iteratively, thus resulting in a staircase-like F0 contour, as demonstrated in Figure 9.12. However, in a sequence of four APs (in a syntactic phrase with a uniformly left-branching structure), as in Figure 9.11, the pitch range of the third AP is frequently expanded so that a staircase-like F0 is not observed. This effect is known as “rhythmic boost” (Kubozono Reference Kubozono1988 [Reference Kubozono and Haraguchi1993]: 220–223) and is shown in Figure 9.13, in which the pitch range of the third AP is larger than the preceding AP.
Figure 9.12 Successive downstep: Ao’i ie’ o era’nda onna’ wa da’re desu ka? (blue house acc chose woman top who cop.pol int) ‘Who is the woman that chose the blue house?’ without the rhythmic effect. Vertical lines indicate AP boundaries.
Figure 9.13 Successive downstep: Ao’i ie’ o era’nda onna’ wa da’re desu ka? (blue house acc chose woman top who cop.pol int) ‘Who is the woman that chose the blue house?’ with the rhythmic effect. Vertical lines indicate AP boundaries.
When the IP boundary is inserted, downstep is blocked at this boundary; that is, pitch reset occurs, and a new pitch range is specified to the IP. Various linguistic factors result in pitch reset at the IP boundary, including syntactic constituency and focus (Kawakami Reference Kawakami1957; Kori Reference Kori and Kunihiro1997; Ishihara Reference Ishihara2007; Kubozono Reference Kubozono and Ishihara2007b). The pitch ranges of post-focal words in accented APs are significantly reduced, a process called post-focal compression. Pitch reset and post-focal compression are shown in Figures 9.14 and 9.15, respectively. The prosodic phrasing in the utterances in these figures is shown in (4), in which curly brackets represent the boundaries of the IPs.
(4) The prosodic phrasing in the utterances in Figures 9.14 and 9.15. Focused words are capitalized.
a. {(na’oya no) (ane ga)} {(nomi’ya de) (no’nda)} (Figure 9.14, left)
b. {(na’oya no)} {(ANE GA nomi’ya de) (no’nda)} (Figure 9.14, right)
c. {(na’oya no) (a’ni ga)} {(nomi’ya de) (no’nda)} (Figure 9.15, left)
d. {(na’oya no)} {(A’NI GA) (nomi’ya de) (no’nda)} (Figure 9.15, right)
Figure 9.14 Pitch reset and post-focal compression: Na’oya no ane ga nomi’ya de no’nda (Naoya gen sister nom pub loc drank) ‘Naoya’s sister drank in the pub,’ without (left) and with (right) focus on the second unaccented AP ane ga. Focused words are capitalized.
Figure 9.15 Na’oya no a’ni ga nomi’ya de no’nda (Naoya gen brother nom pub loc drank) ‘Naoya’s brother drank in the pub,’ without (left) and with (right) focus on the second accented AP a’ni ga.
5 Other Possible Intonational Contrasts
5.1 Intonational Contrasts in Phrasal Edges
Thus far, we have described the Japanese intonation system in terms of BPM and prosodic phrasing. They are both boundary-related prosodies, meaning that BPM is localized at the edges of prosodic phrases, which are signaled by a pitch movement localized at the phrasal edges. In other words, the locations where intonational tone occurs are not typically found in the middle of prosodic phrases in Japanese.
A marginal case is the presence or absence of downstep as discussed in Section 4.4. Downstep is manifested as the manipulation of the pitch range of the AP, which may be considered a global prosodic characteristic specific to the AP as a whole rather than a local event at the phrasal boundary. However, downstep is simultaneously a cue for the presence or absence of the IP boundary and thus may also be considered a prosodic event localized at the phrasal boundary.
On the other hand, more obvious intonational contrasts that can be described in terms of types of tones (such as H% versus HL%) or the presence or absence of tones (such as with versus without BPM) are undoubtedly localized in the phrasal edges. This restriction may be considered to be one of the typological characteristics of Japanese (for a full discussion regarding these putative typological characteristics, see Igarashi et al. Reference Igarashi, Nishikawa, Tanaka and Mazuka2013: 1286–1288 and Igarashi Reference Igarashi and Kubozono2015: 556–563).
5.2 Possible Intonational Contrasts at the Beginning of the Phrase
In most studies of Japanese intonation, it is assumed that intonationally contrastive tones are localized at the phrasal end but not its beginning. However, Kawakami (Reference Kawakami1956) observed variability in the timing of the initial rise of the utterance-initial AP: rise aligned earlier or later with respect to the phrasal boundary. He also proposed that the alignment varies according to the speaker’s “emotions.” In their experimental studies, Maekawa and Kitagawa (Reference Maekawa and Kitagawa2002) showed that the F0 contour at the beginning of the utterance varies significantly depending on the speaker’s attitude and intentions (“paralinguistic information,” in their terms), such as admiration, disappointment, and suspicion. For example, in an utterance produced with suspicion, the beginning of the initial rise is delayed considerably, thus yielding a long stretch of low F0 before the rise, and the contour exhibits a concave shape in the rising movement (BPM at the end of the phrase is what is referred to as IncreQ rise in Section 3.3). An example of strong suspicion is shown in Figure 9.16 (left).
Figure 9.16 Utterances produced with a neutral attitude (left) and with suspicion (right). Yamada-san de’su ka? (Yamada cop.pol int) ‘Is it Mr. Yamada?’ The boundaries of the first and final moras (/ya/ and /ka/, respectively) are marked to indicate their lengthening in the utterance with suspicion.
The delay of the initial rise, or the long low-pitched stretch at the beginning, may be considered a manifestation of intonational contrast. Further research is required to investigate whether this pattern is indeed contrastive or a mere variant of a single pattern. If the two contours shown in Figure 9.16 (i.e. non-delayed and delayed rise) are two contrastive intonational patterns, then we need to posit an intonational contrast at the beginning of the phrase as well. One possibility may be to posit a contrast in phrasal tones, such as a H- versus LH- contrast. Another may be to posit a %L vs. %LL contrast for the initial boundary tone.
5.3 Possible Intonational Contrasts in the Middle of the Phrase
Even if there is an intonational contrast at the beginning of the phrase, it is nevertheless localized at the phrasal edge. It must be noted, however, that the Japanese intonational contour is known to exhibit variability in the middle of the phrase.
In Section 4.2, we have seen that an unaccented phrase shows a gradual fall. The fall is accounted for by interpolation between H- and L% in the X-JToBI framework. However, Sugahara (Reference Sugahara2003), based on the inter-speaker variation found in her experimental results, claims that unaccented phrases may have no gradual fall at all and may show a high plateau instead. Indeed, in Japanese spontaneous speech, it is not difficult to find unaccented phrases without a gradual fall. An example of a high plateau in an unaccented AP is shown in Figure 9.17 (middle). Gradual pitch fall (Figure 9.17 [right]) could merely be regarded as one possible realization of the pitch pattern. Future research should address whether the two contours (with a gradual fall and with a high plateau) are indeed due to two categorically distinct intonation patterns. If they are contrastive, then Japanese intonation should be regarded as having intonational contrast in the middle of an AP.
Figure 9.17 Variations in the contour of unaccented APs Ore no mondai da (I gen problem cop) ‘It’s my problem.’ An ordinal contour (left), a contour that would be accounted for by tone spreading (middle), and a rising contour without an apparent tone target (right).
In addition to a high plateau (Figure 9.17 [middle]), a gradual rise without any turning point in the contour is also observed in an unaccented AP. Figure 9.17 (right) exemplifies an F0 rise from the beginning of the unaccented AP to the final mora, without apparent targets for H- and L% (although F0 rises throughout the utterance, it is interpreted as a statement rather than a question). If this contour results from the deletion of H-, then we need to define the contrast between L% and H% for the final boundary tone (not for BPM), in addition to the tone deletion rule for H- (thus, the contour may be interpreted as a combination of the initial boundary tone %L and a final boundary tone H%). In contrast, if the contour results from the dislocation (or delay) of H-, we must posit a tone deletion rule for L% and a rule modifying the alignment of H- (thus, the contour may be interpreted as a combination of the initial boundary tone %L and the phrasal H- without a final boundary tone). In any case, given that the contour is contrastive, we may need to assume an intonational contrast in the middle of the phrase, although these putative contrasts can still be regarded as those at the phrasal edges since edge tones such as phrasal high and boundary tones are involved in the contrasts.
Intonational contrasts existing in the middle of an AP, if any, have rarely been investigated. The reason for the scarcity of research can, in my view, be attributed to the low flexibility in variability in pitch contours in the middle of an AP. The middle of an AP in Japanese is allocated to lexical tone contrast, which is manifested as the location of a sharp pitch fall, if any. To preserve lexical tone contrasts, it is impossible to delete a sharp fall in the case of an accented AP or to implement an additional sharp fall in the case of an unaccented AP. In general, it is not usual for intonational effects to cause the neutralization of lexical tone contrasts in Japanese (Kawakami Reference Kawakami1956).Footnote 3
However, there are regional dialects of Japanese without any lexically contrastive tones; these are generally called “accentless dialects.” They are expected to show more intonational contrasts as there are no restrictions imposed by lexical tones. Indeed, recent studies reveal that accentless dialects show more flexibility in varying the contours that may be involved in intonational contrasts (Maekawa Reference Maekawa1997b; Igarashi Reference Igarashi and Jun2014). Some of the contrasts occur in the middle of APs, as we will see in the following subsection.
5.4 Intonational Contrasts in the Accentless Dialects
Accentless dialects are widely scattered in non-contiguous areas of the Japanese archipelago. The high variability in the pitch contour of the accentless dialects is reported by Maekawa (Reference Maekawa1994, Reference Maekawa1999) for the Kumamoto dialect spoken in Kyūshū in which the AP basically has a rise–fall contour, %L H- L%. However, this rise–fall shape shows considerable variability. First, the alignment of H- varies. As shown in Figure 9.18a, H- can appear on virtually any syllable within the AP while in Standard Japanese it is generally aligned with the second mora. This unstable H- is called a “wandering high” by Maekawa (Reference Maekawa1994). Second, a high plateau can be observed between the rise and fall in Figure 9.18b. Maekawa (Reference Maekawa1994) accounts for this plateau by assuming the spreading of H-. Finally, the fall at the end of the phrase can be unrealized as shown in Figure 9.18c. Henceforth, these three phenomena will be called H wandering, H spreading, and L deletion, respectively.

Figure 9.18 Variations in pitch contours in AP in the Kumamoto dialect
Igarashi (Reference Igarashi and Jun2014) demonstrated that the Koriyama dialect spoken in Fukushima Prefecture also shows H wandering. Figure 9.19 illustrates the contours of six tokens of the same wh-question ‘What do you see?’ produced by a single speaker. Each utterance comprises a single AP, and the rise at the end of the AP is due to a BPM: %L H- L% LH%. We see that the location of H- in the AP varies from one token to another.
Figure 9.19 H wandering in the Koriyama dialect: Nani ga mien da-i? (what nom visible.nmlz cop-sfp) ‘What do you see?,’ produced by a female speaker born in 1957. Six tokens normalized along the temporal scale are overlaid.
H spreading can be observed in other accentless dialects. The two panels in Figure 9.20 contrast utterances without (left) and with (right) H spreading in the same wh-question produced by a single speaker in the Imaichi dialect spoken in the Northern Kanto district (Igarashi Reference Igarashi and Jun2014: 482). Both utterances comprise a single AP with a BPM (LH%) at its end. The AP in this dialect also has a rise–fall contour (%L H- L%), although the rise tends to be more concave than that in the Kumamoto dialect. In the utterance in the top panel, F0 continuously rises from the beginning of the AP to around the middle of the third word and then falls toward the end of the AP. In the utterance in the bottom panel, a sharp initial rise ends around the beginning of the second word, followed by a high (slightly rising) plateau that persists until approximately the middle of the final word.
Figure 9.20 Utterances without (top) and with (bottom) H spreading in the Imaichi dialect: Doko no narazumono ni nagur-are-ta? (where gen gang dat punch-pass-pst) ‘What gang were you punched by?,’ produced by a female speaker born in 1984.
L deletion is also observed in the Imaichi dialect (Igarashi Reference Igarashi and Jun2014: 482). Figure 9.21 shows two tokens of the same wh-question produced by the same speaker. Both utterances comprise two APs with a BPM at the end of the second AP. They differ, however, in that the fall seen in the second AP in the left panel is not found in that in the right panel.
Figure 9.21 Utterances without (top) and with (bottom) L deletion in the Imaichi dialect: Odawara no dare ni nagur-are-ta? (Odawara gen who dat punch-pass-pst) ‘Who in Odawara were you punched by?,’ produced by a female speaker born in 1984.
As the observed variability in the contours is much more obvious here than in Tokyo Japanese, especially in the case of L deletion, it seems reasonable to assume that they are not merely phonetic variants of a single intonational category but are involved in intonational contrasts, conveying distinct pragmatic meanings. Issues concerning meaning differences in these contours are addressed in Maekawa’s (Reference Maekawa1999) perception study of the Kumamoto dialect in which differing H- alignments affect the perceived politeness of the utterance, with a later alignment yielding more “polite” judgments. The utterance with the L deletion in combination with the latest H- alignment led to most “polite” judgments. Although further research is necessary to establish categorically distinct intonational patterns in the accentless dialects, it is suggested that intonational contrasts are involved in the different contours under investigation. This in turn suggests that more intonational contrasts exist in the accentless dialects than in Tokyo Japanese.
6 Conclusion
This chapter has provided an overview of the intonation system of Japanese. Japanese intonation can basically be described in terms of two edge-related components: BPM and prosodic phrasing. However, as discussed in Section 5, Japanese may have intonational contrasts in the middle of prosodic phrases; this aspect requires further investigation. An examination of the regional dialects of Japanese, as was conducted in Section 5.3, may shed new light on these issues.
The inventory and meanings of Japanese BPMs discussed in Section 3 also require further investigation. Quantitative analysis based on the experimental methods discussed in Gussenhoven (Reference Gussenhoven2004: 62–70) will help resolve this issue.
Due to space limitations, the present chapter could not adequately discuss principles that may govern prosodic phrases in Japanese. In general, prosodic phrasing cannot be predicted from syntax alone (Bolinger Reference Bolinger1972) and there can be multiple options for phrasing utterances with the same syntax. It is also known that extra-syntactic factors, such as utterance rhythm and length, as well as non-linguistic factors, such as speech rate, play a role in determining prosodic phrasing (see Shattuck-Hufnagel and Turk Reference Shattuck-Hufnagel and Turk1996) (for a review of recent discussions on issues in prosodic phrasing in Japanese, especially the prosody–syntax interface, readers may refer to Ishihara Reference Ishihara and Kubozono2015).
1 Introduction
The late 1990s to 2000s marked an important turning point in mimetics research, with researchers moving toward investigating mimetics from a theoretical standpoint and away from the traditional descriptive approach. Topics investigated also diversified to include exploration of what role mimetics play in the lexicon and how they function in the grammar of Japanese. Despite numerous publications, however, many aspects remain unaccounted for. Important remaining topics include: how similar and dissimilar mimetics are to non-mimetic words (e.g. how mimetics participate in “verbal alternations” (see Section 4)) and whether to account for such similarities and differences the adopted linguistic theories can be applied without modification or whether some adjustment is required.
This chapter takes up some of recent developments, focusing on the semantic and morphosyntactic characteristics of Japanese mimetics.Footnote 1 Section 2 introduces fundamental characteristics of mimetics, with a focus on their unique morphophonology and semantics. It touches on the notion of lexical category, showing how it is difficult to determine. The chapter then turns to some much-discussed issues, honing in on three topics. Section 3 discusses the controversy centering on Kita (Reference Kita1997), who treats mimetics as constituting a word group sui generis. Section 4 outlines different approaches to the semantics of mimetic verbs and the realization of their arguments in clause structure. Section 5 critiques work on the optionality of the quotative particle to with reduplicated mimetics. Section 6 contains concluding remarks.Footnote 2
2 Characteristics of Mimetics
2.1 Morphophonological Characteristics
In the literature of traditional Japanese grammar, mimetics have been grouped together with native Japanese words (cf. Kageyama and Saito Reference Kageyama, Saito, Kageyama and Kishimoto2016). In recent literature, however, it is more common to treat mimetics separately from Japanese native words, following McCawley (Reference McCawley1968), who treats Japanese vocabulary as constituting four types of lexical strata: native Japanese words, mimetics, Sino-Japanese words, and foreign words (loanwords from Western languages). This stratification is primarily based on phonological characteristics.
However, there are other reasons to suggest mimetics should be distinguished from native words. For example, mimetics have distinctive morphophonological forms. Akita (Reference Akita2009: 107–109) shows that mimetics fit into one of the fifteen “morphophonological templates,” arguing that the templates can be used to differentiate mimetics from non-mimetic words. The templates cover information on accent, root types (mono- versus bi-moraic) and morpheme types (reduplicated, suffixed). Table 10.1 shows the templates in square brackets, preceded by examples of mimetics.
Table 10.1 Morphological types of mimetics
| (I) Reduplicated: | (II) Non-reduplicated | ||
|---|---|---|---|
| Accented (Place of accent: Initial vowel): | |||
| a. búubuu ‘oink-oink’ | [CVV-CVV] | h. niQ(-to) ‘grinning’ | [CVQ] |
| b. púNpuN ‘reeking’ | [CVN-CVN] | i. zuN(-to) ‘zank’ | [CVN] |
| c. gúigui ‘jerking’ | [CVi-CVi] | j. poiQ(-to) ‘tossing’ | [CViQ] |
| d. méramera ‘blazing up’ | [CVCV-CVCV] | k. kaa(-to) ‘caw’ | [CVV] |
| l. kuraQ(-to) ‘dizzy’ | [CVCVQ] | ||
| Unaccented: | m. doroN(-to) ‘vanishing’ | [CVCVN] | |
| e. paNpaN ‘bursting’ | [CVN-CVN] | n. niyari(-to) ‘grinning’ | [CVCVri] |
| f. booboo ‘weedy’ | [CVV-CVV] | (III) CVCCVri | |
| g. betobeto ‘sticky’ | [CVCV-CVCV] | o. jiNwari ‘warmly moved’ [CVCCVri] | |
In the table, the representations are adjusted from Akita’s original. First, the templates are simplified by removing the pitch fall symbols (^). An acute accent is placed on some reduplicated mimetics (e.g. gúigui ‘jerking’), thus contrasting accented and the unaccented forms. Second, the templates are rearranged from Akita’s root-type-based organization to morpheme-type-based, grouping mimetics into three major morphological types: (I) reduplicated, (II) non-reduplicated, and (III) the CVCCVri forms (C=consonant, V=vowel).
Reduplicated mimetics have reduplicated roots, indicated by the presence of the morpheme boundary, which are represented by the hyphenation in the template (e.g. CVV-CVV). While many reduplicated mimetics are accented (on the initial vowel) as in (a–d), some are unaccented as in (e–g). The same sequence may be accented or unaccented, in which case, there is usually a contrast in meaning (e.g. gátagata ‘rattling sound’ versus gatagata ‘rough (surface)’).
Non-reduplicated mimetics, alternatively called “suffixed mimetics,” consist of a root with a suffix: N (=mora nasal), Q (=mora obstruent), V (=Vowel), or ri.Footnote 3 As indicated in the parentheses following the examples in the table, mimetics in this group must be marked by the quotative particle to when they appear in a sentence (cf. (3)).
The CVCCVri form, sometimes labeled “emphatic” or “ri-suffixed,” is similar to the reduplicated form in that it consists of four moras, but the internal structure is distinct: in the CVCCVri form, the accent falls on the last vowel before ri, and the consonant following the initial vowel is moraic (N or Q) as in jiNwári ‘warmly moved.’
One of the most interesting characteristics of mimetics is that many denote an event or state (Kita Reference Kita1997), whose function is normally carried out by verbs. Like verbs, mimetics express aspect (Hamano Reference Hamano1998) but unlike verbs, mimetics indicate aspect by their overall morphological form rather than by means of a particular aspectual morpheme.
While there is no coherent form–aspect correspondence in the CVCCVri form, there is a regular correspondence between the form and the aspectual meaning in both the reduplicated and non-reduplicated forms. Reduplicated mimetics, accented or non-accented, express unboundedness, which is characterized by various terms depending on the situation, such as: iterativity (pókapoka ‘hitting repeatedly’), continuity or durativity (kírakira ‘glittering continuously’), and atelicity (zarazara ‘being coarse’). By contrast, non-reduplicated mimetics express boundedness, which may be characterized by terms such as: brevity (kaa ‘a short cry of a crow’), completion of one cycle (pokaQ ‘hitting once’), punctuality (baN ‘a bang’), and telicity (koroQ ‘(someone) dies’). Moreover, the repetition of the form with a short phonological break iconically depicts the number of occurrences of the event. For example, toN expresses the sound of ‘a knock,’ but toN toN expresses ‘two knocks,’ or toN toN toN ‘three knocks,’ and so forth. Furthermore, if the form contains a prolonged vowel, it represents a prolonged duration. For instance, while gyuQ expresses a brief contracting motion of a hand, gyuuuuuuQ expresses that this contraction is maintained for a longer period. Most CVCCVri forms are aspectually non-specific. Although some may express boundedness if they have non-reduplicated counterparts, as in paQkuri ‘split open’ (cf. pakuQ ‘pop open’), most express meanings other than aspect proper, such as degree (meQkiri ‘considerably’) and quantity (e.g. taQpuri ‘a lot’).
2.2 Semantic Characteristics
According to Hinton, Nichols, and Ohala (Reference Hinton, Nichols and Ohala1994: 1), sound-symbolism refers to “the direct linkage between sound and meaning.” As convincingly shown in Hamano (Reference Hamano1998), Japanese mimetics are a quintessential example. For instance, in a CV-root-based mimetic, palatalization means “childishness” or “excessive energy,” or the word initial p expresses the meaning of “taut surface; explosive movement” (p. 99), or the final -ri of a CVCV-root-based mimetic expresses “quiet ending” (p. 174). Japanese mimetics also display what Hinton, Nichols, and Ohala call “imitative sound-symbolism” wherein the word itself mimes the meaning, as with “onomatopoeic words and phrases representing environmental sounds” (p. 3), but the entire system goes far beyond the category of onomatopoeia.
In traditional Japanese grammar, mimetics are frequently classed into semantic groups (cf. Kindaichi Reference Kindaichi and Asano1978). The most common grouping is a tripartite classification of giongo ‘phonomimes,’ gitaigo ‘phenomimes,’ and gijōgo ‘psychomimes’ (translations due to Martin Reference Martin1975: 1025). This classification is adopted in many recent linguistic studies.
a. Phonomimes (giongo):
Animal cry: kórokoro ‘croaking of a frog’ Human voice: géragera ‘guffawing’ Sound: bashaQ ‘sound of a splash’ páchipachi ‘popping sound’ b. Phenomimes (gitaigo)
Manner: hírahira ‘fluttering’ gúruguru ‘going round and round’ Condition: kushakusha ‘being all crumpled’ betobeto ‘sticky’ c. Psychomimes (gijōgo)
Bodily sensation: múzumuzu ‘itchy’ chíkuchiku ‘prickling’ Psychological state: íraira ‘irritatedly’ gaQkari ‘feeling disappointed’
Phonomimes cover mimetics expressing an animal cry, a human voice, and sounds more generally. Japanese abounds in mimetics representing environmental sounds; for example, bashaQ ‘sound of a splash,’ kátakata ‘rattling sound,’ páchipachi ‘popping sound.’ Phenomimes cover expressions of manners (dynamic events) and conditions (states). Japanese has a variety of mimetics expressing manner of motion; for example, pyókopyoko ‘hopping,’ gúruguru ‘spinning,’ and zúruzuru ‘sliding.’ Japanese is also rich in mimetics expressing conditions that rely on tactility; for example, betobeto ‘sticky,’ tsurutsuru ‘smooth’ (Kindaichi Reference Kindaichi and Asano1978). Psychomimes cover two groups of mimetics: those expressing bodily sensations (e.g. hírihiri ‘smarting pain,’ múkamuka ‘feel nauseous’) and those expressing a psychological state (e.g. íraira ‘irritatedly’). As implied by the translations, mimetics usually have detailed meanings (e.g. yóchiyochi ‘toddling’). Because of this, they are sometimes characterized as “hyponyms” and contrasted with verbs which tend to have a more neutral meaning (aruku ‘walk’). The three types with detailed meanings in (1) constitute the semantic prototypes of mimetics.
It is essential to recognize three points about these semantic categories. First, the relationship between a mimetic and its categorical affiliation is not necessarily one-to-one, as mimetics may simultaneously express what is gathered through different senses. For instance, kórokoro can be categorized as a phonomime or a phenomime, as it can simultaneously express what is sensed from both audition (a light sound an object emits as it rolls) and vision (the manner of a small object rolling). This sharply contrasts with a non-mimetic word like mawaru ‘roll’; this vision-based descriptive word leaves the sound the object emits unexpressed, even if it has emitted a sound. Second, many mimetics extend their meanings across (sub-)categories. For instance, mimetics expressing bodily sensation are often extended to denote a psychological state, as in dókidoki, which can mean ‘heart pounding’ (sensation) or ‘feeling nervous’ (psych.), or a phonomimic sense is extended to a phenomimic sense, as in káchikachi, which can mean ‘click click’ (sound) or ‘being very hard’ (condition) (see Akita Reference Akita2013 for more on this topic). Third, a small group of “demimeticized” (Akita and Usuki Reference Akita and Usuki2016: 255) mimetics resides outside the tripartite classification in (1). Many take the CVCCVri form, expressing concepts such as quantity taQpuri ‘a lot,’ degree meQkiri ‘considerably,’ or pace yuQkuri ‘slowly.’ It is important to make note of their presence when making a generalization about mimetics in Japanese; they do not fit into the semantic prototypes in (1), even though they are mimetics in terms of their morphophonological forms.
2.3 Lexical Category
Among the forms introduced in Table 10.1, there is a major division in the pattern of how the mimetics behave in a morphosyntactic context (there are exceptions, notably CVCCVri),Footnote 4 with each group associated with a different lexical category.
The unaccented reduplicated mimetics (e.g. tsurutsuru ‘being slippery’) occur in some of the same environments as nouns [N] and adjectival nouns [AN], although semantically, they are adjectival in that they express a state.
a.
Kore wa tsurutsuru (/ki/gōka) da. [N/AN] this top mim wood/gorgeous cop.npst ‘This is slippery (/wood/gorgeous).’ b.
Tsurutsuru (/gōka/ mizu) ni natta. [N/AN] water cop.adv became ‘It became smooth (gorgeous/water)’ c.
tsurutsuru (/ki) no yuka (cf. ?tsurutsuru na/*ki na yuka) [N] cop.att floor ‘slippery (/wooden) floor’ d.
gōka na yuka (cf. gōka *no yuka) [AN] gorgeous cop.att floor ‘gorgeous floor’ e.
Yoru made sono tsurutsuru ga tsuzuita. [N] night till that mim nom continued ‘That smoothness continued till night (after brushing my teeth with the toothpaste).’
In the predicate position, the unaccented reduplicated mimetic is accompanied by the copula da as in (2a). This follows the pattern of N such as ki ‘tree’ and AN such as gōka ‘gorgeous,’ both of which require da. Similarly, as a complement of naru ‘become,’ the mimetic requires ni, following the pattern of both N and AN (see (2b)). In contrast, the mimetic in the prenominal position follows the pattern of nouns, requiring no to modify the head noun (see (2c)). The AN’s pattern, which usually requires na (see (2d)), seems less acceptable, though an Internet search indicates the form with na is also used, albeit much less frequently (see Uehara Reference Uehara1998 for the similar characteristic displayed by non-mimetic ANs). Further, the mimetic can be preceded by the demonstrative sono, following the pattern of nouns (e.g. sono hon ‘that book’) (see (2e)), but many ANs cannot be preceded by a demonstrative (*sono gōka ‘that gorgeous’). This distribution suggests that, overall, unaccented reduplicated mimetics more closely follow the morphosyntactic patterns of nouns.
The remaining large majority (accented reduplicated, non-reduplicated and CVCCVri forms) are usually classed as adverbs, as they syntactically modify the verb or adjective that co-occurs with them in the sentence. As adverbs, mimetics appear marked by the quotative particle to, which is obligatory for non-reduplicated forms (3a) but syntactically optional for the accented reduplicated (3b) and the CVCCVri forms (3c), as indicated by the parentheses around to (Tamori and Schourup Reference Tamori and Schourup1999: 65–68).
a.
Kodomo ga nikoQ to waratta. [Non-reduplicated] [Adv.] child nom mim quot laughed ‘The child laughed with a smile.’ b.
Kodomo ga níkoniko (to) waratta. [Reduplicated, accented] [Adv.] mim quot ‘The child laughed smilingly.’ c.
Ocha o yuQkuri (to) nonda. [CVCCVri] [Adv.] tea acc mim quot drank ‘I drank tea slowly.’
Furthermore, these three forms can be used as verbs, as exemplified below; in such cases, the mimetics are followed by a light verb suru ‘Lit. do.’ These forms are usually called “mimetic verbs” with suru carrying the tense.Footnote 5
a. Non-reduplicated: mukaQ to suru ‘get angry,’ shiN to suru ‘become quiet,’ pitaQ to suru ‘fit,’ karari to suru ‘dry’
b. Reduplicated, accented: múkamuka suru ‘feel angry,’ píNpiN suru ‘be healthy,’ húsahusa suru ‘be affluent,’ pásapasa suru ‘dry’
c. CVCCVri: gaQkari suru ‘get disappointed,’ hiQsori suru ‘be quiet,’ huNwari suru ‘be fluffy,’ gaQshiri suru ‘firm’
Beyond this major division, it is not easy to assign a category to mimetics. For one thing, it is unclear whether the mimetics in (4) (without suru) should be assigned a category at all, when the entire sequence with suru is already classed as verb (see Sells Reference Sells and Iwasaki2017 for more on this topic). For another thing, mimetics can appear in a sentence without any verbs as in (5a–b) or in isolation as in (5c).
a.
Shachō wa uriage-zō ni niNmari. president top sales-increase at mim ‘The president (is) grinning (looking) at the increase in sales.’ b.
… totsuzen no kōtsūjiko no shirase ni urouro. sudden gen car.accident gen news to mim ‘(The mother got into) panic to hear the sudden news of (her son’s) car accident.’ (Tamori and Schourup Reference Tamori and Schourup1999: 88) c. SutoN.
mim
‘(Something) fell.’
The puzzle is whether the mimetic is an adverb in (5a–b), as Tamori and Schourup (Reference Tamori and Schourup1999: 88) assume (in their view, the clause-mate verb has simply been elided), or all the mimetics in (5) are verbal, since they are, arguably, the constituting element of the predicate, even though the mimetic has no morphosyntactic indications of a verbal element (i.e. there is neither an inflectional morpheme nor a verbal element such as suru or iu).
Given the heterogeneous patterns, it does not seem appropriate to accept the existence of a lexical category called “mimetics,” despite claims to the contrary in references to sound-symbolic words in other languages (cf. Newman Reference Newman1968). The inquiry must remain open as to whether mimetics have a lexical category at all, as questioned in Tsujimura (Reference Tsujimura2001). Alternatively, can mimetics be grouped into a traditional lexical category, such as noun and verb (or even a new hybrid category)? A final unresolved point of inquiry is how forms are related in the word-formation process.
3 Semantic Uniqueness of Mimetics
3.1 Two-dimensional Semantic Representation
Kita’s work (Reference Kita1997) is an important cornerstone in the research history of the semantics of mimetics. It paved the way for a reoriented focus, moving from a morphophonological descriptive account of the word group – mimetics – to an explanatory investigation of why mimetics are “unique,” with a special focus on their semantics. His discussion illuminated various characteristics of mimetics hitherto taken for granted, trivialized, or simply neglected. These include: (i) the underlying principle of the form–meaning relationship of mimetics is iconicity; (ii) the semantics of mimetics have a direct appeal to sensory, motor, and affective aspects. Both characteristics clearly set the semantics of mimetics apart from those of prosaic words.
To capture the unique semantics of mimetics, Kita claims that it is necessary to posit what he calls a “two-dimensional semantic representation,” whereby the semantics of mimetics are distinguished from those of non-mimetic words. Namely, non-mimetic words belong to “the analytic dimension” in which “[a] thought or experience is represented as a proposition,” whereas mimetics belong to “the affecto-imagistic dimension” where “different facets of an experience are represented [which] include the affective, emotive, and perceptual activation in an experience” (p. 387). To defend the two-dimensional model, Kita argues that “the semantics of a mimetic and that of other parts of a sentence are not fully integrated with each other despite the fact that they are syntactically integrated” (p. 388).
While Kita’s contribution cannot be emphasized enough, his arguments are not problem-free. First, Kita’s proposal is a direct challenge to Jackendoff’s “The Conceptual Structure Hypothesis,” which states: “There is a single level of mental representation, conceptual structure, at which linguistic, sensory and motor information are compatible” (Reference Jackendoff1983: 17, emphasis in original). Kita simply says this “single level of mental representation” is only for non-mimetic words (Reference Kita1997: 380, 409), naturally assuming the conceptual structure is unable to cover the semantics of mimetics. He takes no notice of Jackendoff’s point that the conceptual structure is expected to “be rich enough in expressive power to deal with all things expressible by language … [and] to deal with the nature of all the other modalities of experience as well” (Reference Jackendoff1983: 17). This seems to imply that Jackendoff’s system is intended to be able to cover the semantics of mimetics. Kita neither explains why Jackendoff’s model is inadequate nor elucidates why having some unique semantic characteristics alone guarantees an independent dimension for mimetics. Second, Kita (Reference Kita1997) claims that, while the semantic representation of mimetic adverbs belongs to the affecto-imagistic dimension, that of “nominal mimetics” belongs to both the analytic and the affecto-imagistic dimension.Footnote 6 As Tsujimura (Reference Tsujimura2001: 416) aptly notes, the use of the categorial status (the notion belonging to the analytic dimension) as a key criterion for assigning mimetics to the affecto-imagistic dimension is irrational. Third, Kita makes a generalization about mimetics without encompassing all types of mimetics. As noted in Section 2.2, some mimetics express prosaic word-like meanings, deviating from the semantic prototypes. For these mimetics, the dimensional assignment leads to a clash. For instance, taQpuri ‘a lot’ must belong to the affecto-imagistic dimension since it is a mimetic, but at the same time, it must belong to the analytic dimension since it is a quantifier. Fourth and finally, when there is a certain linguistic behavioral difference between mimetic and non-mimetic words, Kita automatically attributes it to the dimensional difference, without exploring other possible causes. Examples include a phenomenon of negation, a semantic redundancy, and certain selectional restrictions, as elaborated below.
3.2 Linguistic Evidence of the Need to Separate the Dimension for Mimetics
3.2.1 Negation
Some earlier work on sound-symbolic words argues sound-symbolic words and negation are incompatible. For instance, Diffloth notes: “[In Korean] one cannot negate the ideophone itself but only the appropriateness of a given ideophone to describe a certain situation” (Reference Diffloth1972: 446, note 4). Interpreting Diffloth’s comment to mean “metalinguistic negation is the only possible interpretation” for mimetics, Kita uses an example like (6) to substantiate his argument.
a.
Tama ga shizukani korogatta no de wa nai. ball nom quietly rolled nmlz cop top neg (i) ‘It was not the case that a ball rolled quietly.’ (ii) ‘It was not a ball that rolled quietly.’ (iii) ‘It was not rolling quietly that a ball did.’ b.
Tama ga gorogoro to korogatta no de wa nai. ball nom mim quot rolled nmlz cop top neg (i) *‘It was not the case that a ball rolled gorogoro.’ (ii) ??‘It was not a ball that rolled gorogoro.’ (iii) *‘It was not rolling gorogoro that a ball did.’ (adapted from Kita Reference Kita1997: 390)
Example (6) contains the negated no da construction, wherein the content of the clause marked by the nominalizer no is negated entirely or partially. The two sentences minimally contrast in the element expressing the manner: (6a) has a non-mimetic adverb shizukani ‘quietly,’ and (6b) contains a mimetic gorogoro ‘manner of a heavy object rolling.’ According to Kita, with the non-mimetic adverb, negation can apply to various parts of the sentence, thus yielding different interpretations, such as (i–iii) in (6a), but the same is not true of the sentence with the mimetic, as indicated by the marks representing downgraded degrees of acceptability, such as (i–iii) in (6b). Kita explains (6b) is acceptable if the mimetic is prosodically emphasized, in which case, the sentence can be given a metalinguistic negation interpretation.Footnote 7 Though he does not provide the entire sequence, it is understood that (6b) can be followed by a sentence like Korokoro to korogatta no da ‘It was korokoro (‘manner of a light object rolling’) that (it) rolled,’ which rejects the heaviness of the weight of the ball but not the occurrence of the ball’s rolling per se. The contrast between (6a) and (6b) obtains, he argues, because “logical negation … [is] an operation in the analytic dimension” (Reference Kita1997: 390); that is, (6a), with all the elements from the analytic dimension, can participate in logical negation, but (6b) cannot, as it contains a mimetic, an element which does not belong to the analytic dimension.Footnote 8
For Tamori and Schourup (Reference Tamori and Schourup1999: 156), (6a) and (6b) are not qualitatively very different, and they provide examples like (7), showing logical negation is indeed operative.
a.
Gorogoro to korogatta no wa tama de wa naku, taru datta. mim quot rolled gen top ball cop top neg barrel was ‘It was not a ball that rolled gorogoro, but was a barrel.’ b.
Tama ga shita koto wa gorogoro to korogaru koto ball nom did event top mim quot roll event de wa naku, ikioiyoku hazumu koto de atta. cop top neg vigorously bounce event cop existed ‘What the ball did was not rolling gorogoro, but was bouncing vigorously.’ (Tamori and Schourup Reference Tamori and Schourup1999: 157)
These examples show that the negation of part of the proposition is possible for the semantic argument tama ‘ball’ in (7a) and for the event including the manner expressed by the mimetic in (7b), even though Kita questions such interpretations, as in (6b-ii) and (6b-iii) (for a fuller range of examples in Japanese and English, see Tamori and Schourup Reference Tamori and Schourup1999: 154–160).
Tamori and Schourup (Reference Tamori and Schourup1999: 158) rightly conclude that if (6b) is awkward at all, it stems from the violation of a general pragmatic condition that the sentence should have one focus. Example (6b) is awkward as it has two focus-bearing elements: the negation and the mimetic.
3.2.2 Semantic Redundancy
The second piece of evidence which leads Kita (Reference Kita1997) to claim the need to distinguish between the two dimensions is the absence of semantic redundancy observed in an example like (8). In the description of a fast-walking event, (8a) contains a mimetic sutasuta to ‘manner of walking hurriedly,’ and (8b) has a non-mimetic adjunct isogi-ashi de ‘with hurried steps,’ with both modifying the verbal phrase haya-aruki o shita ‘did a fast-paced walking.’
a.
Tarō wa sutasuta to haya-aruki o shita. top mim quot haste-walk acc did ‘Taro walked hurriedly.’ b.
#Tarō wa isogi-ashi de haya-aruki o shita. top hurried-feet with haste-walk acc did ‘Taro walked hastily hurriedly.’ (adapted from Kita Reference Kita1997: 388)
According to Kita (Reference Kita1997: 389), even when the mimetic co-occurs with a verbal phrase with a similar meaning, as in (8a), the sentence remains felicitous without being wordy (because the mimetic comes from the affecto-imagistic dimension, distinct from the analytic dimension to which the other phrase belongs). But if the mimetic is replaced by an adjunct as in (8b), the sentence becomes infelicitous, as it causes semantic redundancy (because all elements belong to the analytic dimension).
Tsujimura (Reference Tsujimura2001) points out that the wordiness in (8b) is caused by the repetition of virtually the same meaning in isogi-ashi de and haya-aruki o, that is, “both refer to feet and describe the manner of fast walking” (p. 411). Meanwhile, the lack of wordiness in (8a) can be explained by the fact that “sutasuta to refers to fast walking but also expresses smoothness of movement” (p. 411), that is, two different things are expressed. Accordingly, the sentence is rendered felicitous. Tsujimura’s point is corroborated in the following constructed example.
(9)
#Uta ga umai hito wa goman-to gorogoro iru. song nom skilled person top large.numbers-in mim exist ‘There are many people in large quantities who are good at singing.’
Example (9) has a mimetic gorogoro and a non-mimetic adverb goman-to, both meaning ‘in large numbers.’ Kita’s theory would predict the sentence to be felicitous, as the semantic representations of the words belong to different dimensions, but this is not the case; the sentence sounds redundant, as both the mimetic and the non-mimetic have practically the same meaning. As this example implies, the semantic coverage of the two words must be first examined to determine redundancy before considering semantic dimension.
3.3 Selectional Restrictions
Kita claims mimetics impose “selectional restrictions” on the theme objects, but “[i]n general, mimetics never impose restrictions on [the] agent” (Reference Kita1997: 404).Footnote 9 Accordingly, in (10) gorogoro ‘manner of a heavy object rolling’ requires that the object be heavy but the heaviness cannot refer to the weight of the agent, as the acceptability contrast in the translation below indicates.
(10)
Dareka ga tama o gorogoro to korogashita. someone nom ball acc mim quot rolled a. ‘Somebody rolled a heavy ball.’
b. *‘Somebody heavy rolled a ball.’ (adapted from Kita Reference Kita1997: 403)
Tsujimura (Reference Tsujimura2001: 412) cogently points out that the oddity of the (10b)-reading comes from the fact that the mimetic is simply inappropriate to describe the subject. That is, gorogoro in (10) describes a motion of an object that rolls. Even if it is used in a transitive sentence like (10), the fact remains that it describes something about the object, not about the subject. Naturally, the (10b)-reading is unavailable.
Tsujimura further shows that the unavailability of the (10b)-reading is not because the mimetic never imposes a restriction on the agent, as in some instances, the mimetic does precisely that.
(11)
Gamu o kuchakucha suru chewing.gum acc mim do ‘Somebody is chewing gum.’ (adapted from Tsujimura Reference Tsujimura2001: 413)
In (11), Tsujimura notes, “the agent must be a human being. If the mimetic does not impose a selectional restriction on the agent, it could be a dog or a bird, or even a car, none of which would be acceptable” (Reference Tsujimura2001: 413).
In short, not all Kita’s arguments are convincing. This does not require us to deny that mimetics have unique semantics. Far from it. Kita’s work has caused other researchers to take the semantics of mimetics very seriously. The influence of mimetics on the interpretation of event participants or “argument structure” is an important issue, especially when the mimetic is used as a verb, as discussed in the next section.
4 Mimetic Verbs and Verbal Alternations
One of the major concerns of work dealing with the theories of the syntax–semantics interface is how to account for the ability of one verb to co-occur with a different set of arguments or with a set of arguments realized in distinct encoding, depending on the morphosyntactic context. This phenomenon is sometimes called “verbal alternation” and can take several forms. For instance, in valency alternation, a labile verb such as eat can be used monovalently, as in Kim ate, or bivalently, as in Kim ate the apple (cf. Kulikov and Lavidas Reference Kulikov and Lavidas2014). In locative alternation, such as Lee spread paint on the wall versus Lee spread the wall with paint, the two arguments – the entity being acted upon (paint) and the location affected by the action (wall) – appear in distinct morphosyntactic codings in the respective portrayals, keeping the actor (Lee) constant (e.g. Iwata Reference Iwata2005).
Verbal alternations have been approached from two theoretical standpoints, projectionist and constructionist (cf. Levin and Rappaport Hovav Reference Levin and Rappaport Hovav2005; Van Valin Reference Van Valin and Pustejovsky2013). According to Levin and Rappaport Hovav, “the fundamental assumption [of projectionism is] that a verb’s lexical entry registers some kind of semantically anchored argument structure, which in turn determines the morphosyntactic expression – or projection – of its arguments” (Reference Levin and Rappaport Hovav2005: 186). In contrast, constructionist theorists posit that “the lexical entry of the verb registers only its core meaning … and this core meaning combines with the event-based meanings which are represented by syntactic constructions themselves or are associated with particular syntactic positions or substructures” (Reference Levin and Rappaport Hovav2005: 190). In other words, on the one hand, projectionist theories posit a specific representation for the lexical entry of the verb, and this determines the syntactic structure of the clause (e.g. Foley and Van Valin Reference Foley and Van Valin1984); constructionist theories, on the other hand, posit an underspecified representation for the verb while postulating the construction as the supplier of the rest of the necessary information (e.g. Pustejovsky Reference Pustejovsky1995).
Tsujimura (Reference Tsujimura, Fried and Boas2005) is perhaps the first to extend these two positions to discuss the syntactico-semantic characteristics of mimetic verbs, a verbal form in which a mimetic is followed by suru (cf. (4)). She discusses the case of multiple sets of arguments attributed to the polysemy of a mimetic verb. She offers a constructionist account, addressing the difficulty of decomposing the image/iconicity-based meaning of mimetic verbs, in comparison with prosaic verbs: “The so-called ‘meaning’ of a mimetic verb should not be attributed solely to the mimetic word itself, but rather it results from more global information obtained throughout a sentence in which the mimetic verb appears” (Reference Tsujimura, Fried and Boas2005: 139). To prove her point, she discusses an instance like (12).
a.
Doa no totte ga burabura-suru. door gen knob nom mim-do ‘The door knob is loose.’ b.
Tarō ga kōen o burabura-shita. nom park loc mim-did ‘Taro strolled leisurely in the park.’ c.
Tarō ga uchi de burabura-shi-te iru. nom home at mim-do-conj exist ‘Taro is being lazy at home.’ d.
Tarō ga ashi o burabura-suru. nom leg acc mim-do ‘Taro swings his legs.’ (adapted from Tsujimura Reference Tsujimura, Fried and Boas2005: 147)
This example features the mimetic verb burabura-suru. As the translation suggests, the mimetic verb can have multiple meanings, depending on the context: (i) to sway to and fro, (ii) to stroll, or (iii) to idle the time away. Tsujimura argues that without the aid of co-occurring NPs and the morphological cues on the nouns and the verb, the verb’s meaning cannot be determined. For instance, (12a) and (12d) have the same verbal form, both expressing the sense of swaying.Footnote 10 The precise meaning is attained by referring to the co-occurring NPs: (12a) shows the intransitive frame with an inanimate NP; together they yield the stative meaning of the object being loose. By contrast, (12d) shows the transitive frame with the nominative-marked agent and the accusative-marked body part; together they yield the dynamic sense of the agent’s swinging action. If the agent NP co-occurs with an o-marked NP expressing a spacious place where people usually take a walk, as in (12b), the verb yields the sense of strolling, but with a de-marked NP expressing the sense of a home where someone usually stays and relaxes, the verb yields the idling sense as in (12c). In short, “these varying ‘meanings’ are not attributed to the mimetic verb alone, but should be deduced from the construction in which it appears” (Reference Tsujimura, Fried and Boas2005: 148).
Eschewing the constructional argument, Kageyama (Reference Kageyama and Frellesvig2007) articulates a projectionist view whereby “mimetic words are inherently outfitted with rather clear-cut conceptual meanings” (p. 30). In his proposal, Kageyama decomposes the meaning of a mimetic verb into two parts, the mimetic and suru ‘do,’ each represented by a Lexical Conceptual Structure (LCS). The LCS of suru has a skeletal representation and serves as the template into which the LCS of the mimetic is “integrated” via “Semantic Incorporation” (p. 46). Kageyama divides the LCS for suru into two groups: one with Agent or Experiencer subjects (subdivided into Type 1 through Type 4), and the other with Theme subjects (subdivided into Type 5 through Type 7). For instance, one of the senses of burabura suru (introduced in (12b), glossed as ‘stroll leisurely’) is classed as Type 3; it contains “verbs of locomotion designating continuous movement in random directions in a place” (p. 54). Example (13) illustrates the LCS for the mimetic verb.
a.
Type 3 suru: [event x CONTROL […]] b.
Mimetic: x MOVE <Manner α> [Route ] (e.g. burabura: [x MOVE <Manner SWAYING> [Route the park]]) c.
Mimetic verb: [event x CONTROL [event x MOVE <Manner α> [Route ]]] (adapted from Kageyama Reference Kageyama and Frellesvig2007: 56, 59)
Type 3 suru has a skeletal meaning of an event controlled by the subject x, as represented in (13a). To this, the meaning of a mimetic, shown in (13b), is integrated. Here, the LCS indicates that x moves in a particular manner (where the variable α is replaced by the specific manner expressed by the mimetic such as ‘swaying’ in the case of burabura) in a certain place, to be indicated after “route” (e.g. the park). Example (13c), the LCS of the mimetic verb, presents the outcome of Semantic Incorporation, yielding the reading, “the subject x controls his own movement along a Route” (p. 59). In his in-depth discussions of examples like (13), then, Kageyama shows the semantic composition of mimetic verbs can be achieved “without resorting to the notion of Construction” (p. 34).
Elsewhere, I offer another projectionist view, working within the framework of Role and Reference Grammar (RRG) (Toratani Reference Toratani, Nolan and Diedrichsen2013). RRG is, categorically, a projectionist theory, but it does not alienate the notion of “constructions.” Following Van Valin’s (Reference Van Valin and Pustejovsky2013) analysis of verbal alternations, which incorporates Pustejovsky’s (Reference Pustejovsky1995) constructional theory or “co-composition,” I take a consolidating position; the seemingly opposing views that center on mimetic verbs are, in fact, complementary, that is, the projectionist account corresponds to information-processing from the speaker’s perspective, and the constructionist account corresponds to information-processing from the hearer’s perspective. In other words, in the former, the speaker knows exactly which verb to use, pulling it from the lexicon to make the utterance (represented as semantics-to-syntax linking). By contrast, in the latter, the hearer must rely on the morphosyntactic or constructional clues, as well as the meanings of the lexical items, to arrive at the precise meaning of the verb and the entire sentence (represented as syntax-to-semantics linking).
As far as semantic representation is concerned, the meaning of mimetic verbs can be decomposed, in step with Kageyama (Reference Kageyama and Frellesvig2007), but the representation differs from Kageyama’s in that the mimetic verb is analyzed as having a lexical entry of its own, not as having two components: the LCS of the mimetic and the LCS of suru. For instance, the third sense of burabura-suru introduced in (12c) ‘(someone) is being lazy (at home),’ is represented as do′ (x, [loaf.around′ (x)]) (Toratani Reference Toratani, Nolan and Diedrichsen2013: 52).
Meanwhile, Tsujimura revisits her constructional view of mimetic verbs to posit two types of mimetics: those that parallel prosaic verbs with the “conventionalized and fossilized” (Reference Tsujimura and Rainer2014: 304) meanings typically listed in (mimetic) dictionaries, and those that do not. In the former case, Tsujimura agrees with Kageyama (Reference Kageyama and Frellesvig2007): a “unified analysis for mimetic and prosaic verbs” is valid (Tsujimura Reference Tsujimura and Rainer2014: 304). But in the latter, she emphasizes a difference, saying they “exhibit a high degree of extension to innovative meanings and a relative freedom of argument structure possibilities, far beyond the level to which prosaic verbs have access.” For instance, speakers may use a mimetic verb, such as gachagacha-suru on the fly, mutatis mutandis, intending to mean “some chaotic situation such as doing odds-and-ends that lack organization” (Reference Tsujimura and Rainer2014: 307), presumably basing it on the dictionary meaning of ‘a rattling/clattering noise.’
As this summary indicates, verbal alternations involving mimetic verbs require more intense investigation. The two works adopting the projectionist position (Kageyama Reference Kageyama and Frellesvig2007; Toratani Reference Toratani, Nolan and Diedrichsen2013) make a good start, but both rely on a rather simple decomposition to represent the meaning of mimetics. While it may be sufficient for the purpose of syntax-semantics linking, it is likely not fine-grained enough to encapsulate the detailed and elastic aspects of the meanings of mimetic verbs which distinguish them from prosaic verbs, as Tsujimura (Reference Tsujimura and Rainer2014) argues. A challenge for projectionist theorists, then, is to develop a richer representational system, capable of handling the unique aspects of the meanings of mimetic verbs. A similar challenge should be issued to constructionist theorists. To date, to the best of my knowledge, no work has formally illustrated the precise mechanism whereby the underspecified meaning of a mimetic verb can receive a specific reading when placed in a specific context.
Whichever stance is taken, how the verbal alternations of mimetic verbs relate to those of non-mimetic verbs and what the (non-)relatedness implies to semantic and morphosyntactic theories, in general, remain unclear.
5 Marking Alternation on Reduplicated Mimetics
Another topic now generating considerable interest is the optionality of marking by the quotative particle to on reduplicated mimetics used adverbially. As introduced in (3), to is obligatory for non-reduplicated mimetics but is syntactically optional for reduplicated mimetics. The central question is what determines the choice of one marking over the other. The literature to date suggests different factors are at work, independently or intertwiningly. Some syntactic and semantic factors are introduced below.
5.1 Syntactic Condition
The relevance of the marking distinction to syntactic realization of mimetics is first discussed in Tamori (Reference Tamori1980). He notes to-marking is usually optional for reduplicated mimetics in normal clause-internal position, such as pakupaku ‘munch-munch’ (cf. (14a)), but it becomes obligatory at the postposed position (cf. (14b)) and a preferred choice at the preposed position (cf. (14c)).Footnote 11
a.
Jon ga pakupaku to/Ø pan o tabeta. nom mim quot bread acc ate ‘John ate the bread, with a munch-munch.’ b.
Jon ga pan o tabeta, pakupaku to (*Ø). nom bread acc ate mim quot ‘John ate the bread, with a munch-munch.’ c.
Pakupaku to (??Ø) Jon ga pan o tabeta. mim quot nom bread acc ate ‘With a munch-munch John ate the bread.’ (adapted from Tamori Reference Tamori1980: 165)
While Tamori (Reference Tamori1980) pioneers in relating the mimetic’s marking distinction to the syntactic position, his discussion focuses on the clause-external positions, leaving the clause-internal positions untreated.
Following up on the topic, elsewhere I examine literary texts (309 tokens) to investigate how reduplicated mimetics distribute within a clause. I find a more dominant pattern of Ø-marked forms occupying the immediately preverbal position (151/187=81%) than to-marked forms (58/122 = 48%) (Toratani Reference Toratani2006).Footnote 12 Although the results reveal a tendency, a critical question remains unconsidered. Is to- or Ø-marking obligatory for some mimetics at the immediately preverbal position, and if so, what types of mimetics are they? Section 5.2 partially addresses the issue, but for more on the topic, see Akita and Usuki (Reference Akita and Usuki2016) and Toratani (Reference Toratani and Iwasaki2017).
5.2 Semantic Condition
The literature suggests that the distinction of to/Ø-marking is related to different semantic factors. For instance, mimetics expressing abstract notions tend to be Ø-marked (e.g. frequency: chokuchoku-Ø iku ‘frequently go,’ degree: meQkiri-Ø heru ‘drastically reduces’ (Tamori and Schourup Reference Tamori and Schourup1999: 68–69). In addition, the marking can help identify which verb/adjective the mimetic is modifying (Toratani Reference Toratani2006: 419). For instance, (15) has a mimetic gorigori ‘sound of grating’ with two potential hosts: futoi ‘thick’ and mawasu ‘turn.’
a.
Mata gorigori to futoi kubi o mawasu.Footnote 13 again mim quot thick neck acc rotate ‘Again, (he) rotated his thick neck gorigori (= with audible noises).’ b.
#Mata gorigori futoi kubi o mawasu. again mim thick neck acc rotate ‘Again, (he) rotated his noisily thick neck.’
If the mimetic is bare (i.e. Ø-marked) as in (15b), it favors a reading in which the element next to it is taken to be its host, yielding an infelicitous reading (as gorigori and futoi are semantically incompatible), but if it is to-marked as in (15a), it allows a reading whereby the verb in the distance is interpreted as its host, yielding the felicitous reading (as gorigori and mawaru are semantically compatible).
Beyond these, another much-discussed semantic factor is concerned with the mimetic-host predicate relation. Elsewhere, I argue that the distinction of to/Ø-marking indicates whether the mimetic co-occurs with a particular type of host predicate.
(16)
To-marking indicates a “semantic mismatch between the predicted host and the host that actually co-occurs, whereas Ø-marking suggests a match”
(Toratani Reference Toratani2006: 419).
This applies to the contrast observed in (17).
a.
Kare wa nikoniko warat-te iru. he top mim laugh-conj exist Lit. ‘He is laughing smilingly (He is smiling).’ b.
Kare wa nikoniko to akarui. he top mim quot cheerful Lit. ‘He is cheerful, smiling.’
Example (17b) has a bare (i.e. Ø-marked) mimetic. This is consistent with (16) because the mimetic–host relation is considered a match. Mimetics typically co-occur with a verb belonging to the mimetic’s superordinate category. Warau ‘laugh’ is considered to belong to the superordinate category of the mimetic as it subsumes the meaning of the mimetic nikoniko ‘smilingly,’ thereby qualifying as the predicted host of the mimetic. Example (17b) has a to-marked mimetic. This is also consistent with (16). Here, the mimetic–host relation is considered a mismatch, as the host akarui ‘cheerful’ does not belong to the superordinate category of the mimetic; naturally it is an “atypical host,” or an unpredicted host.
Although the marking distinction in (16) seems intuitively reasonable, the distinction turns out to be less straightforward. Example (16) suffers from a vague usage of the term “predicted host,” which, in turn, builds on an unexplained term “typical host.” Without clear definitions, the task of prediction concerning to/Ø marking is practically impossible.
That said, if the criteria can be set, the empirical validity of (16) can readily be tested. For the sake of argument and for present purposes, I searched literary texts and gathered 317 instances of reduplicated mimetics that occur at the immediately preverbal position; I then examined whether the marking of the mimetics conforms to the prediction made by (16), assuming “typical hosts” refer to the verbs/adjectives listed as part of the entry in a mimetic dictionary; for example, mawaru ‘turn’ is the typical host for the mimetic kurukuru ‘roll-roll.’Footnote 14 If (16) is tenable, we should expect two types of reduplicated mimetics: Ø-marked mimetics co-occurring with their typical hosts, and to-marked mimetics co-occurring with their typical hosts. This turns out not to be the case, however. When the two variables (the marking and the host) have two values each (to/Ø, typical/atypical, respectively), we get four logical combinatory possibilities. As Figure 10.1 shows, all combinations are found in my examples (the number represents token frequency or the ratio to the total).
Figure 10.1 Distribution of to/Ø-marked mimetics
Examples for each combination are shown as follows:Footnote 15 (a) Ø-marked mimetics with typical hosts (níkoniko warau ‘laugh smilingly,’ óioi naku ‘cry boohoo,’ púkapuka uku ‘float bobbingly’); (b) Ø-marked mimetics with atypical hosts (níkoniko tanoshii ‘fun smilingly’ [instead of warau ‘laugh’], nóronoro kurasu ‘live sluggishly’ [instead of susumu ‘move forward’], hírahira oyogu ‘(goldfish) swim wavingly’ [instead of chiru ‘scatter’]); (c) to-marked mimetics with atypical hosts (e.g. níkoniko to unazuku ‘nod smilingly’ [instead of warau ‘laugh’], kúnekune to aruku ‘walk meanderingly’ [instead of ugoku ‘move’], súrusuru to shimaru ‘close smoothly’ [instead of suberu ‘slide’]); and (d) to-marked mimetics with typical hosts (e.g. níyaniya to warau ‘laugh grinningly,’ súyasuya to neru ‘sleep peacefully,’ pyónpyon to tobihaneru ‘jump frisking around’). Of these, the combinations in (a) and (c) are expected from (16), but those in (b) and (d) are not expected.Footnote 16
The puzzle is why there are (b)- and (d)-type combinations at all.Footnote 17 In the case of the latter, some suggest the mimetic is to-marked when it is the focus-bearing element of the sentence (e.g. Toratani Reference Toratani2006: 419–420). In other words, the mimetic is the sole carrier of new information, with the verb carrying old information, introduced in the immediately preceding discourse (Lambrecht Reference Lambrecht1994). Even if all the instances in (d) can be assumed to be motivated by the notion of focus, the (b)-type combination, such as níkoniko tanoshii ‘fun smilingly,’ remains unexplained. At least, the dictionary-based determination of typicality of host does not work well for an account of the distribution of to/Ø-marked reduplicated mimetics. That is, their distribution is not as simple as (16) assumes. A closer examination of the contexts may shed more light on the issue.
6 Concluding Remarks
This chapter outlines some fundamental characteristics of mimetics and provides a critical survey of recent literature. While many issues remain unsettled, the past two decades have seen some much-needed research breakthroughs. Among other things, researchers have worked to situate the discussion cross-linguistically, to diversify the methods of inquiry by incorporating experiments and corpora (e.g. Shinohara and Uno Reference Shinohara and Uno2013), and to appeal to different theoretical frameworks and notions (e.g. Iwasaki, Sells, and Akita Reference Iwasaki, Sells and Akita2017). Such endeavors inspire new questions while shedding welcome light on certain characteristics of mimetics likely to have gone unnoticed in the descriptive accounts dominating earlier studies. As most current work resorts to a small set of cognitive-functional-based theories, future work may glean different insights from other linguistic theories and approaches.
1 Introduction
Based on a semantic distinction between stable properties and dynamic events, this chapter attempts to open up a new perspective on the interaction of form and meaning observed in certain unruly phenomena in Japanese that constitute a major challenge for general theories of morphology and syntax. The term property (also called quality or attribute in the literature on lexical aspect) refers to stable features of an entity that are supposed to remain intact through the passage of time, whereas the term event (also called eventuality or situation), when paired with property, is intended as a general notion covering dynamic events/actions and temporary states that may change as time progresses. This opposition can be readily understood by comparing two English sentences: (i) The rope broke all of a sudden, with the ergative break representing an event that happened at a particular place at a particular time; and (ii) This rope won’t break easily, with the middle verb break describing an inherent quality of the rope that holds regardless of place and time.
In Western linguistics, the event-property distinction has been extensively discussed under the heading of “stage-level predicates” and “individual-level predicates” (e.g. Milsark Reference Milsark1974; Carlson Reference Carlson1977; Krifka et al. Reference Krifka, Pelletier, Carolson, ter Meulen, Chierchia, Link, Carlson and Pelletier1995), as illustrated in (1).
In (1a), the occurrence of the time adverbial today indicates that the doctor’s being available/sick/drunk is a temporary state, whereas in (1b) the infelicity of the same adverb suggests that the doctor’s property of being tall/intelligent/altruistic is not affected by the passage of time. The same kind of contrast is known to be overtly manifested by two be-verbs in some languages, such as Spanish ser for properties and estar for temporary states (Arche Reference Arche2006). Aomori dialects of Japanese also embody the same opposition by means of two copula forms.
(2) Goshogawara dialect of Aomori Prefecture (Yakame Reference Yakame2008: 132)
Property: Tanaka-san ekichō da. ‘Mr. Tanaka is a station manager.’ station.manager cop Event: Tanaka-san ekichō dera. ‘Mr. Tanaka is a temporary station manager.’
Presumably due to its topic-prominent character, Japanese boasts a broader variety than European languages of syntactic and morphological constructions that pertain to the description of a property of a subject/topic, as in (3).
a. Topic-predicate construction
Kanojo wa chōshin da/ aisō ga yoi. she top tall cop affability nom good ‘She is tall/affable.’ b. Generic subjects
Zō wa hana ga nagai. elephant top nose nom long ‘Elephants have a long trunk.’ c. Generic tense
Kono inu wa hito ni kamitsuka-nai. this dog top people dat bite-neg ‘This dog will never bite.’ d. Resultative aspect with -te iru or -ta
Kono futon wa fuka-fuka shi-te iru. this top fluffy do-ger is ‘This futon is fluffy.’ fuka-fuka shi-ta futon fluffy do-pst ‘a fluffy futon’
Since the event-property distinction is determined by the interpretation of an entire sentence, not just by the lexical meaning of the verb or adjective, we will employ the term predication instead of predicate. Event predication thus largely corresponds to stage-level predication, and property predication to individual-level predication.
This chapter proceeds as follows. By sketching the history of the relevant research in both Western and Japanese linguistics, Section 2 lays the foundation of theoretical concepts necessary for the discussion in the subsequent sections centering on peculiar phenomena that are recalcitrant to standard analyses in morphology and syntax. In particular, Section 3 presents a novel type of compounding, called agent compounding, that combines a transitive predicate with its subject in direct violation of the putatively universal prohibition on such combinations. It will be demonstrated that the deviant nature of this compounding finds a rationale in its special function as property predication. Section 4 turns attention to a unique construction in Japanese that depicts human physical attributes with the verb suru ‘do,’ as in Naomi wa aoi me o shite iru ‘Lit. Naomi is doing blue eyes,’ that is, ‘Naomi has blue eyes.’ The seemingly idiosyncratic use of suru, it will be shown, has a reasonable motivation if it is viewed as a special construction depicting an inborn trait of the subject. Overall, the major finding of the present chapter is that sentences of property predication are subject to different grammatical conditions from sentences of event predication. This generalization regarding the form–meaning mismatch calls for a reconsideration of the basic tenet of lexical semantics that the syntactic behavior of words is largely predictable from their lexical meanings. Section 5 concludes this chapter.
2 Research on Events and Properties in Japanese and Western Linguistics
Preliminary to a probe into particular constructions in Japanese, this section presents an overview of the history of the relevant research in Japanese and Western linguistics. Apart from Bolinger’s (Reference Bolinger and Kachru1973) seminal paper pointing out that Spanish ser and estar respectively depict “essence” and “accident,” in the tradition of generative syntax and formal semantics, Milsark (Reference Milsark1974) was the first to point out the pertinence of the distinction between transitory states and unalterable properties to the English existential construction illustrated in (4).
a.
There were several people {shot/awake/undressed/tired}. [temporary states] b.
*There were several people {tall/foolish/intelligent/talkative}. [properties]
Sentences of perception report like Martha saw the policemen X constitute another test frame to separate states, which fit in the construction (Martha saw the policemen shot), from properties, which do not (*Martha saw the policemen tall).
Milsark’s observation was extended by Carlson (Reference Carlson1977) to a classification of predicates into stage-level predicates (SLPs, Milsark’s “states”), individual-level predicates (ILPs, Milsark’s “properties”), and kind-level predicates (such as extinct used to refer to a kind of things). Stage in Carlson’s terminology suggests that an event, action, or state is conceived of as a cluster of different stages that develop as time moves on, whereas the term individual-level implies a property or quality of an individual entity that persists over time. The basic diagnoses of SLP and ILP are summarized in Table 11.1 (see Krifka et al. (Reference Krifka, Pelletier, Carolson, ter Meulen, Chierchia, Link, Carlson and Pelletier1995) and Fernald (Reference Fernald2000) for details).
Table 11.1 Diagnoses of SLP and ILP
| Compatibility with punctual time adverbials | Compatibility with progressive aspect | Occurrence in complements to direct perception verbs | |
|---|---|---|---|
| SLP | Yes: Firemen were not available at that time. | Yes: The service is being available only in the daytime. | Yes: I saw John drunk. |
| ILP | No: Firemen are altruistic (*at this moment). | No: He is (*being) intelligent. | No: *I saw John intelligent. |
The semantic notions of SLP and ILP espoused by Milsark and Carlson were integrated into syntactic structure by Diesing (Reference Diesing1992), who proposed to set the subject of an ILP in the Specifier of IP (outside of VP) while relegating the subject of a SLP to the Specifier of VP. Remarkable progress was achieved by Kratzer (Reference Kratzer, Carlson and Pelletier1995), who, by consolidating Diesing’s syntactic bipartition of SLP/ILP subjects and Davidson’s (Reference Davidson and Davidson1967) idea of an “event argument,” proposed that SLPs do but ILPs do not have an event argument (in addition to the regular arguments such as agent and theme).
In traditional Japanese grammar, on the other hand, inquiries into the semantic modes of sentences more or less comparable to the SLP/ILP distinction started much earlier than in Western linguistics and have been carried out until recently totally independent of the Milsark–Carlson–Kratzer line of research. Sakuma (Reference Sakuma1941) was the first to classify diverse construction patterns from the viewpoint of functions in communication. Among other modes of sentences, he specifically distinguished monogatari-bun (storytelling sentences), which describe the progress or development of events, from shinasadame-bun (characterizing sentences), which express the characteristics or properties of entities.
What is notable is that Sakuma’s storytelling sentences mark the subject phrase with the nominative (ga), while his characterizing sentences mark the subject phrases with the topic marker (wa) in the copula construction “X wa Y da.” With hindsight, Sakuma’s differentiation of subject marking presents an amazing resemblance to Diesing’s (Reference Diesing1992) theory of partitioning the subject positions of SLPs and ILPs: the nominative subject is parallel to the VP-internal subject of SLPs in Diesing’s theory, and the topic phrase to the VP-external subject of ILPs. The ga/wa distinction in Sakuma’s theory merits further attention because of its similarity to Kuroda’s (Reference Kuroda1972) distinction between “thetic judgment” with ga-marked subjects and “categorical judgment” with wa-marked subjects (cf. Chapter 13, this volume), which Ladusaw (Reference Ladusaw, Horn and Kato2000) identifies with the SLP/ILP bipartition.
Sakuma’s modes of sentences were taken over by Mikami (Reference Mikami1953) and more recently by Masuoka (Reference Masuoka1987, Reference Masuoka and Masuoka2004, Reference Masuoka and Masuoka2008), who has developed what appears to be the most detailed classification of Sakuma’s characterizing sentences. Masuoka proposes to view Sakuma’s modes of sentences as different types of predication, thereby subsuming the variety of storytelling sentences under the rubric of event predication and the variety of characterizing sentences under the rubric of property predication, with each type of predication divided into subcategories. Particularly important among Masuoka’s subcategories of property predication is the distinction between “categorical” properties (inherent or inborn properties that serve a classificatory function) and “experience-based” properties (properties acquired through one’s experience).
The notion of acquired properties, which appears absent from discussion in Western linguistics, has the potential for universal applicability. For example, the formation of adjectival passives in English, which is characterized by Levin and Rappaport (Reference Levin and Rappaport1986) as being targeted only at an internal argument (theme) of the base verb, as in baked potatoes (from the transitive bake) and an expired passport (from the unaccusative expire), actually applies exceptionally to the external arguments (agents) of certain unergative verbs, as in a much-traveled man and a well-read scholar. Problematic as they are to Levin and Rappaport’s theory, these exceptional adjectival passives receive a natural account if they are interpreted as representing an acquired property of the agent. In a much-traveled man, the man, due to his extensive travel experiences, is characterized as being knowledgeable about foreign countries, and in a much-read scholar, the scholar is characterized as being erudite as a result of his numerous experiences of book-reading.
This brief survey shows that domestic Japanese grammarians and Western theoretical linguists have investigated similar phenomena with similar treatments, but quite independently. Although Western linguistics excels in the depth of abstract theorization in formal terms, Japanese grammarians appear more advanced in the breadth and richness of the empirical data they deal with. Of special importance is the fact that, while the SLP/ILP distinction in Western linguistics is primarily motivated by a classification of adjectives and similar lexical categories, the studies on event and property predications in Japanese linguistics have an all-embracing coverage extending to all lexical categories and to whole sentences in discourse.
As hinted at by the English adjectival passive examples, the semantic distinction of events and properties exerts substantial influence on morphological and syntactic forms. To be more specific, morphological and syntactic structures that ought to be ruled out by general principles governing sentences of event predication gain acceptability when viewed as property predications. Such a claim was already adumbrated by Masuoka (Reference Masuoka1987: 188), who pointed out that Japanese has a passive of property predication which takes a different agent marking from regular passives of event predication.
a.
Kono shōsetsu wa jū-nen-mae ni Tarō {niyotte/*ni} kak-are-ta. this novel top ten-years-ago loc {by/*dat} write-pass-pst ‘This novel was written by Taro ten years ago.’ b.
Kono shu no suiri-shōsetsu wa nihon no sakka ni wa this kind gen mystery-novel top Japan gen novelist dat top ichido mo kak-are-ta koto ga nai. once even write-pass-pst nmlz nom be.neg ‘This kind of mystery novel has never been written by Japanese novelists.’
The passives of creation verbs such as hon o kaku ‘write a book’ generally call for the agent marker niyotte ‘by’ rather than the simple dative ni, as shown in (5a). Contrary to this regularity, however, the passive sentence in (5b) with the same verb is fully acceptable with the agent marked by the dative ni. What distinguishes these two sentences is that, while (5a) is a mere storytelling sentence, (5b) is construed as a characterizing sentence denoting a unique property of the subject as a ‘new type of mystery story’ on the basis of the information provided by the passive predicate. The property interpretation is schematically represented in (6).
(6) Semantic representation of property reading
[Topic phrase] [semantic concept denoting a property] + da.
To be interpreted as ‘The topic is such that it is endowed with such and such a property.’
3 Agent Compounding as Property Predication
This section introduces a novel morphological process termed agent compounding, where an agent (external argument) is compounded with a transitive predicate in direct violation of the universal principle of verbal compounding or noun incorporation that bans the morphological combinations of a volitional agent and a transitive verb. This phenomenon, which was discovered only recently (Kageyama Reference Kageyama and Masuoka2006b), provides perhaps the most apposite and most forceful illustration of the kind of division of labor between event and property predications.
3.1 Exclusion of Agents from Compounds of Event Predication
In the literature on lexical semantics it is generally agreed that how the arguments of a verb are realized in syntactic structure is generally predictable by the lexical semantic representation of the verb and/or the semantic roles borne by its arguments (Levin and Rappaport Hovav Reference Levin and Rappaport Hovav2005). The core linking rule holds that the agent argument of a transitive verb is realized as subject in syntactic structure, and the theme argument as direct object. The linking pattern of the canonical transitive verb kill, for example, is schematically shown in A of Figure 11.1. Only when implemented by a special operation like passive can a distorted argument realization as in B be accepted.

Figure 11.1 Normal and distorted argument realization
Generalizations on argument linking are formally stated in terms of argument structure configurations where the agent argument is situated in an external argument position (x) and the theme argument in an internal argument position (y).
(7) Argument structure
a. transitive verbs: (x <y>)
b. unaccusative verbs: (<y>)
c. intransitive (unergative) verbs: (x < >)
Besides the argument structure of transitive verbs (7a), those of unaccusative verbs (non-volitional intransitive verbs that have only an internal argument) and unergative verbs (volitional intransitive verbs that are characterized as having only an external argument) are represented as in (7b) and (7c), respectively. For some researchers, an event argument is added on top of the thematic argument structures in (7).
Word-formation processes that are based on verbs as the head are generally sensitive to the distinction of external versus internal argument of the base verbs. As formulated by Lieber (Reference Lieber1983: 272), for example, English deverbal (synthetic) compounding applies either to the internal argument of a verb if it has one, as in letterwriting, or to an adjunct if it does not, as in sleepwalking. This principle correctly rules out compounds that involve a transitive verb and its external argument, as shown by the ungrammaticality of *student-writing of a letter (meaning ‘Students write a letter’). The exclusion of external arguments is in fact deemed a universal constraint that holds for noun incorporation in polysynthetic and other languages (e.g. Mithun Reference Mithun1984: 875).
Japanese N-V compounding complies with this universal restriction faithfully. Limiting attention to the argument realization in three major types of N-V compounds that function as predicates as summarized in Table 11.2, we find that there is no unequivocal instance of compounds consisting of a transitive verb and its subject.
Table 11.2 Argument restrictions on N-predicate compounds
| A. Trans. object | B. Trans. subject | C. Unaccusative subj. | |
|---|---|---|---|
| I. native N + V compounds | tema-doru [time-take] ‘to take time’ | None | nami-datsu [wave-rise] ‘to billow’ |
| II. Sino-Japanese V + N compounds | doku-sho [read-book] ‘book-reading’ | None | shuk-ka [fire-break.out] ‘outbreak of fire’ |
| III. post-syntactic compounds | erebētā : shiyō [use : elevator] ‘to use an elevator’ | None | chōnan : tanjō [first.son : be.born] ‘one’s first son be born’ |
Examples like kaeru-oyogi [frog-swim] ‘breaststroke’ and inu-kaki [dog-paddle] ‘dog-paddle’ that apparently involve an agent as the first member of a compound do not denote an action in which the named animals actually participate but only a particular manner of conventionalized action that is typically associated with these animals. On this interpretation, since the animal nouns do not represent an agent argument, they are irrelevant to the constraint on agent compounding.
In contrast to English, where compounds headed by verbs are hardly productive, Japanese abounds in productive compounding rules that apply to verbs and other predicates as the head. Among them, what Shibatani and Kageyama (Reference Shibatani and Kageyama1988) call post-syntactic compounding (Row III in Table 11.2) exhibits by far the highest productivity, freely applying to any combination of a VN (Verbal Noun; nouns that have argument structure and case but do not inflect for tense by themselves) and its internal argument. Corresponding to the clausal structure in (8a), for example, a post-syntactic compound is formed as in (8b), where the compound is enclosed by square brackets, with the colon (:) inside signaling a short phonological break. The same kind of phonological break is observed with other types of complex words using Sino-Japanese elements as in prefixed words (e.g. datsu : genpatsu ‘de-nuclear power generation,’ zen : daitōryō ‘ex-president’) and compound nouns (e.g. bōeki-gaisha : shachō ‘president of a trading company’).
a.
daitōryō ga Howaito-Hausu de pātī o kaisai no sai president nom White-House loc party acc host gen time ‘when president hosted/hosts a party at the White House’ b.
daitōryō ga Howaito-Hausu de [pātī : kaisai] no sai president nom White-House loc [party : host] gen time
In Japanese, the order of arguments and adjuncts may be altered by scrambling as in (9a), with the consequence that the subject argument shows up in front of the predicate ‘host.’
a.
Howaito-Hausu de pātī o daitōryō ga kaisai no sai White-House loc party acc president nom host gen time (Same meaning as (8a), with the word order changed by scrambling) b.
*Howaito-Hausu de pātī o [daitōryō : kaisai] no sai White-House loc party acc [president : host] gen time
Despite the linear adjacency, compounding the agent (president) with the following predicate yields an illegitimate result in (9b). This indicates that post-syntactic compounding is constrained by the composition of argument structure rather than by surface order.
3.2 Compounding of External Arguments in Property Predication
Surprisingly, the systematic exclusion of an external argument explained in Section 3.1 is flatly contradicted by a fairly productive process of compounding an agent (external argument) with a Sino-Japanese VN. The total ungrammaticality of the post-syntactic compound in (9b) should thus be compared with the well-formedness of compounds like those in (10).
a.
[daitōryō : shusai] no pātī [president : host] gen party ‘a/the party hosted by the president’ b.
[ichiryū-kenchikuka : sekkei] no bijutsukan [first.class-architect : design] gen museum ‘a/the museum designed by a/the first-class architect’
In (10), transitive VNs shusai ‘host’ and sekkei ‘design’ are combined with the external arguments ‘president’ and ‘first-class architect,’ while leaving the internal arguments ‘party’ and ‘museum’ outside of the combinations. By using diagnoses of “lexical integrity” showing that words make up an integral unit whose internal structure cannot be manipulated by syntactic means, it is easily confirmed that these combinations of agent nouns and VNs truly constitute morphological compounds. For example, their internal structure cannot be interrupted by insertion of a focus particle between the two members. Likewise, deletion of only the noun members results in total ungrammaticality, as in *shusai no pātī ‘Lit. a/the party hosted’ and *sekkei no bijutsukan ‘Lit. a/the museum designed.’ Such genuine agent compounds thus exhibit a striking contrast to spurious agent-VN combinations due to particle ellipsis, such as fan : taibō (no shinsaku) ‘(a new work) awaited by fans,’ which permit deletion of the agent noun, as in taibō (no shinsaku) ‘a long-awaited (new work).’
Given that agent compounds like those in (10) are robust morphological constructs, how could the asymmetry in argument realization between them and the post-syntactic compounds in (8b) be accounted for? A first conceivable analysis will be to hypothesize that the predicates in agent compounds are passives derived by an invisible passive morpheme. This suggestion sounds plausible in view of the fact that comparable compounds in English use adjectival passives, as in architect-designed (homes). Under this analysis, the Japanese agent compounds do not violate the universal ban on external argument compounding, because the erstwhile agents – ‘president’ and ‘first-class architect’ in (10) – are demoted to adjuncts. The passive analysis, however, does not seem to be supported empirically. First, postulation of an invisible passive morpheme is dubious because VNs themselves cannot be passivized due to their noun morphology. Second, the agent nouns in (10) refuse paraphrases using an adjunct agent phrase with niyotte ‘by’; instead, they are most plausibly paraphrased with a nominative marking, as shown in (11).
a.
[daitōryō {ga/*niyoru} shusai] no pātī [president nom/by host] gen party b.
[ichiryū- kenchikuka {ga/*niyoru} sekkei] no bijutsukan [first.class-architect nom/*by design] gen museum
While the passive analysis can be dismissed as infeasible, a second approach, suggested to me by Akira Watanabe (p.c. 2010), has some validity. Watanabe’s idea was that what we call agent compounds might be a special case of post-syntactic compounding applied to relative clause structure, where relativization of an object argument renders the transitive subject linearly adjacent to the predicate and hence compounding takes place. This analysis, it will be shown, is relevant only to one type of agent compound, that is, Class C in (12) below.
The two approaches mentioned above are based on the assumption that the relevant data involving external arguments constitute a uniform class with no internal variation. Careful scrutiny, however, reveals that the data are not monolithic but should be split into a few distinct classes. Kobayashi (Reference Kobayashi2004) gathered a large number of actually occurring examples from newspaper articles and attempted to confirm their compound status by applying certain diagnostics for lexical integrity such as modifiability of the noun members by external adjectives and deletability of the VN members on identity. Kageyama (Reference Kageyama2004) closely inspected Kobayashi’s as well as other data and proposed to tease apart three classes of expressions that share the formal appearance of an external argument combined with a transitive VN. The three classes (A to C) are exemplified in (12), with their essential features cataloged in Table 11.3.
A. [Supirubāgu-kantoku : seisaku] no eiga ([Spielberg-director : produce] gen film) ‘a/the film(s) produced by Spielberg,’ [Andō-Tadao-shi : sekkei] no hoteru ([Ando-Tadao-Mr : design] gen hotel) ‘a/the hotel(s) designed by Architect Tadao Ando’
B. [Warutā : shiki] no Koronbia-kōkyōgakudan ([Walter : conduct] gen Columbia-Symphony.Orchestra) ‘Columbia Symphony Orchestra conducted by Bruno Walter,’ [josei-untenshi : sōjū] no shinkansen ([female.driver : operate] gen Shinkansen) ‘a/the Shinkansen train operated by a female driver’
C. [kishōchō : happyō] no taifū-jōhō ([meteorological.agency : release] gen typhoon-information) ‘typhoon information released by the Meteorological Agency,’ [Seki-Tsutomu-shi : hakken] no shin-shōwakusei ([Seki-Tsutomu : discover] gen new-asteroid) ‘a/the new asteroid discovered by Mr Tsutomu Seki’
Table 11.3 Four classes of agent-transitive predicate combinations
| Class | Compound status | Copula construction | Spatiotemporal adverbs |
|---|---|---|---|
| A | Yes | Yes [inborn property] | No [property predication] |
| B | Yes | No [acquired property] | No [property predication] |
| C | Marginal | No | Yes [event predication] |
| D | No | No | Yes [event predication] |
Only Classes A and B qualify as full-fledged morphological compounds consisting of an external argument and a transitive VN. A typical diagnosis for testing the compound status is to see whether a given sequence is interruptible by a syntactic element. Classes A and B are equipped with the hallmark of lexical integrity, that is, non-interruptibility, while Class C appears to have a marginal status as compounds. Although the examples given in (12C) are collected from Japanese web pages, I conjecture that their acceptability will diverge among individual native speakers. Their questionable status can be ascribed to their irregular production by an aberrant operation. Recall that post-syntactic compounding rules out combinations of a transitive VP and an external argument, as shown by (13b).
a.
kishōchō ga [taifūjōhō : happyō] chokugo meteorological.agency nom [typhoon.info : release] right.after ‘right after the Meteorological Agency released typhoon information’ b.
*taifūjōhō o [kishōchō : happyō] chokugo typhoon.info acc [meteorological.agency : announce] right.after
However, if the accusative object is relativized, its object position becomes invisible (being occupied by a trace), as in (14).
(14)
Kishōchō ga ti happyō no taifūjōhō meteorological.agency nom release gen typhoon.info ‘the typhoon information released by the Meteorological Agency’
In this structure, the subject argument (Meteorological Agency) is linearly adjacent to the transitive VN ‘release’ (ignoring the trace of the relativized noun). Given this surface order, the condition on post-syntactic compounding may be relaxed (for some speakers) in such a way that it can work on linearly adjacent words to give rise to a compound-like unit. This rather irregular implementation of post- syntactic compounding, I suspect, accounts for the marginal status of Class C compounds. Their questionable status is confirmed by the fact that, unlike the full-fledged compounds of Classes A and B, Class C compounds are incapable of participating in an iterative application of compounding with the theme nouns they modify, as shown in (15).
A:
[[Supirubāgu-kantoku : seisaku] bōken-eiga] (OK as a compound) [[Spielberg-director : produce] adventure-film] B:
[[Warutā : shiki] Koronbia-kōkyōgakudan] (OK as a compound) [[Walter : conduct] Columbia-Symphony.Orchestra] C:
*[[kishōchō : happyō] taifūjōhō] (Bad as a compound) [[meteorological.agency : release] typhoon.info]
3.3 Inherent versus Acquired Properties in Agent Compounding
Having delineated the range of relevant data, we will now grapple the vital question of why Classes A and B compounds are acceptable despite their ostensible violation of the universal constraint on external argument compounding. The key to the problem lies in the compatibility or incompatibility with spatiotemporal adverbials that designate a particular time or place when/where the event denoted by a compound takes/took place. This point is made clearer by comparing compounds of Classes A and B with Class C compounds. Observe first that Class C compounds, for those speakers who accept them at all, are fully compatible with such adverbials, as demonstrated in (16).
a.
gogo san-ji ni [kishōchō : happyō] p.m. three-o’clock at [meteorological.agency : release] no taifūjōhō gen typhoon.info ‘the typhoon information that the Meteorological Agency released at 3 p.m.’ b.
kyō no gikai de [yatō-giin : teishutsu] today gen Diet loc [opposition-dietmen : present] no shitsumonsho gen written.inquiry ‘the written inquiry the opposition party presented at today’s diet meeting’
The acceptability of (16) contrasts dramatically with the total incompatibility of such adverbials with compounds of Class A (17a) and Class B (17b).
a.
Sono eiga wa (*2001-nen ni *Hariuddo de) the film top (*2001-year in Hollywood loc) [Supirubāgu-kantoku : seisaku] da. [Spielberg-director : produce] cop ‘That film was produced by Spielberg in Hollywood in 2001.’ b.
(*1958-nen ni *Nyūyōku no sutajio de) [Warutā : shiki] (*1958-year in New.York gen studio loc) [Walter : conduct] no Koronbia-kōkyōgakudan gen Columbia-symphony.orchestra ‘the Columbia Symphony Orchestra conducted by Walter in 1958 in a studio of New York’
The disparity between (16) and (17) is exactly parallel to the one we saw in Section 2 between stage-level predication (as in My doctor is not available today) and individual-level predication (as in *My doctor is tall today). This is tantamount to saying that Class C compounds, compatible with spatiotemporal adverbials, are event predications describing the occurrence of a particular action or event along the time dimension, whereas the compounds of Classes A and B, which are incompatible with such adverbials, are property predications depicting a more or less constant quality attributed to the theme nouns they modify.
For example, (17a), with the adverbials left out, characterizes the particular film as one of superior quality on the basis of the pragmatic knowledge that Spielberg is a celebrated film director with outstanding expertise. Since, however, the quality of the film is a permanent attribute, the time and place adverbials that refer to the process of film-making are rendered irrelevant. Likewise, (17b) evaluates the performance of the Columbia Symphony Orchestra as having been excellent when it was under the baton of Bruno Walter, a conductor of eminent reputation in the early 1900s. Although the referential NPs do not necessarily refer to leading figures like Spielberg or Walter, they are indeed required to convey cogent information that helps to assign a noteworthy quality to the theme noun that the compound modifies. Nouns that do not convey information semantically or pragmatically sufficient to uniquely characterize the quality of a theme noun, such as ‘my son’ in (18a) or ‘a certain composer’ in (18b), are not suitable as the noun members in agent compounding.
a.
*[chōnan : seisaku] no ehon [first.son : create] gen picture.book b.
*[bō-sakkyokuka : sakkyoku] no gasshō-kyoku [certain-composer : make] gen chorus-music
So far, it has been established that compounds of Class A and Class B are acceptable because of their semantic function as property predication. What distinguishes between the two classes then? They can be sharply differentiated from each other by the simple test of whether a given compound can fit into the copula construction ‘X (theme noun) wa COMPOUND da.’ Class A compounds comfortably fit into the copula construction, while Class B compounds do not.
(19) Class A compounds in the topic-copula construction
a.
Kono eiga wa [Supirubāgu-kantoku : seisaku] da. this film top [Spielberg-director : produce] cop ‘This film is one that was produced by Spielberg.’ b.
Kono bijutsukan wa [Andoo-Tadao-shi : sekkei] da. this museum top [Ando-Tadao-Mr : design] cop ‘This museum is one that was designed by Tadao Ando.’
(20) Class B compounds in the topic-copula construction
a.
*Kono kōkyōgakudan wa [Warutā : shiki] da. this symphony.orchestra top [Walter : conduct] cop ‘The symphony orchestra is one that was conducted by Walter.’ b.
*Kono shinkansen wa [josei-untenshi : sōjū] da. this Shinkansen.train top [female-driver : operate] cop ‘This Shinkansen train is one that is operated by a female driver.’
The discrepancy between (19) and (20) can be attributed to the archetypical function of the copula construction of the form “X wa Y da,” which serves to describe an inherent property of the theme noun realized as the topic.
As briefly noted in Section 2, Masuoka (Reference Masuoka and Masuoka2004) distinguishes two types of property: (i) property inherent in a category and (ii) property acquired from a past experience. The former refers to an inherent property, and the latter to an acquired or derived property. Since the copula construction prototypically denotes an inherent, classificatory property, the Class A compounds are construed as representing inherent properties of a topic noun. Strong evidence for this construal derives from the lexical meaning of VNs employed in these compounds. Specifically, the VNs qualified for Class A compounds, such as shusai ‘host,’ sekkei ‘design,’ and sakusei ‘create,’ are all accomplishment verbs that converge on the core meaning of production and creation. With the aid of the agent nouns, which mostly represent eminent figures in a particular field of activity, the VNs of creation and production assign an inborn quality to the topic phrase. The meaning of inborn properties is also activated by compounds like those in (21), where the noun members denote producing centers for artifacts, food, and other commodities.
a.
Kono kagu wa [Itaria : chokuyunyū] da. this furniture top [Italy : direct.import] cop ‘This furniture is imported directly from Italy.’ b.
Kono ise-ebi wa [sanchi : chokusō] da. this lobster top [producing.district : direct-send] cop ‘This lobster is sent directly from the producing center.’
By metonymic extension, a noun designating a famous place of production is naturally conceived of as its “creator” or agent. Notice that the VNs in (21) express ‘acquisition’ or ‘arrival’ of the goods, which are broadly equivalent to the lexical meaning of ‘production.’
The delineation of the semantic range of VNs in Class A compounds as having the meaning of production and creation strongly indicates that these compounds bear a classificatory function, assigning an inherent property to the topic phrase they are predicated of. By contrast, Class B compounds, which are unable to occur in the topic-copula construction, represent the kind of property that has been acquired through the action or event denoted by its modifying compound. Since past experiences that motivate the assignment of an acquired property are variegated, the VNs that may participate in this class of compounds exhibit diverse lexical meanings in diverse aspectual classes.
This section has shown that Masuoka’s distinction between inherent property and acquired property, which is not known in Western linguistics, is neatly reflected in the Japanese agent compounds of Class A and Class B. It remains to be seen how the distinction is formally represented and exploited in theories of event semantics. A plausible proposal is to attribute the difference of the two classes to different modules of grammar in which each of them is generated. Class A compounds denoting inborn properties are formed in the lexicon, whereas Class B compounds representing acquired properties are formed in syntax (Kageyama Reference Kageyama and Masuoka2006b). Telling evidence for this distinction is found in the availability or unavailability of the honorific prefix go-, which normally cannot be embedded inside lexical words. As predicted, Class B compounds in syntax permit this prefix on the VNs inside compounds, while Class A compounds in the lexicon do not.
A:
Kono kyoku wa [Takada-Saburō-sensei : (*go-)sakkyoku] desu. this music top [Takada-Saburo-Mr : (*hon-)compose] cop ‘This music was composed by Mr. Saburo Takada.’ B:
[Heika : go-hōmon] no chi [emperor : hon-visit] gen place ‘the place(s) visited by the emperor’
To recapitulate, agent compounding in Japanese is a robust rule of creating compounds that depict a characteristic property/quality of the theme noun they are predicated of by combining a transitive VN with its external argument. Given that agent compounding is allowed only in property predication, we must ask whether the convergence of its morphological peculiarity and its property-predication function is a mere coincidence or a logical consequence of some abstract principle relating them to each other. Probably it is not an accidental idiosyncrasy because there are many other phenomena in which property predication is associated with the violation of otherwise valid constraints in syntax and morphology, as reported by Kageyama (Reference Kageyama, Tsunoda and Kageyama2006a, Reference Kageyama2009). Those phenomena that look disorderly when viewed from event predication become orderly when regarded as belonging to the realm of property predication that is distinct from the more familiar realm of event predication.
4 The Physical Attribute Construction Using the Verb Suru ‘Do’
This section tackles thorny issues surrounding the form–meaning mismatch presented by another curious construction in Japanese, schematically represented in (23). Let us call this the physical attribute construction.
(23)
[Subject top] [adjective + body part noun] acc do-ger be. Naomi wa aoi me o shi-te iru ‘Naomi has blue eyes; Naomi is blue-eyed.’
Extremely common in colloquial as well as written language, as contrasted with the formal style associated with the agent compounding discussed in Section 3, the physical attribute construction describes a noteworthy physical/personal attribute of a human subject with the verb suru ‘do.’ From a theoretical standpoint, it has an array of intriguing and probably mutually intertwined properties ranging from morphology to syntax to semantics and pragmatics that are foreign to regular sentences predicated by the action verb suru. Although the curious behavior of the construction has long been noted by Japanese grammarians, there have been very few attempts so far to unravel the interaction of the puzzling properties in a principled way. This section will suggest a direction for an integrated analysis by associating the peculiarities of the construction with the notion of property predication.
The most fundamental problem of this construction is why suru ‘do,’ an archetypical action verb meaning a dynamic action when it takes an accusative object, is exploited to designate a static attribute of the subject/topic. In the template of (23), while the slots for subject, adjective, and body part noun are freely occupied by appropriate lexical items, the verb suru cannot be replaced by any of its synonyms like okonau ‘carry out,’ yaru ‘do,’ and dekiru ‘be able to do’ (suppletive for the potential form of suru), nor with possessive verbs like motsu ‘have’ and aru ‘exist, have.’ The tense and aspect of suru are also strictly limited to the resultative -te iru in concluding a sentence or to the resultative -ta in noun-modifying (i.e. relative) constructions, as in aoi me o shi-ta shōjo ‘a girl who is blue-eyed.’ The physical attribute construction thus counts as a “constructional idiom” in the terminology of Jackendoff (Reference Jackendoff2010: 272–274). The idiomatic nature of the construction will be highlighted by the selectional restriction that only human nouns are fully qualified for its subject (topic), as suggested by the degraded acceptability of sentences like ?*Zō wa nagai hana o shite iru ‘Elephants have a long trunk’ and *Sono ie wa akai yane o shite iru ‘This house has a red roof.’ The subsequent discussion will attempt to disentangle the intertwined characteristics of this constructional idiom by teasing them apart into three groups as follows:
(24) Characteristics related to suru ‘do’
A. Why is the verb suru employed in this construction?
B. Why are the verb forms restricted to shite iru or shita?
(25) Characteristics related to objects and adjuncts
C. Why are nouns like ‘wart’ and ‘mole’ excluded from the construction?
*Naomi wa ōkina hokuro o shite iru.
‘Naomi has a big wart.’
D. Why is the construction incompatible with particular spatiotemporal adverbials?
*Naomi wa kyō wa kyōshitsu de aoi me o shite iru.
Lit. ‘Naomi has blue eyes in the classroom today.’
E. Why is an adjective or other kind of modifier mandatory for the object noun?
Naomi wa {aoi me/*me} o shite iru.
‘Naomi is {blue-eyed/*eyed}.’
(26) Characteristics related to the definiteness restriction
F. Why must the object NPs be ‘indefinite’?
*Naomi wa sono aoi me o shite iru.
Lit. ‘Naomi has the blue eyes.’
G. Why do the object NPs resist syntactic movement?
*Naomi ga shite iru no wa aoi me da.
Lit. ‘What Naomi has are blue eyes.’
Each of the characteristics cataloged in (24–26) will be discussed in Sections 4.1, 4.2, and 4.3, respectively.
4.1 Characteristics Related to the Verb Suru ‘Do’
We begin our discussion with the question of why suru turns up in this construction. Tsunoda (Reference Tsunoda, Chappell and McGregor1996) provides a descriptive observation on the distribution of possession verbs along what he calls the inalienability cline: shoyū-suru ‘own’ is used for objects representing nouns of alienable possession such as ‘car,’ motsu ‘have’ for kinship terms such as ‘wife’ and ‘son’ as well as nouns of alienable possession, aru ‘exist, have’ for acquired properties such as hokuro ‘mole’ as well as kinships, and finally suru for inborn attributes. Tsunoda, however, gives no account of why the action verb suru is employed to represent static properties. A conceivable analysis might be to regard suru in the physical attribute construction as an extension of the same verb used for wearing accessories on one’s body.
Japanese is known to have a set of special dressing verbs that are distinguished by the body part to which a clothing item is applied: kaburu for covering one’s head or face with items like a hat or a mask; haku for covering one’s feet, legs, or lower body with items like shoes, socks, stockings, pants, or skirts; and kiru for covering one’s upper body or whole body with such items as clothes, pullover, jacket, shirt, or suit. While these three verbs are dedicated to dressing actions (‘putting on’ and ‘wearing’), there are no dedicated verbs of wearing that refer to other small parts of a body such as a waist, neck, or finger. For these body parts, suru comes into play as an elsewhere verb, thus serving as a surrogate verb for shimeru ‘fasten’ (a necktie, a belt), hameru (a ring, gloves), kakeru (glasses), and maku (a scarf, a sash). The suggested analysis of identifying suru in the physical attribute construction with the same verb used as a wearing verb cannot be upheld, however, because they exhibit distinct aspectual properties, the former (as a non-volitional verb) depicting a permanent and unchanging feature of the subject and the latter (as a volitional verb) describing a temporary state. In Section 4.3 it is suggested that the suru in the physical attribute construction might be a variant of suru in light verb constructions.
Concerning the aspectual properties of verbs, Kindaichi (Reference Kindaichi1950b), a pioneering work on Japanese lexical aspect, classifies Japanese verbs into four groups: (i) stative verbs such as aru ‘exist’ and dekiru ‘be able’ (incompatible with the “progressive” -te iru in the conclusive form), (ii) continuative verbs such as hataraku ‘work’ and kaku ‘write’ (expressing an ongoing action or event when followed by -te iru), (iii) instantaneous verbs such as shinu ‘die’ and tsuku ‘(light) come on’ (expressing a resultant state of an event when followed by -te iru), and (iv) Type 4 verbs such as sobieru ‘soar’ and sugureru ‘excel’ (which express an unalterable state that obtains as a result of some sort of change and always occur with -te iru in the conclusive form or with -ta in prenominal modification).
The verb suru ‘do,’ which falls under Type 2 (continuative verbs) in its typical usage as an activity verb, is actually indeterminate in terms of aspectual class and is versatile enough to acquire different aspectual properties depending on the combinations with particular objects or adverbs. Thus, suru behaves as a continuative verb when it takes as its object such activity nouns as shukudai (homework)/tenisu (tennis)/onigokko (tag)/kyōshi (teacher) o shite iru ‘He is doing homework/tennis/tag/[Lit.] a teacher.’ If the object designates items for wearing, the same verb behaves as an instantaneous verb, as in nekutai o shite iru meaning either ‘He is putting on a tie’ (progressive) or more plausibly ‘He has a tie on’ (resultative). The latter interpretation is made possible by the fact that a necktie has an intrinsic purpose of being worn (the Telic role in Pustejovsky’s (Reference Pustejovsky1995) theory of qualia structure). Since neckties and other kinds of accessories can be put on or taken off volitionally, shite iru can also express an ongoing action in addition to the resultative state.
Now, if body part nouns show up in the object position of suru, the -te iru construction is no longer ambiguous. The example in (23), Naomi wa aoi me o shite iru, cannot express an ongoing action like ‘Naomi is in the process of putting on blue eyes,’ but only the permanent state of Naomi being blue-eyed. When put in relative clauses, the same verb can be set in the past (i.e. resultative) form -ta, as in aoi me o shita shōjo ‘a blue-eyed girl.’ This provides good reason to assume that suru in the physical attribute construction belongs to the class of Type 4 verbs, as in fact, Kindaichi (Reference Kindaichi1950b) identifies a physical attribute construction takai hana o shite iru ‘has a prominent nose’ as an example of Type 4.
4.2 Characteristics Related to Objects and Adjuncts
Previous studies were mostly concerned with the lexical semantic properties of the nouns that do and do not fit in with the physical attribute construction. Details aside, only certain nouns of inalienable possession qualify as the object of the construction under discussion. Thus, nouns like those in (27), denoting alienable objects, are ruled out, while those in (28), denoting alienable possession, are acceptable.
(27) *Sensei wa {akai kuruma/tayorininaru musuko/kireina okusan} o shite iru.
Intended meaning: ‘My teacher has {a red car/a reliable son/a beautiful wife}.’
(28) Sensei wa {ōkii me/hosoi ashi/utsukushii koe/yōkina seikaku} o shite iru.
‘My teacher has {big eyes/slender legs/a beautiful voice/a cheerful character}.’
The last two examples in (28) suggest that the nouns qualified for this construction are not limited to body parts but may include physical and mental traits intrinsic to the subject, such as voice and character. Drawing on previous studies, Takuzo Sato (Reference Sato2003) presents an almost exhaustive list of eligible nouns and characterizes them as denoting an essential attribute of the subject, that is, an intrinsic property that cannot be externally added after the person in the subject is born. Sato’s characterization is sufficient to exclude from this construction nouns like hokuro ‘mole,’ ibo ‘wart,’ shiraga ‘white hair,’ and yakedo no ato ‘burn scar,’ which are accidentally acquired after one’s birth.
Let us now consider the marked contrast in the compatibility with spatiotemporal adverbials between temporary facial expressions as in (29), which are event predications, and permanent attributes as in (30), which are property predications.
a.
Kanojo wa sono toki totsuzen fukuret-tsura o shita. she top that time suddenly sullen-look acc did ‘She suddenly put on a sullen look at that time.’ b.
Kanojo wa chichioya no mae de wa itsumo fukuret-tsura o she top father gen front loc top always sullen-look acc shi-te iru. do-ger be ‘She always look sullen in front of her father.’
a.
*Kanojo wa sono toki totsuzen hosoi yubi o shita. she top that time suddenly slim fingers acc did Lit. ‘She suddenly put on slim fingers at that time.’ b.
*Kanojo wa chichioya no mae de wa itsumo hosoi yubi o she top father gen front loc top always slim finger acc shi-te iru. do-ger be Lit. ‘She always has slim fingers in front of her father.’
Another diagnostic to distinguish properties from events is susceptibility to adversative passive formation. Regardless of whether the verb is transitive or intransitive (unergative), sentences of event predication can form adversative passives, although sentences predicated by unaccusative verbs such as aru ‘exist’ and okoru ‘occur’ are excluded because they lack an external argument. Sentences of property predication are also resistant to adversative passive. The total ungrammaticality of (31b) indicates that the physical attribute construction lacks an external argument (in other words, it is uncontrollable).
a.
Watashi wa sensei ni nemusōna me o s-are-ta. I top teacher dat sleepy eye acc do-pass-pst ‘I was disturbed by my teacher’s sleepy look.’ b.
*Watashi wa sensei ni aoi me o s-are-ta. I top teacher dat blue eye acc do-pass-pst Lit. ‘I was disturbed by my teacher’s having blue eyes.’
It is thus shown that the physical attribute construction presents property predication, where an inherent bodily feature of the subject is described by the verb suru in the resultative forms -te iru or -ta. The nature of the construction as characterizing sentences naturally accounts for why the verb suru ‘do’ exhibits idiosyncratic behavior that deviates from the archetypical usage of the same verb as a volitional and self-controllable activity verb in event predication sentences.
The outstanding problem with this construction is why the object noun should be accompanied by an adjective or other modifier in syntactic structure.
a.
*Kanojo wa {me/yubi/kami} o shi-te iru. she top {eye/finger/hair} acc do-ger is. Lit. ‘She is {eyed/fingered/haired}.’ b.
Kanojo wa {sunda me/hosoi yubi/nagai kami} o shi-te iru. she top {clear eye/slim finger/long hair} acc do-ger is. ‘She has {clear eyes/slim fingers/long hair}.’
The reason for the ill-formedness of sentences like (32a) is not pragmatic because addition of meaningful information by way of morphological compounds as in (33) does not improve the grammaticality (see Tsujioka (Reference Tsujioka2002: 140) for related discussion).
(33)
*Kanojo wa {chō-hatsu/chijire-ge/tare-me} o shi-te iru. she top {long-hair/curly-hair/droopy-eye} acc do-ger is ‘She has long hair/curly hair/droopy eyes.’
My proposal is that the contrast between (32a) and (32b) is syntactic in nature. The syntactic composition of ‘adjective+noun’ in the object phrase is a logical necessity to provide an appropriate semantic representation for property predication. Specifically, the body part noun used in the physical attribute construction does not refer to the body part as a physical entity but to the noun’s characteristic state expressed by the adjectival modifier. For example, aoi me ‘blue eyes’ does not refer to the entity itself (eyes that are blue) but instead represents a state of the eyes being blue, namely, me ga aoi ‘the eyes are blue.’ In other words, aoi me o shite iru is paraphrasable as me ga aoi. Actually, such a reversed interpretation of a prenominal modifier as a predicative element is not rare in Japanese. Some “exocentric” compounds of the form “A+N” are also interpretable in the same vein as a predication of the form “N is A” (Kageyama Reference Kageyama2010). For example, futop-para [big-stomach] ‘big-hearted, generous’ does not refer to a big stomach itself but to the quality of being generous paraphrased as Hara ga futoi ‘The heart [stomach] is big.’
To sum up, the object noun in the physical attribute construction calls for an adjectival modification in syntactic structure. This syntactic structure is motivated to productively obtain the “subject+adjective” predication in reversed order in semantic representations. The form–meaning mismatch found in the physical attribute construction is thus generalized as a shift from prenominal modification in syntactic structure (‘blue eyes’) to predicative structure in semantic representation (‘eyes are blue’). This analysis renders the semantic contribution of the verb suru minimal.
In fact, the meaning of the aoi me o shite iru sentence can be easily paraphrased as (34a), where the verb suru is dispensed with in an overt topic-copula construction.
a.
Naomi wa (*itsumo) aoi me da. top (*always) blue eye cop ‘Naomi is (*always) blue-eyed.’ b.
Naomi wa itsumo fukuret-tsura/katsura da. top always sulky-face/wig cop ‘Naomi always looks sulky/Naomi is always wigged.’
The inappropriateness of the time adverb itsumo ‘always’ in (34a) reinforces the claim that the construction is a property predication, as opposed to the compatibility of the same adverb in (34b) with fukuret-tsura o suru/fukuret-tsura da ‘wear a sulky face’ and katsura o suru/katsura da ‘wear a wig,’ which are event predications.
4.3 Syntactic Behavior of Object NPs
The object phrases of the construction under discussion must be indefinite NPs rather than definite or referential NPs. This is indicated not only by incompatibility with definite determiners like sono ‘that’ but also by incompatibility with any kind of quantifier.
a.
Naomi wa (*sono/*kanojo no) aoi me o shi-te iru. top (*those/*she gen) blue eyes acc do-ger is b.
Naomi wa (*takusan/*jup-pon/*zenbu) hosoi yubi o shi-te iru. top (*many/*ten-cl/*all) slender fingers acc do-ger is
One might be tempted to attribute the impossibility of determiners in (35a) to the fact that the object (eyes) is lexically bound by the subject phrase. However, this account does not go through for (35b), where both weak quantifiers (‘many,’ ‘ten’) and strong quantifiers (‘all’) are excluded. Notably, this restriction is shared by the topic-adjective construction in (36).
(36)
Naomi wa (*takusan/*jup-pon/*zenbu) yubi ga hosoi. top (*many/*ten-cl/*all) fingers nom slender Lit. ‘Naomi has many/ten/all slender fingers.’
On the other hand, the quantifier juppon-tomo ‘all of ten,’ as in (37a), appears acceptable in the shite iru construction in parallel with the acceptability of the same quantifier in the topic-adjective construction (37b).
a.
Naomi wa juppon-tomo hosoi yubi o shi-te iru. top ten-cl-all.together slender finger acc do-ger is ‘All of Naomi’s ten fingers are slender.’ b.
Naomi wa juppon-tomo yubi ga hosoi. top ten-cl-all.together finger nom slender (Same meaning as 38a)
The reason (37) is acceptable in contrast to (35–36) will be sought in the special function of -tomo ‘all/both of,’ which quantifies the whole proposition instead of the noun phrase it is directly associated with, as in “The proposition that Naomi’s fingers are slender holds true of all of her fingers.”
The object phrase in the physical attribute construction thus lacks referentiality just because it is semantically construed as a predicate in a topic-adjective construction. Its predicative status correctly prohibits the object NP from undergoing syntactic movement rules like relativization and topicalization, as shown in (38).
a.
*Naomi ga shi-te iru hosoi yubi wa haha-yuzuri desu. nom do-ger is slim fingers top mother-inherited cop ‘The slim fingers that Naomi has are inherited from her mother.’ b.
*Hosoi yubi wa Naomi ga shi-te iru. slim fingers top nom do-ger is
Given the property predication function of the construction under discussion, the question remains which of inborn properties and acquired properties it represents. Observe that there are a variety of expressions that exhibit much the same behavior with respect to their meaning and syntactic realization with the verb suru. These include expressions like gotsu-gotsu suru ‘be rugged’ and geijutsukazen to suru ‘be a prototypical artist’ which, categorized in Kindaichi’s Type 4, appear obligatorily in the resultative aspect with -te iru when used as predicates of finite sentences. Kindaichi (Reference Kindaichi1950b: 49) characterizes the basic meaning of Type 4 verbs as “the taking on of a state” rather than “being in a state.” In other words, Type 4 verbs imply that the current state has been achieved gradually over time. Support for this characterization derives from the compatibility with adverbs like masumasu ‘increasingly’ when accompanied with the auxiliary -te kuru ‘come to.’ Example (39) thus denotes an incremental change.
(39)
Yamada-san wa masumasu geijutsukazen to shi-te kita. Yamada-Mr top increasingly true.artist quot do-ger came ‘Mr Yamada is increasingly acquiring the air of as a true artist.’
By contrast, the physical attribute construction is wholly incompatible with such adverbs.
(40)
*Naomi wa masumasu aoi me o shi-te kita. top increasingly blue eye acc do-ger came Lit. ‘Naomi increasingly came to have blue eyes.’
The total unacceptability of (40) leads us to conclude that the physical attribute construction of ‘blue-eyed’ type is distinct from Type 4 verbs and constitutes a pure representation of inborn properties whose process of acquisition has no relevance to the speaker.
In closing this chapter, I will briefly mention how all the idiosyncratic features of the construction and their intricate interaction can be accounted for in a unified manner. While Tsujioka (Reference Tsujioka2002) entertains a generative-syntactic analysis, Kageyama (Reference Kageyama2004) explores an approach exploiting “semantic incorporation” – a process reported in diverse languages – whereby syntactically immobile indefinite objects are analyzed as being semantically incorporated with the head verb. Under this analysis, the indefinite and non-referential character of body part nouns in the physical attribute construction falls out as a natural consequence of the formation of abstract composite predicates consisting of accusative NPs and the “light” verb suru.
5 Conclusion and Future Research Perspectives
By highlighting certain idiosyncratic phenomena that have been largely neglected in the mainstream research on Japanese grammar, this chapter has argued that the distinction between event predication and property predication that lies at the heart of these phenomena has far-reaching implications for theories dealing with the interface of morphology, syntax, semantics, and possibly pragmatics – much more so than has previously been thought in Western linguistics. Specifically, it has been argued that sentences of property predication constitute a realm of their own that is distinct from the realm of event predication sentences on the grounds that these two realms are governed by different rules and principles. The rules and principles that determine the well-formedness of sentences of event predication often fail to apply to sentences of property predication. Whereas sentences of event predication are interpreted on the basis of the predicate’s argument structure, sentences of property predication are fundamentally built on the topic-copula structure, which is prevalent in the grammar of Japanese as a topic-prominent language.
The universal validity of this line of thought can be found in a wide array of phenomena from various languages that have eluded satisfactory accounts in the past studies focused on event predication but are susceptible to a systematic account on the interpretation that they represent property predication. These allegedly idiosyncratic phenomena include peculiar passives in Japanese and English, middle constructions in English and Japanese, peculiar reflexive constructions in Spanish and other languages, adjunct subject constructions in English and Chinese, potential forms in some Japanese dialects, and many others (Kageyama Reference Kageyama, Tsunoda and Kageyama2006a, Reference Kageyama2009).
Given the reality of the event-property distinction, then, a number of questions remain for future research. One of them is how to capture the distinction in formal terms. A possible approach is to consider sentences of event predication to be turned to sentences of property predication by “suppressing” the event argument. An alternative approach would be to dissociate syntactic constructions from their semantic structures and allow a non-one-to-one correspondence between them.
Another non-trivial question is whether events and properties make a dichotomous bipartition or constitute a continuous cline. The possibility suggested here is that the two predication types basically constitute distinct realms of grammar, without denying the possibility of shifting one predication type to the other by certain grammatical means. Masuoka (Reference Masuoka and Masuoka2008), on the other hand, holds that several subcategories he sets up for each of event and property predication are related to each other in a continuous manner. This question has to do with how events, states, and properties can be related to each other. On the assumption that verbs describe events that are temporally unstable, nouns describe objects whose character is stable, and adjectives have an intermediate status, Givón (Reference Givόn1984) contends that these three categories are arranged on a continuous scale of temporal stability. The assumption that events and properties are continuous concepts, however, cannot easily account for the disparity in their syntactic and morphological behavior as evidenced by agent compounds, the physical attribute construction, and many other phenomena. The papers collected in Kageyama (2012) will serve as a good point of departure for future research.







