Skip to main content Accessibility help
×
Hostname: page-component-76fb5796d-qxdb6 Total loading time: 0 Render date: 2024-04-29T19:00:28.870Z Has data issue: false hasContentIssue false

Chapter 2 - The Chinese Language and Writing System

Published online by Cambridge University Press:  04 January 2024

Erik D. Reichle
Affiliation:
Macquarie University, Sydney
Lili Yu
Affiliation:
Macquarie University, Sydney

Summary

provides an overview of the Chinese language and its many dialects and how they differ from other Asiatic languages (e.g., Japanese). The chapter then reviews the origins and evolution of the Chinese writing system. The chapter closes with an overview of the modern Chinese writing system and its conventions, including points of difference with other writing systems (e.g., alphabetic scripts like English) and examples of how words can be decomposed into characters, radicals, and strokes.

Type
Chapter
Information
The Psychology of Reading
Insights from Chinese
, pp. 21 - 45
Publisher: Cambridge University Press
Print publication year: 2024

This chapter is intended to provide a high-level description of the Chinese language and writing system. The description will not be comprehensive but will instead only be sufficient to understand how the similarities and differences between Chinese and other languages and writing systems that have been used to study reading, most notably English, have and might continue to be leveraged to provide theoretically interesting points of contrast. During the past few decades, these points of contrast have resulted in a growing appreciation that the science of reading might be advanced by studying the reading of languages that have markedly different writing systems, like Chinese. Our present description of the Chinese language and writing system will therefore focus mainly on the writing system and aspects of it that make it make it so unique and worthy of study. For a comprehensive treatment of the spoken Chinese language and its history, please consult Norman’s (Reference Norman1988) definitive volume on the topic, which provides a wealth of information about the origins of the language, its relation to other languages, and its key linguistic attributes. With this important disclaimer, let us now begin our description of the Chinese language and writing system.

2.1 The Chinese Language

As observed by W. Wang (Reference Wang1973: 60), “the Chinese language has the largest number of speakers in the world and the greatest time depth of its literature,” with the latter “spanning a period of 35 centuries” (51). Currently, Chinese is spoken as a first language by approximately 1.3 billion people. It is a misnomer to call it a “language,” however, because it is a family of languages consisting of seven to thirteen major mutually unintelligible linguistic groups or dialects that in turn consist of hundreds of regional variants (Norman, Reference Norman1988). For a variety of historical, political, and geographic reasons, the sizes of these language groups vary quite considerably, with larger, more homogeneous enclaves across the northern plains of China and smaller, more heterogeneous pockets of speakers located in the mountainous regions of southern China (see Figure 2.1). These linguistic groups form a dialectic “continuum,” with the degree of intelligibility often declining in a graded manner with increasing geographical distance, but with the rate of decline also punctuated by, for example, mountain ranges or large rivers that have historically separated two or more regions.

Figure 2.1 A map of China showing the main dialects and where they are spoken

The most commonly spoken of these languages is Mandarin, which is based on the Beijing dialect and currently has roughly 800 million speakers.Footnote 1 Mandarin was adopted as the official language of the Republic of China in the 1930s. It is also the official language of Taiwan, is one of a handful of official languages of both Singapore and the United Nations, and is spoken by the millions of Chinese diaspora who have emigrated around the globe. A few other of the most widely used dialects include Yue or Cantonese, which is spoken by about 68 million people, Wu or Shanghainese, which is spoken by about 74 million people, and Min, which is spoken by about 75 million people. Again, it is important to emphasize that most of these dialects are distinct languages, with a speaker of Mandarin, for example, being as unintelligible to a speaker of Cantonese as a speaker of English would be to a speaker of German (Norman, Reference Norman1988: 2).

The Chinese languages are also more distantly related to the Tibeto-Burman language group which, as its name suggests, includes both Tibetan and Burmese. This larger group can be contrasted with the main language groups that surround China, including the Altaic group (i.e., Turkic, Mongolian, Tungusic, and possibly Korean and Japanese) to the north, and to the south, the various Tai languages spoken in Vietnam, Laos, Thailand, and Burma. The languages within each of these main groups bear a “family resemblance” to each other, with the Chinese languages sharing an overlapping constellation of features that, collectively, distinguish them from the languages of the other groups. One of these features is that Chinese is monosyllabic, with each syllable corresponding to a single morpheme or unit of meaning. Thus, in contrast to English, where single-syllable words can contain multiple morphemes (e.g., cats = cat + s to denote plurality; ran = run + inflection to denote past-tense form) and multisyllabic words can correspond to a single morpheme (e.g., elephant, hammer, continent), most syllables in Chinese correspond to only one unit of meaning. (There are a few exceptions to this rule, but they are rare; e.g., the Chinese disyllabic word meaning “spider.”) However, as is true in English, most Chinese words consist of two or more morphemes and are polysyllabic.

Each spoken syllable has a specific phonological structure that, at a minimum, includes the vowel, but that can also have an optional onset consisting of a consonant or a consonant and a medial glide, as well as an optional coda consonant. In English, for example, the word “steel” has an onset consisting of the consonant cluster “st” (/st/) and a body consisting of the vowels “ee” (/iː/) and the coda consonant “l” (/l/). However, in contrast to English, consonant clusters are not permitted in either the onset or coda within the syllables that make up Chinese words. And one final property of the Chinese syllable happens to be the one that is perhaps most obvious to speakers of European languages – the fact that spoken Chinese, in contrast to languages like English and German but also many other Asian languages like Korean and Japanese, sounds “melodic” to the ear because each syllable is spoken with an associated change in its pitch contour, or tone. In the case of Mandarin, for example, each syllable has one of four different possible tones: (1) level; (2) rising; (3) falling and then rising; or (4) falling. (Some descriptions of Mandarin include a fifth, neutral tone that can be contrasted with the other four.) These tones are used to differentiate between the meanings that might be associated with a given syllable. For example, as Figure 2.2 illustrates, the syllable pronounced /ma/ can have one of four distinct meanings that can be differentiated by the tone that is used in its pronunciation; whereas the level tone /ma1/ means “mother,” the falling-then-rising tone /ma3/ means “horse.”Footnote 2 The tones thus function as phonemes in that they provide the minimal contrasts that are used to discriminate between two morphemes/words, in the same manner that the contrast between the phonemes /k/ and /b/ are used to respectively discriminate between the words “cat” and “bat” in English. And although Mandarin is spoken with four tones, there are northern dialects that are spoken using as few as three, and some southern dialects using six or more.

Figure 2.2 An example illustrating the four tones used to differentiate the meanings of the spoken syllable /ma/ in Mandarin

The arrows show each tone’s change in the pitch contour as might be measured using an oscilloscope.

Because the individual syllables correspond to morphemes, they often correspond to single syllable words. However, most words are polysyllabic, with most being bi-syllabic but a non-negligible proportion consisting of three or four syllables. As is true of English, Chinese morphemes can be divided into those that convey independent meaning, or contentives, and those that modify the contentives in some systematic manner, or functives. The former can be classified as nouns, verbs, or adjectives and are used to construct words in those classes, whereas the latter are used to convey the grammatical relationships among those words. Because the syntactic structures of phrases and sentences are conveyed using prepositions, particles, and word order, with the default for the latter being subject-verb-object, Chinese is generally considered to be an analytic or isolating language (Norman, Reference Norman1988). One implication of this is that the use of both derivational and inflectional morphology are comparatively rare.

Thus, in contrast to English, where an inflection requires internal changes to the base word (e.g., ate = eat + inflection to denote completed action), in Chinese, a small number of suffixes (e.g., /le/, /zhe/, etc.) can be used immediately after the verb to indicate aspect (e.g., perfective, or continuing). For example, /chi1/ is the basic verb meaning “eat,” but by adding /le/ to the verb, the phrase /chi1 le/ now suggests that the act of eating has been completed. (Note that the aspect differs from tense in that the completed action could refer to a past or future event.) Similarly, /chi1 zhe/ indicates that the act of eating is currently in progress. As another example, the plural suffix /men/ can be added to an animate pronoun or noun to mark numerical change and thereby denote a collective (e.g., /wo3/ “I” + /men/ = /wo3 men/, meaning “we” or “us”; /xue2 sheng1/ “student” + /men/ = /xue2 sheng1 men/, meaning “students”).

It is also worth noting that the obligatory use of inflected forms is limited to indicating plural pronouns (e.g., /wo3 men/ “we” or “us”) and it is not obligatory in other contexts. For instance, the plural measurement /zhe4 xie1/ “these” can be combined with singular nouns (e.g., added to /xue2 sheng1/ “student” to give /zhe4 xie1 xue2 sheng1/, meaning “these students”). Similarly, durative actions can also be expressed using an adverb /zheng4 zai4/ meaning “in the process of” before the verb to indicate that the action is in progress without using inflectional suffix /zhe/. Some linguists therefore consider the use of inflectional suffixes a grammatic or syntactic process rather than a morphological one (e.g., Norman, Reference Norman1988; cf. Packard, Reference Packard, Wang and Sun2015). One likely implication of this is that the concept of “word” in Chinese is not clear to many of its readers – or even some linguists!

There are also some derivational affixes that can be attached to a base word to form new words or phrases that have a different syntactic category or meaning. At least as compared to inflected words, these derived words appear to be more common in Chinese. To give a few examples, the prefix /fu4/ denotes “again” and can be used to generate words like /fu4 he2/ (“reunite”), /fu4 cha2/ (“re-examine”), and /fu4 yuan2/ (“recover”). Likewise, the prefix /wu2/ negates a base word, allowing for the generation of such words as /wu2 xu1/ (“no need”), /wu2 xian4/ (“unlimited”), and /wu2 chang2/ (“without pay”). As one final example, the suffix /hua4/ corresponding to “-ify” or “-ize” can likewise be used to generate /jian3 hua4/ meaning “simplify,” /yang3 hua4/ meaning “oxidize,” and /gong1 ye4 hua4/ meaning “industrialize.”

Despite the use of inflectional and derivational affixes and suffixes in Chinese, most Chinese disyllabic or multisyllabic words are formed through compounding. Additionally, certain aspects of Chinese word formation and their grammatical features are quite different from anything that is found in English.

One example is that, in Chinese, the articles and numerals that modify nouns cannot directly precede those nouns. The articles and numerals must instead be separated from their corresponding nouns by classifiers that are used in reference to units of measure. Thus, while a speaker of English might perfectly well say “the cat” or “three books,” the equivalent phrases in Chinese (i.e., /na4 mao1/ and /san1 shu1/, respectively) would be agrammatical. The Chinese speaker would instead by obliged to say /na4 zhi1 mao1/, or “the piece cat,” and /san1 ben3 shu1/, or “three piece books,” where the word “piece” is a loose English translation that has little semantic content but that serves as a placeholder for the two classifiers, /zhi1/ and /ben3/.

A second example involves the use of syllable reduplication, wherein a noun can be repeated to convey the added meaning of “every.” For example, repeating the word /ren2/, which by itself means “person,” will produce “every person” (i.e., /ren2 ren2/). Similarly, repeating the word for “day,” /tian1/, gives “every day” (i.e., /tian1 tian1/). Applying this reduplication principle to verbs will change the word to its transitory meaning. For example, repeating the word /kan4/, which by itself means “to look,” will produce “to take a look” (i.e., /kan4 kan4/), while repeating the word that means “to walk,” /zou3/, will produce “to take a walk” (i.e., /zou3 zou3/). And to give one final example, an adjective can be converted into an adverb via reduplication and the addition of the /de/ suffix; for example, the adjective /kuai4/ meaning “quick” can be converted into the adverb /kuai4 kaui4 de/ meaning “quickly.”

A third example is related to the formation of new words by conjoining two morphemes that have the opposite meaning. For example, the antonyms /mai3/ (“buy”) and /mai4/ (“sell”) can be joined to form /mai3 mai4/, meaning “business.” Similarly, conjoining /chang2/ (“long”) and /duan3/ (“short”) gives /chang2 duan3/, or “length.” However, the meaning of the conjoined words is not always transparently related to their parts; for example, /fan3/, which means “turned over,” can be combined with /zheng4/, meaning “right side up,” to produce /fan3 zheng4/, which means “in any case.”

As indicated previously, these features of the Chinese language differentiate it from English as well as the other Asiatic language groups that were mentioned earlier. For example, although the use of monomorphemic, tonal syllables is a feature shared by the Chinese languages and many of the languages spoken to the south of China (e.g., Miao, Thai, Vietnamese, or Yao), it differentiates Chinese from northern languages that use polymorphemic, atonal syllables (e.g., Japanese, Korean, Manchu, Mongolian). And conversely, Chinese languages share features with their northern linguistic neighbors (e.g., each syllable onset can only contain one consonant, adjectives must precede the nouns that they modify, etc.) that are at odds with their southern linguistic neighbors. These similarities and differences provide clues about the evolution of the Chinese language family and its linguistic neighbors. Although this evolution has undoubtedly been bidirectional (e.g., as evidenced by that fact that new words, especially those describing technology, have been introduced into Chinese after first being appropriated into Japanese), it is no exaggeration to say that the influence of Chinese culture has been profound. In fact, the influence of the Chinese language on other regional languages has been likened to that of ancient Greek and Latin on the development of European languages (Norman, Reference Norman1988).

Finally, as was indicated at the beginning of this chapter, our main objective in providing this overview has been quite modest – to provide the minimal background that is required for someone unfamiliar with the Chinese language and writing system to gain a better understanding and appreciation of the latter (in the remainder of this chapter). This understanding is a prerequisite for understanding the topics that will be discussed in the rest of this book – namely, what has been learned about the psychology of reading from research on the reading of one writing system, Chinese. For a more in-depth discussion of the Chinese language, we again invite the reader to consult Norman (Reference Norman1988) because it arguably provides the most complete and authoritative treatment of the topic (at least, in English). With that caveat, we now turn to a brief discussion of the origins of the Chinese writing system.

2.2 The History of the Chinese Writing System

The Chinese writing system has played a defining role in Chinese history and is one of the most remarkable cultural inventions in all of human history. The former claim is based on the fact that the Chinese writing system has provided a foundation for the development of Chinese culture and political unity. This is due to three factors that are perhaps unique to China. First, as has already been discussed, the Chinese “language” comprises many mutually unintelligible languages. Second, the geographic region making up China is occupied by a larger number of different ethnic groups. Third, the history of China is every bit as complex and rich as that of Europe (see Keay, Reference Keay2009). These three factors together have meant that the Chinese writing system has been used as a linguae franca for the peoples of China, allowing for ready commerce and the sharing of knowledge, much as Latin allowed for such exchanges throughout much of the history of Europe. As Norman states in his discussion of the Chinese writing system:

The aptness of language as a symbol of cultural and even political unity was facilitated by the use of a script that for all practical purposes was independent of any particular phonetic manifestation of the language, allowing the Chinese to look upon the Chinese language as being more uniform and unchanging than it actually was.

This continuity of the written form, in combination with the fact that it has existed for at least 3,500 years, thus makes Chinese unique among the languages of the world. One implication of this fact is quite remarkable: Modern readers of Chinese can often read texts that were written hundreds or even thousands of years ago! This is true even though, because the spoken language (like all spoken languages) has continued to evolve, the spoken form of ancient Chinese is as different from its modern counterpart as Latin is from modern Italian and French. Thus, although a native speaker of Chinese might be able to read and understand portions of the Analects as they were originally written by Confucius more than two millennia ago (during the Warring States period, approximately 475–221 bce), the same Chinese speaker would not be able to have a spoken conversation with Confucius. This is because the spoken form of the Chinese language has continued to evolve and change over the ensuing millennia. This is not to say that the writing system has not also changed because of course it has. In fact, much more is known about the evolution of the writing system due to the simple fact that physical evidence of this change has been preserved in various media.

The earliest evidence of Chinese writing comes from inscriptions on the turtle shells, animal bones, and bronze vessels that were used for divinatory purposes (e.g., predicting weather).Footnote 3 These inscriptions have been dated to the Shang dynasty (sixteenth to eleventh centuries bce), but both their prevalence and level of sophistication suggest that they were in widespread use perhaps centuries earlier (R. Chang & Chang, Reference Chang and Chang1978; Norman, Reference Norman1988). A few examples of these inscriptions are illustrated in Figure 2.3. As shown, many consist of pictographs in that the referent of each inscription can be readily inferred in the absence of any knowledge of written Chinese. For example, the inscription meaning “sun” consists of a circle with a dot in the center, while the inscription for “horse” is a simple line drawing showing the animal complete with both its mane and tail. It is important to note, however, that this is not true of all the inscriptions, and that the total number of characters that have been cataloged is more than four thousand (Robinson, Reference Robinson1995). These two facts indicate that the inscriptions were in fact part of a complete writing system and not just, for example, used for artistic purposes. They also suggest that the writing system may have been developed hundreds of years earlier than the current archeological evidence indicates, perhaps by as early as the Xia dynasty, which ended the sixteenth century bce.

Figure 2.3 A few examples of the earliest form of Chinese writing

Inscriptions that have been found on turtle shells, animal bones, and bronze vessels that were often used for oracles.

These examples were extracted from www.zdic.net.

Although this early Chinese writing system may have been sufficient for its purposes, careful consideration of the examples in Figure 2.3 suggests at least a few limitations inherent in the approach. The first is that the use of pictographs is by its very nature primarily applicable to concrete referents that can be drawn, such as plants, animals, geographic features, and human artifacts. Although some abstract concepts can also be represented (e.g., the concept “above” is represented by two horizontal lines, with the shorter of the two being above the longer), these abstract concepts are less transparent and necessitate that the group of pictograph users adopt and understand the conventions that allow the symbols to be used. This general approach to denoting referents also becomes increasingly difficult as the concepts become more abstract or complex, making it hard to represent complex thoughts about, for example, human emotions, political concepts, or future events. A second limitation of the inscriptions is that they do not provide direct links to their spoken forms; for example, the symbols for “sun” and “horse” shown in Figure 2.3 provide no indication of how the two words were spoken. Finally, due to their complexity, the symbols can be cumbersome to inscribe and perhaps even require some degree of artist talent to render in a manner that is intelligible to others.

For those reasons, the increasing use of pictographs to represent other types of records (e.g., financial transactions) meant that they were subject to selective pressure to make them easier to use. One of these changes was their simplification and a movement away from the use of true pictographs to the use of more abstract characters. This abstraction of course afforded the depiction of more complex and abstract concepts. Figure 2.4 shows examples of how a few pictographs evolved into their modern equivalents, Chinese characters, during the centuries following the Shang dynasty (sixteenth to eleventh centuries bce). In tandem with this simplification, another convention that was intended to make reading easier was the adoption of the rebus principle to associate concepts that were difficult to depict with characters that sounded like their spoken counterparts. For example, at one point, the character meaning “wheat” and pronounced /lai2/ was used in substitution for the concept “come,” which was difficult to depict but was also pronounced /lai2/. Over time, the character was used exclusively to refer to its new adopted meaning as the spoken word for “wheat” fell out of use. And in a similar manner, other characters were adopted to represent the meanings of concepts that were pronounced like concepts that had originally been associated with the characters. Two consequences of this trend are that it connected the phonology of the spoken language to its written form more directly, and expanded the number of possible referents.

Figure 2.4 A few examples showing how early pictographs changed into their modern character equivalents during the evolution of the Chinese writing system

These examples were extracted from www.zdic.net.

As Figure 2.4 shows, the Chinese writing system continued to change throughout recorded Chinese history. It is worth noting, however, that this evolution did not proceed at a constant rate or with the desired end product in mind; rather, the changes were sporadic and likely emerged unsystematically in various locales, with some of the resulting changes being adopted and spreading to other locales and many (perhaps even most?) either going unnoticed or falling into disuse. The evolution of the writing system was thus analogous to biological evolution in that the retention and proliferation of its features were selected as a function of their utility “fitness,” with the changes being made to facilitate the more widespread reading and writing of Chinese.

However, as carefully documented by Norman (Reference Norman1988), there were also at least three significant periods in the development of the Chinese writing system, where the changes were rapid, systematic, and by design. The first was the unification of China under the Qin dynasty (221–207 bce). This political unification brought about the standardization of units of measurement and legal statutes, and the replacement of various local scripts with a single script that would further consolidate the government’s control of its empire. This script was also of two types: a seal script that, as implied by its name, was used for official seals and documents, and a clerical script that was used by government officials and clerks for commerce, the maintenance of inventories, and so on. Both scripts were highly standardized but with the former being more complex and stylized and the latter being simpler and thus easier to use for everyday purposes.

The second watershed period in the development of the Chinese writing system occurred during the Han dynasty (206 bce – 24 ce), when the seal script was abandoned in favor of the clerical script, which was further simplified and standardized. For example, previous efforts to preserve the pictographic links between characters and their referents were abandoned in favor of utility. The line segments comprising the characters were also shortened and straightened. The circular lines in the character representing “sun,” for example, were straightened to produce its current form, a box bisected by a horizontal line. This simplified clerical script is therefore the basis of the modern Chinese characters that are used today; although native speakers of Chinese would have considerable difficulty reading most characters written in the seal script, speakers with a good understanding of written Chinese can decipher many characters written in clerical script. Finally, an abbreviated, cursive form of the characters were also introduced for informal purposes and for writing draft documents.

The third and final watershed period in the development of the Chinese writing systems occurred in the mid-twentieth century. After their rise to power in 1949, the government of the People’s Republic of China began implementing a nationwide reform of the Chinese writing system. Up until 1956, Chinese has been written in the traditional manner, using relatively elaborate characters and the convention of writing the characters from top to bottom, in columns running from right to left. The reform brought about the simplification of a large proportion of existing characters, a few examples of which are shown in Figure 2.5. The Western convention of writing in rows from left to right was also adopted. Finally, for pedagogical purposes, an alphabetic writing system called pinyin was adopted for teaching elementary children in mainland China about the phonology of the Chinese language.Footnote 4 This pinyin is still used today as a “scaffold” to support literacy education that is eventually abandoned as the children learn to read characters. It is also worth noting that, although these conventions were adopted in mainland China, the more traditional writing system has largely been retained in Hong Kong, Macau, Taiwan, and other geographical regions that have large Chinese diaspora; in these locations, the more complex traditional Chinese characters are often used, and it is not uncommon to see books, newspapers, and signs written in the traditional manner, with columns of characters running from right to left. Finally, modern Chinese is now also written using punctuation to denote clauses and the ends of sentences.Footnote 5

Figure 2.5 A few examples illustrating the similarities and differences between traditional and simplified Chinese characters

In closing this section, it is noteworthy how the Chinese writing system has influenced the cultures and writing systems of the many countries surrounding China. As the preeminent political and cultural entity in eastern Asia, it is perhaps not surprising that the development of the Chinese writing system and literature had a profound effect on the cultural development of the surrounding regions. The prime example is the fact that the Chinese writing system was borrowed by the Thai, Korean, and Japanese and used for official government purposes for many centuries. Although the Thai developed their alphabet (based on an Old Khmer script) in the thirteenth century ce and the Koreans likewise developed their own alphabetic system (called hangul) in the fifteenth century, both countries continued to employ some Chinese characters until the mid-twentieth century. Similarly, although the Japanese also developed their own writing system (a system called kana that represents spoken syllables), this indigenous system is still used in combination with Chinese characters (which are called kanji) to the present day. The fact that speakers of these vastly different languages could adopt the Chinese writing system to their own languages underscores the point that was made earlier about Chinese characters not being directly linked to the pronunciations of the syllables that they represent – a point that will be raised again in the next section. One immediate implication of this, however, is that native speakers of what are effectively different languages, such as Mandarin and Cantonese and, to a more limited extent, Japanese, can read the same text even though they cannot then talk about its contents. This remarkable situation would be analogous to speakers of English and German being able to read the same newspaper but being unable to discuss its contents.

2.3 The Modern Chinese Writing System

Figure 2.6 shows a sentence written in modern (simplified) Chinese, as is currently the convention in the People’s Republic of China. To the Western eye, the feature that is perhaps most salient is the simple fact that the sentence is written as a single continuous line of uniformly sized, box-shaped characters, without the blank spaces between words that, by convention, are used in most alphabetic writing systems. (The use of blank spaces to demarcate word boundaries is not necessarily a feature of all alphabetic writing systems, however; for example, Thai is written using an alphabet but, like Chinese, does not use blank spaces to demarcate words; see Reilly et al., Reference Reilly, Aranyanak, Yu, Yan and Tang2011.) The absence of between-word spaces is of significant theoretical importance for at least two reasons.

Figure 2.6 An example sentence written in modern Chinese using simplified characters

The first is that Chinese words comprise a variable number of characters. For example, estimates of word type frequency derived from large corpus of natural text (e.g., Lexicon of Common Words in Contemporary Chinese Research Team, 2008) indicate that only about 6 percent of Chinese words comprise one character, with an additional 72 percent of words, the large majority, consisting of two characters, 10 percent consisting of three characters, and the remainder consisting of four or more characters. (Chinese words containing four or more characters usually correspond to vocabulary that is slang, proverbs, or borrowed from other languages; e.g., 庐山真面目 /lu2 shan1 zhen1 mian4 mu4/ means “the truth of something”; 阿尔卑斯山 /a1 er3 bei1 si1 shan1/ means “Alps.”) However, token frequency counts, which provide estimates of how often instances of given words occur, indicate that about 70 percent of the words that occur in printed text contain one character, 27 percent contain two characters, 2 percent contain three characters, and only 1 percent contain four or more characters. That the two frequency counts differ reflects the selective pressure to shorten the most used words. Similar pressures exist in English where the most commonly used words also tend to be very short.

As might be guessed from this brief discussion of how text corpora are used to estimate character frequency counts, another important fact is that the sheer number of characters is vast and has continued to grow over the course of Chinese history. For example, Norman (Reference Norman1988) has documented how the number of characters in circulation expanded from approximately 10,000 during the Han dynasty (206 bce – 24 ce) to more than 50,000 by the Song dynasty (960–1279 ce), but with a significant proportion of those characters now being obsolete and thus only rarely used. Perhaps a better indicator of the number of characters that are still in active use comes from estimates of the number known by the average literate Chinese person. Again, Norman provides these estimates. For example, a study conducted in the 1960s by the Institute of Psychology at the Academy of Sciences suggests that the average university educated person knows approximately 3,500 to 4,000 characters, and that a working knowledge of about 3,000 characters is required to read a newspaper. R. Chang and Chang (Reference Chang and Chang1978) provide a similar estimate of 2,800 to 3,000 characters, while W. Wang (Reference Wang1973) provides a considerably larger estimate – that knowledge of 4,000 to 7,000 characters is required to read a newspaper. Finally, the government of the People’s Republic of China defines functional literacy as knowledge of the 2,000 most common or important characters. Although these estimates clearly vary, they are consistent in showing that – perhaps paradoxically – a well-educated reader of Chinese must devote years of study to learning thousands of characters, but despite this effort ends up knowing only a fraction of the characters that are currently in existence.

Returning now to our discussion of Figure 2.6, the fact that words consist of a variable number of characters in conjunction with the absence of clear word boundaries can result in ambiguity about how any sequence of characters should be grouped together or segmented, thus causing confusion about the boundaries and identities of individual words (Hsu & Huang, Reference Hsu and Huang2000; Inhoff & Wu, Reference Inhoff and Wu2005). To illustrate this point, a somewhat analogous situation can be experienced in English if the spaces between two or more words are removed. For example, consider the letter sequence “catchair.” Does it refer to a “cat chair,” whatever that might be, or is it instead a directive to “catch air”? Such ambiguities occasionally occur in written Chinese. For example, the three characters “花生长” can be segmented into “花” and “生长,” respectively meaning “flower” and “grows,” but can also be segmented into “花生” and “长,” respectively meaning “peanut” and “grows.” Similarly, the four characters “通过去年” contain three overlapping words (i.e., “通过,” “过去,” and “去年,” respectively meaning “through,” “past,” and “last year”), which can also lead to segmentation difficulty, especially during the initial pass of reading. Indeed, studies in which native Chinese speakers are simply asked to indicate where the word boundaries are in sentences indicate that people find it difficult to define the concept of a word, often disagreeing about precisely where the boundary between two words is located (e.g., Hoosain, Reference Hoosain1992; P. Liu et al., Reference Liu, Li, Lin and Li2013). More will be said about this in Chapters 3 and 4.

A second reason why the absence of clear word boundaries is theoretically interesting is that it raises the question of how readers of Chinese “know” where to move their eyes.Footnote 6 In alphabetic languages like English and German, for instance, it is generally accepted that readers use information about the length and location of the upcoming or parafoveal word in programming eye movements from one word to the next (Reichle et al., Reference Reichle, Pollatsek and Rayner2012; Schad & Engbert, Reference Schad and Engbert2012). Thus, from a fixation on word N, readers simply direct their eyes to the center of word N+1 because this viewing location will afford the most efficient processing of the word (McConkie et al., Reference McConkie, Kerr, Reddix and Zola1988; O’Regan, Reference O’Regan and Rayner1992); a fixation near the center of word N+1 will allow it to be processed from the center of vision, where visual acuity is maximal (Bouma, Reference Bouma1973; Frey & Bosse, Reference Frey and Bosse2018; Normann & Guillory, Reference Normann, Guillory, Hung and Ciuffreda2002; Veldre et al., Reference Veldre, Reichle, Yu and Andrews2023). However, it remains less clear both if and how this strategy might be adapted to the reading of Chinese given the lack of clear information about an upcoming word’s length and location. This uncertainty has resulted in different theoretical accounts of saccadic targeting in Chinese reading (e.g., Y. P. Liu, Yu, Fu, et al., Reference Liu, Yu, Fu, Li, Duan and Reichle2019; M. Yan et al., Reference Yan, Kliegl, Richter, Nuthmann and Shu2010; for a review, see Y. P. Liu et al., Reference Liu, Yu and Reichle2023). These accounts will be discussed at length later (see Chapter 4), but for now suffice it to say that the main point of this debate is that it has raised questions about long-standing assumptions about the psychology of reading that may eventually be shown to be incorrect (or at least incomplete; e.g., see Y. P. Liu et al., Reference Liu, Yu and Reichle2019).

Returning to our discussion of Figure 2.6, another striking feature of the Chinese writing system that has not been discussed until now are the characters themselves. As briefly mentioned in the previous section of this chapter, the characters have their origins in pictographic inscriptions (see Figure 2.3), but by the Han dynasty (206 bce – 24 ce; see Figure 2.4) they had already taken on much of their current form, with each character occupying a uniformly sized, box-shaped area. Closer examination of the characters themselves, however, indicates other important features that differentiate them from words in alphabetic writing systems.

The first is that the characters consist of a variable number of simple line segments that are called strokes because, throughout most of China’s history, the characters were written using a brush and ink. As shown, both the number and arrangement of the strokes can vary, producing a full range of orthographic complexity that has often quantified using counts of the number of strokes. At one end of the continuum are single-stroke characters like 一 /yi1/, which means “one,” while at the other end of the continuum are characters having thirty-six strokes, such as 齉 /nang4/, which means “(nose is) blocked.” Interestingly, the average number of strokes per character is five or six, like the average number of letters per word in English. Despite the potential variability associated with the shape, size, and location of the individual strokes, their sizes and locations are restricted in that they must occur within the confines of the character. The shapes of the strokes can also be grouped into eight basic types (Y. Hu, Reference Hu1981).Footnote 7 Following the tradition of Chinese calligraphy practice, these eight types of strokes can be illustrated using the single character 永 meaning “forever,” as is shown in Figure 2.7. Another integral part of learning to write Chinese is that the order in which the strokes are written within the characters is also explicitly taught through repetition. The order can be described by a set of rules which include writing the strokes from top to bottom and then left to right, rendering horizontal strokes before vertical strokes, and so on. An example showing the order in which the strokes of a character should be written is also shown in Figure 2.7.

Figure 2.7 An example illustrating the eight basic types of strokes and the order in which character strokes are normally written

Note that the second and third strokes in 永 are compound strokes, which consist of a sequence of basic strokes written in a continuous fashion (i.e., without lifting the pen from the page).

A second important fact about the characters is that, in contrast to alphabetic writing systems, where the smallest units are letters that are arranged along a single horizontal dimension, the characters are rendered along two dimensions and often consist of a hierarchy of elements. For example, as Figure 2.6 shows, the character 猫 meaning “cat” is composed of two clusters of strokes, with one on the left side of the character and the other on the right. These stroke clusters are called radicals and they are often used to denote information related to either the meaning or pronunciation of the characters in which they are embedded. In fact, about 80 percent of Chinese characters are phonograms that consist of one radical that signifies the character’s meaning and a second radical that indicates its pronunciation. The semantic radical is most often located on the left or top side of the character with the phonological radical located on the right or bottom side, though this is not always true. Characters can contain up to nine radicals and those radicals are arranged in a variety of different manners (e.g., left-right, above-below, inside-outside, etc.). There are currently about 220 radicals in circulation in the writing system – an admittedly large number but small enough that their redundancy across characters makes the learning of those characters significantly easier.

With the majority of Chinese characters being phonograms, the remainder can be sorted into three other broad categories:

  1. 1. the simple pictographic characters;

  2. 2. ideographic characters that represent abstract concepts using symbols whose meanings are suggested by the visual forms of the characters (e.g., the character 上 shown in Figure 2.3 meaning “above”);

  3. 3. associative compound characters whose meanings are suggested by their constituent radicals (e.g., the character 武 meaning “military force” or “martial arts” and pronounced /wu3/ consists of radicals meaning “dagger-ax” and “stop”).

Only the associative compound characters and phonograms remain productive in the modern writing system; the number of pictographic and ideographic characters is largely closed in that most concrete referents and concepts that are easy to depict have already been represented by characters.

Returning now to Figure 2.6 and our discussion of phonograms, it is important to note that the phonetic radicals vary in terms of how diagnostic they are with respect to a given character’s pronunciation. Because most phonetic radicals can be characters on their own and thus have a pronunciation,Footnote 8 a phonogram can be regular, for example, if its pronunciation is the same as the phonetic radical embedded within it.Footnote 9 However a phonogram can be irregular if its pronunciation is different from its phonetic radical. Phonetic radicals can also vary in terms of their consistency, with the pronunciation of some being relatively consistent across the characters that contain them, and the pronunciations of others being relatively inconsistent across their characters. Both properties of characters are respectively illustrated by the examples shown in Panels A and B of Figure 2.8. Finally, because individual characters correspond to monomorphemic syllables, Chinese also exhibits a large amount of homophony, which in turn means that characters can vary in terms of their phonological density. As shown in Figure 2.8C, for example, some pronunciations are common to many characters so those characters exhibit relatively high homophone density, whereas other pronunciations are only shared by a few characters so they exhibit relatively low density.

Figure 2.8 Example phonograms of phonetic radical variation

The example phonograms show how phonetic radicals can vary in terms of their: (A) regularity; (B) consistency; and (C) homophone density.

The fact that Chinese characters vary in terms of their phonological regularity, consistency, and density is thus similar to what is found in languages with alphabetic writing systems like English, where individual words can vary in terms of their regularity (e.g., “cat” vs. “yacht”), consistency (e.g., “pint” vs. “mint,” “hint,” “lint,” etc.), and density (e.g., “-at” in “cat,” “hat,” “sat,” etc. vs. “-eopard” in “leopard”). However, in the case of alphabetic writing systems, these variables are directly related to the relationships between the individual graphemes (i.e., letter and letter combinations like “sh”) and phonemes in a manner that is not possible with Chinese. This difference underscores another fundamental distinction between the two writing systems.

In alphabetic writing systems, the pronunciations of words can be generated in two ways: by retrieving the pronunciation directly from memory, or by “sounding out” the letters (Coltheart et al., Reference Coltheart, Rastle, Perry, Langdon and Ziegler2001). The first method is often referred to as addressed phonology because the representation of a word’s pronunciation is inferred to be retrieved from some “address” in memory. By contrast, the second method is referred to as assembled phonology because the pronunciation is constructed from “rules” that specify the set of grapheme-to-phoneme correspondences (GPCs) within a language. By this account, addressed phonology is required to pronounce words that do not conform to a language’s GPCs (e.g., “yacht” and “colonel” in English), where assembled phonology is required to pronounce unknown letter strings (e.g., “brane” and “flink”).Footnote 10

In contrast, the pronunciation of individual Chinese characters cannot be assembled from their constituent parts but must instead be retrieved from memory in a manner similar to what presumably happens with irregular words (e.g., “yacht”) in English. The fact that the pronunciations of Chinese characters must be generated using addressed phonology has led to descriptions of Chinese phonology as being available to the reader in an “all or none” manner. For example, as Perfetti et al. (Reference Perfetti, Liu and Tan2005) argue:

…in an alphabetic system, the word-level units do not wait for a complete specification of all letter units prior to activating word level phonology (i.e., cascade style). In Chinese, the word-level phonology is not activated prior to a full orthographic specification of the character – hence, threshold style.

Although the distinction between “cascade style” versus “threshold style” phonology as articulated by Perfetti and colleagues (see also Coltheart et al., Reference Coltheart, Curtis, Atkins and Haller1993) is an important one for understanding how Chinese differs from alphabetic writing systems, we hasten to add one important qualifier. Although “threshold style” might accurately describe how the pronunciations of individual characters are generated, the pronunciations of multi-character words can probably also be generated via a process that is more analogous to assembled phonology. That is, the pronunciations of the individual characters might be “sounded out” and then blended to produce the pronunciation of the whole word. Thus, one might argue that, although individual characters play the equivalent of both syllables and morphemes in languages that use alphabetic writing systems, the characters can also play the role of graphemes in that they can be used to assemble the pronunciations of multi-character words. This claim requires one additional caveat, however.

As mentioned previously, an important distinction between alphabetic writing systems and written Chinese is that, in the former, the “building blocks” of words, the individual letters, are arranged along a single horizontal dimension, whereas in the latter, the “building blocks” of words are both hierarchical (i.e., consist of radicals and characters) and arranged along two spatial dimensions. These distinctions are important because having a better understanding of Chinese words will likely inform our understanding of two general issues in the psychology of reading.

The first is called the alignment problem and refers to the question: How do readers (most often) perceive the correct order of letters within words? In reading the word “cats,” for example, how does the reader know that the word is not “acts” or “cast”? Early models of word identification (e.g., McClelland & Rumelhart’s, Reference McClelland and Rumelhart1981 interactive-activation model, as discussed in Chapter 1) simply ignored this question by assuming that individual letters are encoded and represented in each of their respective “slots” (e.g., for the word “cats,” the letter “c” would be encoded in the first spatial position, “a” encoded in the second, and so on). The fact that certain types of letter-transposition errors are more common (e.g., mistaking “cast” for “cats”) than others (e.g., mistaking “cars” for “cats”) led to the recognition that the individual letters within a word are not encoded veridically but are instead encoded in some manner that is subject to error. More contemporary models therefore make some attempt to explain these findings by assuming that the positions of letters are subject to spatial uncertainty (Gomez et al., Reference Gomez, Ratcliff and Perea2008; Norris & Kinoshita, Reference Norris and Kinoshita2012), are converted into temporal codes that are subject to error (Davis, Reference Davis2010; Whitney, Reference Whitney2001), are affected by the syllabic structure of words as represented in memory (Taft & Krebs-Lazendic, Reference Taft and Krebs-Lazendic2013), or also include representations of spatially adjacent letter pairs (i.e., open bigrams; Grainger & van Heuven, Reference Grainger, van Heuven and Bonin2003).

The second general issue is related to the first and pertains to the question: What are the basic features in visual word identification? In alphabetic writing systems like that of English, for example, there is evidence that the individual letters that comprise words are the basic features that are used in lexical processing. One type of evidence supporting this claim is the letter-transposition effects mentioned earlier, along with the facts that these errors are usually much more difficult to detect than letter-substitution errors but much easier to detect than transpositions involving larger units, such as morphemes (e.g., mistaking “cowboy” for “boycow”). Another type of evidence is that, during natural reading, the individual letters that are initially perceived on the printed page appear to be rapidly converted into abstract orthographic representations that are invariant across fonts, case, or other typographical factors. For example, in eye-movement experiments where participants read text displayed in alternating lower- and upper-case letters (e.g., “LiKe ThIs”), there is no cost associated with changing (e.g., “lIkE tHiS”) as compared to not changing (“LiKe ThIs”) the case in which the letters are displayed following each successive saccade (McConkie & Zola, Reference McConkie and Zola1979). This finding suggests that the lower- and upper-case forms of letters that are as markedly different as “g” and “G” or “a” and “A” are somehow represented in the same manner (i.e., as abstract letters that are somehow devoid of their specific visual features), thereby allowing the set of twenty-six abstract letters in English to function as the invariant features that are used to access a reader’s lexicon.

As will be discussed at length in Chapter 3, whether and how this happens during the reading of Chinese is unclear. What is clear, however, is that Chinese words can be decomposed into characters which can then be further decomposed into complex radicals, simple radicals, or strokes (see Figure 2.6). Adding to this complexity is the fact that these units can:

  1. 1. play multiple functional roles (e.g., the radical 苗 in the character 猫 /mao1/, meaning “cat,” can also be a single-character word 苗 /miao2/, meaning “sprout,” which consists of two simple radicals, 艹 and 田);

  2. 2. occur in a variety of spatial locations (e.g., the radical 木 occurs in the top half of the character 杰 but the right side of character 休); and

  3. 3. undergo some amount of distortion in both their shape and size to accommodate their spatial locations (e.g., compare the radical 土 in the characters 坝 vs. 吐).

Each of these problems lacks a direct analog in alphabetic writing systems, making most models of word-identification (which have been developed around English) of questionable value for finding their solutions. And as we shall see in Chapter 3, although models of Chinese word identification have made valiant attempts to explain key behaviors related to the reading of Chinese characters and words, these attempts have often required a few simplifying assumptions that have sidestepped many of the inherent difficulties discussed here. These issues will be revisited in the upcoming chapters.

2.4 Conclusion

This chapter was intended to provide a brief introduction to the Chinese language and writing system – one that allows a reader who is unfamiliar with either to understand the remainder of this book. Because the focus of this book will be the reading of Chinese, and how it is like but also different from the reading of alphabetic writing systems such as that of English, our discussion of the Chinese language has only been intended to provide a context for any subsequent discussion of the Chinese writing system. Therefore, in the remainder of this book, we discuss in more detail how properties of the Chinese writing system have influenced how readers identify printed characters and words (Chapter 3), the skilled reading of text (Chapter 4), and the development of skilled reading, its impairment, and what has been learned from cognitive neuroscience about reading (Chapter 5). These chapters will each discuss the evidence about their respective topics that has been accumulated from behavioral and brain-imaging experiments and computer models of Chinese reading.

Footnotes

2 There are different conventions for representing the tones associated with Chinese syllables written using the Roman alphabet (e.g., via diacritical markings above the vowels; W. Wang, Reference Wang1973). The convention that will be adopted in this book entails appending numbers to the ends of syllables to indicate their tones, with 1 to 4 respectively indicating the level, rising, falling-then-rising, and falling tones. (The fifth, neutral tone is not marked with a number.) Relatedly, the pronunciations of syllables and words will be indicated by the use of forward slashes.

3 For discussion of the earliest forms of writing around the world and the archeological evidence dating their development, see Robinson (Reference Robinson1995).

4 Similarly, another phonetic system called zhuyin is also used in Taiwan to support early literacy training.

5 One other historically significant milestone related to the development of Chinese writing warrants mention – the invention of the printing press. The earliest known printed texts were Buddhist scriptures produced in the eighth century ce using inked ceramic blocks carved in relief. The invention of moveable type then occurred sometime later during the Five Dynasties/Ten Kingdoms period (907–959 ce), predating its invention by Johannes Gutenberg in 1440 ce, Germany, by about five centuries (Keay, 2011). As was true in Germany, the mass printing of text allowed for its broad dissemination and likely contributed to the standardization of the writing system.

6 The scare quotes are used here because the decisions that readers make about when and where to move their eyes are not made consciously, but are instead determined by various perceptual, cognitive, and motoric factors that largely operate outside of conscious awareness (see Reichle, Reference Reichle2006).

7 Note that there are different taxonomies for describing the basic strokes of Chinese characters. For example, the State Language Commission and Ministry of Education China (2001) defines five basic stroke types and thirty-three subordinate stroke types that can be derived from those basic types.

8 There are also a small proportion of the phonetic radicals that are not standalone characters, but because they consistently appear in characters that share similar pronunciations, are also deemed to be phonetic radicals; e.g., the top (complex) radical in characters 奖 /jiang3/, 浆, /jiang1/, 桨/jiang3/, and 酱/jiang4/.

9 Note that definitions of “phonogram regularity” can either include or ignore tonal similarity. This is also true for definitions of “phonological consistency” and “homophone density” (see Chapter 3).

10 The distinction between addressed vs. assembled phonology also provides a natural account of the two most common types of dyslexia or reading-specific impairment: phonological dyslexia is an impairment in the capacity to use assembled phonology whereas surface dyslexia is an impairment in the capacity to use addressed phonology (see Castles & Coltheart, Reference Castles and Coltheart1993). More will be said about dyslexia in the reading of Chinese in Chapter 5.

Figure 0

Figure 2.1 A map of China showing the main dialects and where they are spoken

Figure 1

Figure 2.2 An example illustrating the four tones used to differentiate the meanings of the spoken syllable /ma/ in MandarinThe arrows show each tone’s change in the pitch contour as might be measured using an oscilloscope.

Figure 2

Figure 2.3 A few examples of the earliest form of Chinese writingInscriptions that have been found on turtle shells, animal bones, and bronze vessels that were often used for oracles.These examples were extracted from www.zdic.net.

Figure 3

Figure 2.4 A few examples showing how early pictographs changed into their modern character equivalents during the evolution of the Chinese writing systemThese examples were extracted from www.zdic.net.

Figure 4

Figure 2.5 A few examples illustrating the similarities and differences between traditional and simplified Chinese characters

Figure 5

Figure 2.6 An example sentence written in modern Chinese using simplified characters

Figure 6

Figure 2.7 An example illustrating the eight basic types of strokes and the order in which character strokes are normally writtenNote that the second and third strokes in 永 are compound strokes, which consist of a sequence of basic strokes written in a continuous fashion (i.e., without lifting the pen from the page).

Figure 7

Figure 2.8 Example phonograms of phonetic radical variationThe example phonograms show how phonetic radicals can vary in terms of their: (A) regularity; (B) consistency; and (C) homophone density.

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×