Hostname: page-component-5db58dd55d-h5th4 Total loading time: 0 Render date: 2026-06-02T05:13:48.974Z Has data issue: false hasContentIssue false

Learning from the Input: A Corpus-Based Investigation of Chinese Classifiers in Children’s Books and Child-Directed Speech

Published online by Cambridge University Press:  06 April 2026

Jinyu Shi*
Affiliation:
Department of Experimental Psychology, University of Oxford , UK
Yaling Hsiao
Affiliation:
School of Psychology and Clinical Language Sciences, University of Reading , UK
Yifan Yang
Affiliation:
Department of Experimental Psychology, University of Oxford , UK
Elizabeth Wonnacott
Affiliation:
Department of Education, University of Oxford , UK
Kate Nation
Affiliation:
Department of Experimental Psychology, University of Oxford , UK
*
Corresponding author: Jinyu Shi; Email: jinyu.shi@psy.ox.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

In Mandarin Chinese, numeral classifiers form a grammatical category that is syntactically obligatory when a noun is modified by a numeral or a demonstrative. The appropriate choice of a classifier is associated with the semantic properties of its corresponding noun and is context dependent. Experience with language is needed to learn these patterns, but little is known about how classifiers are structured in children’s language environments. We compared the frequency and distribution of classifier phrases in four corpora: child-directed speech, children’s television shows, children’s books, and adult-directed speech. Classifier usage in children’s books was more diverse than in both child-directed and adult speech. Books contained more specific classifiers that co-occurred with a higher proportion of unique nouns, whereas everyday speech relied on more generic classifiers. Books therefore provide access to classifier–noun combinations that are rare in speech. Implications for language development and language processing are discussed.

摘要

摘要

汉语数量名结构中, 量词的选择与其搭配名词的语义特征关系密切且具有语境依赖性。然而, 儿童如何从语言环境中习得数量名结构, 目前学界对此知之甚少。为此, 本研究考察了儿童导向言语(又称儿向语)、儿童电视节目、儿童读物和成人导向言语四类语料中量词短语的使用频率与分布特征, 以探讨不同类型输入对儿童量词系统习得的潜在影响。结果显示, 儿童读物中量词的多样性显著高于儿童导向言语和成人导向言语, 且其中低频量词的占比更高(如“股”“缕”等), 这类低频量词还与数量占比更高的独特名词搭配使用;相较之下, 日常言语中使用较多的则是高频通用量词(如“个”“只”等)。上述结果表明, 儿童读物可以为儿童提供日常言语中出现频率较低甚至完全缺失的量名搭配。这一发现凸显了儿童读物在儿童语言输入和语言发展中的独特作用。本文最后进一步讨论了阅读对语言习得与语言加工的潜在影响。

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press
Figure 0

Figure 1. An overview of the extraction process with our regular expression algorithm and exemplar sentences.

Figure 1

Table 1. Precision, recall, and F1 values of classifiers, nouns, and classifier–noun pairings using automated extraction compared to manual coding in each corpus. Number of identified cases is shown in parentheses below each corpus name (machine identified/manual coding). Raw number of errors shown in parentheses below precision and recall (parsing error, algorithm error)

Figure 2

Figure 2. Frequency of classifier phrases (a) and diversity of classifiers (b) in adult-directed speech (ADS), child-directed speech (CDS), children’s media, and children’s books. The labels show raw count in (a) and type frequency in (b).

Figure 3

Figure 3. Frequency count of the 30 most frequent classifiers in each corpus, colour-coded based on how many of the corpora (one to four) the classifier featured within the top 30. The cumulative percentage of classifier observations out of all classifier phrases is shown in the labels on the right. The dotted lines indicate the classifiers accounted for 50% (red) or 75% (green) of classifier phrases. See text for more description.

Figure 4

Figure 4. The top four classifiers that co-occurred with the highest number of unique noun types in each corpus. Raw counts of unique nouns are shown above the bars. Each bar is divided into nouns that only appeared with the corresponding classifier (light blue) and nouns that can occur with other classifiers (dark blue). Percentages inside the bars indicate the proportion of nouns that co-occurred with other classifiers.

Figure 5

Table 2. Expected cumulative number of classifier–noun pairs and unique classifiers and classifier–noun pairs to which children are exposed annually by the number of readings per week. The number of unique classifiers and classifier–noun pairs that are present or absent in child-directed speech is shown in parentheses. All values are rounded to the closest integers. aNever is mathematically represented as 0.11 times per week, 1–2 is represented by 1.5 books per week, 3–5 is represented as 4 books per week, daily is represented as 7 books per week, and multiple books per day is represented as 35 books per week

Supplementary material: File

Shi et al. supplementary material

Shi et al. supplementary material
Download Shi et al. supplementary material(File)
File 215.9 KB