Hostname: page-component-89b8bd64d-46n74 Total loading time: 0 Render date: 2026-05-08T05:59:39.890Z Has data issue: false hasContentIssue false

What ratings and corpus data reveal about the vividness of Mandarin ABB words

Published online by Cambridge University Press:  23 April 2024

Thomas Van Hoey
Affiliation:
Department of Linguistics, KU Leuven, Leuven, Belgium
Xiaoyu Yu
Affiliation:
Department of Linguistics, The University of Hong Kong, Hong Kong, SAR of China
Tung-Le Pan
Affiliation:
Graduate Institute of Linguistics, National Taiwan University, Taiwan
Youngah Do*
Affiliation:
Department of Linguistics, The University of Hong Kong, Hong Kong, SAR of China
*
Corresponding author: Youngah Do; Email: youngah@hku.hk
Rights & Permissions [Opens in a new window]

Abstract

A well-known method of studying iconic words is through the collection of subjective ratings. We collected such ratings regarding familiarity, iconicity, imagery/imageability, concreteness, sensory experience rating (SER), valence and arousal for Mandarin ABB words. This is a type of phrasal compound consisting of a prosaic syllable A and a reduplicated BB part, resulting in a vivid phrasal compound, for example, wù-mángmáng 雾茫茫 ‘completely foggy’. The correlations between the newly collected ABB ratings are contrasted with two other sets of prosaic word ratings, demonstrating that variables that characterize ABB words in an absolute sense may not play a distinctive role when contrasted with other types of words. Next, we provide another angle for looking at ABB words, by investigating to what degree rating data converges with corpus data. By far, the variable that characterizes ABB items consistently throughout these case studies is their high score for imageability, showing that they are indeed rightfully characterized as vivid. Methodologically, we show that it pays off to not take rating data at face value but to contrast it with other comparable datasets of a different phenomenon or data about the same phenomenon compiled in an ontologically different manner.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Table 1. Stratified sample of mean familiarity ratings, sorted from most familiar to least familiar

Figure 1

Table 2. Number of participants, their intra-class correlation coefficient (ICC) and the mean times a stimulus was rated

Figure 2

Table 3. Descriptive statistics for mean item ratings

Figure 3

Figure 1. Plotting the mean and standard deviation of each rating against each other. The dashed vertical line indicates the mean of each rating set. The dotted line at (x = 3) indicates the midway point of rating. Shoe horse curves appear for familiarity, concreteness, imagery, SER, and iconicity, but not for arousal or valence.

Figure 4

Figure 2. Pairwise correlation plot of all ratings. The lower end shows scatterplots and a fitted linear model. The diagonal shows histograms, which display the distribution. The upper end shows the result of Pearson correlation tests, with asterisks indicating the level of significance.

Figure 5

Table 4. ABB words versus prosaic words (set of Yao et al., 2017)

Figure 6

Table 5. Prosaic words (set of Song & Li, 2021) versus ABB words

Figure 7

Table 6. Summary of corpus material

Figure 8

Table 7. ABB words occurring across all types of corpus data, with their raw token frequencies

Figure 9

Figure 3. Pairwise correlation plot of all corpus-based measures. The lower triangle shows scatterplots and a fitted linear model. The diagonal shows histograms, which display the distribution. The upper triangle shows the result of Pearson correlation tests, with asterisks indicating the level of significance.

Figure 10

Figure 4. Scree plot of the principal components analysis based on the 13 transformed variables (ratings and corpus).

Figure 11

Figure 5. Contributions of variables to the first and second PCA dimensions. The red line indicates the expected contribution if the variables’ contributions were equal.

Figure 12

Figure 6. ABB items plotted on the PCA space for Dimension 1 and Dimension 2. A density plot (green lines) shows the concentration of items in different bands. The colored rectangles indicate three clusters of interest. Highlighted items are those from Table 1 and Table 7. Versions with transcriptions and English translations are provided in the OSF repository.