Hostname: page-component-89b8bd64d-z2ts4 Total loading time: 0 Render date: 2026-05-12T04:24:00.680Z Has data issue: false hasContentIssue false

Linguistic synesthesia detection: Leveraging culturally enriched linguistic features

Published online by Cambridge University Press:  09 September 2024

Qingqing Zhao*
Affiliation:
Institute of Linguistics, Chinese Academy of Social Sciences, Beijing, China
Yunfei Long
Affiliation:
School of Computer Science and Electronic Engineering, University of Essex, Essex, UK
Xiaotong Jiang
Affiliation:
Natural Language Processing Lab, Soochow University, Suzhou, China
Zhongqing Wang
Affiliation:
Natural Language Processing Lab, Soochow University, Suzhou, China
Chu-Ren Huang*
Affiliation:
Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China
Guodong Zhou
Affiliation:
School of Computer Science and Technology, Soochow University, Suzhou, China
*
Corresponding authors: Qingqing Zhao; Email: zhaoqq@cass.org.cn; Chu-Ren Huang; Email: churen.huang@polyu.edu.hk
Corresponding authors: Qingqing Zhao; Email: zhaoqq@cass.org.cn; Chu-Ren Huang; Email: churen.huang@polyu.edu.hk
Rights & Permissions [Opens in a new window]

Abstract

Linguistic synesthesia as a productive figurative language usage has received little attention in the field of Natural Language Processing (NLP). Although linguistic synesthesia is similar to metaphor concerning involving conceptual mappings and showing great usefulness in the NLP tasks such as sentiment analysis and stance detection, the well-studied methods of metaphor detection cannot be applied to the detection of linguistic synesthesia directly. This study incorporates comprehensive linguistic features (i.e., character and radical information, word segmentation information, and part-of-speech tagging) into a neural model to detect linguistic synesthetic usages in a sentence automatically. In particular, we employ a span-based boundary detection model to extract sensory words. In addition, a joint model is proposed to detect the original and synesthetic modalities of the sensory words collectively. Based on the experiments, our model is shown to achieve state-of-the-art results on the dataset for linguistic synesthesia detection. The results prove that leveraging culturally enriched linguistic features and joint learning are effective in linguistic synesthesia detection. Furthermore, as the proposed model leverages non-language-specific linguistic features, the model would be applied to the detection of linguistic synesthesia in other languages.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. A hierarchical model for linguistic synesthesia (Williams 1976, see p. 463).

Figure 1

Figure 2. Transfer directionalities of linguistic synesthesia based on Mandarin corpus data (Zhao et al. 2019, see p. 9).

Figure 2

Figure 3. The procedure for dataset acquisition.

Figure 3

Table 1. Inter-annotator agreements for annotation of linguistic synesthesia

Figure 4

Table 2. Data distribution of the five sensory modalities in synesthetic and original sub-datasets

Figure 5

Figure 4. An example of annotation of linguistic synesthesia in Chinese.

Figure 6

Figure 5. The architecture of our proposed methods.

Figure 7

Figure 6. An example for two sub-tasks: sensory word extraction and joint sensory modality detection.

Figure 8

Table 3. An example for representation of linguistic featuresh

Figure 9

Figure 7. An example of linguistic synesthesia detection.

Figure 10

Table 4. The results of sensory word extraction

Figure 11

Table 5. The results of original modality detection, with F1 (weighted F1) calculated by taking the mean of all per-class F1 scores while considering the weight of each class

Figure 12

Table 6. The results of synesthetic modality detection, with F1 (weighted F1) calculated by taking the mean of all per-class F1 scores while considering the weight of each class

Figure 13

Table 7. The results of our proposed model with the sub-set of testing data with respect to the original modality, where “(Num.)” means the number of data from one original modality to one synesthetic modality

Figure 14

Figure 8. Influence of the size of the training data.