Hostname: page-component-89b8bd64d-nlwjb Total loading time: 0 Render date: 2026-05-13T10:29:14.336Z Has data issue: false hasContentIssue false

Using large corpus data for the reconstruction of cross-cultural concept differences: happiness and joy in West Slavic languages and in English

Published online by Cambridge University Press:  11 May 2026

Lucie Saicová Římalová*
Affiliation:
Faculty of Arts, Institute of Czech Language and Theory of Communication, Charles University , Czech Republic
Rights & Permissions [Opens in a new window]

Abstract

Happiness is a complex concept that has been intensively researched from many perspectives, but the linguistic aspects of this phenomenon are still under-researched. Using corpus-based analysis of semantically similar words (word embedding), the author studies lexical units denoting happiness and joy in three West Slavic languages (Polish, Czech, Slovak) and compares them with the corresponding lexical units in English. The results show that despite the mutual linguistic and non-linguistic ties, the Polish, Czech and Slovak understanding of happiness exhibits not only similarities (e.g. the relationship between happiness and joy and the outward orientation of joy) but also significant differences (e.g. the different value of the component ‘luck’ in happiness, a different relationship between joy, sadness and fear, and cross-cultural differences related to religion). The results also highlight similarities and differences between West Slavic languages and English. In addition to this, the study tests the advantages and limitations of the word-embedding analysis for the analysis of concepts and their culturally specific features. The author believes that the method is useful because it offers new insights into the analysed data, but it also requires human oversight and careful interpretation.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press
Figure 0

Table 1. The Aranea corpora used in the research, their size and web links to further information (all information retrieved April 23, 2025)

Figure 1

Table 2. The first 25 semantically most similar words to the English words happiness, luck and joy according to SEMÄ, their sin θ and absolute frequency (ordered according to sin θ)

Figure 2

Table 3. The first 25 semantically most similar words to the Polish words szczęście ʻhappinessʼ, radość ʻjoyʼ according to SEMÄ, their basic translation into English, their sin θ and absolute frequency (ordered according to sin θ)

Figure 3

Table 4. The first 25 semantically most similar words to the Czech words štěstí ʻhappinessʼ, radost ʻjoyʼ according to SEMÄ, their basic translation into English, their sin θ and absolute frequency (ordered according to sin θ)

Figure 4

Table 5. The first 25 semantically most similar words to the Slovak words šťastie ʻhappinessʼ, radosť ʻjoyʼ according to SEMÄ, their basic translation into English, their sin θ and absolute frequency (ordered according to sin θ)