Hostname: page-component-89b8bd64d-mmrw7 Total loading time: 0 Render date: 2026-05-13T12:59:24.276Z Has data issue: false hasContentIssue false

Code-switching in text and speech challenges information-theoretic speaker design

Published online by Cambridge University Press:  27 April 2026

Debasmita Bhattacharya*
Affiliation:
Columbia University in the City of New York, USA
Marten van Schijndel
Affiliation:
Cornell University, USA
*
Corresponding author: Debasmita Bhattacharya; Email: db3526@columbia.edu
Rights & Permissions [Opens in a new window]

Abstract

In this work, we use language modeling to investigate the factors that influence insertional code-switching. Code-switching occurs when a speaker alternates between one language variety (the primary language) and another (the secondary language), and is widely observed in multilingual contexts. Recent work has shown that code-switching is often correlated with areas of low predictability in the primary language, but it is unclear whether low primary language predictability only makes the secondary language relatively easier to produce at code-switching points – that is, purely speaker-driven code-switching – or whether code-switching is additionally used by speakers for other purposes, for instance to signal the need for greater attention on the part of listeners. In this paper, we use bilingual Chinese–English online forum posts and transcripts of spontaneous Chinese–English speech to replicate prior findings that low primary language (Chinese) predictability is correlated with insertional switches to the secondary language (English). We then demonstrate that the predictability of the English productions is even lower than that of meaning-equivalent Chinese alternatives, and these are therefore not easier to produce, rejecting the purely speaker-driven theory of code-switching in both writing and speech.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press
Figure 0

Table 1. Schematic illustrating the shorthand terminology used throughout the remainder of this paper on (1) original code-switched sentences, (2) code-switched sentences that have been fully translated to Chinese and (3) control non-code-switched sentences in the written dataset.

Figure 1

Figure 1. Comparing surprisal of CS1 words in code-switched (CS) and non-code-switched (Non-CS) sentences to Non-CS words in Non-CS sentences. Boxplot whiskers extend to 1.5x the interquartile range.

Figure 2

Table 2. Summary of the logistic regression model (R2 = 0.862) for CS1 (coded 1) versus random Non-CS1 (coded 0) in the dataset of written code-switches.

Figure 3

Figure 2. Comparing CS1 in English and the monolingual (ML) English vocabulary across (a) word length, (b) part-of-speech tag distribution, (c) word frequency and (d) surprisal, in writing.

Figure 4

Figure 3. Comparing normalized CS1 surprisal in English to Chinese in writing.

Figure 5

Table 3. Summary of the logistic regression model (R2 = 0.290) for CS1 (coded 1) versus random Non-CS1 (coded 0) in the dataset of spoken code-switches.

Figure 6

Figure 4. Comparing CS1 in English and the monolingual (ML) English vocabulary across (a) word length, (b) part-of-speech tag distribution, (c) word frequency and (d) surprisal, in speech.

Figure 7

Figure 5. Comparing normalized CS1 surprisal in English to Chinese in speech.

Supplementary material: File

Bhattacharya and van Schijndel supplementary material 1

Bhattacharya and van Schijndel supplementary material
Download Bhattacharya and van Schijndel supplementary material 1(File)
File 257.6 KB
Supplementary material: File

Bhattacharya and van Schijndel supplementary material 2

Bhattacharya and van Schijndel supplementary material
Download Bhattacharya and van Schijndel supplementary material 2(File)
File 428.3 KB