Tracking Meaning Through Variation: Cross-Situational Learning of Tonal Words and Underlying Phonological Rules

01 December 2025, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Cross-situational learning (CSL) enables learners to acquire word meanings by tracking statistical co-occurrences between auditory forms and visual referents across ambiguous situations. While CSL has been shown to support the learning of phonological contrasts and varied lexical categories, little is known about whether it can accommodate systematic phonological alternations, such as tone sandhi in tonal languages, which introduce context-dependent surface variability. The present study investigates whether adult Mandarin speakers can learn (i) sound–referent mappings under tone-sandhi-induced allotonal variation and (ii) the underlying tone sandhi patterns themselves. We constructed an artificial tone language, PseuSH, modeled on Shanghai Wu, containing eight tone sandhi rules spanning leftward tone reduction and rightward tone extension. Eighty Mandarin speakers completed four tasks: an AX discrimination task, a CSL exposure phase pairing disyllabic forms with static or dynamic referents, a two-alternative forced-choice (2AFC) memory test, and a 2AFC generalisation test using novel word combinations. Participants successfully discriminated both tonal and allotonal contrasts. In the CSL memory task, learners demonstrated robust above-chance mapping performance despite surface variability. In the generalisation task, they also showed sensitivity to tone-sandhi-constrained tonal patterns, generalising these alternations to new lexical contexts. These findings show that CSL mechanisms can tolerate context-conditioned tonal variability and support partial abstraction of systematic phonological alternations. The results broaden current accounts of CSL by demonstrating its applicability to more naturalistic, variability-rich tonal systems and offer new insights into how learners extract structure from dynamic speech input.

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.