Aging and distributional tone learning: the role of pitch memory in older adults’ discrimination of mandarin lexical tones

Yin-To Chui; Susu Lai; Quentin Zhen Qin

doi:10.1017/langcog.2025.10035

Aging and distributional tone learning: the role of pitch memory in older adults’ discrimination of mandarin lexical tones

Published online by Cambridge University Press: 15 October 2025

Yin-To Chui ,

Susu Lai and

Quentin Zhen Qin

Show author details

Yin-To Chui: Affiliation:
Speech, Learning, and the Brain (SLaB) Lab, Division of Humanities, The Hong Kong University of Science and Technology , Kowloon, Hong Kong
Susu Lai: Affiliation:
Speech, Learning, and the Brain (SLaB) Lab, Division of Humanities, The Hong Kong University of Science and Technology , Kowloon, Hong Kong
Quentin Zhen Qin*: Affiliation:
Speech, Learning, and the Brain (SLaB) Lab, Division of Humanities, The Hong Kong University of Science and Technology , Kowloon, Hong Kong Center for Aging Science, The Hong Kong University of Science and Technology , Kowloon, Hong Kong
*: Corresponding author: Quentin Zhen Qin; Email: hmzqin@ust.hk

Article contents

Abstract
Introduction
The present study
Methods
Results
Discussion
Data availability statement
Competing interests
Footnotes
References

Rights & Permissions

Abstract

Distributional learning enables listeners to form phonetic categories by extracting statistical regularities from speech input. Younger Cantonese speakers can acquire the Mandarin level-falling (T1–T4) contrast through distributional learning, with bimodal exposure facilitating category formation and unimodal exposure suppressing it, and with fine-grained pitch sensitivity predicting success. However, aging is associated with declines in pitch sensitivity and phonetic boundary formation, which may disrupt this process. This study examined whether Cantonese-speaking older adults exhibit distributional learning of Mandarin T1–T4 and whether individual cognitive factors predict learning success. Sixty-four participants completed a pretest–training–posttest procedure with bimodal or unimodal exposure. While older adults improved in tone discrimination, no group differences emerged. Further analysis showed that those with lower pitch-related auditory memory failed to learn from unimodal input. On the other hand, fine-grained pitch perception abilities did not predict learning outcomes. These results suggest that older adults may rely on alternative learning mechanisms, such as memory-based strategies, when exposed to ambiguous input distributions. The findings indicate a shift from perceptual encoding to memory-driven processing in aging and highlight the limits of passive statistical learning in older adulthood.

Keywords

distributional learning lexical tone perception mandarin level-falling tones older adults pitch memory

Information

Type: Article
Information: Language and Cognition , Volume 17 , 2025 , e80

DOI: https://doi.org/10.1017/langcog.2025.10035 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press

1. Introduction

Speech perception involves categorizing continuous acoustic input into discrete phonological categories, a process shaped by linguistic experience. Native language (L1) exposure refines phonetic categories from infancy, making the perceptual learning of non-native (L2) speech contrasts in adulthood particularly challenging due to interference from pre-existing phonetic categories (Best et al., Reference Best, McRoberts and Goodell2001; Flege, Reference Flege and Strange1995; Van Leussen & Escudero, Reference Van Leussen and Escudero2015). One well-documented mechanism that facilitates phonetic learning is distributional learning, in which listeners extract statistical regularities from input distributions to form new phonetic categories (Maye et al., Reference Maye, Werker and Gerken2002), an ability that adds to a large repertoire of statistical language learning capabilities demonstrated by infants, children and adults (Newport, Reference Newport2016). Exposure to a bimodal distribution, which contains two distinct frequency peaks corresponding to the target categories, enhances category formation. In contrast, exposure to a unimodal distribution, which has a single peak centered on an ambiguous token, tends to suppress category formation. As a result, participants trained with a bimodal distribution typically show better posttraining discrimination of the endpoint sounds than those trained with a unimodal distribution. This performance divergence is known as a ‘typical’ distributional learning effect (e.g., Escudero et al., Reference Escudero, Benders and Wanrooij2011; Maye et al., Reference Maye, Werker and Gerken2002; Ong et al., Reference Ong, Burnham and Escudero2015).

Distributional learning has been widely studied in both infants and adults, and has been extended to various linguistic contrasts, including segmental distinctions such as vowels (Escudero et al., Reference Escudero, Benders and Wanrooij2011) and suprasegmental contrasts such as lexical tones (Liu et al., Reference Liu, Yuan, Ong, Tuninetti, Antoniou, Cutler and Escudero2022; Ong et al., Reference Ong, Burnham and Escudero2015, Reference Ong, Burnham, Escudero and Stevens2017). However, while distributional learning is well-documented in younger adults, its effectiveness in older adults remains largely unexplored, despite known age-related declines in auditory perception and cognitive processing (e.g., Neger et al., Reference Neger, Rietveld and Janse2014; Shen et al., Reference Shen, Wright and Souza2016). This study will look at the distributional learning of the Mandarin level-falling (T1–T4) tone contrast by Cantonese-speaking older adults. Cantonese speakers struggle with the Mandarin T1–T4 contrast, despite speaking a tonal language, even more than non-tonal language speakers (Hao, Reference Hao2012). A previous study has shown that L1-Cantonese younger adults are able to learn the contrast through distributional training (Chui & Qin, Reference Chui and Qin2024b). This study will investigate whether Cantonese-speaking older adults can learn the contrast in a similar way, and will explore individual differences governing successful distributional learning in aging.

1.1. Distributional learning of lexical tones

Following its success with segmental contrasts, the distributional learning paradigm has also been extended to the learning of suprasegmental features like lexical tones. Research in this area has shown that learning effectiveness is not uniform, but is instead modulated by a listener’s prior linguistic experience. Specifically, the success of tonal distributional learning often depends on the listener’s L1 background (i.e., whether they are a tonal or non-tonal language speaker; Ong et al., Reference Ong, Burnham and Escudero2015, Reference Ong, Burnham, Escudero and Stevens2017) and the specific perceptual mapping patterns between the L2 tones and the listener’s L1 tonal categories (Chui & Qin, Reference Chui and Qin2024b).

Ong et al. (Reference Ong, Burnham and Escudero2015) provided one of the first studies that demonstrated the efficacy of distributional learning for L2 lexical tones. They recruited L1-English speakers with no prior exposure to lexical tones, and trained them on the discrimination of a Thai tone pair (mid-level versus falling). Results showed that participants from the bimodal distributional training group exhibited better post-training discrimination than participants from the unimodal group when the tones are embedded in a novel test syllable, highlighting the sensitivity of non-tonal language speakers to statistical patterns in pitch to learn L2 tones (see also Liu et al., Reference Liu, Yuan, Ong, Tuninetti, Antoniou, Cutler and Escudero2022 for a study on the distributional learning of Mandarin tones). Ong et al. (Reference Ong, Burnham, Escudero and Stevens2017) further examined whether linguistic experience with lexical tones would enhance distributional learning. They recruited L1 Mandarin speakers on the same distributional learning task, and found a larger distributional learning effect than L1 English speakers, showing that tonal language speakers may extract and generalize distributional patterns of L2 tones more effectively than non-tonal language speakers.

More recently, Chui and Qin (Reference Chui and Qin2024b) extended these findings to test the generalizability of the distributional training regime to different tonal stimuli. The study examined young adult Cantonese-speaking learners’ (i.e., tonal language speakers) distributional learning of two Mandarin tone contrasts: T1–T4 (high-level versus falling) and T1–T2 (high-level versus rising). This was designed to test whether different L1-L2 perceptual mapping patterns for the non-native tones may have differential effects on distributional learning effectiveness. Specifically, Mandarin T1–T4 are mapped and perceived as variants of a single Cantonese tone (high-level) for L1-Cantonese speakers, while Mandarin T1–T2 are mapped and perceived as two different Cantonese tones (high-level and rising; Hao, Reference Hao2012). According to the Perceptual Assimilation Model and its later extension to L2 learning (Best & Tyler, Reference Best, Tyler, Bohn and Munro2008), if two L2 sounds are mapped (i.e., perceptually assimilated) to a single L1 category, learners cannot depend on their single category representation for effective discrimination, and training may be needed to separate them; on the other hand, if two L2 sounds are mapped to two different L1 categories, the pre-existing two-category representation is enough for effective discrimination and no learning is needed. Indeed, results showed that the bimodal group outperformed the unimodal group in the discrimination test only for the Mandarin T1–T4 pair. Specifically, for T1–T4, the bimodal group had a rising performance curve and improved their discrimination after training, while the unimodal group had a flat performance curve, and neither improved nor worsened after training. This is in contrast with T1–T2, where both groups maintained their performance from pretest to posttest. This shows that only T1–T4 was trainable through distributional exposure Footnote ¹.

Given age-related declines in auditory processing, cognitive function and statistical learning abilities (e.g., Neger et al., Reference Neger, Rietveld and Janse2014; Shen et al., Reference Shen, Wright and Souza2016; though see Veríssimo et al., Reference Veríssimo, Verhaeghen and Goldman2022 for evidence of improvement in some aspects of cognitive abilities such as executive control), it remains an open question whether older adults can acquire new tonal categories via distributional exposure in a manner comparable to younger learners. If older adults show reduced sensitivity to statistical cues, this would suggest that distributional learning mechanisms deteriorate with age. Alternatively, if bimodal exposure still facilitates category formation (and unimodal still suppresses it), this would indicate that statistical learning remains robust across the lifespan. Investigating distributional learning in older adults is therefore crucial for understanding the lifespan limits of implicit phonetic learning mechanisms and whether alternative learning strategies become necessary with age.

1.2. Individual differences in the distributional learning of lexical tones

Lexical tone learning is subject to substantial individual variability, with research highlighting a range of cognitive, perceptual and experiential factors that shape learning success (e.g., Laméris & Post, Reference Laméris and Post2023; Laméris et al., Reference Laméris, Llompart and Post2024; also see Kim et al., Reference Kim, Clayards and Kong2020 for a study on segmental learning/adaptation). Pitch perception ability has been widely recognized as a critical determinant, as individuals with heightened sensitivity to pitch differences tend to perform better in tone perception and learning tasks (Perrachione et al., Reference Perrachione, Lee, Ha and Wong2011). This ability is also multifaceted. For a contour tone language like Mandarin, successful learning depends not just on perceiving static pitch height, but crucially on perceiving dynamic pitch direction. For example, Chandrasekaran et al. (Reference Chandrasekaran, Sampath and Wong2010) demonstrated that good learners of Mandarin are distinguished by their pre-existing tendency to weight the pitch direction cue more heavily than poor learners. This highlights the importance of assessing sensitivity to pitch contours specifically. Other factors like musical training have also been linked to enhanced lexical tone learning, as musicians tend to have heightened sensitivity to pitch variations (e.g., Lee & Hung, Reference Lee and Hung2008; Wong & Perrachione, Reference Wong and Perrachione2007), though the benefits of musical experience for tonal language speakers remain debated because of possible ‘saturation effects’, where their exposure to pitch in their L1 already provides sufficient pitch-processing experience and makes additional advantages from musical training negligible (e.g., Cooper & Wang, Reference Cooper and Wang2012; Laméris & Post, Reference Laméris and Post2023). This debate between the role of fine-grained pitch perception and musicality highlights the importance of distinguishing between different types of predictors. For example, in a study with young adult learners, Bowles et al. (Reference Bowles, Chang and Karuzis2016) found that pitch-specific perceptual abilities were stronger predictors of Mandarin tone learning than more general musicality measures (and other L2 aptitude and general cognitive measures). Their work supports the principle that the most effective predictors are those that tap into abilities most closely related to the target linguistic skill. In the general learning literature, there have also been findings of complex interactions between individual differences factors (e.g., musical aptitude and working memory) that modulate learning success (e.g., Li et al., Reference Li, Zhang, Baills and Prieto2024; Ong & Chan, Reference Ong and Chan2023). The relative importance of domain-general perceptual skills versus existing linguistic knowledge is a key area of investigation. For example, in a recent study on Mandarin tone perception, Zhou and Veríssimo (Reference Zhou and Veríssimo2025) found that learners’ success was predicted by their domain-general pitch acuity, while their lexical knowledge (i.e., vocabulary size) played no significant role. This suggests that for challenging phonetic contrasts, fundamental auditory processing abilities can be more critical than a learner’s existing L2 vocabulary.

Chui and Qin (Reference Chui and Qin2024a) investigated individual differences predictors of distributional tone learning, and examined whether fine-grained pitch perception abilities, memory capacities and vocabulary size of the target language may modulate the effectiveness of distributional training of Mandarin tones by L1-Cantonese speakers. Given that distributional tone training is selectively effective towards non-native tonal stimuli with specific L2-to-L1 mapping patterns (i.e., Mandarin T1–T4 is trainable while Mandarin T1–T2 is not; Chui & Qin, Reference Chui and Qin2024b), the study only investigated the effect of individual differences predictors for the Mandarin T1–T4 pair. Fine-grained pitch perception was measured using two tasks: a contour-based subtest of the Pitch Threshold Task which measures lower level pitch discrimination (Zhang et al., Reference Zhang, Ho, Shao, Ou and Law2021), and pitch-based subtests of the Montreal Battery of Evaluation of Amusia which measure higher level melodic discrimination (MBEA; Peretz et al., Reference Peretz, Champod and Hyde2003). The former was designed to test the just-noticeable difference of pitch between two pitch excursions, while the latter was designed to test discrimination of two melodic sequences that minimally differ from each other by a single note in pitch. Memory capacity was measured using an automated version of a complex memory span task called Operation Span (Unsworth et al., Reference Unsworth, Heitz, Schrock and Engle2005), designed to measure working memory requiring the simultaneous manipulation of arithmetic and the temporary mental storage of English letters. Target language vocabulary size was measured using a Mandarin version of the Peabody Picture Vocabulary Test (Tsoi et al., Reference Tsoi, Yang, Chan and Kidd2019), designed to quantify a learner’s receptive vocabulary size using a sample of progressively more difficult vocabulary items. Results indicated that training-induced performance change was only predicted by fine-grained pitch perception (specifically, by learners’ individual pitch thresholds)Footnote ². It was found that lower pitch threshold (or higher pitch aptitude) predicted improvement by the bimodal group. In other words, learners with better fine-grained pitch perception were able to learn from the bimodal tonal distribution. Presumably, participants with higher pitch aptitude were able to track more accurately the small step-wise pitch changes between each distinct token of the tonal continuum, and the better encoding facilitated more accurate construction of the two categories necessary to discriminate between the two Mandarin tones. This finding may imply an additional challenge for older adult distributional tone learning, as their pitch-related sensitivity and perception has been shown to decline with age (Shen et al., Reference Shen, Wright and Souza2016; Yang et al., Reference Yang, Wang, Xu, Zhang, Xu and Liu2015), which means that accurate pitch encoding as a potential prerequisite for successful distributional tone learning may be compromised in aging.

1.3. Aging and lexical tone perception and learning

Aging weakens pitch perception across different levels – from lower level pitch pattern recognition to higher level categorical perception and learning of lexical tones. First, at the lower level of basic auditory processing, older adults experience a decline in pitch pattern representation (Shen et al., Reference Shen, Wright and Souza2016; Yang et al., Reference Yang, Wang, Xu, Zhang, Xu and Liu2015). For example, Shen et al. (Reference Shen, Wright and Souza2016) investigated pitch perception in older adults using synthesized vowel stimuli. Results showed that older adults performed worse on dynamic pitch discrimination than on static pitch discrimination, and the perception deficit was more severe when pitch changes were gradual rather than abrupt, suggesting that older adults struggle more with detecting subtle pitch changes. At the higher level, there is some evidence that aging affects the ability to map auditory patterns into phonological categories. For lexical tones, Wang et al. (Reference Wang, Yang, Zhang, Xu, Xu and Liu2017) investigated categorical perception of Mandarin tones, and found that older adults exhibited shallower identification slopes and poorer discrimination compared to younger adults. Similarly, Yang et al. (Reference Yang, Wang, Xu, Zhang, Xu and Liu2015) investigated older adults’ identification performance of Mandarin monosyllabic tone words, and found that older adults performed significantly worse than younger adults on tone identification accuracy. This suggests that older adults have less defined phonetic boundaries across tonal categories.

While younger and older adults exhibit differential patterns of pitch and tonal perception, it may also be the case that the two groups exhibit differential patterns in tonal learning, especially when it comes to cognitive predictors of successful learning. In the general L2 learning literature, younger and older adulthood are influenced by distinct cognitive predictors. While younger adults benefit more from perceptual acuity, which allows for efficient phonetic encoding and rapid speech processing, older adults may rely more on crystallized cognitive skills to compensate for declines in perceptual abilities when learning a new language. In particular, it has been shown that memory plays a significant role in older learners’ language learning success (Fong et al., Reference Fong, Ma, Chui, Law, Hui, Au and Wang2022; Nilsson et al., Reference Nilsson, Berggren, Garzón, Lebedev and Lövdén2021). A recent study on older adult lexical tone learning shares similar findings (Ingvalson et al., Reference Ingvalson, Nowicki, Zong and Wong2017). In the study, older adults were tested on their ability to learn lexical tones in an artificial word-learning task, where they associated level, falling and rising pitch contours with meaning. Participants underwent 8 days of training and had to generalize their learning to untrained talkers. Unlike younger adults, whose learning is linked to phonetic aptitude and pitch perception, older adults’ performance was only predicted by declarative memory capacity. Further analysis revealed that older adults with higher declarative memory capacity made significantly fewer tone-related errors, demonstrating that memory retrieval and association formation played a key role in their learning success. These results suggest that older adults may depend more on explicit memory-based encoding and retrieval strategies when acquiring novel phonetic categories.

The above findings suggest that, as perceptual abilities decline, older adults compensate by leveraging memory-based learning strategies. In the context of the current study, it is similarly possible that Cantonese-speaking older adults rely on memory-related abilities to acquire the Mandarin T1–T4 contrast through distributional exposure. In other words, while Chui and Qin (Reference Chui and Qin2024a) found pitch aptitude modulates younger learners’ improvement through distributional training, older adults may depend on memory encoding instead.

2. The present study

While previous work has established that younger Cantonese-speaking adults can successfully acquire non-native tonal contrasts through distributional exposure, the extent to which this learning mechanism operates similarly in older adults remains unclear. Given age-related changes in auditory processing and tonal learning mechanisms (e.g., Ingvalson et al., Reference Ingvalson, Nowicki, Zong and Wong2017; Shen et al., Reference Shen, Wright and Souza2016), it is also essential to examine whether the same individual differences factors predict learning outcomes in this population.

Participants were exposed to a bimodal or unimodal distribution of tonal stimuli derived from a Mandarin level-falling tone contrast (T1–T4). To ensure a consistent comparison with the study on younger adults (Chui & Qin, Reference Chui and Qin2024a, Reference Chui and Qin2024b), we opted to replicate the individual difference measurements for older adults. That is, as in the younger adult study, we measured participants’ pitch perception abilities using two different tasks – (1) a Pitch Threshold task assessing participants’ just-noticeable difference in pitch contours, and (2) pitch-based subtests of a melodic discrimination task (i.e., the Montreal Battery of Evaluation of Amusia). Participants were also tested on a Mandarin version of the Peabody Picture Vocabulary Test for their Mandarin vocabulary size. Importantly, there are two differences between the current study and the younger adult study, which are both related to our choice of tasks that measure memory. First, while we measured younger adults’ working memory through the Operation Span task, which is a complex span task that requires participants to manipulate arithmetic and store English letters simultaneously, this task may be too difficult for older adults – in one variation of Operation Span, older adults average 2.07 out of 5, 1 SD below younger adults’ average performance (Zeintl & Kliegel, Reference Zeintl and Kliegel2007). Instead, we opted for a backward digit memory span task. Second, because of the convergent literature on the role of memory as a predictor for older adult language learning and tonal learning, we also included a pitch-based short-term memory test designed to test domain-specific auditory memory related to pitch processing.

This study aims to answer two research questions. First, do Cantonese-speaking older adults exhibit distributional learning of the Mandarin T1–T4 (level-falling) contrast? Two hypotheses are possible. If distributional learning is preserved throughout the lifespan, we may see that older adults demonstrate similar patterns to younger adults and they could learn from distributional exposure. Specifically, in such a case, the bimodal group may be expected to improve their discrimination of T1–T4 after training, leading to a rising performance curve, while the unimodal training group may be expected to suppress their discrimination of T1–T4, leading to a flat performance curve with no improvement. On the other hand, given that aging is associated with a decline in lower level pitch pattern representations and the categoricalness of higher level tonal representations (Shen et al., Reference Shen, Wright and Souza2016; Wang et al., Reference Wang, Yang, Zhang, Xu, Xu and Liu2017; Yang et al., Reference Yang, Wang, Xu, Zhang, Xu and Liu2015), as well as a decline in statistical learning mechanisms (Neger et al., Reference Neger, Rietveld and Janse2014), distributional input may not be effective in training the perception of the contrast. The second research question is: which individual cognitive factors (e.g., pitch threshold and melodic discrimination measures, pitch-based and digit-based memory measures) predict successful distributional learning of Mandarin T1–T4 in older adults? Given evidence that older adults experience declines in fine-grained pitch discrimination, as well as evidence for the recruitment of memory mechanisms for tonal category learning (Ingvalson et al., Reference Ingvalson, Nowicki, Zong and Wong2017), we predict that pitch memory and working memory – rather than pitch perception measures – may emerge as stronger predictors of learning outcomes, reflecting a greater reliance on memory-based encoding rather than perceptual sensitivity.

3. Methods

3.1. Participants

Sixty-nine Cantonese-speaking older adults were initially recruited for the experiment. All participants were native speakers of Hong Kong Cantonese with no self-reported neurological disorders. Inclusion criteria required participants to have less than 6 months of Mandarin training and no prior residence in Mandarin-speaking regions exceeding 1 year. They were also required to have no formal musical training exceeding 3 years (Chang et al., Reference Chang, Hedberg and Wang2016). Participants were screened for age-related cognitive decline using the Cantonese version of MoCA, which has been validated in Chinese older adults in Hong Kong (Yeung et al., Reference Yeung, Wong, Chan, Leung and Yung2014), and those scoring below 20 were excluded to ensure comparable cognitive baselines. Pure-tone audiometry (PTA) screening ensured hearing thresholds ≤40 dB across 250–1000 Hz (40 dB as the critical threshold is comparable to other phonetic perception studies for older adults, e.g., Kalaivanan et al., Reference Kalaivanan, Wong, Wong and Chan2023).

Sixty-four participants remained after exclusion [18 men, mean age = 62.9 years, SD = 4.8 years], who were randomly (and equally) assigned to the bimodal distributional training group and the unimodal distributional training group respectively [bimodal group: 9 men, mean age = 63.2 years, SD = 5.3 years; unimodal group: 9 men, mean age = 62.6 years, SD = 4.4 years]. To further ensure that our participants have limited Mandarin proficiency in addition to the inclusion criteria of less than 6 months of Mandarin training, we asked participants to complete a Language History questionnaire and extracted Mandarin proficiency metrics for each participant (LHQ3; Li et al., Reference Li, Zhang, Yu and Zhao2020). The proficiency score is based on the weighted sum of the participant’s self-rating of their Mandarin proficiency levels, and the overall proficiency score is 0.467 (i.e., around 3 on a 7-point Likert scale; bimodal group proficiency = 0.489, SD = 0.122; unimodal group proficiency = 0.455, SD = 0.132)Footnote ³. As we will later show, the two groups are also matched in terms of an objective measure of Mandarin vocabulary size (see Table 1 below). Ethical approval was obtained from the institutional review board, and all participants provided written informed consent.

Table 1. Means and SDs of participants’ biographical information and cognitive battery variables, with Bayes factor tests (i.e., BF₁₀ ≤ 0.33) revealing no differences between the two groups in any of these measures

Note: Each entry follows the format of mean (SD). For Backwards Digit Span, BF₁₀ = 0.37 indicates anecdotal evidence for no group difference (i.e., evidence for no difference is 2.7 times stronger than evidence for a difference).

3.2. Stimuli

3.2.1. Distributional training

The stimuli for this study were the same as those from Chui and Qin (Reference Chui and Qin2024a, Reference Chui and Qin2024b), designed to investigate the distributional learning of Mandarin Tone 1 (high-level) and Tone 4 (falling) using a synthetic eight-step tonal continuum (Chui & Qin, Reference Chui and Qin2024b). This continuum was generated in Praat with the ProsodyPro script (Xu, Reference Xu2013). Natural productions of the monosyllable nua in T1 and T4 by a female native Mandarin speaker served as the endpoints of the continuum. Ten equally spaced points (0%, 10%, … 90%) along the pitch contours of these natural tones were identified to create ‘interpolation points’. These points were used to interpolate six intermediate steps between Steps 1 and 8. The stimuli were normalized for intensity (65 dB SPL) and duration (matched to the mean duration of the natural tones). Female production of nua was selected because it ensured a low-difficulty syllable presented for training. This is because (a) nua has a sonorant onset, which provides clear tonal information, and (b) a female speaker has a naturally higher pitch range compared to a male speaker. Figure 1 illustrates the tonal continuum.

Figure 1. Tone 1–Tone 4 continuum. Dashed line = intermediate tokens. Numbers next to the lines denote step number.

Additionally, 32 sine-wave beeps (440 Hz, 500 ms) were interspersed randomly between tonal stimuli and each required a keyboard response, encouraging participants’ attention and ensured active listening throughout the experiment (Ong et al., Reference Ong, Burnham and Escudero2015).

3.2.2. Discrimination test

The same female speaker from training, as well as one additional male native Mandarin speaker, recorded multiple tokens of novel syllables fao and nua in Mandarin T1 and T4. Two syllables and two speakers were used in the discrimination task to assess participants’ generalization across gender and syllabic contexts beyond the trained female nua context. The use of fao as the untrained syllable was due to three reasons: (a) like nua, it is a novel syllable that minimizes prior exposure effects, (b) fao is acoustically distinct from nua (fao lacks pitch information at syllable onset while nua contains it) and (c) both syllables have comparable durations. Three tokens of each gender–syllable combination were used in the discrimination task (see Procedure).

3.3. Procedure

The experiment procedure is described in Figure 2. After passing the screening tests assessing general cognitive decline and hearing thresholds, participants completed a cognitive battery aimed at measuring individual scores on pitch aptitude, musical aptitude, pitch memory, working memory and Mandarin vocabulary size. Details of each task used are given in the following sections.

Figure 2. Illustration of the experiment procedure. Note: PTA = Pure-Tone Audiometry; MoCA = Montreal Cognitive Assessment.

The experiment then employed a pretest–training–posttest design. Participants completed an ABX tone discrimination task, which tested their ability to differentiate between Mandarin T1–T4. On each trial, participants were presented with two reference tokens (A and B) of the tone pair, followed by a target token (X) taken from another recording. They were instructed to determine whether X matched A or B in terms of tonal category. The test stimuli included all gender–syllable combinations (female nua, female fao, male nua, male fao) to assess the participants’ ability to generalize across untrained contexts. The trials were presented in a randomized order, with an inter-stimulus interval of 1,000 ms. Each test consisted of 64 trials, with four practice trials at the beginning to familiarize participants with the task. No feedback was provided during the test trials to avoid influencing participants’ responses. Participants were required to give their answer within 5 seconds of the presentation of the target token X. Discrimination accuracy (scored as 0 or 1) served as the primary measure.

Participants were matched based on their cognitive battery results and randomly assigned to either the bimodal or unimodal training group (see Table 1 for details). The training involved exposure to a continuum of tonal stimuli (bimodal or unimodal, depending on the group) derived from the nua syllable produced by a female speaker. Participants completed 256 randomized trials, with an addition of 32 sine-wave beeps (440 Hz, 500 ms) interspersed within the tonal trials. Participants were required to respond to each beep within a 750-ms interval to ensure active engagement with the stimuli (Ong et al., Reference Ong, Burnham and Escudero2015). The training phase took approximately 10 minutes to complete. Immediately following the training, participants repeated the ABX discrimination task with the same procedure as the pretest to assess any immediate improvements in discrimination ability.

3.3.1. Pitch aptitude

Pitch aptitude was measured using the Pitch Threshold task (Qin et al., Reference Qin, Jin and Zhang2022; Zhang et al., Reference Zhang, Ho, Shao, Ou and Law2021). It evaluated participants’ sensitivity to semitone differences. Participants were presented with pairs of auditory stimuli consisting of rising-falling or falling-rising glides. The stimuli were generated using pitch manipulations of a male Cantonese speaker’s production of a syllable ji and a complex tone. The task followed a two-alternative forced-choice (2AFC) paradigm, where participants judged the correct pitch movement of the auditory pair (rising-falling versus falling-rising). An adaptive two-down-one-up staircase procedure was employed – pitch differences started at 10 semitones and were dynamically adjusted based on participant performance. Correct responses reduced the pitch difference, and incorrect responses increased it, ensuring convergence toward the participant’s just-noticeable difference threshold. The task concluded after 14 reversals, and the final pitch aptitude score was calculated as the average pitch difference across the last six reversals.

3.3.2. Musical aptitude

Musical aptitude was assessed using the Montreal Battery of Evaluation of Amusia (Peretz et al., Reference Peretz, Champod and Hyde2003), which provides another measure of fine-grained pitch perception by assessing participants’ pitch-related melodic discrimination abilities. Three pitch-related subtests (scale, contour and interval) were administered. Participants listened to 30 musical phrases per subtest, with each phrase played twice, and judged whether the two versions were the same or different. The ‘scale’ subtest altered one note to fall outside the scale, the ‘contour’ subtest modified the pitch direction surrounding a target note, and the ‘interval’ subtest adjusted the semitone distance of a critical pitch while preserving contour and scale. The final score was calculated as the average accuracy across the three subtests (Qin et al., Reference Qin, Zhang and Wang2021).

3.3.3. Pitch memory

Participants’ short-term memory for pitch sequences was measured using the Tone Span task (Williamson & Stewart, Reference Williamson and Stewart2010). Participants were presented with two successive sequences of tones drawn from a set of 10 triangle-waveform tones (frequencies ranging from 262 to 741 Hz in equally tempered whole-tone steps). Each tone in the sequence was 500 ms in duration, with an inter-stimulus interval of 383 ms and an inter-sequence interval of 2 s. The second sequence was either identical to or differed from the first sequence by a single tone reversal. Participants judged whether the sequences were the same or different. A two-up-one-down staircase procedure was employed to adaptively adjust the sequence length, starting at two tones and increasing based on performance. The task ended after eight reversals, and the pitch memory span was calculated as the average sequence length over the last six reversals.

3.3.4. Working memory

Working memory was assessed using the Backwards Digit Span task (Wechsler, Reference Wechsler2012). In this task, participants were presented with a sequence of digits and asked to recall them in reverse order. The sequences varied in length, starting with two digits and ending with a maximum span of nine digits. There are two sequences for each number of digits (i.e., 2 two-digit sequences, 2 three-digit sequences and so on). Participants were required to complete all sequences (16 in total). This task was selected for its suitability for older adults, offering a robust measure of cognitive capacity without the high task-switching demands of complex span tasks like the Operation Span, and has been used in other speech perception studies involving older adults (Neger et al., Reference Neger, Rietveld and Janse2014). Performance was measured as the total number of sequences recalled out of 16.

3.3.5. Mandarin vocabulary size

As noted in a previous section, it was necessary to account for participants’ Mandarin proficiency because of natural exposure to Mandarin in their normal place of residence (i.e., Hong Kong). Although care was taken to exclude participants with extended Mandarin training (i.e., longer than 6 months) or immersion (i.e., longer than a year in a Mandarin-speaking city), and the Language History questionnaire confirmed a low proficiency among our participants, we opted to include a Mandarin vocabulary test as an objective measure of proficiency suitable for administration to adults, that is, the Mandarin version of the Peabody Picture Vocabulary Test (PPVT; Tsoi et al., Reference Tsoi, Yang, Chan and Kidd2019). PPVT has been shown to correlate well with standardized tests of written and spoken language perception and production (De Wilde et al., Reference De Wilde, Brysbaert and Eyckmans2020). During the task, participants were presented with a screen displaying four colored images and an auditory word. They were asked to select the image corresponding to the meaning of the word. The test included 72 items, taken from sets designed for adult participants aged 19 or older. Words ranged from one syllable to five syllables. Each word was presented only once, with no time limit for responding. The final score was calculated as the percentage of correct responses across the 72 items.

3.4. Statistical analysis

Data files, along with analysis scripts, are publicly available at OSF (https://osf.io/28c6p/). This study employed a Bayesian analytical framework to analyze discrimination accuracy, offering several advantages over traditional frequentist approaches. Bayesian models quantify uncertainty and enable direct probabilistic interpretation of parameter estimates. They also facilitate hypothesis testing for null effects, making them particularly useful in the absence of significant group differences (Vasishth et al., Reference Vasishth, Nicenboim, Beckman, Li and Kong2018).

Bayesian logistic regression models were implemented using the brms package in R, interfacing with the probabilistic programming language Stan. The dependent variable was binary discrimination accuracy in the ABX task (‘1’ for correct responses, ‘0’ for incorrect responses). For analyzing group results (i.e., distributional learning effect), models included fixed effects for group (bimodal versus unimodal; deviation coding: −0.5, 0.5), session (pretest versus posttest; deviation coding: −0.5, 0.5) and their interactions. For analyzing individual differences results, models additionally included all individual differences variables entered additively into the model (as continuous measures), along with their interaction with session and group. In particular, Mandarin vocabulary size has been included in the model to account for individual variability in Mandarin proficiency. To ensure the comparability of the effect sizes, all continuous individual difference predictors were converted to z-scores before being entered into the models. All models were fitted with a maximal random effects structure allowed by the experiment design (Vasishth et al., Reference Vasishth, Nicenboim, Beckman, Li and Kong2018). The structure included random intercepts for participants and trials, as well as by-participant random slopes for session, and by-trial random slopes for group and session. Note that since our model specifies accuracy as the dependent variable (instead of improvement, otherwise trial-level data would be lost as improvement metrics are at the level of individual participants instead of trials), the interaction term involving session and distribution (with or without other individual differences predictors) will be our target variable of interest. The existence of the interaction would mean that session effect (i.e., improvement from pretest to posttest) is modulated by training group (i.e., bimodal versus unimodal), meaning that a distributional learning effect exists.

Weakly informative priors were specified to ensure stable and interpretable estimates while reflecting minimal prior assumptions (Ghosh et al., Reference Ghosh, Li and Mitra2018; Vasishth et al., Reference Vasishth, Nicenboim, Beckman, Li and Kong2018). Priors were centered around neutral effects, consistent with a conservative approach for hypothesis testing. Priors were set as Normal (0, 3) for fixed effects, Truncated Normal (0, 0.1) for random effects, Normal (0, 1.5) for intercepts and LKJ(2) for random correlations. Prior predictive checks were conducted to verify that the priors were reasonable and generated plausible data distributions (see Supplementary Figure S1 for a figure of the prior predictive checks). This step ensured that the priors neither overly constrained the model nor allowed for implausible outcomes.

Posterior distributions for model parameters were estimated using Markov Chain Monte Carlo (MCMC) sampling with four chains of 2,000 iterations each, including 1,000 warm-up iterations. Convergence diagnostics, including R-hat statistics (<1.01) and effective sample size, were used to confirm reliable parameter estimates. Key results were summarized as posterior means and 95% credible intervals (CrIs). Hypothesis testing utilized Bayes factors (BF₁₀), comparing models with and without the effects of interest. For example, the Bayes factor for the interaction of group and session can assess whether bimodal and unimodal training groups exhibited differential improvement across sessions. Bayes factors quantify the evidence for the alternative hypothesis (H₁) over the null hypothesis (H₀). Following conventional guidelines (e.g., Kass & Raftery, Reference Kass and Raftery1995), a BF₁₀ > 3 is considered substantial evidence for H₁, a BF₁₀ < 1/3 is substantial evidence for H₀ and values in between are considered ambiguous or anecdotal evidence. This approach can provide robust evidence for or against specific hypotheses while accommodating complex interactions in the data.

4. Results

Table 1 summarizes the cognitive battery results of the participants by group. A total of 64 participants completed the study, with 32 participants assigned to the unimodal group and 32 to the bimodal group. Bayes factor tests were conducted (Hoijtink et al., Reference Hoijtink, Mulder, van Lissa and Gu2019), which showed that the two groups were well-matched in terms of pitch perception thresholds, musical aptitude, pitch memory and working memory. The two groups were also matched in objective Mandarin proficiency, as measured through vocabulary size. This similarity across baseline measures ensured that any observed differences in subsequent results could not be attributed to pre-existing group differences.

4.1. Group results of distributional learning

To assess the effects of group and training on discrimination accuracy, a Bayesian logistic regression model was fit to the data. Fixed effects included group (bimodal versus unimodal), session (pretest versus posttest) and their interaction. Table 2 shows the model output. Results showed that there was a general improvement in discrimination accuracy from pretest to posttest, regardless of training group. Crucially, there was no evidence for an interaction between group and session, meaning that there is no divergence in performance change from pretest to posttest.

Table 2. Model output for the group analysis

Note: The predictors with 95% credible intervals not overlapping with 0 are in bold.

Figure 3 (left panel) shows the (model-based) predicted proportions of correct responses between groups from pretest to posttest; Figure 3 (right panel) shows the posterior distribution for the interaction between group and session. Post hoc pairwise comparisons of discrimination accuracy across sessions using the emmeans package in R confirmed comparable improvements within both training groups. For the bimodal group, discrimination accuracy increased from pretest to posttest, with a mean difference of 0.233 in the logit scale (95% highest posterior density = [0.083, 0.385]). Similarly, the unimodal group demonstrated improvement from pretest to posttest, with a mean difference of 0.178 in the logit scale (95% highest posterior density = [0.027, 0.331]).

Figure 3. (Left panel) Means (circles) and 95% credible intervals (vertical bars) of predicted proportions of correct responses by group and by session. (Right panel) Posterior distribution of the regression coefficient for group × session interaction. Shaded areas show 95% credible intervals.

To further evaluate the evidence for the absence of a group interaction effect, Bayesian hypothesis testing was conducted. Two models were compared: a full model including the interaction term (group × session), and a reduced model without the interaction. The Bayes factor (BF₁₀) was 0.04176, supporting no meaningful interaction between group and session, strongly favoring the null model and confirming comparable improvements across both groups. Furthermore, to ensure that this conclusion is robust to the choice of prior distributions, we conducted a prior sensitivity analysis (Vasishth et al., Reference Vasishth, Nicenboim, Beckman, Li and Kong2018; Zhou & Veríssimo, Reference Zhou and Veríssimo2025), and tested the comparison between the full model and the reduced model under five different priors. Two narrower priors assume smaller effect sizes, while two wider priors allow for larger potential effects. Across all tested priors, the Bayes factor consistently remained below 0.33, providing evidence in favor of the reduced model. This confirms that the observed results are robust across a range of reasonable assumptions about effect size. A figure visualizing the prior sensitivity analysis is provided in Supplementary Figure S2.

Finally, further post hoc analysis eliminated the possibility of a trial effect within the pretest, which mitigates the concern that the overall improvement is because of task familiarization (see Supplementary Figure S3 for the investigation of trial effect; more scrutiny will follow in the Discussion section).

4.2. Individual differences in distributional learning

Given the absence of a group-level difference in performance change, further analyses were conducted to examine the role of individual differences in modulating tone discrimination accuracy and improvement between groups. A Bayesian logistic regression model was fit, and fixed effects included session (pretest versus posttest; deviation coding: −0.5, 0.5), group (bimodal versus unimodal; deviation coding: −0.5, 0.5) and individual differences factors entered additively, along with their interactions with group and session. All continuous individual difference predictors were converted to z-scores before being entered into the models. Table 3 shows the model output.

Table 3. Model output for the individual differences analysis

Note: The predictors with 95% credible intervals not overlapping with 0 are in bold.

Results revealed that pitch threshold and pitch memory were predictors of tone discrimination accuracy in general. Lower pitch thresholds (i.e., higher pitch aptitude) and better pitch memory were associated with improved discrimination accuracy, regardless of group and session. Crucially, the model revealed that pitch memory was the only predictor that interacted with session and group. Specifically, the three-way interaction between session × group × pitch memory showed a credible effect, indicating that pitch memory modulated performance improvements differently between the two training distributions. In contrast, the three-way interactions involving pitch threshold, musical aptitude and working memory did not yield credible effects, even after accounting for individual variation in Mandarin proficiency through including PPVT scores, as their 95% credible intervals overlapped zero. Figure 4 (left panel) shows the (model-based) predicted proportions of correct responses between groups across a range of pitch memory scores in the pretest and the posttest; Figure 4 (right panel) shows the posterior distribution for the interaction between group, session and pitch memory.

Figure 4. (Left panel) Means (line) and 95% credible intervals (shaded area) of model-predicted proportions of correct responses by group and by session. (Right panel) Posterior distribution of the regression coefficient for group × session × pitch memory interaction. Shaded areas show 95% credible intervals.

To clarify the nature of the three-way interaction involving pitch memory, post hoc pairwise comparisons were conducted. In the unimodal group, participants with higher pitch memory scores showed a significant decrease in discrimination accuracy from pretest to posttest (estimate = −0.2637, 95% highest posterior density [−0.510, −0.0097]). This suggests that only participants with higher pitch memory were affected by the ambiguous single-peak distributional structure, which may have placed greater demands on auditory memory, hindering successful discrimination. In the bimodal group, however, no significant improvement was associated with pitch memory (estimate = 0.0317, 95% highest posterior density [−0.112, 0.1918]). This suggests that participants in the bimodal group more or less performed as expected across the board – that is, bimodal training facilitated discrimination regardless of individual pitch memory abilities. To better visualize the improvement trend and provide data at the level of individual participants, we plotted Figure 5 to show participant-level accuracy change against pitch memory scores across training groups.

Figure 5. Accuracy changes (i.e., an illustration based on raw data, but not model-based predictions) as a function of pitch memory score across both distribution groups. Shaded areas show 95% confidence intervals.

In summary, while individual differences in pitch threshold, musical aptitude and working memory did not further modulate performance improvements, pitch memory emerged as a key factor and exerted an effect on the unimodal group only. Combined with the group-level results, the current data suggest that older adults did not exhibit the traditional pattern of distributional learning (i.e., bimodal improvement versus unimodal stagnation), a trend that may be driven by the unexpected performance for low pitch memory participants in the unimodal group, where participants with lower pitch memory scores seemed to resist learning from a distribution that hindered improvement. The findings highlight that pitch memory selectively influenced learning outcomes in the unimodal condition, where the absence of a bimodal stimulus distribution may have necessitated stronger reliance on auditory memory resources.

5. Discussion

The present study investigated whether older Cantonese-speaking adults exhibit distributional learning of the Mandarin T1–T4 contrast and examined individual cognitive predictors of learning success. The results revealed two key findings. First, any predicted pattern of distributional learning was only partially realized. While the bimodal training group improved as expected of successful distributional learning, so did the unimodal training group. It seemed that while older adults might have been successful in learning from the bimodal exposure to facilitate discrimination, unimodal exposure failed to elicit suppression of category formation. Second, our individual differences analysis provided further evidence that the current trend may indeed be driven by the unexpected performance of the unimodal group. Namely, it was found that memory (specifically pitch-related auditory memory) rather than pitch perception measures modulated training-induced performance change for the unimodal group only, indicating that memory-based mechanisms played a role in learning success after unimodal exposure specifically. These results suggest that older adults process tonal distributions differently than younger learners, relying more on memory-based mechanisms rather than pitch acuity in implicit category formation, especially when it came to extracting statistical regularities derived from more ambiguous distributions with lower perceptual salience of target cues. The following sections discuss the implications of these findings in greater detail.

It is informative to compare the current results with Chui and Qin (Reference Chui and Qin2024a, Reference Chui and Qin2024b), where the same distributional learning procedure was administered to Cantonese-speaking younger adults. In the younger adult study, participants showed a distributional learning effect, with the bimodal training group exhibiting improvement while the unimodal training group stagnated in performance. Perhaps a first observation for the current study is that, despite previously reported difficulties in pitch perception and tonal category learning for older adults (Wang et al., Reference Wang, Yang, Zhang, Xu, Xu and Liu2017; Yang et al., Reference Yang, Wang, Xu, Zhang, Xu and Liu2015), especially when stimuli involve dynamic pitch (Shen et al., Reference Shen, Wright and Souza2016), our older participants were still able to refine their perception of the level-falling tone contrast through training. However, in the current study, there was no differential pattern between the bimodal and unimodal groups as both groups improved comparably. Bayesian hypothesis testing was consistent with this result, and demonstrated the robustness of the lack of divergence across a range of prior specifications. The fact that both training groups improved in the current study may initially raise suspicions of task familiarity and increased engagement with the stimuli as a potential explanation. Namely, since the pretest and posttest both involved an ABX discrimination task requiring participants to make tonal judgments across trials, older adults may simply have benefited from repeated exposure to the task format, as familiarity may have reduced cognitive load and enhances efficiency in decision-making. However, we addressed this in our statistical analysis (Supplementary Figure S3) by showing that there was no trial effect for either group in the pretest – there was no improvement from the first half to the second half of the baseline discrimination task, but there was a marked improvement for both groups from the pretest and posttest. In other words, the improvement did not come from familiarization of the discrimination test, but indeed came from the training. How exactly did the unimodal distribution elicit improvement for our older participants?

Since successful statistical extraction of unimodal input necessarily implies the formation of one single category, which hinders improvement (Maye et al., Reference Maye, Werker and Gerken2002), the current improvement trend likely suggests that older adults in the unimodal group might not have extracted such statistical information, and instead engaged in an alternative learning process. One possibility is that they may have become more sensitive to tonal variability through structured training, which helped them gradually refine their ability to discriminate the contrast independent of statistical distributional structure. This improvement may reflect a shift to what Chládková and Šimáčková (Reference Chládková and Šimáčková2021) refer to as an ‘auditory mode’ of processing. In a similar study, they found that learners without prior categorical experience for a contrast also improved after unimodal exposure. They proposed that these listeners, rather than attempting to form categories, focus on the fine-grained, continuous acoustic differences within the stimuli. Applied to our study, this would mean our older adult learners improved their discrimination by becoming more sensitive to the raw tonal variations through repeated exposure, a process that is independent of learning the statistical distribution itself. Specifically, since unimodal training, despite its specific distributional shape, still exposed participants to multiple steps from the T1–T4 continuum in a repeated manner, this might have prompted them to compare stimuli across exposures. Over time, this process may have helped participants establish more stable internal reference points for differentiating between the tones, allowing for incremental improvements in discrimination ability. In other words, even when no category boundaries could reliably be drawn from a unimodal distribution, the range of tonal variations may have encouraged learners to refine their perception of pitch movement, which may have led to improved discrimination. Crucially, this does not mean they have tracked the distribution shape accurately; rather, it suggests that their perception of the contrast became more stable over time as they became increasingly familiar with tonal variation.

We have identified a possible alternative learning mechanism for our older adult participants, namely steady exposure to a distribution of tokens may have been enough to elicit improvement for older adults. But a question remained: could the improvement of the bimodal group be plausibly argued to have come from employing alternative strategies of structured training as well? Our individual differences analysis may shed some light on the matter, and may favor the interpretation that any alternative learning method may indeed be unimodal-specific. This is because our individual differences results also pointed to the unimodal group as the source of variation in posttraining performance, specifically that pitch memory (and only pitch memory) modulated improvement for the unimodal group only (while the bimodal group elicited improvement regardless of different pitch and memory abilities). For the unimodal group, learners with high pitch memory performed worse after unimodal training, while learners with low pitch memory improved after unimodal training. This means that a subset of the participants in the unimodal training group did learn from the distribution which hindered subsequent discrimination, but this only happened to participants who are strong in pitch memory. This finding aligns with previous studies which found that memory predicted older adults’ learning of tone words (Ingvalson et al., Reference Ingvalson, Nowicki, Zong and Wong2017) and other general language learning scenarios by older adults (Fong et al., Reference Fong, Ma, Chui, Law, Hui, Au and Wang2022; Nilsson et al., Reference Nilsson, Berggren, Garzón, Lebedev and Lövdén2021). Since this modulating effect of memory has not been reported in the distributional learning literature specifically, we attempt to give a first hypothesis as to how this might occur in our context of Mandarin T1–T4 distributional learning. Unlike fine-grained pitch perception, which involves moment-to-moment encoding of subtle acoustic differences, pitch memory supports longer term retention and comparison of auditory information (Gaab et al., Reference Gaab, Gaser, Zaehle, Jancke and Schlaug2003). We propose that participants with high pitch memory were more effective at storing and recalling the acoustic exemplars from the training, allowing them to accurately infer the underlying statistical structure. In the unimodal condition, this meant successfully learning that the input centered on a single ambiguous peak. Per distributional learning theory (Maye et al., Reference Maye, Werker and Gerken2002), this successful learning of a one-category distribution then hindered their posttest discrimination ability. In contrast, participants with low pitch memory were likely unable to track and retain the frequency distribution of the unimodal input. As a result, they were not misled by the statistical cue and instead appear to have relied on an alternative learning mechanism – perhaps establishing more refined internal reference points through repeated exposure to tonal variation – which led to their improved discrimination. This interpretation explains why better pitch memory was associated with less desirable learning outcomes in the unimodal group.

Perhaps one further question is why such a pattern is observed for the unimodal group only and why pitch memory did not exert an influence on the bimodal group. One plausible explanation involves the salience of cues inherent in unimodal versus bimodal distributions. In bimodal distributions, the presence of two distinct peaks provides clear statistical structures, making category boundaries more perceptible. This high cue salience may have allowed learners to form phonetic categories with relative ease, reducing reliance on memory-based strategies. Consequently, individual differences in pitch memory have a diminished impact on learning outcomes in bimodal conditions, leading to more uniform performance among participants. Conversely, unimodal distributions lack clear category boundaries, resulting in lower cue salience due to it being a gradual frequency distribution of sounds without distinct separable peaks. Learners may have relied more heavily on memory-based strategies to discern patterns within this ambiguous input. Participants with lower pitch memory may struggle to track and retain these subtle variations, leading to greater variability in learning outcomes within the unimodal group.

Finally, it is also worth noting the other individual differences measures that were not found to be predictors of successful distributional learning. First is working memory, as operationalized by the accuracy in a Backwards Digit Span task. This shows that a domain-specific memory measure may have a larger effect than a domain-general one (see similar findings in Laméris et al., Reference Laméris, Llompart and Post2024). Specifically, the pitch memory task required domain-specific auditory retention, directly assessing participants’ ability to store and compare pitch sequences over time. In contrast, the digit span task was domain-general, assessing verbal working memory capacity rather than auditory-specific memory mechanisms. Given that the task required discriminating pitch trajectories, the domain-specific nature of pitch memory may have been more directly relevant. Second, fine-grained pitch perception was not found to modulate distributional learning success, in contrast to Chui and Qin (Reference Chui and Qin2024a, Reference Chui and Qin2024b). Perhaps this was because of an overall decline in pitch sensitivity (see Supplementary Figure S4 for a comparison of melodic discrimination task performance between younger and older adults). As older adults experience general degradation in pitch representation and categorical perception of tones, it may be more difficult to use fine-grained pitch perception to improve their tone discrimination. This may have less of an impact on the bimodal group, where the relatively higher perceptual salience of the critical cues (at the ends of the tonal continuum) still made it possible for older adults to distinguish and rely on the distributional information. For the unimodal group, however, the difficulty and lower salience of critical cues (between two consecutive steps at the middle of the tonal continuum) may have forced older adults to rely on higher level memory-based strategies rather than direct acoustic sensitivity, employing it as a compensatory mechanism to perform the distributional learning task.

In conclusion, our results revealed that, unlike younger learners, older adults showed complex and only partially realized patterns of distributional learning. While the bimodal group performed as expected of successful distributional learning, the unimodal group also improved at a similar rate, and may have employed alternative strategies to facilitate subsequent discrimination that was independent of distributional structure. Both the group-level and individual-level results support this claim. Specifically, instead of leveraging statistical distributional cues at the group level, only older adults high in domain-specific pitch memory capacity were able to encode, retain and utilize the statistical regularities present in the training stimuli, especially when it came to more ambiguous distributions with lower perceptual salience of target cues. These findings suggest that older adults do not rely on statistical learning in the same way as younger learners, but those with stronger pitch-related auditory memory are better able to integrate distributional information into their perceptual adjustments, highlighting a shift from perceptual sensitivity to memory-based compensatory strategies in older adults. The findings have direct implications for second-language phonetic training in older learners, suggesting that training paradigms should be adapted to engage memory-based learning mechanisms rather than relying solely on passive statistical exposure.

Acknowledgments

This research was supported by the Hong Kong PhD Fellowship Scheme, awarded to Y.-T.C., and the Seed Funding from the Center for Aging Science at the Hong Kong University of Science and Technology, awarded to Q.Z.Q. Portions of this work have been presented at the 16th Annual Meeting of the Society for the Neurobiology of Language (SNL 2024).

Supplementary material

The supplementary material for this article can be found at http://doi.org/10.1017/langcog.2025.10035.

Data availability statement

The data that support the findings of this study are publicly available at OSF (https://osf.io/28c6p/).

Competing interests

The authors declare none.

Footnotes

¹ The study involved both an immediate post-training test and a delayed post-sleep test. It was found that distributional learning effect was largest in the delayed test.

² The study also found that working memory was a significant predictor modulating bimodal/unimodal groups’ performance for overnight consolidation.

³ As is common for this population, all participants were also L2 speakers of English. Their English proficiency was also measured using LHQ3 (Li et al., Reference Li, Zhang, Yu and Zhao2020), yielding a mean proficiency score of 0.565 (SD = 0.141). To account for any potential confounding effects of bilingualism, we included this proficiency score as a control predictor in our statistical models. It did not emerge as a significant predictor and its inclusion did not alter the pattern of results reported in the manuscript.

References

Best, C. T., McRoberts, G. W., & Goodell, E. (2001). Discrimination of non-native consonant contrasts varying in perceptual assimilation to the listener’s native phonological system. The Journal of the Acoustical Society of America, 109(2), 775–794. https://doi.org/10.1121/1.1332378.CrossRef Google Scholar

Best, C. T., & Tyler, M. D. (2008). Nonnative and second-language speech perception: Commonalities and complementarities. In Bohn, O.-S. & Munro, M. J. (Eds.), Language experience in second language speech learning: In honor of James Emil Flege (pp. 13–34). John Benjamins Publishing Company. https://doi.org/10.1075/lllt.17.07bes.Google Scholar

Bowles, A. R., Chang, C. B., & Karuzis, V. P. (2016). Pitch ability as an aptitude for tone learning. Language Learning, 66(4), 774–808. https://doi.org/10.1111/lang.12159.CrossRef Google Scholar

Chandrasekaran, B., Sampath, P. D., & Wong, P. C. (2010). Individual variability in cue-weighting and lexical tone learning. The Journal of the Acoustical Society of America, 128(1), 456–465. https://doi.org/10.1121/1.3445785.CrossRef Google Scholar PubMed

Chang, D., Hedberg, N., & Wang, Y. (2016). Effects of musical and linguistic experience on categorization of lexical and melodic tones. The Journal of the Acoustical Society of America, 139(5), 2432–2447. https://doi.org/10.1121/1.4947497.CrossRef Google Scholar PubMed

Chládková, K., & Šimáčková, Š. (2021). Distributional learning of speech sounds: An exploratory study into the effects of prior language experience. Language Learning, 71(1), 131–161.CrossRef Google Scholar

Chui, Y.-T., & Qin, Q. Z. (2024a). Individual differences in the distributional learning and overnight consolidation of the mandarin level-falling tone contrast. The Journal of the Acoustical Society of America, 156(6), 4256–4268. https://doi.org/10.1121/10.0034717.CrossRef Google Scholar

Chui, Y.-T., & Qin, Z. (2024b). Distributional learning and overnight consolidation of nonnative tonal contrasts by tonal language speakers. Journal of Speech, Language, and Hearing Research, 67(7), 2038–2052. https://doi.org/10.1044/2024_JSLHR-23-00711.CrossRef Google Scholar

Cooper, A., & Wang, Y. (2012). The influence of linguistic and musical experience on Cantonese word learning. The Journal of the Acoustical Society of America, 131(6), 4756–4769. https://doi.org/10.1121/1.4714355.CrossRef Google Scholar PubMed

De Wilde, V., Brysbaert, M., & Eyckmans, J. (2020). Learning English through out-of-school exposure: How do word-related variables and proficiency influence receptive vocabulary learning? Language Learning, 70(2), 349–381. https://doi.org/10.1111/lang.12380.CrossRef Google Scholar

Escudero, P., Benders, T., & Wanrooij, K. (2011). Enhanced bimodal distributions facilitate the learning of second language vowels. The Journal of the Acoustical Society of America, 130(4), EL206–EL212. https://doi.org/10.1121/1.3629144.CrossRef Google Scholar PubMed

Flege, J. E. (1995). Second language speech learning: Theory, findings and problems. In Strange, W. (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 233–277). York Press.Google Scholar

Fong, M. C. M., Ma, M. K. H., Chui, J. Y. T., Law, T. S. T., Hui, N. Y., Au, A., & Wang, W. S. (2022). Foreign language learning in older adults: Anatomical and cognitive markers of vocabulary learning success. Frontiers in Human Neuroscience, 16, 787413.CrossRef Google Scholar PubMed

Gaab, N., Gaser, C., Zaehle, T., Jancke, L., & Schlaug, G. (2003). Functional anatomy of pitch memory—An fMRI study with sparse temporal sampling. NeuroImage, 19(4), 1417–1426. https://doi.org/10.1016/s1053-8119(03)00224-6.CrossRef Google Scholar PubMed

Ghosh, J., Li, Y., & Mitra, R. (2018). On the use of Cauchy prior distributions for Bayesian logistic regression. Bayesian Analysis, 13(2), 359–383. https://doi.org/10.1214/17-BA1051.CrossRef Google Scholar

Hao, Y.-C. (2012). Second language acquisition of mandarin Chinese tones by tonal and non-tonal language speakers. Journal of Phonetics, 40(2), 269–279. https://doi.org/10.1016/j.wocn.2011.11.001.CrossRef Google Scholar

Hoijtink, H., Mulder, J., van Lissa, C., & Gu, X. (2019). A tutorial on testing hypotheses using the Bayes factor. Psychological Methods, 24(5), 539–556. https://doi.org/10.1037/met0000201.CrossRef Google Scholar PubMed

Ingvalson, E. M., Nowicki, C., Zong, A., & Wong, P. C. M. (2017). Non-native speech learning in older adults. Frontiers in Psychology, 8. https://doi.org/10.3389/fpsyg.2017.00148.CrossRef Google Scholar PubMed

Kalaivanan, K., Wong, P. C. M., Wong, F. C. K., & Chan, A. H. D. (2023). Native language perceptual sensitivity predicts nonnative speech perception differently in younger and older Singaporean bilinguals. Journal of Speech, Language, and Hearing Research: JSLHR, 66(3), 987–1017. https://doi.org/10.1044/2022_JSLHR-22-00199.CrossRef Google Scholar PubMed

Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.CrossRef Google Scholar

Kim, D., Clayards, M., & Kong, E.-J. (2020). Individual differences in perceptual adaptation to unfamiliar phonetic categories. Journal of Phonetics, 81, 100984. https://doi.org/10.1016/j.wocn.2020.100984.CrossRef Google Scholar

Laméris, T. J., Llompart, M., & Post, B. (2024). Non-native tone categorization and word learning across a spectrum of L1 tonal statuses. Bilingualism: Language and Cognition, 27(4), 729–743. https://doi.org/10.1017/S1366728923000871.CrossRef Google Scholar

Laméris, T. J., & Post, B. (2023). The combined effects of L1-specific and extralinguistic factors on individual performance in a tone categorization and word identification task by English-L1 and mandarin-L1 speakers. Second Language Research, 39(3), 833–871. https://doi.org/10.1177/02676583221090068.CrossRef Google Scholar

Lee, C.-Y., & Hung, T.-H. (2008). Identification of mandarin tones by English-speaking musicians and nonmusicians. The Journal of the Acoustical Society of America, 124(5), 3235–3248. https://doi.org/10.1121/1.2990713.CrossRef Google Scholar PubMed

Li, P., Zhang, F., Yu, A., & Zhao, X. (2020). Language history questionnaire (LHQ3): An enhanced tool for assessing multilingual experience. Bilingualism: Language and Cognition, 23(5), 938–944. https://doi.org/10.1017/S1366728918001153.CrossRef Google Scholar

Li, P., Zhang, Y., Baills, F., & Prieto, P. (2024). Musical perception skills predict speech imitation skills: Differences between speakers of tone and intonation languages. Language and Cognition, 16(3), 647–665. https://doi.org/10.1017/langcog.2023.52.CrossRef Google Scholar

Liu, L., Yuan, C., Ong, J. H., Tuninetti, A., Antoniou, M., Cutler, A., & Escudero, P. (2022). Learning to perceive non-native tones via distributional training: Effects of task and acoustic Cue weighting. Brain Sciences, 12(5), Article 5. https://doi.org/10.3390/brainsci12050559Google Scholar PubMed

Maye, J., Werker, J. F., & Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition, 82(3), B101–B111. https://doi.org/10.1016/S0010-0277(01)00157-3.CrossRef Google Scholar PubMed

Neger, T. M., Rietveld, T., & Janse, E. (2014). Relationship between perceptual learning in speech and statistical learning in younger and older adults. Frontiers in Human Neuroscience, 8. https://doi.org/10.3389/fnhum.2014.00628.CrossRef Google Scholar PubMed

Newport, E. L. (2016). Statistical language learning: Computational, maturational, and linguistic constraints. Language and Cognition, 8(3), 447–461.10.1017/langcog.2016.20CrossRef Google Scholar PubMed

Nilsson, J., Berggren, R., Garzón, B., Lebedev, A. V., & Lövdén, M. (2021). Second language learning in older adults: Effects on brain structure and predictors of learning success. Frontiers in Aging Neuroscience, 13, 666851.10.3389/fnagi.2021.666851CrossRef Google Scholar PubMed

Ong, J. H., Burnham, D., & Escudero, P. (2015). Distributional learning of lexical tones: A comparison of attended vs. unattended listening. PloS One, 10(7), e0133446. https://doi.org/10.1371/journal.pone.0133446.CrossRef Google Scholar

Ong, J. H., Burnham, D., Escudero, P., & Stevens, C. J. (2017). Effect of linguistic and musical experience on distributional learning of nonnative lexical tones. Journal of Speech, Language, and Hearing Research, 60(10), 2769–2780. https://doi.org/10.1044/2016_JSLHR-S-16-0080.CrossRef Google Scholar PubMed

Ong, J. H., & Chan, A. H. (2023). Working memory modulates the effect of music on word learning. Language and Cognition, 15(1), 131–147.CrossRef Google Scholar

Peretz, I., Champod, A. S., & Hyde, K. (2003). Varieties of musical disorders. The Montreal battery of evaluation of Amusia. Annals of the New York Academy of Sciences, 999, 58–75. https://doi.org/10.1196/annals.1284.006.CrossRef Google Scholar PubMed

Perrachione, T. K., Lee, J., Ha, L. Y. Y., & Wong, P. C. M. (2011). Learning a novel phonological contrast depends on interactions between individual differences and training paradigm design. The Journal of the Acoustical Society of America, 130(1), 461–472. https://doi.org/10.1121/1.3593366.CrossRef Google Scholar PubMed

Qin, Z., Jin, R., & Zhang, C. (2022). The effects of training variability and pitch aptitude on the overnight consolidation of lexical tones. Journal of Speech, Language, and Hearing Research, 65(9), 3377–3391. https://doi.org/10.1044/2022_JSLHR-22-00058.CrossRef Google Scholar PubMed

Qin, Z., Zhang, C., & Wang, W. S. (2021). The effect of mandarin listeners’ musical and pitch aptitude on perceptual learning of Cantonese level-tones. The Journal of the Acoustical Society of America, 149(1), 435–446. https://doi.org/10.1121/10.0003330.CrossRef Google Scholar PubMed

Shen, J., Wright, R., & Souza, P. E. (2016). On older listeners’ ability to perceive dynamic pitch. Journal of Speech, Language, and Hearing Research, 59(3), 572–582. https://doi.org/10.1044/2015_JSLHR-H-15-0228.CrossRef Google Scholar PubMed

Tsoi, E. Y. L., Yang, W., Chan, A., & Kidd, E. (2019). Mandarin–English speaking bilingual and mandarin speaking monolingual children’s comprehension of relative clauses. Applied PsychoLinguistics, 40(4), 933–964. https://doi.org/10.1017/S0142716419000079.CrossRef Google Scholar

Unsworth, N., Heitz, R. P., Schrock, J. C., & Engle, R. W. (2005). An automated version of the operation span task. Behavior Research Methods, 37(3), 498–505. https://doi.org/10.3758/BF03192720.CrossRef Google Scholar PubMed

Van Leussen, J.-W., & Escudero, P. (2015). Learning to perceive and recognize a second language: The L2LP model revised. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.01000.CrossRef Google Scholar PubMed

Vasishth, S., Nicenboim, B., Beckman, M. E., Li, F., & Kong, E. J. (2018). Bayesian data analysis in the phonetic sciences: A tutorial introduction. Journal of Phonetics, 71, 147–161. https://doi.org/10.1016/j.wocn.2018.07.008.CrossRef Google Scholar

Veríssimo, J., Verhaeghen, P., Goldman, N., et al. (2022). Evidence that ageing yields improvements as well as declines across attention and executive functions. Nature Human Behaviour, 6, 97–110.CrossRef Google Scholar PubMed

Wang, Y., Yang, X., Zhang, H., Xu, L., Xu, C., & Liu, C. (2017). Aging effect on categorical perception of mandarin tones 2 and 3 and thresholds of pitch contour discrimination. American Journal of Audiology, 26(1), 18–26. https://doi.org/10.1044/2016_AJA-16-0020.CrossRef Google Scholar PubMed

Wechsler, D. (2012). Wechsler Adult Intelligence Scale—Fourth Edition [Dataset]. https://doi.org/10.1037/t15169-000CrossRef Google Scholar

Williamson, V. J., & Stewart, L. (2010). Memory for pitch in congenital amusia: Beyond a fine-grained pitch discrimination problem. Memory, 18(6), 657–669. https://doi.org/10.1080/09658211.2010.501339.CrossRef Google Scholar PubMed

Wong, P. C. M., & Perrachione, T. K. (2007). Learning pitch patterns in lexical identification by native English-speaking adults. Applied PsychoLinguistics, 28(4), 565–585. https://doi.org/10.1017/S0142716407070312.CrossRef Google Scholar

Xu, Y. (2013). ProsodyPro—A tool for large-scale systematic prosody analysis [proceedings paper]. An Interspeech 2013 satellite event. In: Tools and resources for the analysis of speech prosody (pp. 7–10). Laboratoire Parole et Langage, France: Aix-En-Provence. https://discovery.ucl.ac.uk/id/eprint/1406070/Google Scholar

Yang, X., Wang, Y., Xu, L., Zhang, H., Xu, C., & Liu, C. (2015). Aging effect on mandarin Chinese vowel and tone identification. The Journal of the Acoustical Society of America, 138(4), EL411–EL416. https://doi.org/10.1121/1.4933234.CrossRef Google Scholar PubMed

Yeung, P. Y., Wong, L. L., Chan, C. C., Leung, J. L. M., & Yung, C. Y. (2014). A validation study of the Hong Kong version of Montreal cognitive assessment (HK-MoCA) in Chinese older adults in Hong Kong. Hong Kong Medical Journal = Xianggang Yi Xue Za Zhi, 20(6), 504–510. https://doi.org/10.12809/hkmj144219.Google Scholar

Zeintl, M., & Kliegel, M. (2007). The role of inhibitory control in age-related operation span performance. European Journal of Ageing, 4(4), 213–217. https://doi.org/10.1007/s10433-007-0066-0.CrossRef Google Scholar PubMed

Zhang, C., Ho, O.-Y., Shao, J., Ou, J., & Law, S.-P. (2021). Dissociation of tone merger and congenital Amusia in Hong Kong Cantonese. PLoS One, 16(7), e0253982. https://doi.org/10.1371/journal.pone.0253982.CrossRef Google Scholar PubMed

Zhou, C., & Veríssimo, J. (2025). L2 difficulties in the perception of mandarin tones: Phonological universals or domain-general aptitude? (pp. 1–15). Bilingualism: Language and Cognition. https://doi.org/10.1017/S1366728925100114.Google Scholar

Table 1. Means and SDs of participants’ biographical information and cognitive battery variables, with Bayes factor tests (i.e., BF10 ≤ 0.33) revealing no differences between the two groups in any of these measures

Figure 1. Tone 1–Tone 4 continuum. Dashed line = intermediate tokens. Numbers next to the lines denote step number.

Figure 2. Illustration of the experiment procedure. Note: PTA = Pure-Tone Audiometry; MoCA = Montreal Cognitive Assessment.

Table 2. Model output for the group analysis

Table 3. Model output for the individual differences analysis

Chui et al. supplementary material

File 293.3 KB

Article contents

Aging and distributional tone learning: the role of pitch memory in older adults’ discrimination of mandarin lexical tones

Abstract

Keywords

Information

1. Introduction

1.1. Distributional learning of lexical tones

1.2. Individual differences in the distributional learning of lexical tones

1.3. Aging and lexical tone perception and learning

2. The present study

3. Methods

3.1. Participants

3.2. Stimuli

3.2.1. Distributional training

3.2.2. Discrimination test

3.3. Procedure

3.3.1. Pitch aptitude

3.3.2. Musical aptitude

3.3.3. Pitch memory

3.3.4. Working memory

3.3.5. Mandarin vocabulary size

3.4. Statistical analysis

4. Results

4.1. Group results of distributional learning

4.2. Individual differences in distributional learning

5. Discussion

Acknowledgments

Supplementary material

Data availability statement

Competing interests

Footnotes

References

Chui et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests