The Alignment between Language Properties and Computational Algorithms Enhances Statistical Word Segmentation: Evidence from Korean Child-Directed Speech

Jun Ho Chai; Seongmin Mun; Eon-Suk Ko

doi:10.1017/S0305000926100646

The Alignment between Language Properties and Computational Algorithms Enhances Statistical Word Segmentation: Evidence from Korean Child-Directed Speech

Published online by Cambridge University Press: 29 April 2026

Jun Ho Chai

Seongmin Mun and

Eon-Suk Ko

Show author details

Jun Ho Chai: Affiliation:
Chosun University , Gwangju, Republic of Korea
Seongmin Mun: Affiliation:
Chosun University , Gwangju, Republic of Korea
Eon-Suk Ko*: Affiliation:
Chosun University , Gwangju, Republic of Korea
*: Corresponding author: Eon-Suk Ko; Email: eonsukko@chosun.ac.kr

Article contents

Abstract
Introduction
Methods
Results
Discussion
Conclusion
Data availability statement
Funding statement
Competing interests
Footnotes
References

Rights & Permissions

Abstract

This study investigates whether child-directed speech (CDS) exhibits enhanced segmentability compared to adult-directed speech (ADS) and explores how specific linguistic properties of each register influence computational word segmentation performance in Korean. Employing a speaker-matched corpus of naturalistic Korean CDS and ADS, we observed that Korean CDS features shorter utterances and words, lower lexical diversity, fewer hapax legomena and interjections, a greater proportion of onomatopoeia and word play, a higher frequency of one-word utterances, and lower lexical ambiguity than ADS. Computational algorithms revealed significantly higher word segmentation F-scores for CDS than ADS, suggesting that child-oriented linguistic adaptations in CDS facilitate segmentation. This observation is further supported by statistical modelling, which indicates that the enhanced segmentability in CDS is modulated by the linguistic properties of the register. We discuss the nuanced roles of these properties in shaping the performance of segmentation algorithms.

초록

본 연구는 한국어 아동 대상 발화 (Child-Directed Speech, CDS) 가 성인 대상 발화(Adult-Directed Speech, ADS) 에 비해 향상된 단어 분절 가능성 (segmentability) 을 보이는지 검토하고, 각 발화 유형의 특정 언어적 속성이 단어 분절 성능에 미치는 영향을 계산적으로 탐구하였다. 동일 화자가 발화한 자연스러운 한국어 CDS 및 ADS 코퍼스를 분석한 결과, 한국어 CDS 는 ADS 에 비해 발화 및 단어 길이가 짧고 어휘 다양성이 낮으며, 단일 출현 어휘 (hapax legomena) 와 감탄사의 빈도가 적은 것으로 나타났다. 반면 의성어와 말놀이의 비중, 한 단어 발화의 비율이 크며, 어휘 모호성은 낮게 관찰되었다. 계산 알고리즘 분석 결과, CDS 의 단어 분절 F-점수 (F-score) 는 ADS 보다 유의미하게 높았으며, 이는 CDS 의 아동 지향적 언어적 조정이 단어 분절 과정을 촉진함을 시사한다. 이러한 관찰은 통계적 모형 분석을 통해서도 지지되었는데, CDS 에서 나타나는 향상된 분절 가능성이 해당 발화 유형의 언어적 특성에 의해 조절됨을 확인하였다. 본 연구는 이러한 언어 특성들이 분절 알고리즘의 성능 형성에 미치는 복합적이고 미묘한 역할에 대해 논의한다.

Keywords

word segmentation child-directed speech statistical learning

Information

Type: Research Article
Information: Journal of Child Language , First View , pp. 1 - 27

DOI: https://doi.org/10.1017/S0305000926100646 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2026. Published by Cambridge University Press

1. Introduction

Child-directed speech (CDS), characterised by simplified sentences (Newport et al., Reference Newport, Gleitman and Gleitman2020), exaggerated prosody (Fernald & Simon, Reference Fernald and Simon1984), enhanced acoustic features, and greater variability (Huttenlocher et al., Reference Huttenlocher, Waterfall, Vasilyeva, Vevea and Hedges2010), has been suggested to facilitate early language development. This claim is supported by a body of experimental research demonstrating that infants engaged with CDS exhibit improved performance in various linguistic tasks such as speech sound discrimination (Karzon, Reference Karzon1985; Liu et al., Reference Liu, Kuhl and Tsao2003), word recognition (Singh et al., Reference Singh, Nestor, Parikh and Yull2009), and word learning (Ma et al., Reference Ma, Golinkoff, Houston and Hirsh-Pasek2011) compared to those exposed to adult-directed speech (ADS). The ability to segment continuous speech into meaningful units, such as words, is essential for language acquisition and processing. While behavioural studies suggest that CDS facilitates this process (Menn et al., Reference Menn, Michel, Meyer, Hoehl and Männel2022; Thiessen et al., Reference Thiessen, Hill and Saffran2005), the specific advantages of CDS in segmentation with naturally occurring spontaneous speech are not yet well-established. This paper investigates whether and how CDS facilitates early word segmentation by combining corpus analyses of linguistic properties with computational modelling across multiple segmentation algorithms.

1.1. Linguistic properties of Korean relevant to segmentation

Most research on word segmentation has focused on stress-based, right-branching languages like English (Jusczyk et al., Reference Jusczyk, Houston and Newsome1999). However, Korean presents a valuable contrast due to its lack of lexical stress (Jun, Reference Jun1998; Ko, Reference Ko2024), head-final syntax, and agglutinative morphology, making it an ideal case to explore how universal and language-specific cues interact during segmentation. Unlike English, Korean does not rely on lexical stress as a segmentation cue. Instead, Korean prosody is organised primarily around the Accentual Phrase (AP; Jun, Reference Jun1998), a phrasal-level unit without consistent word-level prominence (Ko, Reference Ko2013, Reference Ko2024). Infants exposed to languages without strong stress cues, such as French, rely heavily on alternative segmentation cues, including syllable-based regularities (Nazzi et al., Reference Nazzi, Iakimova, Bertoncini, Frédonie and Alcantara2006). Korean infants might similarly rely on these syllable-level cues for segmentation.

At the syntactic level, Korean is a head-final language with canonical Subject–Object–Verb (SOV) order. This syntactic structure makes forward transitional probabilities particularly informative, as arguments and modifiers usually precede heads. Korean learners thus demonstrate sensitivity primarily to forward rather than backward transitional probabilities (Onnis & Thiessen, Reference Onnis and Thiessen2013). At the same time, Korean’s flexible word order and frequent omission of arguments, particularly in conversational speech, reduce transitional probability stability, potentially complicating segmentation. CDS, however, with its structurally simpler, shorter utterances and frequent repetition, might partially offset these challenges by increasing regularity and predictability.

Morphologically, Korean’s agglutinative structure poses additional segmentation challenges. Morphemes in Korean are linearly concatenated but can be blurred by phonological processes such as coalescence. Recent computational modelling work, however, suggests morphological complexity plays a modest role compared to algorithmic factors (Loukatou et al., Reference Loukatou, Stoll, Blasi and Cristia2022b). In Korean CDS, the shorter and morphologically simpler words relative to ADS likely mitigate some of these segmentation difficulties. Still, the question remains whether infants initially segment at the word, morpheme, or prosodic unit (e.g., AP) level. In this study, we adopt orthographic words as segmentation targets, following prior research, though we acknowledge this approach may oversimplify segmentation tasks in languages with complex morphology like Korean.

Lexically, Korean CDS frequently employs onomatopoeia, expressive lengthening, and playful linguistic forms, particularly in speech directed towards younger infants (Jo & Ko, Reference Jo and Ko2018). Onomatopoeic words, acoustically salient and frequently repeated, may act as anchor points to aid segmentation, analogous to isolated or familiar words that facilitate segmentation (Bortfeld et al., Reference Bortfeld, Morgan, Golinkoff and Rathbun2005). Thus, Korean provides a distinct linguistic environment for investigating how language-specific structures interact with general CDS modifications to shape segmentation strategies, making it a valuable test case for examining the cross-linguistic applicability of segmentation models.

1.2. Segmentation of speech into words

Word segmentation plays a crucial role in language acquisition and processing, as it enables infants to identify and extract meaningful linguistic units from continuous speech (Goyet et al., Reference Goyet, Nishibayashi and Nazzi2013; Saffran et al., Reference Saffran, Aslin and Newport1996). However, segmenting speech into words presents challenges since natural speech lacks explicit word boundary markers. To overcome these challenges, infants rely on various cues present in the speech input, including stress (Jusczyk et al., Reference Jusczyk, Cutler and Redanz1993; Jusczyk & Aslin, Reference Jusczyk and Aslin1995; Menn et al., Reference Menn, Michel, Meyer, Hoehl and Männel2022), phrase-level prosody (Estes & Hurley, Reference Estes and Hurley2013; Jusczyk et al., Reference Jusczyk, Friederici, Wessels, Svenkerud and n Jusczyk1993; Shukla et al., Reference Shukla, White and Aslin2011), statistical regularities at the word (Bortfeld et al., Reference Bortfeld, Morgan, Golinkoff and Rathbun2005), syllable (Black & Bergmann, Reference Black and Bergmann2017), and phoneme levels (Mattys & Jusczyk, Reference Mattys and Jusczyk2001; Saffran et al., Reference Saffran, Aslin and Newport1996), and allophonic patterns (Estes, Reference Estes2014; Jusczyk et al., Reference Jusczyk, Houston and Newsome1999).

CDS has been shown to facilitate word segmentation in laboratory-based behavioural research (Hoareau et al., Reference Hoareau, Yeung and Nazzi2019; Thiessen et al., Reference Thiessen, Hill and Saffran2005). However, the specific mechanisms that facilitate segmentation in CDS remain unclear. Its efficacy is often attributed to its unique prosody, such as exaggerated pitch contours and slower speech rates (Cooper & Aslin, Reference Cooper and Aslin1990; Estes & Hurley, Reference Estes and Hurley2013; Fernald & Kuhl, Reference Fernald and Kuhl1987), and musical attributes that function similarly to songs (Ma et al., Reference Ma, Fiveash, Margulis, Behrend and Thompson2020). These features enhance the appeal and effectiveness of CDS, while its higher information value at the prosodic level aids attentional processing (Räsänen et al., Reference Räsänen, Kakouros and Soderstrom2018). These characteristics create an environment conducive to infants processing the statistical regularities in speech, essential for successful word segmentation (Saffran et al., Reference Saffran, Aslin and Newport1996).

While CDS is recognised for its salient acoustic characteristics that benefit children’s language learning (Fernald & Simon, Reference Fernald and Simon1984; Garnica, Reference Garnica1977; Menn et al., Reference Menn, Michel, Meyer, Hoehl and Männel2022; Stern et al., Reference Stern, Spieker, Barnett and MacKain1983), it is important to undertake a deeper analysis of input properties beyond phonological and phonetic features. Previous research has primarily focused on CDS features that attract and maintain infants’ attention and enhance speech signals (Fernald & Simon, Reference Fernald and Simon1984; Grieser & Kuhl, Reference Grieser and Kuhl1988). However, relatively few studies have thoroughly examined key linguistic features through corpus analysis, such as the proportion of one-word utterances, hapax legomena, onomatopoeia, and utterance length.

Recent naturalistic studies have begun bridging laboratory findings with real-world language acquisition. For example, Loukatou et al. (Reference Loukatou, Scaff, Demuth, Cristia and Havron2022a) showed that CDS consistently exhibits shorter utterances and lexical simplification across diverse speakers in French and Sesotho, indicating these properties facilitate segmentation regardless of cultural context. However, inconsistencies in the magnitude of CDS advantages across languages remain, potentially due to factors such as morphological complexity and prosodic structure. In agglutinative languages such as Korean, features like syllable-timed rhythm and case-marking particles might uniquely influence CDS segmentation outcomes. By systematically comparing Korean CDS and ADS, this study explores how language-specific structures interact with universal CDS characteristics to shape segmentability outcomes.

1.3. Simulating word segmentation

In the line of research utilising computational simulation to model word segmentation, findings on the advantage of CDS have been mixed. For instance, Batchelder (Reference Batchelder2002) reported a ~ 10% CDS advantage in English, Spanish, and Japanese using a single algorithm, and Fourtassi et al. (Reference Fourtassi, Borschinger, Johnson and Dupoux2013) similarly found a CDS advantage in English and Japanese, also using a single algorithm. Ludusan et al. (Reference Ludusan, Mazuka, Bernard, Cristia and Dupoux2017) also identified a CDS advantage in Japanese using four algorithms. However, more comprehensive studies have shown smaller advantages: Loukatou et al. (Reference Loukatou, Le Normand and Cristià2019) found only a ~3% CDS advantage in French using 18 algorithms, and Cristia et al. (Reference Cristia, Dupoux, Ratner and Soderstrom2019) reported a similar result in English using nine algorithms.

These discrepancies likely arise from methodological, linguistic, and corpus-related factors. Methodologically, simpler algorithms relying solely on transitional probabilities may be especially sensitive to repetitive features of CDS, potentially inflating its apparent advantage. More complex algorithms that incorporate multiple cues (e.g., diphone information, phonotactic patterns, or rule-based inference) distribute their reliance across features, resulting in more nuanced or modest CDS benefits.

The linguistic structure of the target language also matters. Agglutinative languages with regular morphological patterns (e.g., Korean, Japanese, Turkish) might facilitate segmentation, especially when morpheme boundaries consistently align with clear phonological or syllabic boundaries. However, this advantage may depend on how transparently these boundaries surface in actual speech, particularly in CDS, which tends to have shorter words and simpler morphological forms. In contrast, stress-timed languages like English often rely on stress patterns or function words as segmentation cues, which may not consistently align with word boundaries, making statistical patterns potentially less predictable. Thus, language-specific phonological and morphological structures interact differently with segmentation algorithms, potentially influencing the observed effectiveness of CDS.

Additionally, corpus sampling conditions, including differences in naturalness (e.g., spontaneous versus laboratory-recorded speech) and interactional context, can substantially influence segmentation outcomes. Spontaneously collected CDS in naturalistic contexts may provide more representative and consistent segmentation cues than data elicited in laboratory settings, contributing further variability to cross-study comparisons.

Finally, the age of the children in the corpus plays a role. CDS directed at younger infants often includes shorter utterances, and frequent repetition, which support segmentation, whereas speech to older children becomes syntactically complex and less optimised for segmentation (Chai & Ko, Reference Chai and Ko2025).

Together, these methodological and linguistic factors help explain the divergent findings across studies using different corpora, age groups, and segmentation approaches.

1.4. Linguistic properties of CDS associated with segmentability

Several linguistic properties of CDS consistently support efficient word segmentation, including shorter utterances, shorter words, frequent word repetition, one-word utterances, and enhanced prosodic features (Batchelder, Reference Batchelder2002; Brent & Siskind, Reference Brent and Siskind2001; Menn et al., Reference Menn, Michel, Meyer, Hoehl and Männel2022; Newport et al., Reference Newport, Gleitman and Gleitman2020; Thiessen & Saffran, Reference Thiessen and Saffran2003).

Short utterances are thought to facilitate segmentation by providing a simplified and predictable linguistic environment (Bernstein Ratner & Rooney, Reference Bernstein Ratner and Rooney2001; Brent & Siskind, Reference Brent and Siskind2001; Swingley & Humphrey, Reference Swingley and Humphrey2018). Computational modelling has demonstrated that shorter utterances improve segmentation performance (Frank et al., Reference Frank, Goldwater, Griffiths and Tenenbaum2010), indicating that the brevity of CDS may be beneficial for early word learning. Additionally, isolated words, which occur frequently in CDS, offer particularly clear perceptual anchors that help infants segment speech more efficiently (Brent & Siskind, Reference Brent and Siskind2001; Lew-Williams et al., Reference Lew-Williams, Pelucchi and Saffran2011).

Interjections, another frequent feature in spontaneous CDS, might also support segmentation despite lacking grammatical connections to adjacent words. Familiar anchor words like “mommy” or the child’s name are known to facilitate segmentation (Bortfeld et al., Reference Bortfeld, Morgan, Golinkoff and Rathbun2005), but whether interjections provide similar benefits remains unexplored. Differences in how interjections are classified across studies may therefore affect conclusions about segmentation cues in CDS. Variation in how interjections are treated when defining one-word utterances across studies could therefore influence conclusions about the structure and segmentation benefits of CDS.

CDS also consistently exhibits lower lexical diversity compared to ADS (Soderstrom, Reference Soderstrom2007), characterised by frequent repetition and fewer rare words (Goodman et al., Reference Goodman, Dale and Li2008). Higher lexical diversity and more unique words (hapax legomena) typically found in ADS introduce more segmentation complexity, as algorithms or learners encounter less predictable linguistic patterns. In contrast, the repetition and lexical simplicity in CDS strengthen statistical regularities, increasing predictability and ease of segmentation (Thiessen & Saffran, Reference Thiessen and Saffran2003). Variation sets, or repeated words presented in varied contexts, additionally reinforce segmentation cues (Lester et al., Reference Lester, Moran, Küntay, Allen, Pfeiler and Stoll2022). Computational studies have similarly shown that shortening utterances and increasing repetition significantly enhances segmentation performance (Batchelder, Reference Batchelder2002), further emphasising how CDS structurally supports infants’ segmentation abilities.

1.5. Research objectives

This study aims to advance our understanding of word segmentation in Korean, a language that is typologically different from languages like English at multiple linguistic levels. Our primary objectives are (1) to examine how linguistic features and statistical regularities differ between CDS and ADS in a naturalistic Korean speech corpus; (2) to evaluate the performance of word segmentation algorithms in segmenting words from Korean CDS compared to ADS; and (3) to assess the effectiveness of various algorithms in segmenting words from the Korean corpus. We employ computational modelling to investigate how Korean CDS and ADS, each with distinctive properties, interact with different segmentation strategies to influence word segmentation performance. In light of the results, we will discuss the underlying assumptions of each algorithm, and their ability to capture the unique statistical properties of Korean.

To the best of our knowledge, this is the first study to explore word segmentation in Korean CDS and ADS using computational models applied to a corpus of spontaneous speech and to compare the segmentation strategy of algorithms in relation to linguistic properties of Korean.

2. Methods

2.1. Design

This study utilizes the Korean Ko corpus (Ko et al., Reference Ko, Jo, On and Zhang2020), which includes transcriptions of natural interactions between Korean mothers and their children (CDS) and adults (ADS). To control for potential corpus-size effects (Montag et al., Reference Montag, Jones and Smith2018), we supplemented the ADS portion of the Ko corpus with additional data from the Call Friend Korean corpus (ADS-CF; Ko, Reference Ko2013). This augmentation ensured that our ADS dataset matched the CDS portion of the Ko corpus more closely in size, helping to rule out the possibility that observed segmentation differences were artifacts of corpus size disparities.

Previous studies typically used dictionary lookups to convert text into phonemic transcriptions. However, Beech and Swingley (Reference Beech and Swingley2023) show that using actual pronunciations leads to worse segmentation performance compared to phonemic forms, as many phonologically distinct word forms on the surface were actually the same underlying words, and vice versa. To address this mismatch, we transformed the orthographic form into a phonemic representation and applied phonological rules to derive actual pronunciations in spontaneous speech. While this approach has limitations, such as the inability to replicate the application of optional rules that are often dependent on speaking rate or prosodic phrasing, it represents an advancement over previous methods based on dictionary forms by aligning the input more closely with the phonological reality encountered by children during word segmentation.

2.2. Data and preparation

To simulate word segmentation of the Korean language, we used spontaneous speech from the Ko corpus (Ko et al., Reference Ko, Jo, On and Zhang2020), which includes transcripts of 35 mother–child dyads (14 girls; mean age = 16 months; age range = 6–30 months) during free-play sessions, totalling 23.3 hours of CDS. The same mothers also produced ADS while talking with adult family members or experimenters, totalling 5.8 hours. As the amount of ADS in the Ko corpus was relatively limited, we supplemented our analyses with the ADS-CF corpus (Ko, Reference Ko2013), a collection of 15-minute transcriptions from 100 phone calls between Korean-speaking adults residing in the U.S. These informal peer conversations involve 200+ speakers (55 female–female, 28 male–male, and 17 mixed-gender calls), primarily among young adults (mean age = 21.8). Although the speech is not face-to-face and varies in demographic background, it offers additional adult-directed input that matches the CDS in size. The Ko corpus, with rich annotations and matched CDS/ADS speakers, served as the primary source for comparing registers.

In the Korean writing system, each alphabetical letter corresponds to an underlying phoneme, and each block of Hangul characters represents a syllable. Hangul’s phonemic system allows direct derivation of phonetic output without dictionary lookups, as each block corresponds to a syllable, and each character within represents a specific phoneme. For example, the word “한글” (/hankɯl/), meaning “the Korean alphabet,” consists of two syllables: “한” (/han/) and “글” (/kɯl/). Each character block can contain up to three phonemes, typically in a consonant-vowel (CV) or consonant-vowel-consonant (CVC) structure.

We transformed the orthographic transcriptions into phonetic symbols using phonological rules from the KoG2P package (Hong et al., Reference Hong, Ki and Gweon2018; Mun et al., Reference Mun, Kim and Ko2022). The phonetic input resulting from applying phonological rules to the phonemic representation includes adjusted syllable boundaries and other phonological changes to accurately reflect actual pronunciations. The phonologisation process involved applying context-sensitive phonological rules to account for phonological variations and connected speech processes. For example, an application of the h-deletion, optional w-deletion, place assimilation, j-deletion, and resyllabification rules would transform the orthographic form “전화주시는거죠” (/d͡ʑʌn.hwa.cu.si.nɯn.gʌ.d͡ʑjo/, “You will call me, right?”) to the phonetic form [d͡ʑʌ.na.cu.si.nɯŋ.gʌ.d͡ʑo].

2.3. Metrics for assessing linguistic properties of corpus data

Metrics for primary linguistic properties. We systematically quantified several linguistic properties, including utterance and word length, the proportion of hapax legomena, the percentage of monosyllabic words, one-word utterances, onomatopoeic words and word play, as well as interjections. These metrics were specifically chosen due to their theoretical significance in understanding word segmentation and language complexity. The metrics are detailed below.

Utterance length and word length: We measured word and utterance length in both phones (i.e., total number of individual phone segments within a word or utterance) and syllables (i.e., total number of Hangul blocks within a word or utterance). Average lengths were then calculated for each participant at both phonetic and syllabic levels.

Hapax legomena: Hapax legomenon values were calculated for each subject respectively by dividing the number of word types that occur only once by the total number of word types in the sample.

Monosyllabic words: We identified monosyllabic words and measured their percentage relative to the total number of words in the Ko Corpus.

One-word utterances: We adopted a conservative approach, excluding words that never occur in combination with other words and interjection words such as discourse fillers (e.g., 어 “uh,” 아 “ah”), affirmative responses (e.g., 네 “yes,” 응 “mm-hmm,” 예 “yes,” 그래 “okay”), negative responses (e.g., 아니 “no”), and non-lexical vocalisations (e.g., 우와 “wow,” 아고 “oh dear,” 어머 “oh my”). These excluded words are collectively categorised as interjections due to their linguistic function.

Interjections: We defined interjections as exclamatory words or phrases that can stand alone or be inserted within a sentence to vividly convey sudden feelings such as surprise, excitement, anger, or disappointment (Ameka, Reference Ameka1992). This definition focuses on their linguistic properties rather than their acoustic features.Footnote ¹

Onomatopoeic words and word play: The Ko corpus (Ko et al., Reference Ko, Jo, On and Zhang2020) annotates onomatopoeic words with the symbol “@o” and word play with the symbol “@wp.” To assess the ratio of these words, we tallied the total frequency of words identified with the tag, allowing for redundant counts between the two categories, and divided by the total number of word tokens in each corpus and across speech registers. This measure represents the proportion of onomatopoeic and word play tokens relative to all word tokens.

Metrics for corpus complexity. Expanding on the basic linguistic unit percentages outlined above, we further explored corpus complexity by assessing composite measures of lexical diversity and lexical ambiguity/entropy.

MATTR: To assess lexical diversity, we employed the Moving-Average Type-Token Ratio (MATTR) across a 20-word window (Covington & McFall, Reference Covington and McFall2010). This metric iteratively calculates the average type-token ratio as the window progresses through the corpus. A higher MATTR indicates greater lexical diversity in the speech, whereas a lower MATTR points to a greater degree of repetition.

Entropy: We quantified lexical ambiguity using the normalised Shannon entropy (NSE) formula. NSE evaluates the ambiguity or unpredictability of text segmentation by employing a unigram model that assigns probabilities to each possible segmentation based on word frequency. High entropy values indicate greater lexical ambiguity and unpredictability in segmentation, while low entropy values denote more constrained and predictable segmentation outcomes.

2.4. Word segmentation simulations

We opted to use the WordSeg package (Bernard et al., Reference Bernard, Thiolliere, Saksida, Loukatou, Larsen, Johnson and Cristia2020) to enable direct comparisons of the outcome in our study with previous studies that have utilised this package (e.g., Cristia et al., Reference Cristia, Dupoux, Ratner and Soderstrom2019; Loukatou et al., Reference Loukatou, Le Normand and Cristià2019). These studies have reported a minimal advantage of CDS over ADS, which may be attributed to variations in methods and language contexts. To model sub-lexical level segmentation, which focuses on sound sequences between syllables and phones, we employed the transitional probability (TP; Saffran et al., Reference Saffran, Aslin and Newport1996) and diphone-based segmentation (DiBS; Daland & Pierrehumbert, Reference Daland and Pierrehumbert2011) algorithms. For lexical-level segmentation, we utilised PUDDLE (Phonotactics from Utterances Determine Distributional Lexical Elements; Monaghan & Christiansen, Reference Monaghan and Christiansen2010) and the Adaptor Grammar approach (e.g., Cristia et al., Reference Cristia, Dupoux, Ratner and Soderstrom2019; Loukatou et al., Reference Loukatou, Le Normand and Cristià2019; Ludusan et al., Reference Ludusan, Mazuka, Bernard, Cristia and Dupoux2017).

The algorithms selected for this study represent a range of segmentation strategies, from local statistical cues (e.g., transitional probabilities, diphone statistics) to more global, lexicon-driven approaches (e.g., PUDDLE, Adaptor Grammar). Rather than modelling child cognition directly, our aim is to test which types of cues and mechanisms are most sensitive to differences between CDS and ADS in Korean, and which align best with the language’s structural properties. While these algorithms are not intended as fully cognitively plausible models, they help identify which input features (from CDS and ADS) and statistical regularities are most informative for segmentation, offering insight into the types of cues that may be available to learners.

In the following section, we provide a concise summary of each algorithm and outline the specific settings employed in our study, referencing Bernard et al., Reference Bernard, Thiolliere, Saksida, Loukatou, Larsen, Johnson and Cristia2020 for further descriptions of the algorithms.Footnote ²

Algorithms. Baseline: We conducted a series of baseline simulations at both phones and syllables levels. These simulations were configured to assign word boundaries with a 50% probability, i.e., at-chance level of inserting a word boundary between unit tokens (phones or syllables) and thus treating each as a word.

Transitional probabilities: We implemented four TP algorithms varying in transition thresholds (absolute and relative) and direction (forward and backward). Relative TP identifies boundaries based on a relative decrease in TP, while absolute TP sets the boundary at the average TP of the corpus. Forward TP analyses transitions left-to-right, and backward TP right-to-left, without requiring known word boundaries. Each algorithm processed syllable and phone units, yielding eight evaluation sets.

Diphone-based segmenter: The DiBS algorithm utilizes Bayes’ rule to estimate the probability of a word boundary between two phones/syllables. We implemented two versions of DiBS: (1) DiBS-phrasal uses utterance as the chunks for identifying word boundaries and (2) DiBS-lexical utilizes a seed lexicon. DiBS-lexical is still considered a sub-lexical algorithm as it focuses on the statistical properties of sub-word units, such as diphones, rather than statistical information at the word level. Both DiBS-phrasal and DiBS-lexical processed at both syllable and phone units, resulting in four sets of processes.

Phonotactics from utterances determine distributional lexical elements (PUDDLE): PUDDLE is an utterance-based word segmentation algorithm developed to model the incremental nature of language learning in a psychologically plausible way (Monaghan & Christiansen, Reference Monaghan and Christiansen2010). Rooted in a chunk-based approach, PUDDLE is conceptually related to Christiansen’s broader work on usage-based learning and chunking; it assumes that learners store and process input as chunks, gradually refining their representations as they encounter more data. Notably, this chunk-based perspective is also reflected in approaches like Lieven’s traceback method, which traces children’s utterances back to previously encountered chunks and frame-and-slot patterns in their input (Hartmann et al., Reference Hartmann, Koch and Quick2021; Lieven et al., Reference Lieven, Salomo and Tomasello2009).

Unlike local event-based approaches like DiBS and TP, PUDDLE breaks down utterances into candidate words using three long-term storage buffers: a “lexicon,” onset bigrams, and offset bigrams. The algorithm operates incrementally by (1) scanning utterances and matching sequences to the “lexicon,” (2) storing unmatched utterances as words with frequency and phonetic information, (3) using a phone window to capture word boundaries, (4) comparing new utterances to stored phonetic information for segmentation, and (5) favouring high-frequency words while adding unknown words to the “lexicon.”

Adaptor grammar (AG): AG (Goldwater et al., Reference Goldwater, Griffiths and Johnson2009; Johnson et al., Reference Johnson, Griffiths and Goldwater2006; Phillips, Reference Phillips2015) is a computational framework designed for parsing a corpus and inferring the probabilities of rewrite rules that may have generated it. AG assumes sentences consist of words, and words consist of basic units. The framework generates, stores, and rewrites rules through an iterative process: (1) parsing the corpus multiple times to refine sub-rules and prune uneconomical parses; (2) reapplying these parses to generate possible segmentations; and (3) using Minimum Bayes Risk to select the most probable segmentations. This flexible approach allows AG to robustly segment words while adapting to language-specific characteristics.

Evaluations. We initially trained each of the 9 unique algorithms (baseline, absolute forward TP, relative forward TP, absolute backward TP, relative backward TP, DiBS-phrasal, DiBS-lexical, PUDDLE, AG) on 140 unique corpus sets, derived from 35 dyads across two registers (CDS and ADS) and two levels of processing units (phones and syllables).

In this study, we analyse both phones and syllables as processing units, given their central role in Korean structure and bottom-up statistical learning. Phones represent the most basic segmental units, whereas syllables are highly salient in perception, production, and orthography, and often align closely with morpheme boundaries. Sensitivity to statistical regularities at these levels is therefore a critical precursor to effective word and morpheme segmentation.

Each algorithm processed the utterances produced by the mothers in the Ko corpus, which were categorised as either CDS or ADS based on the register tag. For evaluation purposes, we prepared a gold corpus that defines the gold standard for target word boundaries in the input utterances, and a unitised corpus used for algorithm training. The orthographic form of the data includes information on syllable and word boundaries, as well as the phones.

Performance evaluation for both phone-level and syllable-level algorithms involved comparing their segmented outputs against the gold standard corpus. Phone-level algorithms are denoted by a “_p” suffix and syllable-level algorithms by a “_s” suffix, with each being supplied with the corresponding unitised corpus at the phone or syllable level. We assessed performance using precision, recall, and F-scores, derived from the comparison of the segmented outputs to the gold corpus.

3. Results

3.1. Linguistic properties of CDS and ADS

We report relevant linguistic properties extracted from the Korean corpus in Table 1, providing means and standard deviations (SDs) accompanied by histograms generated using the skimr package in R (McNamara et al., Reference McNamara, Arino de la Rubia, Zhu, Ellis and Quinn2018). Building upon previous research by Cristia et al. (Reference Cristia, Dupoux, Ratner and Soderstrom2019) and Loukatou et al. (Reference Loukatou, Le Normand and Cristià2019), our study focuses on utterance and word length (measured in phones and syllables), the proportion of hapax legomena (words), one-word utterances, MATTR, and entropy, with an addition of the percentages of monosyllabic words, onomatopoeic, word-play and interjection words.

Table 1.

Linguistic properties of corpus, their mean and standard deviation, M (SD) across registers (CDS, ADS and ADS call friend)

¹ Not an exhaustive list.

² Information not available due to the lack of annotation for this property in the Call Friend corpus.

To compare property-related differences between the registers, we fitted a series of linear models for each linguistic property, using the metrics shown in Table 1. The dependent variable in each model was the linguistic property, with corpus register type (CDS, ADS, and ADS-CF) as the fixed factor. We also included the total token number as a covariate to account for differences in corpus size. Table 2 shows regression estimates for linguistic properties, comparing ADS and ADS-CF with CDS as the reference level, as well as the effect of corpus size (represented per 1,000 tokens). A positive estimate in the ADS or ADS-CF columns indicates that the respective register exhibits a higher value for that linguistic property compared to CDS. Negative estimates for corpus size indicate that as corpus size increases, the corresponding linguistic property value decreases.

Table 2.

Summary table of a series of linear regressions comparing differences in corpus properties categorised by registers (CDS, ADS, ADS-call-friend), with corpus size (total tokens) as the covariate. Values represent Estimate with standard errors (SE) in parentheses

***p < 0.001; **p < 0.01; *p < 0.05.

The analysis of different linguistic properties across CDS and ADS reveals distinct patterns. ADS (and ADS-CF) is characterised by longer utterance and word length; greater lexical diversity, as evidenced by higher MATTR; and a larger proportion of hapax legomena. The higher entropy in ADSFootnote ³ suggests more complex and less predictable language use compared to CDS, which may in part reflect the broader range of topics and socio-environmental contexts present in ADS samples. However, ADS-CF does not show significant differences from CDS in terms of entropy, likely due to some similar linguistic features that derive from the social distance between speakers in these two contexts. ADS shows a lower proportion of one-word utterances compared to CDS, indicating that speakers are less likely to produce a single word when addressing an adult.

In summary, our findings reveal clear structural differences between CDS and ADS, underscoring how adults adapt their language to support children’s linguistic development. CDS is characterised by simpler, more repetitive patterns that likely cater to the developmental needs of young learners, while ADS is generally more complex and varied, reflecting typical adult communication. It is important to note that the ADS in the Ko corpus includes interactions with both family members and experimenters; these subsets were not separated in our segmentation analysis due to the limited corpus size, which may contribute to the observed complexity in ADS.

3.2. Segmentability of speech

To systematically investigate register-related variations (Figure 1) between CDS and ADS, and assess segmentation algorithms differences, we employed a mixed linear regression model. This model analyses the effects of register type (CDS, ADS-CF & ADS), algorithm type (nine types), and the unitised corpus (at phone-level and syllable-level) on word segmentation performance, quantified by F-scores.Footnote ⁴ We account for individual differences among dyads by incorporating a by-dyad random factor into the model. The specific formula used for the model is as follows:

Figure 1.

Scatter plot showing the distribution of raw F-scores measuring model performance across speech registers (CDS, ADS), speech processing algorithms, and phone/syllable units on word segmentation simulations.

f_score ~ algorithm + register + phonological unit + (1 | dyads).

Post-hoc analyses are conducted to examine the statistical contrasts between the estimated marginal means of each variable level (Figure 2).

Figure 2.

Comparison of model estimated marginal mean F-scores on word segmentation simulations, across speech registers (CDS, ADS, ADS-CF), algorithms, and unitised type, with 95% confidence intervals. The purple bars represent 95% confidence intervals around each estimate. Red arrows highlight pairwise contrasts between conditions where a statistically significant difference was found.

Register effect. The model output reports a significant main effect of register type (χ ² = 711.38, df = 1, p < .001). The register effect showed overall better word segmentation performance in CDS (emmean = .70) than in ADS (emmean = .63, p < .001) and in ADS-CF (emmean = .62, p < .001), with no significant differences between ADS and ADS-CF (p = .347).

Algorithm differences. The model output reports a significant main effect of algorithm type (χ² = 2210.52, df = 8, p < .001). Post-hoc comparisons between algorithms revealed that the baseline control measure was significantly lower than the other algorithms (p’s < .001), thus suggesting that the other algorithms were more effective in word segmentation. It is noteworthy that in the analysis of transitional probabilities (TPs) algorithms, forward probabilities demonstrated greater segmentation performance (p’s < .001) than backward probabilities. Comparisons between the diphone-based (DiBS-phrasal and DiBS-lexical) algorithms showed that the phrasal variant of the algorithm outperforms the lexical variant (p’s < .001), suggesting that phrase-based DiBS are more effective at identifying word boundaries. Adaptor grammar consistently outperformed other algorithms (p’s < .001), with lexical-DiBS and PUDDLE showing poorest performances.

Phonological unit. We also found a significant main effect of the phonological unit (χ² = 2981.28, df = 1, p < .001). Post-hoc comparisons of units showed overall better word segmentation performance in syllables (emmean = .70) than phones (emmean = .60, p < .001). This syllable advantage aligns with the linguistic properties of the Korean language, which we discuss in 4.4.1 in detail.

Mediation of corpus properties. This analysis examines whether corpus-level linguistic properties account for the observed differences in word segmentation performance between registers, essentially testing if these properties mediate the relationship between register and algorithm performance. We investigated the mediation of corpus properties on the register effect in word segmentation performance, with each phone-level and syllable-level units respectively. For example, we compared the linear mixed model for the phone-based algorithm, regressed on the F-score, with algorithm and register as predictors, to the model that has corpus properties added as covariates. These covariates included phone-based word and utterance length, and other common properties such as percentage of hapax legomena, MATTR, entropy, percentage of monosyllabic words, isolated words, and interjections. The same modelling approach was applied to the syllable-based algorithm (see Figure 3). The inclusion of corpus properties significantly improved the models (phone: χ² = 532.32, p < .001; syllable: χ² = 465.80, p < .001). At the phone level, the inclusion of corpus properties reduced the CDS advantage over ADS from 0.12 to 0.05 (i.e., differences in F-scores emmeans), and over ADS-CF from 0.09 to 0.02, while it remained statistically significant (ps < .05). At the syllable level, the inclusion of corpus properties reduced the CDS advantage over ADS to non-significant, decreased from 0.09 to 0.02 (p = .062), and the advantage over ADS-CF decreased from 0.06 to 0.02 (p = .012). These results indicate that much of the segmentation advantage observed in CDS can be attributed to underlying corpus properties, highlighting their mediating role in shaping algorithmic performance across registers.

Figure 3.

Predicted F-scores (emmeans) for word segmentation models comparing child-directed speech (CDS), adult-directed speech (ADS), and Call Friend ADS (ADS-CF) across base and mediation models using phone-level (left) and syllable-level (right) units. Inclusion of corpus-level linguistic properties as covariates in mediation models reduces CDS segmentation advantage compared to ADS and ADS-CF, indicating a mediating role of corpus properties on register effects. Error bars reflect estimated model uncertainty.

4. Discussion

We conducted a detailed examination of the linguistic properties of Korean CDS and ADS to explore their impact on word segmentation performance, providing new insights into how these characteristics influence segmentation processes. While our findings that CDS shows better segmentability than ADS are consistent with cross-linguistic expectations, since simpler, more repetitive speech is generally easier to segment, Korean presents a particularly interesting case due to its flexible word order and complex, agglutinative morphology. These structural features, along with other Korean-specific cues discussed below, likely interact with corpus properties to shape segmentation outcomes in ways that differ from more commonly studied Indo-European languages. Crucially, we utilised statistical modelling techniques to explore the relationship between linguistic properties and segmentation algorithms in assessing the potential advantages of CDS. This approach allowed us to examine the complex interplay of factors involved in early language processing.

4.1. Differences in linguistic properties of CDS and ADS

The corpus analysis reveals systematic differences between CDS and ADS that are demonstrated by complementary measures of lexical predictability and diversity. ADS is characterised by longer utterances and words, greater lexical diversity (reflected in higher MATTR), and a larger proportion of hapax legomena compared to CDS. These characteristics, along with higher entropy in ADS, suggested a more complex and less predictable linguistic environment, consistent with the broader vocabulary and intricate sentence structures characteristic of adult communication. In contrast, CDS demonstrates lower MATTR values, reduced entropy, and fewer hapax legomena, underscoring a fundamentally more repetitive and predictable linguistic strategy. These patterns indicate that CDS employs a restricted set of vocabulary with frequent repetition, creating a more learnable environment for children by enhancing predictability and supporting statistical learning. The reduced lexical diversity and increased repetitiveness in CDS, as captured by both MATTR and entropy, directly align with findings that children benefit from repeated exposure to vocabulary items, facilitating word acquisition through consistent linguistic input (Ko et al., Reference Ko, Jo and Chai2022). Thus, the interplay of these metrics not only highlights the distinctiveness of CDS but also underscores its adaptive role in language development by providing children with a structured and accessible linguistic environment.

Beyond these core measures of lexical diversity, the register differences extend to specific linguistic features that serve different communicative functions. Interjections are often neglected in linguistic research, typically considered marginal (Ameka, Reference Ameka1992; Dingemanse, Reference Dingemanse2024). In the current study, we found that ADS exhibited a lower percentage of one-word utterances but a higher percentage of isolated interjection words compared to CDS. While ADS may involve more complex language and a higher reliance on interjections as discourse markers, CDS tends to prioritize simplicity and clarity to support children’s language acquisition. This is evident in its fewer interjections, more one-word utterances, and the frequent use of onomatopoeic words and word play, which likely serve as attention-getters, simple directives, or labels for objects or actions. The simplicity of CDS, in contrast to the increased complexity of ADS, where a broader vocabulary and more complex sentence structures are employed. The divergence in the linguistic properties of CDS and ADS highlights the inherent adaptability of language to the linguistic needs of interlocutors.

4.2. CDS advantage and mediation effect of corpus properties

Our investigation into word segmentation in Korean, utilising a spontaneous speech corpus, revealed superior results for CDS compared to ADS. These findings align with several prior studies (Batchelder, Reference Batchelder2002; Fourtassi et al., Reference Fourtassi, Borschinger, Johnson and Dupoux2013; Ludusan et al., Reference Ludusan, Mazuka, Bernard, Cristia and Dupoux2017; Ma et al., Reference Ma, Golinkoff, Houston and Hirsh-Pasek2011; Stärk et al., Reference Stärk, Kidd and Frost2022) and represent one of the first attempts to explore this phenomenon in the Korean language. However, it is important to note that there are conflicting reports, for instance, Cristia et al. (Reference Cristia, Dupoux, Ratner and Soderstrom2019) and Loukatou et al. (Reference Loukatou, Le Normand and Cristià2019) reported mixed or minimal advantages of CDS using similar algorithms. Despite replicating Cristia et al.’s (Reference Cristia, Dupoux, Ratner and Soderstrom2019) methods, our analysis of the Korean corpus demonstrated clear segmentation advantages for CDS. These cross-linguistic differences in CDS effectiveness are expected, as different language structures impact segmentability in varying ways. Korean exemplifies this principle particularly well, given its flexible syntax and complex agglutinative morphology, which create unique challenges and opportunities for word segmentation. The language’s SOV word order with flexible constituent placement, combined with rich inflectional morphology where multiple morphemes attach to stems, creates segmentation conditions that differ substantially from previously studied languages.

There are some notable differences between this study and Cristia et al. (Reference Cristia, Dupoux, Ratner and Soderstrom2019) such as language (current work: Korean; Cristia et al., Reference Cristia, Dupoux, Ratner and Soderstrom2019: English) and data size (i.e., based on the number of utterances; current work: ADS = 2,544; CDS = 22,203; Cristia et al., Reference Cristia, Dupoux, Ratner and Soderstrom2019: ADS = 1,772; CDS = 5,320). These differences, despite using the same algorithms and both being based on matched spontaneous speech corpora, seem to result in the register-based effects on word segmentation observed in our findings. The intrinsic attributes of CDS appear to facilitate word segmentation more effectively than ADS in Korean.

When we controlled for differences in linguistic properties between CDS and ADS, such as utterance length, word length, repetition, and lexical diversity, the segmentation advantage of CDS was reduced or, in the case of syllable-based models, became statistically non-significant. This indicates that the superior segmentability of CDS is not solely due to its register (i.e., being child-directed), but is largely mediated by specific corpus properties that make the input simpler, more repetitive, and more predictable for learners. In other words, we identified specific features of CDS, that is shorter utterances, higher repetition, and lower lexical diversity, that directly facilitate word segmentation. This finding highlights the importance of considering how measurable linguistic properties, rather than just social context, shape the learnability of speech input.

4.3. Comparative analysis of word segmentation algorithms

The interplay between word segmentation mechanisms and the linguistic properties of a language plays a crucial role in shaping how words are segmented from continuous speech. In this study, we selected a range of segmentation algorithms to probe which statistical cues are most informative for Korean, a head-final and morphologically rich language. TP and DiBS algorithms focus on local phonological and statistical regularities, such as syllable and diphone patterns, which are highly relevant given Korean’s predictable syllable structure. PUDDLE and AG algorithms, in contrast, operate at higher levels, incorporating longer-range phonotactic or hierarchical patterns and lexical storage, which may be less cognitively plausible for infants but allow us to test the informativeness of more global or memory-based cues. This comparative approach not only tests which statistical cues align best with Korean’s linguistic structure, but also allows us to consider how closely each algorithm approximates mechanisms that might be accessible to infants or language learners. Algorithms relying on local phonological patterns or simple transitional probabilities may better reflect cognitively plausible strategies available to early learners, whereas those requiring extensive lexical storage or hierarchical rule induction may be less so. This distinction is crucial for evaluating the relevance of different segmentation strategies to real-world language acquisition.

Forward versus backward probabilities: The impact of Korean’s head-final structure. In our study focusing on Korean language processing, we observed intricate interactions between forward and backward TPs and various corpus properties. These findings reflect Korean’s head-final structure, where forward contextual information is generally more predictive and informative than backward information. Our results align with previous evidence that forward TPs hold an advantage over backward TPs for languages like Korean (Onnis & Thiessen, Reference Onnis and Thiessen2013).

In the SOV word order of Korean, the verb occurs sentence-finally, often carrying critical semantic information. Forward TP algorithms, which calculate the probability of a word given the preceding word(s), can effectively capture these predictive cues. For example, in the sentence 나는 사과를 먹었다 na-nun sakwa-lul mek-ess-ta “I-TOPIC apple-ACC eat-PAST -DECL; I ate an apple,” the forward TP from the object 사과를 (sakwa-lul, “apple”) to the verb 먹었다 (mek-ess-ta, “ate”) is likely to be high because objects typically precede verbs, strongly predicting their occurrence. In contrast, backward TP algorithms, which calculate the probability of a word given the following word(s), may be less effective in Korean. Because Korean verbs appear at the end, they offer fewer precise cues to identify preceding words. Although the verb 먹었다 (mek-ess-ta, “ate”) suggests an eating-related context, it provides limited clues to predict specifically which words preceded it, such as 나 (na, “I”) or 사과 (sakwa, “apple”). This predictive relationship also applies at the word level. Consider the noun phrase 학교에 (hakkyo-ey, “at school”), composed of 학교 (hakkyo, “school”) followed by the postposition 에 (ey, “at”). Here, the preceding noun strongly predicts the following postposition, resulting in clear forward transitional probabilities at the word boundary.

The SOV word order and the use of grammatical markers and case particles in Korean create a linguistic environment where the preceding context is more informative for predicting upcoming words than the following context, aligning well with the strengths of forward TP algorithms. This alignment potentially explains their superior performance compared to backward TP algorithms in Korean word segmentation tasks.

While TPs can, in theory, be computed at multiple linguistic levels: phonetic, syllabic, morphological, and lexical, our study focuses on the phonetic and syllabic levels, where computational methods and theoretical foundations are most robust. Extending TP-based segmentation to higher levels, such as morphology and words, presents additional complexities, including challenges in consistently defining and classifying boundaries, thus remains an active area of research.

Phrasal-based versus lexical-based DiBS: The role of phonological cues. The superior performance of the phrasal-based DiBS over the lexical-based approach in Korean highlights the reliance on phonetic and phonological cues such as syllable structure and phonotactics at phrasal edges, rather than on lexical information, for word segmentation. This finding aligns with the phonological properties and the agglutinative nature of the Korean language, which has a relatively simple and consistent syllable structure, but a rich set of phonotactic constraints that govern the permissible combinations of phonemes within and across phrasal boundaries. For example, diphone sequences such as [sa] or [c^ha] never occur across word or phrasal boundaries due to the coda neutralisation rule applying to laryngeal consonants. On the other hand, sequences such as [ln] or [ng] are allowed only across phrasal boundaries, which are also word boundaries. These phonotactic regularities provide infants with a preliminary guide to word demarcations (Daland & Pierrehumbert, Reference Daland and Pierrehumbert2011). This is likely to be particularly beneficial in Korean, where the AP, the domain of many phonological rules, is also demarcated by intonational cues in speech (Jun, Reference Jun1998) and alignment with strong beats in children’s songs (Ko, Reference Ko2024).

Our results demonstrate that phonological cues, such as syllable structure and phonotactics, are a highly effective source of information for word segmentation in Korean, especially for infants who, early in development, may not yet rely on lexical-based strategies. These bottom-up cues from the speech input help children discern word boundaries and support early word learning. While our findings are specific to Korean, it is important to note that transitional probabilities and phonological cues at this level are also recognised as valuable segmentation sources in other languages, including English, though the degree of reliance may vary with linguistic structure.

Adaptor grammar advantage. The AG algorithm’s superior performance in Korean word segmentation stems from its use of Pitman-Yor process-based adaptors that hierarchically cache and reuse frequent linguistic structures. This creates a “rich get richer” dynamic where common Korean structures like case markers (이/가, 은/는) become increasingly probable through repeated caching. This hierarchical caching mechanism is particularly effective for Korean’s agglutinative morphology, where morpheme boundaries often align with syllable boundaries, allowing AG to learn predictable attachment patterns without requiring preset linguistic unit definitions. While such alignment could theoretically increase the risk of oversegmentation in morphologically rich languages (Loukatou et al., Reference Loukatou, Le Normand and Cristià2019, Reference Loukatou, Stoll, Blasi and Cristia2022b), AG’s multi-level statistical learning and ability to identify co-occurring morpheme sequences help mitigate this risk by favouring linguistically meaningful units over arbitrary boundaries. The Bayesian nonparametric framework enables flexible adaptation to Korean’s complex morphological structure by automatically discovering optimal segmentation patterns from the data.

4.4. Cross-linguistic implications

Understanding word segmentation requires recognising that languages differ not only in their structure but also in the cues they make available to learners. Specifically, while Korean’s clear syllable structure and agglutinative morphology make syllables especially salient for segmentation, other languages highlight different cues: French and Spanish listeners benefit from syllable boundaries (Cutler & Carter, Reference Cutler and Carter1987), English relies more on stress patterns (Cutler, Reference Cutler, Reed and Levis2015; Cutler & Norris, Reference Cutler and Norris1988), and languages like Turkish utilize vowel harmony (Hohenberger et al., Reference Hohenberger, Altan, Kaya, Tuncer and Avcu2016) or French employs final lengthening (Welby, Reference Welby2007) as segmentation aids. These differences are not merely theoretical; experimental evidence from artificial-language learning studies shows that when presented with identical speech streams, listeners from different language backgrounds preferentially use the cues most prominent in their native language (Spanish in Toro-Soto et al., Reference Toro-Soto, Rodríguez-Fornells and Sebastián-Gallés2007; English, French, and Dutch in Tyler & Cutler, Reference Tyler and Cutler2009; German and Italian in Ordin & Nespor, Reference Ordin and Nespor2016). Thus, the optimal segmentation strategy is shaped by the typological features of each language. In the following subsections, we detail how Korean’s linguistic properties shape segmentation strategies, focusing on the role of syllables, the influence of prosody and statistical cues, and the importance of one-word utterances in CDS.

Advantages of syllable-level word segmentation in Korean. Korean primarily employs simple syllable structures, such as CV and CVC, which stand in contrast to the more complex syllable forms typical in English. As a syllable-timed language, each syllable in Korean tends to occupy a relatively uniform amount of time, which makes syllable-based processing both natural and effective, closely aligning with the language’s morpho-phonemic attributes. These syllables serve not merely as rhythmic units but often function as morphemes or distinct word parts, thereby enhancing the utility of syllable-based analytical approaches. Each syllable carries significant morphemic information that is crucial for comprehending the language’s morphology and syntax (Kim, Reference Kim2004).

The psychological significance of syllable as a unit of psychological processing is further exemplified in Korean traditional poetry, such as Sijo, where syllable count plays a critical role analogous to the use of meter in English formal poetry. In linguistic games and wordplay, syllables similarly hold distinct importance. For instance, in 끝말잇기 kkeutmal-ittgi ‘end-word connecting,’ or word chain games, play is strictly based on syllables, in contrast to the segmental-level manipulation in Pig Latin in English.Footnote ⁵ This emphasis on syllables underpins a richer linguistic interaction and highlights their essential role in shaping Korean linguistic structure and psychological processing. More broadly, this illustrates how children can flexibly learn from the structures that provide the most salient evidence for segmentation in their language, with syllables serving as especially informative units in Korean.

Besides, the poor performance of lexical algorithms, such as lexical-DiBS and PUDDLE algorithms in Korean word segmentation may be attributed to their inability to fully capture the statistical cues and linguistic patterns specific to the language. Lexical-DiBS relies on lexical-related information and may struggle with the agglutinative nature of Korean, where words are formed by combining multiple morphemes. PUDDLE’s incremental learning approach may not align well with the statistical properties of Korean, where the agglutinative nature of grammatical markers and particles could potentially bias the algorithm’s lexicon construction. These findings suggest that infants may not rely solely on lexical information or incremental learning strategies for word segmentation in languages with complex morphological structures.

Notably, these structural and psychological properties that make syllables advantageous for word segmentation in Korean are not unique to this language. Similar syllable-morpheme alignment and syllable-based processing advantages are also found in other languages, such as Chinese, suggesting broader cross-linguistic relevance for future research.

Word segmentation in Korean: statistical cues and beyond. Another aspect to consider is the influence of prosody. Korean deviates significantly from languages like English, where lexical stress (Cutler, Reference Cutler, Reed and Levis2015) serves as one important cue among multiple segmentation strategies, including transitional probabilities and phonotactic patterns (Cutler & Norris, Reference Cutler and Norris1988; Saffran et al., Reference Saffran, Aslin and Newport1996; Thiessen & Saffran, Reference Thiessen and Saffran2003). Importantly, both Korean and English provide rich distributional structure, that is, in principle, usable for segmentation; however, the relative weighting and availability of specific cues differ across languages. In Korean, the absence of lexical stress precludes stress-based segmentation strategies, potentially increasing learners’ reliance on alternative sources of information, such as lexical repetition and prosodic grouping. From this perspective, utterance-level prosodic organisation itself becomes a primary locus of segmentation-relevant information.

From an interactional perspective, such learner-side pressures plausibly invite systematic adjustments in caregiver speech, a pattern consistent with longstanding accounts of CDS as an accommodated register shaped in response to children’s linguistic capacities (Giles et al., Reference Giles, Taylor and Bourhis1973; Snow, Reference Snow1972, Reference Snow, Snow and Ferguson1977). In Korean, where clause structure is canonically verb-prominent with an SOV order, nouns in CDS are often rendered salient through repetition and flexible ordering, and may surface in utterance-final positions associated with heightened perceptual salience (Ko et al., Reference Ko, Jo and Chai2022). This pattern illustrates how prosodically prominent edges can be populated with salient lexical material, which may in turn strengthen the association between structural position and perceptual prominence. Taken together, these cues point to a segmentation strategy in Korean that is grounded in prosodic phrasing (Kim & Cho, Reference Kim and Cho2009) and edge-prominence (Jun, Reference Jun and Jun2014) rather than in lexical stress, and thus reflect a typologically distinct pathway to early segmentation relative to stress-timed languages such as English.

While the role of onomatopoeic words and word play in word segmentation was not directly examined in the current study, a related study by Chai and Ko (Reference Chai and Ko2025) did not find a positive association between these features and word segmentation performance. This pattern raises the possibility that such features exert their influence primarily through acoustic salience or attentional engagement, rather than by systematically shaping the distributional cues that computational and behavioural models of segmentation typically exploit. Although the current study does not evaluate the specific impact of onomatopoeia, word play, and expressive lengthening on word segmentation directly, their presence in CDS highlights the unique characteristics of speech directed to children and the potential for these features to influence language acquisition through acoustic and attentional mechanisms.

One-word utterances and interjections in CDS across studies. The prevalence of one-word utterances in CDS varies across languages and studies due to differing criteria for defining these utterances (Stärk et al., Reference Stärk, Kidd and Frost2022). Some studies use inclusive definitions, while others employ stricter criteria (Brent & Siskind, Reference Brent and Siskind2001). The current study on the Korean corpus adopted a conservative approach by excluding interjections from the count of one-word utterances. This exclusion resulted in a lower percentage of one-word utterances compared to French and German studies that used more inclusive criteria. However, when isolated interjections are included, the percentage aligns with that of the German study (Stärk et al., Reference Stärk, Kidd and Frost2022). These variations highlight the importance of clear definitions for cross-linguistic comparisons and understanding language-specific factors in CDS structure. A refined study distinguishing between strict single-word utterances and isolated interjections is crucial for understanding their potential differential effects on word segmentation.

Beyond methodological considerations, isolated words function as perceptual anchors that facilitate speech segmentation. Research has shown that these “anchor words,” previously heard in isolation, help infants segment novel words from continuous speech (Cunillera et al., Reference Cunillera, Laine and Rodriguez-Fornells2016; Lew-Williams et al., Reference Lew-Williams, Pelucchi and Saffran2011). This anchoring mechanism operates through a bootstrapping process: words encountered in isolation become perceptually familiar and subsequently “pop out” when heard embedded within continuous utterances, providing reference points that help infants identify adjacent word boundaries. This process may be particularly helpful in Korean, given its complex interplay between morphological structure and prosodic units, such as AP’s. Due to Korean’s agglutinative morphology and the frequent, but not consistent, alignment of content words with AP boundaries, isolated words in Korean CDS likely offer crucial perceptual anchors, aiding infants in parsing the language’s morphological and prosodic complexity.

4.5. Limitations and future avenues

The current study highlights the significance of language-specific structural organisation in shaping segmentation cues, while also exposing limitations that necessitate additional efforts to differentiate typological effects from methodological variation. Infants universally exploit statistical regularities in broadly similar ways; however, the particular cues that instantiate these regularities differ across languages. Korean infants, for instance, may rely on syllable structure and prosodic boundaries (Jun, Reference Jun1998; Kim & Cho, Reference Kim and Cho2009), whereas English infants utilize lexical stress and strong–weak syllable patterns (Cutler & Norris, Reference Cutler and Norris1988; Jusczyk et al., Reference Jusczyk, Houston and Newsome1999). Other languages highlight different cues, such as vowel harmony in Turkish (Hohenberger et al., Reference Hohenberger, Altan, Kaya, Tuncer and Avcu2016), final lengthening in French (Welby, Reference Welby2007), or pitch movements in Dutch (Johnson & Seidl, Reference Johnson and Seidl2009). Although this study highlights segmentation cues specific to Korean, we did not directly compare these processes across languages. Future research should systematically examine how language-specific structures shape statistical segmentation cues and how infants adapt accordingly.

Additionally, the study highlights the importance of considering the social context and the nature of the linguistic input when investigating word segmentation. While the data were collected in a semi-naturalistic setting, the ADS corpus size was smaller than CDS. The ADS in the Ko corpus included speech involving both family members and experimenters, but these subsets were not separated in the segmentation analysis due to the small corpus size. This limitation affected the control for social distance when merging the two types of ADS for segmentation data. Future research should collect larger and more balanced corpora to enable a more fine-grained analysis of the linguistic properties and segmentability of speech across different social contexts and speaker–listener relationships.

While age effects were omitted from the current study due to space constraints and are detailed in a separate study by Chai & Ko (Reference Chai and Ko2025), investigating age-related changes in the statistical properties of CDS and their impact on word segmentation performance could provide valuable insights into the developmental trajectory of language acquisition.

This study’s findings underscore the intricate interplay between the language input and the assumptions and mechanisms of different word segmentation algorithms. Future research should systematically vary linguistic properties like the percentage of monosyllabic words, syllable complexity, and word length distribution to better understand how these factors interact with various algorithms. Additionally, investigating these effects in naturalistic language input across different languages could provide valuable insights into the generalisability of these findings and the potential role of language-specific adaptations in word segmentation strategies.

The definition of words in this study was based on their orthographic representation in the audio signal, specifically the spacing between words. In agglutinative languages like Korean, a single orthographic word can encompass multiple morphemes, conveying what might be multiple words’ worth of meaning in non-agglutinative languages such as English. Conversely, Korean has a category of nouns called dependent nouns, which are always cliticised but are orthographically represented as an independent word. For example, in phrases like 할 수 있다 hal swu iss-ta “do way exist; can do,” a dependent noun like 수 swu must be integrated with a verb to form a meaningful unit. The resulting phrase, though syntactically complex, forms a single AP, a prosodic unit demarcated by intonation. Given the importance of the role AP’s play in Korean phonology, infants might segment the speech stream based on the AP rather than on individual words or morphemes.

Future research should identify and validate the optimal segmentation units for Korean, considering how prosody, morphology, and perceptual cues interact. Although we assessed segmentation using adult-defined word boundaries, infants may segment speech using units that do not correspond neatly to traditional linguistic categories. Thus, methodologies such as experimental paradigms that examine infants’ sensitivity to various boundary types or computational models allowing flexible, multi-level segmentation may be needed. Leveraging raw audio data could also offer a more ecologically valid approach, revealing whether infants segment speech according to conventional linguistic units or whether segmentation emerges primarily from statistical and acoustic regularities.

5. Conclusion

This study provides a detailed analysis of word segmentation in Korean, showing that features of CDS, such as shorter, simpler utterances, higher repetition, and more frequent single-word utterances, enhance segmentability compared to ADS. These findings highlight clear structural differences between CDS and ADS, reflecting how adults adapt their language to support children’s linguistic needs. We demonstrate that the segmentation advantage of CDS is mediated by specific linguistic properties, and that these interact with segmentation algorithms in language-specific ways. While our results clarify the importance of Korean-specific cues like syllable structure, morphological complexity, and repetition, we acknowledge that the study does not include direct cross-linguistic comparisons or address all possible segmentation cues, such as morpheme-level boundaries or prosodic units. This advantage is not merely due to corpus properties but also results from how these properties interact with computational models designed to simulate human linguistic processing. Future research should further investigate which segmentation cues are most reliable for Korean-learning children and how these differ from those in other languages, as well as expand the analysis to additional linguistic units and distinctions within ADS, to deepen our understanding of word segmentation mechanisms in Korean.

Data availability statement

The data and analysis files associated with this study are available through the Open Science Framework at https://osf.io/uadw9/.

Funding statement

This work was supported by the National Research Foundation of Korea (NRF-2025S1A6B5A02004207) and a research fund from Chosun University, 2024.

Competing interests

The authors declare none.

Footnotes

Present address: Seongmin Mun, Department of English Language and Literature, Kyungpook National University 80, Daehak-ro, Buk-gu, Daegu, Republic of Korea, 41566.

¹ We acknowledge that the list of interjections are not exhaustive and contain some of the most frequent interjections in the corpus. In addition, interjections possess unique acoustic properties, such as distinctive pitch contours, intensity patterns, and voice quality features, which contribute to their attention-getting and emotion-signaling functions (Ameka, Reference Ameka1992; Bolinger, Reference Bolinger1989). However, due to the limitations of our current computational models, these acoustic correlates were not taken into consideration in the present study. We leave the investigation of the role of acoustic features in the segmentation and acquisition of interjections for future research, as it requires the utilization/development of more advanced algorithms capable of integrating both linguistic and paralinguistic information.

² We did not include morphological segmentation because our focus was on word segmentation at the phone and syllable levels, which aligns with the capabilities of the WordSeg package and with the majority of prior computational studies in this area. While morphology is indeed important in Korean, its agglutinative structure means that a single orthographic word (eojeol) often contains multiple morphemes, and there is no universally accepted standard for morpheme-level segmentation in Korean corpus linguistics. Additionally, studies have shown that the optimal unit for segmentation in Korean can depend on the research goal, and morpheme-level segmentation does not always yield better outcomes (Park & Kim, Reference Park and Kim2024). Our approach allows for direct comparison with previous work and maintains methodological consistency, while we acknowledge the importance of morphology and address its implications in the manuscript.

³ While our primary models included total token count as a covariate to statistically control for corpus size effects on entropy, and took additional steps to address potential corpus-size effects by supplementing the ADS portion of the Ko corpus with data from the Call Friend Korean corpus (ADS-CF; Ko et al., Reference Ko, Han, Caravan and Zipperlen2003), we also conducted a supplemental analysis to directly address the reviewer’s concerns by matching sample size across all conditions. Specifically, for each file, we downsampled or bootstrapped the corpus to a fixed number of word tokens (using the average ADS tokens as the target, 323 tokens). This ensured that all entropy estimates were based on equal-sized samples. We then fit a linear mixed-effects model predicting entropy from register (child-directed speech [CDS] versus adult-directed speech [ADS]), with subject ID as a random intercept: entropy ~ register + (1|subject). The effect of the register remained significant, β = −0.014, SE = 0.003, t(34) = − 4.32, p < .001 (ADS as intercept), indicating that CDS exhibited lower entropy than ADS even when measured on matched sample sizes. These results demonstrate that the observed register effect on entropy is robust and not attributable to differences in corpus size.

⁴ Our analysis found no significant performance difference between phonetic and phoneme forms of the algorithms, leading to the decision to select the phonetic form.

⁵ Pig Latin is a playful form of English word manipulation where the initial consonant or consonant cluster of a word is moved to the end, followed by “ay” (e.g., “cat” becomes “atcay,” “school” becomes “oolschay”). Unlike Korean syllable-based games, Pig Latin operates at the segmental level by manipulating individual sounds or sound clusters rather than complete syllabic units.

References

Ameka, F. (1992). Interjections: The universal yet neglected part of speech. Journal of Pragmatics, 18(2–3), 101–118.10.1016/0378-2166(92)90048-GCrossRef Google Scholar

Batchelder, E. O. (2002). Bootstrapping the lexicon: A computational model of infant speech segmentation. Cognition, 83, 167–206.10.1016/S0010-0277(02)00002-1CrossRef Google Scholar PubMed

Beech, C., & Swingley, D. (2023). Consequences of phonological variation for algorithmic word segmentation. Cognition, 235, 105401.10.1016/j.cognition.2023.105401CrossRef Google Scholar PubMed

Bernard, M., Thiolliere, R., Saksida, A., Loukatou, G. R., Larsen, E., Johnson, M., … Cristia, A. (2020). WordSeg: Standardizing unsupervised word form segmentation from text. Behavior Research Methods, 52, 264–278.10.3758/s13428-019-01223-3CrossRef Google Scholar PubMed

Bernstein Ratner, N., & Rooney, B. (2001). How accessible is the lexicon in motherese? Language Acquisition and Language Disorders, 23, 71–78.10.1075/lald.23.06berCrossRef Google Scholar

Black, A., & Bergmann, C. (2017). Quantifying infants’ statistical word segmentation: A meta-analysis. Proceedings of the Annual Meeting of the Cognitive Science Society, 39.Google Scholar

Bortfeld, H., Morgan, J. L., Golinkoff, R. M., & Rathbun, K. (2005). Mommy and me: Familiar names help launch babies into speech-stream segmentation. Psychological Science, 16(4), 298–304. https://doi.org/10.1111/j.0956-7976.2005.01531.x.CrossRef Google Scholar PubMed

Bolinger, D. (1989). Intonation and its uses: Melody in grammar and discourse. Stanford University Press.10.1515/9781503623125CrossRef Google Scholar

Brent, M. R., & Siskind, J. M. (2001). The role of exposure to isolated words in early vocabulary development. Cognition, 81, B33–B44.10.1016/S0010-0277(01)00122-6CrossRef Google Scholar PubMed

Chai, J. H., & Ko, E.-S. (2025). Input properties shape word segmentation performance across child development: A computational modeling study. Infancy, e70046. https://doi.org/10.1111/infa.70046.CrossRef Google Scholar PubMed

Cooper, R. P., & Aslin, R. N. (1990). Preference for infant-directed speech in the first month after birth. Child Development, 61(5), 1584–1595.10.2307/1130766CrossRef Google Scholar PubMed

Covington, M. A., & McFall, J. D. (2010). Cutting the gordian knot: The moving-average type–token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100.10.1080/09296171003643098CrossRef Google Scholar

Cristia, A., Dupoux, E., Ratner, N. B., & Soderstrom, M. (2019). Segmentability differences between child-directed and adult-directed speech: A systematic test with an ecologically valid corpus. Open Mind, 3, 13–22.10.1162/opmi_a_00022CrossRef Google Scholar PubMed

Cunillera, T., Laine, M., & Rodriguez-Fornells, A. (2016). Headstart for speech segmentation: A neural signature for the anchor word effect. Neuropsychologia, 82, 189–199.10.1016/j.neuropsychologia.2016.01.011CrossRef Google Scholar PubMed

Cutler, A. (2015). Lexical stress in English pronunciation. In Reed, M. & Levis, J. M. (Eds.), The Handbook of English Pronunciation (pp. 106–124). Wiley-Blackwell 10.1002/9781118346952.ch6CrossRef Google Scholar

Cutler, A., & Carter, D. M. (1987). The predominance of strong initial syllables in the English vocabulary. Computer Speech & Language, 2(3–4), 133–142.10.1016/0885-2308(87)90004-0CrossRef Google Scholar

Cutler, A., & Norris, D. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance, 14(1), 113.Google Scholar

Daland, R., & Pierrehumbert, J. B. (2011). Learning diphone-based segmentation. Cognitive Science, 35, 119–155.10.1111/j.1551-6709.2010.01160.xCrossRef Google Scholar PubMed

Dingemanse, M. (2024). Interjections at the heart of language. Annual Review of Linguistics, 10, 257–277.10.1146/annurev-linguistics-031422-124743CrossRef Google Scholar

Estes, K. G. (2014). Learning builds on learning: Infants’ use of native language sound patterns to learn words. Journal of Experimental Child Psychology, 126, 313–327. https://doi.org/10.1016/j.jecp.2014.05.006.CrossRef Google Scholar

Estes, K. G., & Hurley, K. (2013). Infant-directed prosody helps infants map sounds to meanings. Infancy, 18(5). https://doi.org/10.1111/infa.12006.Google Scholar PubMed

Fernald, A., & Kuhl, P. (1987). Acoustic determinants of infant preference for motherese speech. Infant Behavior and Development, 10(3), 279–293.10.1016/0163-6383(87)90017-8CrossRef Google Scholar

Fernald, A., & Simon, T. (1984). Expanded intonation contours in mothers’ speech to newborns. Developmental Psychology, 20(1), 104.10.1037/0012-1649.20.1.104CrossRef Google Scholar

Frank, M. C., Goldwater, S., Griffiths, T. L., & Tenenbaum, J. B. (2010). Modeling human performance in statistical word segmentation. Cognition, 117(2), 107–125. https://doi.org/10.1016/j.cognition.2010.07.005CrossRef Google Scholar PubMed

Fourtassi, A., Borschinger, B., Johnson, M., & Dupoux, E. (2013). Whyisenglishsoeasytosegment. In Proceedings of the fourth annual workshop on cognitive modeling and computational linguistics (pp. 1–10). Sofia, Bulgaria.Google Scholar

Garnica, O. K. (1977). Some prosodic and paralinguistic features of speech to young children.Google Scholar

Giles, H., Taylor, D. M., & Bourhis, R. (1973). Towards a theory of interpersonal accommodation through language: Some Canadian data. Language in Society, 2(2), 177–192. https://doi.org/10.1017/S0047404500000701.CrossRef Google Scholar

Goldwater, S., Griffiths, T. L., & Johnson, M. (2009). A Bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1), 21–54.10.1016/j.cognition.2009.03.008CrossRef Google Scholar PubMed

Goodman, J. C., Dale, P. S., & Li, P. (2008). Does frequency count? Parental input and the acquisition of vocabulary. Journal of Child Language, 35(3), 515–531. https://doi.org/10.1017/S0305000907008641.CrossRef Google Scholar PubMed

Goyet, L., Nishibayashi, L. L., & Nazzi, T. (2013). Early syllabic segmentation of fluent speech by infants acquiring French. PLoS One, 8(11), e79646. https://doi.org/10.1371/journal.pone.0079646.CrossRef Google Scholar PubMed

Grieser, D. L., & Kuhl, P. K. (1988). Maternal speech to infants in a tonal language: Support for universal prosodic features in motherese. Developmental Psychology, 24(1), 14.10.1037/0012-1649.24.1.14CrossRef Google Scholar

Hartmann, S., Koch, N., & Quick, A. E. (2021). The traceback method in child language acquisition research: Identifying patterns in early speech. Language and Cognition, 13(2), 227–253.10.1017/langcog.2021.1CrossRef Google Scholar

Hoareau, M., Yeung, H. H., & Nazzi, T. (2019). Infants’ statistical word segmentation in an artificial language is linked to both parental speech input and reported production abilities. Developmental Science, 22(4), e12803. https://doi.org/10.1111/desc.12803.CrossRef Google Scholar

Hohenberger, A., Altan, A., Kaya, U., Tuncer, Ö. K., & Avcu, E. (2016). Sensitivity of Turkish infants to vowel harmony: Preference shift from familiarity to novelty. In The acquisition of Turkish in childhood (pp. 29–56). John Benjamins Publishing Company.10.1075/tilar.20.02honCrossRef Google Scholar

Hong, Y. S., Ki, K. S., & Gweon, G. (2018). Automatic miscue detection using RNN based models with data augmentation. INTERSPEECH, 1646–1650.Google Scholar

Huttenlocher, J., Waterfall, H., Vasilyeva, M., Vevea, J., & Hedges, L. V. (2010). Sources of variability in children’s language growth. Cognitive Psychology, 61(4), 343–365.10.1016/j.cogpsych.2010.08.002CrossRef Google Scholar PubMed

Jo, J., & Ko, E.-S. (2018). Korean mothers attune the frequency and acoustic saliency of sound symbolic words to the linguistic maturity of their children. Frontiers in Psychology, 9, 2225. https://doi.org/10.3389/fpsyg.2018.02225.CrossRef Google Scholar

Johnson, E. K., & Seidl, A. H. (2009). At 11 months, prosody still outranks statistics. Developmental Science, 12(1), 131–141.10.1111/j.1467-7687.2008.00740.xCrossRef Google Scholar PubMed

Johnson, M., Griffiths, T., & Goldwater, S. (2006). Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. Advances in Neural Information Processing Systems, 19.Google Scholar

Jun, S.-A. (1998). The accentual phrase in the Korean prosodic hierarchy. Phonology, 15(2), 189–226.10.1017/S0952675798003571CrossRef Google Scholar

Jun, S.-A. (2014). Prosodic typology: By prominence type, word prosody, and macro-rhythm. In Jun, S.-A. (Ed.), Prosodic typology II: The phonology of intonation and phrasing (pp. 520–539). Oxford University Press.10.1093/acprof:oso/9780199567300.003.0017CrossRef Google Scholar

Jusczyk, P. W. & Aslin, R. N. (1995). Infants’ detection of the sound patterns of words in fluent speech. Cognitive Psychology, 29(1):1—23.10.1006/cogp.1995.1010CrossRef Google Scholar PubMed

Jusczyk, P. W., Cutler, A., & Redanz, N. J. (1993). Infants’ preference for the predominant stress patterns of English words. Child Development, 64(3), 675–687.10.2307/1131210CrossRef Google Scholar PubMed

Jusczyk, P. W., Friederici, A. D., Wessels, J. M. I., Svenkerud, V. Y., & n Jusczyk, A. M. (1993). Infants′ sensitivity to the sound patterns of native language words. Journal of Memory and Language, 32(3), 402–420.10.1006/jmla.1993.1022CrossRef Google Scholar

Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999). The beginnings of word segmentation in English-learning infants. Cognitive Psychology, 39(3–4), 159–207.10.1006/cogp.1999.0716CrossRef Google Scholar PubMed

Karzon, R. G. (1985). Discrimination of polysyllabic sequences by one-to four-month-old infants. Journal of Experimental Child Psychology, 39(2), 326–342.10.1016/0022-0965(85)90044-XCrossRef Google Scholar PubMed

Kim, S. (2004). The role of prosodic phrasing in Korean word segmentation. University of California.Google Scholar

Kim, S., & Cho, T. (2009). The use of phrase-level prosodic information in lexical segmentation: Evidence from word-spotting experiments in Korean. The Journal of the Acoustical Society of America, 125(5), 3373–3386. https://doi.org/10.1121/1.3097777.CrossRef Google Scholar PubMed

Ko, E., Jo, J., & Chai, J. H. (2022). Word-order adaptation and lexical repetition in Korean child-directed speech: Implications for infants’ learning nouns in a verb-dominant language. https://doi.org/10.31234/osf.io/dxbhmCrossRef Google Scholar

Ko, E.-S. (2013). A metrical theory of Korean word prosody. The Linguistic Review, 30(1), 79–115.10.1515/tlr-2013-0004CrossRef Google Scholar

Ko, E.-S. (2024). Edge-prominence in text-setting: A preliminary study of Korean children’s songs. Korean Journal of Linguistics, 49(2), 207–229. 10.18855/lisoko.2024.49.2.001.Google Scholar

Ko, E-S., Han, N. R., Caravan, A., & Zipperlen, G. (2003). Korean telephone conversations speech. Philadelphia: Linguistic Data Consortium.Google Scholar

Ko, E. S., Jo, J., On, K. W., & Zhang, B. T. (2020). Introducing the Ko corpus of Korean mother–child interaction. Frontiers in Psychology, 11, 602623.10.3389/fpsyg.2020.602623CrossRef Google Scholar PubMed

Lester, N. A., Moran, S., Küntay, A. C., Allen, S. E., Pfeiler, B., & Stoll, S. (2022). Detecting structured repetition in child-surrounding speech: Evidence from maximally diverse languages. Cognition, 221, 104986.10.1016/j.cognition.2021.104986CrossRef Google Scholar PubMed

Lew-Williams, C., Pelucchi, B., & Saffran, J. R. (2011). Isolated words enhance statistical language learning in infancy. Developmental Science, 14(6), 1323–1329.10.1111/j.1467-7687.2011.01079.xCrossRef Google Scholar PubMed

Lieven, E., Salomo, D., & Tomasello, M. (2009). Two-year-old children’s production of multiword utterances: A usage-based analysis. Cognitive Linguistics, 20(3), 481–507. https://doi.org/10.1515/COGL.2009.022.CrossRef Google Scholar

Liu, H. M., Kuhl, P. K., & Tsao, F. M. (2003). An association between mothers’ speech clarity and infants’ speech discrimination skills. Developmental Science, 6(3), F1–F10.10.1111/1467-7687.00275CrossRef Google Scholar

Loukatou, G., Scaff, C., Demuth, K., Cristia, A., & Havron, N. (2022a). Child-directed and overheard input from different speakers in two distinct cultures. Journal of Child Language, 49(6), 1173–1192.10.1017/S0305000921000623CrossRef Google Scholar

Loukatou, G., Stoll, S., Blasi, D., & Cristia, A. (2022b). Does morphological complexity affect word segmentation? Evidence from computational modeling. Cognition, 220, 104960. https://doi.org/10.1016/j.cognition.2021.104960.CrossRef Google Scholar

Loukatou, G. R., Le Normand, M. T., & Cristià, A. (2019). Is it easier to segment words from infant-than adult-directed speech? Modeling evidence from an ecological French corpus. CogSci, 2186–2192.Google Scholar

Ludusan, B., Mazuka, R., Bernard, M., Cristia, A., & Dupoux, E. (2017). The role of prosody and speech register in word segmentation: A computational modelling perspective. In Proceedings of the annual conference of the Association for Computational Linguistics Short papers (Vol. 2, pp. 178–183).10.18653/v1/P17-2028CrossRef Google Scholar

Ma, W., Fiveash, A., Margulis, E. H., Behrend, D., & Thompson, W. F. (2020). Song and infant-directed speech facilitates word learning. Quarterly Journal of Experimental Psychology, 73(7), 1036–1054.10.1177/1747021819888982CrossRef Google Scholar

Ma, W., Golinkoff, R. M., Houston, D. M., & Hirsh-Pasek, K. (2011). Word learning in infant-and adult-directed speech. Language Learning and Development, 7(3), 185–201.10.1080/15475441.2011.579839CrossRef Google Scholar PubMed

Mattys, S. L., & Jusczyk, P. W. (2001). Phonotactic cues for segmentation of fluent speech by infants. Cognition, 78(2), 91–121.10.1016/S0010-0277(00)00109-8CrossRef Google Scholar PubMed

McNamara, A., Arino de la Rubia, E., Zhu, H., Ellis, S., & Quinn, M. (2018). skimr: Compact and flexible summaries of data. R Package. version, 1.Google Scholar

Menn, K. H., Michel, C., Meyer, L., Hoehl, S., & Männel, C. (2022). Natural infant-directed speech facilitates neural tracking of prosody. NeuroImage, 251, 118991.10.1016/j.neuroimage.2022.118991CrossRef Google Scholar PubMed

Monaghan, P., & Christiansen, M. H. (2010). Words in puddles of sound: Modelling psycholinguistic effects in speech segmentation. Journal of Child Language, 37(3), 545–564.10.1017/S0305000909990511CrossRef Google Scholar PubMed

Montag, J. L., Jones, M. N., & Smith, L. B. (2018). Quantity and diversity: Simulating early word learning environments. Cognitive Science, 42, 375–412.10.1111/cogs.12592CrossRef Google Scholar PubMed

Mun, S., Kim, S. H., & Ko, E.-S. (2022). 한국어 발음 변환기 (G2P) 의 현황과 성능 향상에 대한 언어학적 제안 [a proposal to improve on existing grapheme-to-phoneme conversion models informed by linguistics]. Language and Information, 26(2), 27–46.10.29403/LI.26.2.2CrossRef Google Scholar

Nazzi, T., Iakimova, G., Bertoncini, J., Frédonie, S., & Alcantara, C. (2006). Early segmentation of fluent speech by infants acquiring French: Emerging evidence for crosslinguistic differences. Journal of Memory and Language, 54(3), 283–299.10.1016/j.jml.2005.10.004CrossRef Google Scholar

Newport, E. L., Gleitman, H., & Gleitman, L. R. (2020). Mother, I’d rather do it myself. Sentence First, Arguments Afterward: Essays in Language and Learning, 141.10.1093/oso/9780199828098.003.0006CrossRef Google Scholar

Onnis, L., & Thiessen, E. (2013). Language experience changes subsequent learning. Cognition, 126(2), 268–284.10.1016/j.cognition.2012.10.008CrossRef Google Scholar PubMed

Ordin, M., & Nespor, M. (2016). Native language influence in the segmentation of a novel language. Language Learning and Development, 12(4), 461–481.10.1080/15475441.2016.1154858CrossRef Google Scholar

Park, J., & Kim, M. (2024). Word segmentation granularity in Korean. Korean Linguistics, 20(1), 82–112.10.1075/kl.00008.parCrossRef Google Scholar

Phillips, L. (2015). The role of empirical evidence in modeling speech segmentation. Ph.D. dissertation, University of California, Irvine.Google Scholar

Räsänen, O., Kakouros, S., & Soderstrom, M. (2018). Is infant-directed speech interesting because it is surprising? - linking properties of IDS to statistical learning and attention at the prosodic level. Cognition, 178, 193–206.10.1016/j.cognition.2018.05.015CrossRef Google Scholar PubMed

Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926–1928.10.1126/science.274.5294.1926CrossRef Google Scholar PubMed

Shukla, M., White, K. S., & Aslin, R. N. (2011). Prosody guides the rapid mapping of auditory word forms onto visual objects in 6-mo-old infants. Proceedings of the National Academy of Sciences, 108(15), 6038–6043.10.1073/pnas.1017617108CrossRef Google Scholar PubMed

Singh, L., Nestor, S., Parikh, C., & Yull, A. (2009). Influences of infant-directed speech on early word recognition. Infancy, 14(6), 654–666.10.1080/15250000903263973CrossRef Google Scholar PubMed

Snow, C. E. (1972). Mothers’ speech to children learning language. Child Development, 43(2), 549–565. https://doi.org/10.2307/1127555.CrossRef Google Scholar

Snow, C. E. (1977). Mothers’ speech research: From input to interaction. In Snow, C. E. & Ferguson, C. A. (Eds.), Talking to children: Language input and acquisition (pp. 31–49). Cambridge University Press.Google Scholar

Soderstrom, M. (2007). Beyond babytalk: Re-evaluating the nature and content of speech input to preverbal infants. Developmental Review, 27(4), 501–532.10.1016/j.dr.2007.06.002CrossRef Google Scholar

Stern, D. N., Spieker, S., Barnett, R. K., & MacKain, K. (1983). The prosody of maternal speech: Infant age and context related changes. Journal of child language, 10(1), 1–15.10.1017/S0305000900005092CrossRef Google Scholar PubMed

Stärk, Katja., Kidd, Evan., & Frost, Rebecca L. A. (2022). Word segmentation cues in German child-directed speech: A corpus analysis. Language and Speech, 65(1), 3–27.10.1177/0023830920979016CrossRef Google Scholar PubMed

Swingley, D., & Humphrey, C. (2018). Quantitative linguistic predictors of infants’ learning of specific English words. Child Development, 89, 1247–1267.10.1111/cdev.12731CrossRef Google Scholar PubMed

Thiessen, E. D., Hill, E. A., & Saffran, J. R. (2005). Infant-directed speech facilitates word segmentation. Infancy, 7(1), 53–71.10.1207/s15327078in0701_5CrossRef Google Scholar PubMed

Thiessen, E. D., & Saffran, J. R. (2003). When cues collide: Use of stress and statistical cues to word boundaries by 7-to 9-month-old infants. Developmental Psychology, 39(4), 706.10.1037/0012-1649.39.4.706CrossRef Google Scholar PubMed

Toro-Soto, J. M., Rodríguez-Fornells, A., & Sebastián-Gallés, N. (2007). Stress placement and word segmentation by Spanish speakers. Psicológica, 28(2), 167–176.Google Scholar

Tyler, M. D., & Cutler, A. (2009). Cross-language differences in cue use for speech segmentation. The Journal of the Acoustical Society of America, 126(1), 367–376.10.1121/1.3129127CrossRef Google Scholar PubMed

Welby, P. (2007). The role of early fundamental frequency rises and elbows in French word segmentation. Speech Communication, 49(1), 28–48.10.1016/j.specom.2006.10.005CrossRef Google Scholar

Table 1. Linguistic properties of corpus, their mean and standard deviation, M (SD) across registers (CDS, ADS and ADS call friend)

Table 2. Summary table of a series of linear regressions comparing differences in corpus properties categorised by registers (CDS, ADS, ADS-call-friend), with corpus size (total tokens) as the covariate. Values represent Estimate with standard errors (SE) in parentheses

Figure 1. Scatter plot showing the distribution of raw F-scores measuring model performance across speech registers (CDS, ADS), speech processing algorithms, and phone/syllable units on word segmentation simulations.

Figure 2. Comparison of model estimated marginal mean F-scores on word segmentation simulations, across speech registers (CDS, ADS, ADS-CF), algorithms, and unitised type, with 95% confidence intervals. The purple bars represent 95% confidence intervals around each estimate. Red arrows highlight pairwise contrasts between conditions where a statistically significant difference was found.

Figure 3. Predicted F-scores (emmeans) for word segmentation models comparing child-directed speech (CDS), adult-directed speech (ADS), and Call Friend ADS (ADS-CF) across base and mediation models using phone-level (left) and syllable-level (right) units. Inclusion of corpus-level linguistic properties as covariates in mediation models reduces CDS segmentation advantage compared to ADS and ADS-CF, indicating a mediating role of corpus properties on register effects. Error bars reflect estimated model uncertainty.

Article contents

The Alignment between Language Properties and Computational Algorithms Enhances Statistical Word Segmentation: Evidence from Korean Child-Directed Speech

Abstract

초록

Keywords

Information

1. Introduction

1.1. Linguistic properties of Korean relevant to segmentation

1.2. Segmentation of speech into words

1.3. Simulating word segmentation

1.4. Linguistic properties of CDS associated with segmentability

1.5. Research objectives

2. Methods

2.1. Design

2.2. Data and preparation

2.3. Metrics for assessing linguistic properties of corpus data

2.4. Word segmentation simulations

3. Results

3.1. Linguistic properties of CDS and ADS

3.2. Segmentability of speech

4. Discussion

4.1. Differences in linguistic properties of CDS and ADS

4.2. CDS advantage and mediation effect of corpus properties

4.3. Comparative analysis of word segmentation algorithms

4.4. Cross-linguistic implications

4.5. Limitations and future avenues

5. Conclusion

Data availability statement

Funding statement

Competing interests

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests