Skip to main content

Largest-chunk strategy for syllable-based segmentation


We apply the largest-chunk segmentation algorithm to texts consisting of syllables as smallest units. The algorithm was proposed in Drienkó (2016, 2017a), where it was used for texts considered to have letters/characters as smallest units. The present study investigates whether the largest chunk segmentation strategy can result in higher precision of boundary inference when syllables are processed rather than characters. The algorithm looks for subsequent largest chunks that occur at least twice in the text, where text means a single sequence of characters, without punctuation or spaces. The results are quantified in terms of four precision metrics: Inference Precision, Alignment Precision, Redundancy, and Boundary Variability. We segment CHILDES texts in four languages: English, Hungarian, Mandarin, and Spanish. The data suggest that syllable-based segmentation enhances inference precision. Thus, our experiments (i) provide further support for the possible role of a cognitive largest-chunk segmentation strategy, and (ii) point to the syllable as a more optimal unit for segmentation than the letter/phoneme/character, (iii) in a cross-linguistic context.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Largest-chunk strategy for syllable-based segmentation
      Available formats
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Largest-chunk strategy for syllable-based segmentation
      Available formats
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Largest-chunk strategy for syllable-based segmentation
      Available formats
Corresponding author
*Address for correspondence: e-mail:
Hide All
Babarczy, A. (2006). The development of negation in Hungarian child language. Lingua, 116, 377392.
Bagou, O., Fougeron, C., & Frauenfelder, U. H. (2002). Contribution of prosody to the segmentation and storage of ‘words’ in the acquisition of a new mini-language. Paper presented at Speech Prosody 2002, Aix-en-Provence, France, 11–13 April.
Bell, A., & Hooper, J. B. (1978). Issues and evidence in syllabic phonology. In Bell, A. & Hooper, J. B. (Eds.), Syllables and segments (pp. 322). Amsterdam: North-Holland.
Cholin, J. (2011). Do syllables exist? Psycholinguistic evidence for the retrieval of syllabic units in speech production. In Cairns, Ch. E. & Raimy, E (Eds.), Handbook of the syllable (pp. 225253). Leiden: Koninklijke Brill NV.
Cutler, A., & Carter, D. M. (1987). The predominance of strong initial syllables in English vocabulary. Computer Speech and Language, 2, 133142.
Cutler, A., & Norris, D. G. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance, 14, 113121.
Drienkó, L. (2016). Discovering utterance fragment boundaries in small unsegmented texts. In Takács, A., Varga, V., & Vincze, V. (Eds.), XII. Magyar Számítógépes Nyelvészeti Konferencia [12th Hungarian Computational Linguistics Conference] (pp. 273281). Szeged: University of Szeged. Online: <>.
Drienkó, L. (2017a). Largest chunks as short text segmentation strategy: a cross-linguistic study. In Wallington, A., Foltz, A., & Ryan, J., (Eds.), Selected papers from the 6th UK Cognitive Linguistics Conference, (pp. 273292).
Drienkó, L. (2017b). Syllable-based largest-chunk segmentation. Poster presentation for the Linguistics Beyond and Within (LingBaW) Conference, 18–19 October 2017, Lublin, Poland.
Eimas, P. D. (1997). Infant speech perception: processing characteristics, representational units, and the learning of words. In Goldstone, Robert L., Scyhns, Phillipe G., & Medin, Douglas L. (Eds.), The psychology of learning and motivation, vol. 36 (pp. 127169). London: Academic Press.
Harris, Z. S. (1955). From phoneme to morpheme. Language, 31, 190222.
Johnson, E. K., Seidl, A., & Tyler, M. D. (2014). The edge factor in early word segmentation: utterance-level prosody enables word form extraction by 6-month-olds. PLoS ONE, 9(1), e83546. Online: <>.
Jusczyk, P. W., Friederici, A. D., Wessels, J. M. I., Svenkerud, V. Y., & Jusczyk, A. M. (1993). Infants’ sensitivity to the sound patterns of native language words. Journal of Memory and Language, 32(3), 402420.
Liberman, I. Y., Shankweiler, D., Fischer, F. W., & Carter, B. (1974). Explicit syllable and phoneme segmentation in the young child. Journal of Experimental Child Psychology, 18(2), 201212.
Livingstone, J. (2014). Do syllables exist? The Guardian 25 June. Online:<>.
MacWhinney, B. (2000). The CHILDES Project: tools for analyzing talk (3rd ed.) (Vol. 2): The database. Mahwah, NJ: Lawrence Erlbaum Associates.
Mattys, S. L, White, L., & Melhorn, J. F. (2005). Integration of multiple speech segmentation cues: a hierarchical framework. Journal of Experimental Psychology: General, 134(4), 477500.
Mehler, J., Dupoux, E., & Segui, J. (1990). Constraining models of lexical access: the onsetof word recognition. In Altmann, G. T. M. (Ed.), Cognitive models of speech processing (pp. 236262). Cambridge, MA: MIT Press.
Montes, R. G. (1987). Secuencias de clarificación en conversaciones con ninos. Morphe 3/4, Universidad Autónoma de Puebla.
Montes, R. G. (1992). Achieving understanding: repair mechanisms in mother–child conversations. Unpublished doctoral dissertation, Georgetown University.
Peters, A. (1983). The units of language acquisition. Cambridge: Cambridge University Press.
Réger, Z. (1986). The functions of imitation in child language. Applied Psycholinguistics, 7, 323352.
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 19261928.
Seidl, A., & Johnson, E. K. (2006). Infant word segmentation revisited: edge alignment facilitates target extraction. Developmental Science, 9, 565573.
Sonderegger, M. (2008). Infant word segmentation: a basic review. Online:<∼morgan/segReview.pdf>.
Swift, J. (n.d.). Gulliver’s Travels. The Project Gutenberg eBook. Online:<>.
Tardif, T. (1993). Adult-to-child speech and language acquisition in Mandarin Chinese. Unpublished doctoral dissertation, Yale University.
Tardif, T. (1996). Nouns are not always learned before verbs: evidence from Mandarin speakers’ early vocabularies. Developmental Psychology, 32, 492504.
Theakston, A. L., Lieven, E. V., Pine, J. M., & Rowland, C. F. (2001). The role of performance limitations in the acquisition of verb-argument structure: an alternative account. Journal of Child Language, 28(1), 127152.
Thiessen, E. D., & Saffran, J. R. (2007) Learning to learn: infants’ acquisition of stress-based strategies for word segmentation. Language Learning and Development, 3(1), 73100.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Language and Cognition
  • ISSN: 1866-9808
  • EISSN: 1866-9859
  • URL: /core/journals/language-and-cognition
Please enter your name
Please enter a valid email address
Who would you like to send this to? *



Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed