Hostname: page-component-89b8bd64d-mmrw7 Total loading time: 0 Render date: 2026-05-07T14:21:11.481Z Has data issue: false hasContentIssue false

A learning perspective on the emergence of abstractions: the curious case of phone(me)s

Published online by Cambridge University Press:  03 May 2023

Petar Milin*
Affiliation:
Department of Modern Languages, University of Birmingham, Birmingham, UK
Benjamin V. Tucker
Affiliation:
Department of Linguistics, University of Alberta, Edmonton, AB, Canada Department of Communication Sciences and Disorders, Northern Arizona University, Flagstaff, AZ, USA
Dagmar Divjak
Affiliation:
Department of Modern Languages, University of Birmingham, Birmingham, UK Department of English Language and Linguistics, University of Birmingham, Birmingham, UK
*
Corresponding author: Petar Milin; Email: p.milin@bham.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

In this study, we propose an operationalization of the concept of emergence which plays a crucial role in usage-based theories of language. The abstractions linguists operate with are assumed to emerge through a process of generalization over the data language users are exposed to. Here, we use two types of computational learning algorithms that differ in how they formalize and execute generalization and, consequently, abstraction, to probe whether a type of language knowledge that resembles linguistic abstractions could emerge from exposure to raw data only. More specifically, we investigated whether a phone, undisputedly the simplest of all linguistic abstractions, could emerge from exposure to speech sounds using two computational learning processes: memory-based learning and error-correction learning (ECL). Both models were presented with a significant amount of pre-processed speech produced by one speaker. We assessed (1) the consistency or stability of what these simple models learn and (2) their ability to approximate abstract categories. Both types of models fare differently regarding these tests. We show that only ECL models can learn abstractions and that at least part of the phone inventory and its grouping into traditional types can be reliably identified from the input.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Table 1. Success rates, in percentages, of predicting the correct phone for three different learning models (WH, TD, and MBL) using three different types of input data (raw, random, and Gaussian). The leftmost column represents the probability of a given phone in the raw test sample. Random data have a uniform probability distribution of phones (100 randomly chosen occurrences) while still allowing for difference in the durations of each sampled phone. Gaussian data have a fully uniform probability distribution of phones (100 generated datapoints)

Figure 1

Figure 1. Conditional effect of learning model and type of data on success rates with respective 95% credible intervals. The dashed line represents the most probable phone (/s/) choice at 9.5%; the dotted line marks the flat chance level at 2.5%.

Figure 2

Table 2. Bayesian Kendall’s tau-b between the probability of a phone in the test sample and the combination of learning (MBL, WH, and TD) and type of data (raw, random, and Gaussian)

Figure 3

Figure 2. Dendrogram of phone clustering using Widrow–Hoff weights from training on raw data.

Figure 4

Figure 3. Dendrogram of phone clustering using memory-based probabilities from training on raw data.

Figure 5

Table 3. Mantel test results of correlations with the criterion matrix based on the phone features. Simulated p-values are based on 1,000 replicates