Hostname: page-component-89b8bd64d-5bvrz Total loading time: 0 Render date: 2026-05-08T22:42:51.604Z Has data issue: false hasContentIssue false

Phonological features emerge substance-freely from the phonetics and the morphology

Published online by Cambridge University Press:  21 November 2022

Paul Boersma*
Affiliation:
University of Amsterdam, Amsterdam, The Netherlands
Kateřina Chládková*
Affiliation:
Charles University, Prague, Czech Republic Institute of Psychology, Czech Academy of Sciences
Titia Benders*
Affiliation:
Macquarie University, Sydney, Australia
Rights & Permissions [Opens in a new window]

Abstract

Theories of phonology claim variously that phonological elements are either innate or emergent, and either substance-full or substance-free. A hitherto underdeveloped source of evidence for choosing between the four possible combinations of these claims lies in showing precisely how a child can acquire phonological elements. This article presents computer simulations that showcase a learning algorithm with which the learner creates phonological elements from a large number of sound–meaning pairs. In the course of language acquisition, phonological features gradually emerge both bottom-up and top-down, that is, both from the phonetic input (i.e., sound) and from the semantic or morphological input (i.e., structured meaning). In our computer simulations, the child's phonological features end up with emerged links to sounds (phonetic substance) as well as with emerged links to meanings (semantic substance), without containing either phonetic or semantic substance. These simulations therefore show that emergent substance-free phonological features are learnable. In the absence of learning algorithms for linking innate features to the language-specific variable phonetic reality, as well as the absence of learning algorithms for substance-full emergence, these results provide a new type of support for theories of phonology in which features are emergent and substance-free.

Résumé

Résumé

Les théories de la phonologie varient selon qu'elles affirment que les éléments phonologiques sont innés ou émergents, et selon qu'elles affirment que ces éléments phonologiques sont porteurs ou non de substance. Une source de preuves jusqu'ici sous-développée, pour choisir entre les quatre combinaisons d'affirmations, réside dans le fait de montrer précisément comment un enfant peut acquérir des éléments phonologiques. Cet article présente des simulations informatiques qui mettent en évidence un algorithme d'apprentissage grâce auquel l'apprenant crée des éléments phonologiques à partir d'un grand nombre de paires son/sens. Au cours de l'acquisition du langage, les traits phonologiques émergent progressivement à la fois de bas en haut et de haut en bas, c'est-à-dire à partir de l'entrée phonétique (c'est-à-dire le son) et de l'entrée sémantique ou morphologique (c'est-à-dire le sens structuré). Dans nos simulations informatiques, les traits phonologiques de l'enfant finissent par avoir des liens émergents avec les sons (substance phonétique) ainsi que des liens émergents avec les significations (substance sémantique), sans contenir ni substance phonétique ni substance sémantique. Ces simulations montrent donc que les traits phonologiques émergents sans substance sont apprenables. En l'absence d'algorithmes d'apprentissage permettant de relier les traits innés à la réalité phonétique variable en fonction de la langue, ainsi que d'algorithmes d'apprentissage pour l'émergence des traits avec substance, ces résultats fournissent un nouveau type de soutien aux théories de la phonologie dans lesquelles les traits sont émergents et sans substance.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © Canadian Linguistic Association/Association canadienne de linguistique 2022
Figure 0

Figure 1: A schematic representation of the three modules and their two interfaces. Information can flow from meaning to sound (production), from sound to meaning (comprehension), or from sound–meaning pairs inwards (acquisition). The number of nodes and levels within each module is chosen arbitrarily for the purpose of this illustration.

Figure 1

Table 1: The meanings (morphemes) of the five possible words

Figure 2

Table 2: Potential adult phonological representations of the five possible words (inaccessible to the child)

Figure 3

Table 3: The sounds of the five possible words

Figure 4

Figure 2: One hundred randomly generated intended ambient utterances and their auditory realizations

Figure 5

Figure 3: Auditory input representations for a learner who listens to the five possible ambient utterances, produced with average F1 and F2 values. The (mostly big) dark red disks depict positive activity (the bigger the disk, the larger the activity, with a fully filled circle depicting an activity of +5), whereas the (usually smaller) light blue disks depict negative activity.

Figure 6

Figure 4: Typical auditory input representations for a learner who listens to the five possible ambient utterances, each repeated three times with random variation in F1 and F2 (any visible correlations or anticorrelations between F1 and F2 values are purely coincidental).

Figure 7

Figure 5: Semantic input representations for a learner who listens to the five possible ambient utterances.

Figure 8

Figure 6: Network structure. In this figure and many others, the numbers from 4 to 28 are basilar frequencies expressed in ERB; they apply only to the bottom left row of nodes.

Figure 9

Figure 7: The utterances from Figure 2 as they come to a sound-only learner.

Figure 10

Figure 8: The overt realizations of one thousand random utterances.

Figure 11

Figure 9: The initial state of a network that can learn from a distribution of sounds.

Figure 12

Figure 10: The development of a network that is learning from a distribution of sounds.

Figure 13

Figure 11a: The classification of five instances of an intended ambient a.

Figure 14

Figure 11e: The classification of five instances of an intended ambient e.

Figure 15

Figure 11i: The classification of five instances of an intended ambient i.

Figure 16

Figure 11o: The classification of five instances of an intended ambient o.

Figure 17

Figure 11u: The classification of five instances of an intended ambient u.

Figure 18

Figure 12: The perceptual magnet effect at work, after 3000 sound data.

Figure 19

Figure 13: Loss of the perceptual magnet effect, after 10,000 sound data.

Figure 20

Table 4: Phonological similarities between the standard forms of the five utterances in comprehension after sound-only learning (in percent)

Figure 21

Table 5: Phonological similarities between the standard forms of the five utterances in comprehension after sound-only learnings, averaged over 100 learners (in percent)

Figure 22

Figure 14: Initial state of learning from meaning alone.

Figure 23

Figure 15: The development of a network that is learning from meaning alone.

Figure 24

Figure 16: Meaning-only production of the utterances a, e, i, o and u.

Figure 25

Table 6: Phonological similarities between the five utterances in production after meaning-only learning (in percent)

Figure 26

Table 7: Phonological similarities between the five utterances in production after meaning-only learning, averaged over 100 learners (in percent)

Figure 27

Figure 17: Initial state of learning from both sound and meaning.

Figure 28

Figure 18: The development of a network that is learning from both sound and meaning.

Figure 29

Figure 19: Production of the utterances a, e, i, o and u.

Figure 30

Figure 20: Variable production of the utterance a, caused by Bernoulli noise.

Figure 31

Table 8: Phonological similarities between the five utterances in production after sound–meaning learning (in percent).

Figure 32

Table 9: Phonological similarities between the five utterances in production after sound:meaning learning, averaged over 100 learners (in percent)

Figure 33

Figure 21: Comprehension of random tokens of the utterances a, e, i, o and u.

Figure 34

Figure 22: Scanning through the front vowels after learning from 3000 pieces of data: strong perceptual magnet effect and effective categorization.

Figure 35

Figure 23: Scanning through the front vowels after learning from 10,000 pieces of data: the perceptual magnet effect has decreased, but semantic classification is still entirely appropriate.

Figure 36

Table 10: Phonological similarities between the standard forms of the five utterances in comprehension after sound–meaning learning (in percent)

Figure 37

Table 11: Phonological similarities between the standard forms of the five utterances in comprehension after sound–meaning learning, averaged over 100 learners (in percent)

Figure 38

Table 12: Phonological similarities between the five utterances in production after sound–meaning learning, averaged over 100 learners of an anti-correlating language (in percent)

Figure 39

Figure 24: Grammar model of Bidirectional Phonology and Phonetics.