Hostname: page-component-6766d58669-zlvph Total loading time: 0 Render date: 2026-05-16T07:14:30.286Z Has data issue: false hasContentIssue false

Simulating procedural discovery in early language acquisition: Domain-general cognition with contextual learning

Published online by Cambridge University Press:  22 July 2025

Yang Ji*
Affiliation:
Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen , Groningen, The Netherlands
Jacolien van Rij
Affiliation:
Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen , Groningen, The Netherlands
Niels Taatgen
Affiliation:
Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen , Groningen, The Netherlands
*
Corresponding author: Yang Ji; Email: y.ji@rug.nl
Rights & Permissions [Opens in a new window]

Abstract

We present a simulation study based on a cognitive architecture that unifies various early language acquisition phenomena in laboratory and naturalistic settings. The model adaptively learns procedures through trial-and-error using general-purpose operators, guided by learned contextual associations to optimise future performance. For laboratory-based studies, simulated preferential focusing explains the delayed behavioural onset of statistical learning and the possible age-related decrease in algebraic processing. These findings suggest a link to continuous, implicit learning rather than explicit strategy acquisition. Moreover, procedures are not static but can evolve over time, and multiple plausible procedures may emerge for a given task. Besides, the same model provides a proof-of-concept for word-level phonological learning from naturalistic infant-directed speech, demonstrating how age-related processing efficiency may influence learning trajectories implicated in typical and atypical early language development. Furthermore, the artile discusses the broader implications for modelling other aspects of real-world language acquisition.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. The PRIMs architecture. The PRIMs architecture, following its predecessors (e.g., SOAR and ACT-R), assumes that cognition can be decomposed into a set of specialised modules that are connected through a central workspace (sometimes called the global workspace, see Anderson, 2007; Dehaene et al., 1998; Taatgen, 2013). Each module projects onto a so-called buffer to communicate information via the central system to other modules. Within this structure, the primitive operators involve either comparing the available contents between two buffer slots (indicated by a double arrow) or encoding contents from one buffer slot to another empty buffer slot (indicated by a single arrow).

Figure 1

Figure 2. The processing of an algebraic pattern. The upper panel provides an overview of the processing steps within a cognitive architecture. The lower panel compares traditional cognitive architecture with PRIMs.

Figure 2

Figure 3. Alternative processing steps for the same algebraic pattern. Note: the current model assumes that working memory encoding occurs spontaneously with long-term memory retrieval. Therefore, exogenous working memory encoding operators 1, 2, and 3 are automatically followed by a retrieval request that endogenously encodes content from long-term storage into the retrieval buffer slots.

Figure 3

Figure 4. Simulated preferential focusing dynamics regarding statistical learning tasks. (a) Averaged processing time under test conditions. Calculated as the sum of on-task operator latencies during the test phase. Note. y-axis: average processing time (in ms); x-axis: test conditions; white bars: consistent conditions; various gray bars: inconsistent conditions; error bars: $ \pm $1SD; orange dots: data points from individual model runs. (b) Averaged operator efficiency under test conditions. Calculated based on the average context-based latency (excluding fixed default action time) of all operators across model runs. Note. y-axis: average latency (in ms); x-axis: test conditions; white bars: consistent conditions; various gray bars: inconsistent conditions; error bars: standard deviations; orange dots: data points from individual model runs. The significance of regression coefficients is denoted by brackets and indicators (sig., $ p $ < 0.001). Note that the overarching brackets denote the main effect of model efficiency.

Figure 4

Figure 5. The averaged proportion of procedures applied by the model in the statistical learning task across model runs. The n-gram procedures detect differences between the working memory and the retrieved pattern at the n + 1th position. The first repetition procedures detect repetition of input syllables compared to an already encoded working memory slot (slot 1) at the 3rd (orange diamond) and 4th (green square) syllable positions, respectively. Note. y-axis: averaged trial proportion of each procedure (within each block); training-phase x-axis (A1/B1): 20 blocks each consisting of 10 trisyllabic patterns; test-phase x-axis (A2/B2): test conditions; error band/bars: $ \pm $1SD; transparent dots: data points from individual model runs. The sum of the trial proportions is not equal to 1, as the model may not use any procedure or may use more than one procedure in a trial.

Figure 5

Figure 6. Simulated preferential focusing dynamics regarding algebraic tasks. (a) Averaged processing time under test conditions. Calculated as the sum of on-task operator latencies during the test phase. Note: y-axis: average processing time (in ms); x-axis: test conditions; white bars: consistent conditions; various gray bars: inconsistent conditions; error bars: $ \pm $1 SD; orange dots: data points from individual model runs. (b) Averaged operator efficiency under test conditions. Calculated based on the average context-based latency (excluding fixed default action time) of all operators across model runs. Note: y-axis: average latency (in ms); x-axis: test conditions; white bars: consistent conditions; various gray bars: inconsistent conditions; error bars: standard deviations; orange dots: data points from individual model runs. The significance of regression coefficients is denoted by brackets and indicators (sig., $ p $ < 0.001). Note that the overarching brackets denote the main effect of model efficiency.

Figure 6

Figure 7. The averaged proportion of procedures applied by the model after training a-b-a across model runs. The first two repetition procedures detect repetition of input syllables compared to a already encoded working memory slot (slot 1) at the 2nd (purple cross) and 3rd (orange diamond) syllable positions, respectively. Repetition at the 2nd position is due to the omission of the middle token d in c-d-c. Another repetition procedure detects a match between the input and a different encoded working memory slot (slot 2) also at the 3rd syllable position (blue dot). Alternatively, the 1-gram procedure detects differences between the working memory pattern and the retrieved pattern immediately at the 2nd position (pink hourglass). Note. y-axis: averaged trial proportion of each procedure (within each block); training-phase x-axis (A1/B1): 10 blocks each consisting of 10 trisyllabic patterns; test-phase x-axis (A2/B2): test conditions; error band/bars: $ \pm $1 SD; transparent dots: data points from individual model runs. The sum of the trial proportions is not equal to 1, as the model may not use any procedure or may use more than one procedure in a trial.

Figure 7

Figure 8. The averaged proportion of procedures applied by the model after training a-b-b across model runs. The first two repetition procedures detect repetition of input syllables compared to an already encoded working memory slot (slot 1) at the second (purple cross) and third (orange diamond) syllable positions, respectively. Repetition at the second position is due to the omission of the middle token d in c-d-c. Another repetition procedure detects a match between the input and a different encoded working memory slot (slot 2) also at the third syllable position (blue dot). Alternatively, the 1-gram procedure detects differences between the working memory pattern and the retrieved pattern immediately at the second position (pink hourglass). Note. y-axis: averaged trial proportion of each procedure (within each block); training-phase x-axis (A1/B1): 10 blocks each consisting of 10 trisyllabic patterns; test-phase x-axis (A2/B2): test conditions; error band/bars: $ \pm $1 SD; transparent dots: data points from individual model runs. The sum of the trial proportions is not equal to 1, as the model may not use any procedure or may use more than one procedure in a trial.

Figure 8

Figure 9. Learning outcomes and trajectories of word-level phonological patterns. (a) Acquired activation of word-level phonological patterns at different model efficiency levels at the end of the 50th block. Note: y-axis: activation of phonological patterns (i.e., chunk activation in ACT-R); x-axis: phoneme length of word-level phonological patterns. The color coding from blue to orange indicates increasing efficiency from 60 to 300 ms (i.e., default action time). Each mini-dot represents a word-level phonological pattern. The means and standard deviations are highlighted by the larger dots and error bars. (b) The trajectory of word-level phonological patterns over the 50 blocks at different model efficiency levels. Note: y-axis: activation of phonological patterns (i.e., chunk activation in ACT-R); x-axis: 50 blocks; color coding as above. The dots represent the averaged phonological activation across the blocks. They are either averaged from the shared patterns within the high (60–100 ms) or low (200–300 ms) efficiency range.

Supplementary material: File

Ji et al. supplementary material

Ji et al. supplementary material
Download Ji et al. supplementary material(File)
File 665.2 KB