Computational cognitive modeling for syntactic acquisition: Approaches that integrate information from multiple places

Abstract Computational cognitive modeling is a tool we can use to evaluate theories of syntactic acquisition. Here, I review several models implementing theories that integrate information from both linguistic and non-linguistic sources to learn different types of syntactic knowledge. Some of these models additionally consider the impact of factors coming from children’s developing non-linguistic cognition. I discuss some existing child behavioral work that can inspire future model-building, and conclude by considering more specifically how to build better models of syntactic acquisition.


Introduction
About computational cognitive modeling for syntactic acquisition One tool we can use to understand how syntactic acquisition works is computational cognitive modeling.The  part refers to implementing an idea (that is, a theory) very precisely, typically using mathematical techniques that are carried out on computers.The  part refers to what the implemented ideas are about, which is some part of human cognition.The  part refers to the theory itself, which captures (i.e., ) some aspect of cognition (here: syntactic acquisition).With this tool of computational cognitive modeling, we can then make a theory about syntactic acquisition concrete enough to evaluate, because the computational cognitive model allows us to generate predictions about children's syntactic behavior that can be evaluated.That is, when we have a computational cognitive model for syntactic acquisition, we have a theory about syntactic acquisition that is implemented precisely enough to evaluate against empirical data.
Importantly, the computational cognitive model serves as a "proof of concept" for a theory.When the model generates predictions that match human behavior (e.g., children's syntactic behavior), this is proof there is at least one way the theory could explain human behaviorwhich is the way the theory was implemented in the computational cognitive model.An important limitation of computational cognitive modeling is that modeling success (or failure) can only be interpreted with respect to the specific theory implemented by the model.That is, if the model succeeds at matching human behavior, we can only interpret this success as success of that specific implementation of that acquisition theorywe have nothing to say about other implementations of this particular theory, or other theories not implemented in the model.The same is true for interpreting model failure: failure is only demonstrated for that specific theory implementation.If we want to evaluate some other theory implementation, we need to build another model and see how it does.See Pearl (2014, in press) for more detailed discussion about how to interpret computational cognitive model success (and failure).
Implementing a theory in a computational cognitive model When we have a theory of syntactic acquisition, how do we implement it in a computational cognitive model?Implementing the model involves several key aspects.First, the model needs to encode relevant prior knowledge and learning abilities the child is supposed to have at this stage of development.This knowledge and these abilities are often assumed implicitly by the acquisition theory.For instance, a syntactic acquisition theory might assume prior knowledge of individual words in the language and the ability to segment speech reliably from the input.
Second, the model needs to learn from realistic input.For instance, a model meant to capture syntactic acquisition behavior that occurs at age four should ideally learn from input that children encounter by age four.
Third, the model needs to output predictions that connect in some interpretable way to children's behavior.For instance, a model might predict if a child at age four would treat two verbs as being syntactically the same (i.e., appearing in the same syntactic contexts and having the same interpretations of their arguments).
Fourth, the model needs to encode learning, which is how the modeled child uses the information from the input to update hypotheses about syntax.Learning is typically the main component specified by the acquisition theory.For instance, a model might attend to the distribution of certain features of the input viewed as relevant (e.g., animacy of verb arguments, syntactic contexts a verb appears in), and then use probabilistic inference to group verbs together that seem similar enough with respect to those relevant features.
So, to sum up, implementing an acquisition theory in a computational cognitive model involves encoding the acquisition theory assumptions (i.e., the prior knowledge assumed, the learning abilities assumed, and how learning proceeds), learning from realistic input estimates, and generating interpretable output that can be evaluated against empirical data from children.This is an approach that the models reviewed below have taken for investigating syntactic acquisition.
Road map I will focus on computational cognitive models of syntactic acquisition that integrate information from multiple places, including both linguistic and non-linguistic sources of information.That is, the syntactic acquisition theories implemented by these models assume that syntactic learning proceeds by children attending to information from these different sources, rather than solely syntactic sources.Why discuss this kind of model?To me, these models seem more realistic because children are surrounded by many different types of information and have many different learning goals simultaneously.That is, children do not ever only learn about syntax; instead, they learn about syntax and about who is likely to give them a hug and about how to communicate their desire for more milk, among many other things.So, non-syntactic sources of information may be particularly salient in any given moment while children are learning about syntax; if these sources of information happen to be helpful for learning about syntax, then children may very well be able to harness those sources to do so.
Moreover, children are likely impacted by non-linguistic factors during acquisition.For instance, cognitive limitations on memory, attention, and executive control can affect how children perceive the information in their input, how they update their internal hypotheses, and how they generate their observable syntactic behavior.In addition, children likely rely on non-linguistic learning mechanisms to update their internal hypotheses, such as probabilistic inference.In fact, all the models of syntactic acquisition reviewed below rely on probabilistic inference, and so already incorporate this nonlinguistic component into their theories of syntactic acquisition. 1ere, as mentioned, I focus on syntactic acquisition models that also integrate information from non-syntactic sources.I should note that these are selected case studies in syntactic acquisition modeling from my own work, rather than capturing the full range of computational cognitive models that implement this type of syntactic acquisition theory.I first review three case studies, whose acquisition theories incorporate conceptual information such as the animacy of an event participant, participant event roles more generally, and components of lexical meaning.Some of these theories additionally incorporate non-linguistic cognitive limitations affecting both input perception and hypothesis updating by implementing the impact of those limitations on input perception and hypothesis updating.I note that these theories are agnostic as to the specific source of the cognitive limitations (e.g., whether the source of the limitations is developing knowledge, developing learning abilities, or something else); instead, the practical impact of the cognitive limitations on the acquisition process is what the model captures.These case studies involve the acquisition of syntactic knowledge about linking theories, the passive, and pronoun interpretation.
I then briefly review some existing child behavioral work that we can take inspiration from when it comes to building better computational cognitive models of syntactic acquisition.I also discuss more specifically how we can think about building better models, and how we can incorporate the insights from both the behavioral work reviewed and current modeling work.I conclude with a few other ideas for building better models of syntactic acquisition in the future.

Some modeling case studies in syntactic acquisition
For each of the modeling case studies below, I first describe the syntactic knowledge children are trying to acquire.I then describe the relevant aspects of the acquisition theory implemented in the computational cognitive model, including the prior theories the implemented theory builds on, which information sources are used, the form the information sources take, and how those sources are used to update the modeled child's hypotheses.I explicitly highlight which information sources are non-syntactic, as relevant.I also describe the input to the model, how the model's output is evaluated against empirical data from children's behavior, and what we learned by using modeling this way.

Linking theories
The syntactic knowledge One type of syntactic knowledge is how to interpret a verb's arguments in context.For instance, consider this sentence: The little girl blicked the kitten on the stairs.Even if we do not know what blick means, we still prefer to interpret this sentence as the little girl doing something (blicking) to the kitten, and that event happening on the stairs.The reason we as adults prefer this interpretation is because we have linking theories that link the thematic roles specified by a verb's lexical semantics (e.g., , , ) to the syntactic argument positions specified by that verb's syntactic frame (e.g., subject, direct object, object of a preposition).Moreover, our linking theories are so welldeveloped that they can impose these links even when we do not know a verb's specific lexical semantics (like here with blick).
Verbs can be grouped together into classes where the verbs in a class behave the same way with respect to the links between syntactic positions and thematic roles.That is, solving the linking problem (i.e., acquiring linking theories for the verbs of the language) involves learning how to link syntactic positions and thematic roles for different verbs; verb classes are collections of verbs that behave the same way for linking.For example, verbs with "subject-raising" behavior like appear and seem allow their subject to not have a thematic role.So, in Lindy seemed/appeared to hug the kitten, Lindy is not a "seemer" or an "appearer", but rather a kitten-hugger.As another example, verbs with "unaccusative" behavior like fall and break have a  in the subject position.So, in The toy kitten fell/broke, falling or breaking is happening to the toy kitten.As a third example, verbs with passivizable behavior like hug and break allow their subject to be a  in the passive construction, while verbs like appear, seem, and fall do not.That is, The toy kitten was hugged/broken by Lindy, with hugging or breaking happening to the toy kitten, is acceptable.In contrast, The toy kitten was seemed/appeared/fallen by Lindy, with seeming, appearing, or falling happening to the toy kitten, is not acceptable.
These examples demonstrate that a verb class can involve many linking behaviors.Here, one verb class involving fall might be characterized as þunaccusative and -passivizable; another verb class involving break might be characterized as þunaccusative and þpassivizable; a third verb class involving seem and appear might be characterized as þsubject-raising and -passivizable.To learn what verbs belong together in a class, children must implicitly develop the linking theory for that verb class.This is why acquiring verb classes can be used as a measure of linking theory development.In short, if a child (and therefore a modeled child) can cluster verbs together into classes that behave the same linking-wise, then the child (real or modeled) can be said to have developed the relevant linking theory knowledge that leads to those verb classes.
The acquisition theory implemented in the model Pearl and Sprouse (2019) proposed that children can cluster verbs into appropriate verb classes by paying attention to several pieces of information associated with verbs in their input: argument animacy, syntactic context, and link distribution.This verb information has been proposed by prior theories as (potentially) relevant (e.g., Becker, 2009Becker, , 2014Becker, , 2015;;Becker & Estigarribia, 2013;Fisher, Gertner, Scott & Yuan, 2010;Gillette, Gleitman, Gleitman & Lederer, 1999;Gleitman, 1990;Gutman, Dautriche, Crabbé & Christophe, 2015;Harrigan, Hacquard & Lidz, 2016;Hartshorne, Pogue & Snedeker, 2015b;Kirby, 2009aKirby, , 2009b;;Landau & Gleitman, 1985;Levin, 1993;Scott & Fisher, 2009).To see a concrete example of each information type, consider two of the utterances involving break from our examples: the unaccusative The toy kitten broke and the passive The toy kitten was broken by Lindy.First, the animacy of the verb's arguments matters.For instance, a child would notice that The toy kitten is inanimate.Second, the syntactic contexts that a verb appears in matter.So, a child would notice that break appeared in an unaccusative context of the form Noun-Phrase Verb and a passive context Noun-Phrase was Verb Preposition Noun-Phrase.Third, the distribution of links between thematic roles and syntactic positions matters.Here, a child would notice that break has the following links in the two utterances above: two instances of P in subject position (from The toy kitten in both utterances) and one instance of A in the prepositional phrase position (from Lindy in the passive utterance).
Pearl and Sprouse made the idealizing assumption that children would have enough prior knowledge and sufficient learning abilities to accurately extract this information from any particular verb use they encountered.This assumption can be relaxed in future work (i.e., we can assume that children do not accurately extract information due to immature knowledge, immature learning abilities, or cognitive limitations more generally).However, this assumption of accurate extraction provides a simple starting point for theory evaluation via computational cognitive modeling, in the absence of a particular theory about how children may inaccurately extract information.
So, with this information extracted from the input2 , children would then create verb classes by using Bayesian inference, a type of probabilistic learning shown to accord with a variety of developmental patterns across cognition (see Pearl, 2021 for a brief review).When using Bayesian inference, a learner updates hypotheses by balancing prior knowledge or biases against fit to the observed data.For learning verb classes, Pearl and Sprouse (2019) built in a standard type of prior knowledge for learning classes of any kind, which is that fewer classes are preferred.The fit to the observed data is about the child's input: here, if the modeled child assumes a certain set of verb classes, is the information observed in the input about argument animacy, syntactic context, and link distribution more probable?A verb class hypothesis that causes the observed information to be more probable is a better fit than a hypothesis that causes the observed information to be less probable.
To better understand this idea of a hypothesis fitting the observed data, consider two verb class hypotheses involving seem and appear.The first hypothesis H 1 puts each verb in its own verb class ( H 1 : class 1 = {appear}, class 2 = {seem}); the second hypothesis H 2 puts both verbs together into one verb class ( H 2 : class 1 = {appear, seem}).Suppose the observed information the modeled child learns from comes from this utterance: Lindy appeared to be sad, but then she seemed to be happy.
In this utterance, the information from argument animacy, syntactic contexts, and link distributions is the same for appear and seem.Hypothesis H 1 , which separates these verbs into different verb classes, views this similarity as a coincidencesimilar verb behavior is not expected if verbs are in different classes.In contrast, hypothesis H 2 , which puts these verbs into the same verb class, expects this similarity in verb behavior precisely because the verbs are in the same verb class.When a hypothesis's expectations are met, it will find the observed information to be more probable and therefore be a better fit.So, H 2 will find the observed information to be more probable, and a modeled learner relying on Bayesian inference will prefer H 2 over H 1 as a better fit for the observed information.

Information integrated
The acquisition theory implemented in the model involves integrating several types of information: (i) animacy (non-linguistic), (ii) syntactic contexts (syntactic), and (iii) links between thematic roles (semantic) and syntactic positions (syntactic).These information sources are combined using the non-linguistic learning mechanism of Bayesian inference.

Model input
To generate predictions about verb classes that English-learning children would have, the model learned from verb uses in English child-directed speech samples.Pearl and Sprouse estimated how many verb uses children at different ages (three, four, and five) would encounter, and implemented models that learned from these same quantities.So, for instance, the three-year-old modeled child learned from the amount of verb uses a threeyear-old English-learning child would encounter, distributed according to the samples of speech directed to English-learning children up to age three.

Model output and evaluation
To evaluate a modeled child, Pearl and Sprouse compared the verb classes predicted by the modeled child against verb classes that children of the appropriate age seem to have.More specifically, Pearl and Sprouse used 12 types of syntactic or interpretation behavior surveyed from a large collection of child behavioral studies in order to identify verb classes that three-, four-, and five-year-old English children have.These behaviors included subject-raising, unaccusative, and passivizable, among others.From these verb behaviors at ages three to five, Pearl and Sprouse derived age-specific verb classes that a modeled child should attempt to match when it learns from the same data that three-, four-, or fiveyear-olds learn from.In particular, verbs in the same class are treated the same by children of that age (i.e., the verbs either have or do not have a specific syntactic or interpretation behavior, such as being passivizable).So, the modeled child of that age should cluster those verbs together if it has learned the way children of that age learn.
Pearl and Sprouse found that their modeled three-, four-, and five-year-olds were able to generate verb classes that matched English-learning children's verb classes fairly well.

What we learned
The model's success at matching available empirical data from children supports the acquisition theory implemented in the model, and suggests that children may indeed be learning from these different information types when developing the linking theory knowledge that leads to their observable verb classes.More specifically, the way English-learning children cluster verbs together during syntactic acquisition aligns with them learning not just from syntactic information (e.g., syntactic contexts), but also from non-syntactic information (e.g., animacy and thematic roles).

Passives
The syntactic knowledge As mentioned above, the passive structure in English allows the subject to be a .For instance, in The toy kitten was broken by Lindy, the The toy kitten is the one being broken.So, this sentence seems to have a structure more like The toy kitten was broken _The toy kitten by Lindy, where _The toy kitten marks the position where The toy kitten is understood (as the object of break).
Children then need to learn that this interpretation is possible, which involves understanding where the element in the subject position is understood (in this case, a position where it can serve as P).Importantly, not all verbs passivize: recall that The toy kitten was fallen is not acceptable to English speakers (i.e., fall doesn't passivize).So, a key learning problem is to learn which verbs in English can passivize (i.e., which verbs allow the passive structure and related interpretation with the subject as P).
Interestingly, there seems to be significant variation in English for when children realize certain verbs are passivizable.Some verbs, such as hug, are recognized as young as age three while others, such as love, appear delayed till after age five.Moreover, verb meaning (i.e., the lexical semantics) seems to matter.For instance, hug is an observable action, and love is not; love is a "psych subject-experiencer" verb where the subject experiences the psychological state described (love), while hug is not a psychological verb at all.These and other lexical semantic features have been proposed to impact when English-learning children learn that specific verbs are passivizable (see Nguyen & Pearl, 2021 for a review of the acquisition trajectory and proposed lexical semantic features.) In addition, the syntactic feature of transitivity has been proposed as a key indicator that a verb is likely passivizable in English (Levin, 1993).A transitive syntactic context has a subject and direct object, as in Lindy broke the toy kitten, with Lindy as the subject and the toy kitten as the direct object.So, verbs that allow a transitive context, like break, are likely to be passivizable in English.
The acquisition theory implemented in the model Nguyen and Pearl (2019) proposed that children decide whether a verb is passivizable on the basis of two things.First, children consider several of the verb's lexical semantic features (like being observable or a psych subject-experiencer verb) and potentially the syntactic feature of transitivity, as proposed by prior acquisition theories (Liter, Huelskamp, Weerakoon & Munn, 2015;Maratsos, Fox, Becker & Chalkley, 1985;Pinker, Lebeaux & Frost, 1987;Levin, 1993;Messenger, Branigan, McLean & Sorace, 2012).Second, children consider how often verbs with those features are passivized in their input.Information about a verb's features are integrated via Bayesian inference.
As with the Pearl and Sprouse model, Nguyen and Pearl made the idealizing assumption that children would have enough prior knowledge and sufficient learning abilities to accurately extract this information from any particular verb use they encountered.As mentioned before, this assumption of accurate extraction provides a simple starting point for theory evaluation via cognitive modeling, in the absence of a particular theory about how children may inaccurately extract information.
As before, Bayesian inference balances prior knowledge or biases against fit to the observed data.Here, the prior captures how easy (or difficult) it is for children to deploy their knowledge of the passive in the moment, which can be impacted by immature cognitive development.That is, even if a child knows a specific verb is passivizable, she might not be able to access the passive structure appropriately in the moment after hearing the verb in the passive.So, she might not use her syntactic knowledge of the passive structure for that verb instance.
The fit to the observed data is again about the child's input.In particular, the modeled child assumes passivization is based on a verb's features and the frequencies of those features in passive forms.Is the information observed in the input about how often verbs with certain features passivize more or less probable?If the verbs in the input are more probable, then there is a good fit to the observed data.
Importantly, the modeled child can heed or ignore any given feature when deciding if a particular verb is passivizable.So, for instance, a five-year-old might ignore whether a verb is an observable action, and instead key into whether it encodes a psychological state.The acquisition theory implemented in the model of Nguyen and Pearl explored theories of selective learning for the English passive (i.e., selectively ignoring available information when deciding if a verb is passivizable).

Information integrated
The information integrated via Bayesian inference is the selected features of a verb (syntactic and lexical semantic), whatever those happen to be.Notably, these features will be the ones children attend to for all the verbs of the language (rather than a feature set for each verb or type of verb).So, the acquisition theory assumes both syntactic and nonsyntactic information is relevant.These information sources are then combined using the non-linguistic learning mechanism of Bayesian inference.

Model input
The model learned from verb uses in English child-directed speech samples, both passive uses like The toy kitten was broken and active uses like The toy kitten broke.

Model output and evaluation
To evaluate a modeled learner attending to some set of features, Nguyen and Pearl looked at the age when children have been observed to correctly interpret or produce the passive of a verb more than half the time in previous child behavioral experiments.They called this age the    (AoA) for the passive of that verb, and Nguyen and Pearl used the AoA of 30 verbs as a model target.They focused on age five, and therefore split the 30 verbs into verbs whose AoA was five or younger versus verbs whose AoA was older.
The modeled learner predicts a specific verb is either passivizable or not at a certain age, on the basis of its input.So, the modeled five-year-old learned from the distribution of verb input that English-learning five-year-olds encounter and predicted which verbs would be passivizable.Nguyen and Pearl found that a modeled five-year-old who ignored many of the available features was able to match the behavior of English-learning fiveyear-olds, and passivize the subset of verbs whose AoA was five or younger.This modeled child instead focused on the syntactic feature of transitivity and a single lexical semantic feature. 3hat we learned These modeling results suggest that English five-year-old passivization behavior can be captured if five-year-olds selectively attend to these syntactic and lexical semantic features in their input.

Pronoun interpretation
The syntactic knowledge Consider this English sentence: Lisa sang to the triplets and then P took a nap.How we interpret P depends on several factors.One is agreement information: If the pronoun is the singular she, we look for a singular antecedent like Lisa; if the pronoun is the plural they, we look for a plural antecedent like the triplets.Another factor is our discourse-level knowledge about the lexical items that connect the two clauses together, such as and then.In languages like Spanish, the equivalent to and then biases the interpretation towards the subject Lisa rather than the object the triplets.Another factor in languages like Spanish is whether the pronoun is overt (i.e., pronounced) or not.Spanish is a language that allows the pronoun not to be pronounced; when it is not pronounced, the subject (e.g., Lisa) tends to be favored as the pronoun's antecedent (see Pearl and Forsythe, 2022 for a brief overview of these factors in pronoun interpretation).Children need to learn how to interpret pronouns of their language in context, taking these factors (and others) into account the way adult speakers of their language do.
Pearl and Forsythe considered two options for how accurately children extract this information from their input.One option was that the modeled child has enough prior knowledge and sufficient learning abilities to accurately extract this information, similar to the two models discussed before.The other option was that the modeled child does not, and in fact would inaccurately represent this information (for whatever reason: immature knowledge, immature learning abilities, and/or cognitive limitations more generally).More specifically, the modeled child would skew the probability distributions observed in the input about these information sources (e.g., how often singular agreement information occurs when the pronoun's antecedent is singular).In particular, a modeled child with inaccurate representations of the information in the input could flatten a distribution (e.g., turning a 30/70 distribution into a 40/60 distribution) or sharpen a distribution (e.g., turning a 30/70 distribution into a 20/80 distribution).
As before, Bayesian inference balances prior knowledge or biases against fit to the observed data.Here, the prior encodes how often a pronoun preferred a particular antecedent in children's input, irrespective of any other useful information about how to interpret that pronoun.The fit to the observed data is about how often each information type occurs in children's input when a pronoun has a particular interpretation.If certain information (e.g., singular agreement information) almost always occurs when a pronoun's antecedent is interpreted a certain way (e.g., a singular antecedent), then using that highly-reliable information to interpret the pronoun is a good fit.
Pearl and Forsythe also considered two options for how accurately children perform this inference in the moment of deciding a pronoun's interpretation.One option was that the modeled child would use all the information sources when performing the Bayesian inference calculation.The other option was that the modeled child would ignore one or more information sources when performing that inference calculation (for whatever reason: immature knowledge, immature learning abilities, and/or cognitive limitations more generally).
So, to sum up, Pearl and Forsythe modeled two types of children.The first type was a modeled child without cognitive limitations, able to (i) accurately extract and represent the probability distributions from the information sources in the input, and (ii) always use those represented probabilities during the Bayesian inference calculation.The second type was a modeled child with cognitive limitations (of whatever kind) that affected (i) the accurate representation of information in the input, (ii) the use of all that information in the Bayesian inference calculation, or (iii) both.In particular, irrespective of the source of inaccurate information representations or inaccurate use of those representations, the modeled child could represent information inaccurately, use that information inaccurately, or both.Thus, the models of Pearl and Forsythe considered certain theories for children's pronoun interpretation behavior that involve cognitive limitations; the effect of those limitations is to impact either the representation of information from the input, the use of that information when deciding a pronoun's interpretation in context, or both.

Information integrated
The information integrated via Bayesian inference is linguistic: agreement information (morphology), the lexical connectives between clauses (lexical), and whether the pronoun is pronounced (syntactic/phonological).These information sources are then combined using the non-linguistic learning mechanism of Bayesian inference.The way the information is combined can be mediated by non-linguistic factors arising from cognitive limitations: misrepresenting the information from the input and/or not using select information during Bayesian inference.

Model input
The modeled child learned from pronoun uses in Spanish speech samples involving children.These pronoun uses involved two clauses and had the pronoun as the subject of the second clause (e.g., [Lisa sang to the triplets] clause 1 and then [P took a nap] clause 2 .)

Model output and evaluation
Pearl and Forsythe evaluated modeled children that attended to this set of linguistic features and potentially had cognitive limitations impacting information representation and/or use.The modeled children generated predictions for how to interpret pronouns that Spanish-learning children ages three to five had interpreted in different experimental contexts involving information about agreement, lexical connectives, and whether the pronoun was pronounced.
Pearl and Forysthe found that modeled three-, four-, and five-year-olds were able to best match the interpretation preferences of actual three-, four-, and five-year-olds when cognitive limitations impacting either information representation or information use (but not both) were active.That is, children's interpretation behavior could be captured by integrating information from agreement, lexical connectives, and whether the pronoun was pronounced as long as children either (i) always mis-perceived information from these sources in the input, leading to inaccurate information, or (ii) often ignored accurate information from these sources when deciding how to interpret a pronoun in the moment.Importantly, children's behavior wasn't captured as well if the modeled child had both effects (inaccurate information often ignored) or neither effect (accurate information never ignored).

What we learned
These modeling results thus offer specific explanations about how cognitive limitations (whatever their specific source happens to be) could impact children's pronoun interpretation preferences, if children rely on these linguistic information sources.Some experimental work to take inspiration from I now briefly turn to some work from child behavioral experiments that can provide inspiration for other factors we might want to consider (or consider further) for syntactic acquisition.The first set of experiments involves cognitive limitations, while the second involves knowledge about pragmatics and the world more generally.

Cognitive limitations
The model of Forsythe and Pearl highlighted one effect that cognitive limitations could have on children's acquisition (syntactic or otherwise): children have adult-like knowledge but can't deploy it effectively in the moment.Several child behavioral experiments have been interpreted as demonstrating this effect for syntactic acquisition4 , including Gerard, Lidz, Zuckerman, and Pinto (2018), Ud Deen, Bondoc, Camp, Estioca, Hwang, Shin, Takahashi, Zenker, and Zhong (2018), and Liter, Grolla, and Lidz (2022).
In Gerard et al. (2018), four-and five-year-old English-learning children were asked to interpret utterances with unpronounced subject pronouns in the second clause, like Dora washed Diego before eating a red apple.An adult-like interpretation is that Dora is the one eating a red apple, so the syntactic representation is something like this: Dora washed Diego before P Dora eating a red apple.Children were asked to interpret this kind of utterance in tasks that were either more or less cognitively-demanding.A more cognitively-demanding task might involve children having to hold additional information in mind and also evaluate whether the utterance itself is true; a less cognitivelydemanding task would involve children simply indicating their interpretation by coloring a picture of the appropriate interpretation (i.e., Dora eating the apple, rather than Diego). 5hen children had to do the more cognitively-demanding taskand so use up more cognitive resources on something besides interpreting the unpronounced pronounthey gave more non-adult-like interpretations (e.g., Diego eating the apple).In contrast, when children did the less cognitively-demanding taskand so focused more cognitive resources on interpreting the unpronounced pronounthey gave more adult-like interpretations (e.g., Dora eating the apple).One way to interpret these results is that four-and five-year-olds have adult-like knowledge of how to interpret these unpronounced pronouns, but cannot always use that knowledge in the moment when their cognitive resources are being used up by other things.This idea aligns broadly with the Forsythe and Pearl modeled children who cannot accurately use their information about pronoun interpretation in the moment.
Another example comes from Ud Deen et al. ( 2018) on children's interpretation of the passive.English-learning four-year-olds correctly interpreted passives like Elephant was surprised by Monkey more often when the utterance was simply repeated.One interpretation of this finding is that children can adjust their mistaken expectations about the thematic role associated with the subject (i.e., that Elephant is not the surprise-causer but instead the surprise-experiencer) when they hear the sentence again because they know they made a mistake the first time.That is, children can inhibit the incorrect thematic role assignment of Elephant because they know it will not be correct.However, the first time children hear the utterance, they do not know this and so they make an incorrect assignment (e.g., of Elephant as surprise-causer), which is hard for them to adjust afterwards.In other words, children have adult-like knowledge about how to interpret the passive, but cannot use it effectively when their cognitive inhibition ability is not strong enough.So, more broadly, this child behavior was interpreted as domain-general cognitive factors like immature cognitive inhibition impacting children's ability to use their knowledge of the passive.
A third example comes from Liter et al. (2022), and also involves immature cognitive inhibition, this time impacting children's production of questions involving wh-words like where.More specifically, English-learning children will sometimes produce "medial wh" questions that seem to duplicate the wh-word, with an extra copy appearing in the middle, such as Where do you think where they were walking?Liter et al. (2022) found that children's production of medial-wh questions correlated with a measure of their cognitive inhibition abilities.One way to interpret this is that children do in fact know that English does not allow medial wh, but children simply lack the cognitive control sometimes to inhibit the extra wh-word from being produced in the moment.As with the passive example above, this result highlights that acquisition theories (and therefore the computational cognitive models we build to explain children's behavior) need to consider the non-linguistic systems controlling cognitive inhibition in children.

Pragmatics and world knowledge
Other sources of information children could harness involve knowledge about how speakers use their language (i.e., pragmatic knowledge) and knowledge about the world more generally.We already have behavioral evidence that children can rely on these information sources during syntactic acquisition, such as when learning to interpret pronouns (e.g., Hartshorne et al., 2015a;Pyykkönen et al., 2010;Song & Fisher, 2005, 2007;Wykes, 1981;among others).
As one example of pragmatic knowledge with pronouns, consider the sentence Lisa sang to Lindy and then she took a nap.The pronoun she could refer to either Lisa or Lindy, but adults know that speakers like to have clauses refer to the same topic (Asher & Lascarides, 2003).This leads to a "first-mention bias", where the element first mentioned (e.g., the subject Lisa) is the topic and listeners prefer a subsequent pronoun to refer to that first-mentioned element (Crawley, Stevenson & Kleinman, 1990;Arnold, Eisenband, Brown-Schmidt & Trueswell, 2000;Järvikivi, van Gompel, Hyönä & Bertram, 2005).English-learning children ages three to five also seem to have this pragmatic knowledge, leading to a first-mention bias in a variety of contexts (Song & Fisher, 2005, 2007;Pyykkönen et al., 2010;Hartshorne et al., 2015a).
As one example of world knowledge with pronouns, consider this sentence pair: Jane needed Susan's pencil.She gave it to her.Knowledge about how the world works allows listeners to pick situationally-appropriate interpretations (e.g., Hobbs, 1979;Kertz, Rohde & Elman, 2008).Here, if Jane needs a pencil, she cannot already have one, so she cannot be the one to give a pencil away.That means that the one doing the giving (referred to by She in the second sentence) must not be Jane, and instead is probably the other mentioned person Susan.Similarly, if Jane needs a pencil, she is likely to be the one getting a pencil from someone else, i.e., the recipient of giving indicated by her.So, world knowledge allows listeners to interpret She as Susan and her as Jane.English-learning five-year-olds seem able to complete this chain of reasoning and correctly interpret the second sentence (Wykes, 1981).
These are just select examples of pragmatic and world knowledge impacting pronoun interpretation, which of course is simply one aspect of syntactic knowledge.More generally, these examples suggest that future syntactic acquisition theories (and the computational cognitive models implementing them) should consider these information sources.

Moving forward
Computational cognitive modeling is a tool that complements other techniques for investigating language development, providing insight into aspects of language acquisition that can be difficult to investigate otherwise.For instance, the models reviewed here investigated how children might learn certain syntactic knowledge from their input (verb constructions like subject-raising, unaccusatives, and passives) and why child behavior may differ from adult behavior for certain syntactic elements (pronoun interpretation).
In general, I think questions of  acquisition works and  children behave as they do are much easier to investigate with modeling.This is because the underlying factors that impact how acquisition works (and therefore why children behave as they do) can be explicitly defined and manipulated within a computational cognitive model.Such factors include how information from the input is perceived, which information is learned from, and how information is used to update internal hypotheses, as well as which hypotheses are under consideration in the first place.To me, it is not at all obvious how to control these factors (and others) with other techniques commonly used to investigate child language development, such as behavioral techniques.
With that said, informative models typically build on data collected with other techniques.Model input is based on estimates of the information children encounter in their language interactions.Model learning mechanisms are based on ideas of what abilities and learning biases children demonstrate at certain ages.Model output is based on data collected from children (or that can be collected in the future), so that the model can explain children's observed linguistic behavior.
As we move forward, a basic goal is to build "better" modelsthat is, models that capture more of the relevant aspects of the acquisition process so that we can better link children's input to their observable behavior.When we have these better models, we then have better explanationsas implemented in the modelsfor why acquisition (syntactic or otherwise) proceeds the way it does.So, how do we build better models?

Building better models
To build a computational cognitive model of language acquisition, we need to be very precise about the acquisition process the model is implementing.One concrete proposal for the relevant components of the acquisition process is in Figure 1, adapted from Pearl (in press).This proposal specifies components both external and internal to the child during the acquisition process, and is meant to capture the iterative process of acquisition unfolding over time.
External components are observable.We can observe the input signal available to children (e.g., the child language interactions they experience).For example, consider a version of our utterance from before: "Lisa sang to the triplets and then she took a power nap."The input signal is the physical signal in the world, such as auditory components like pitch, timbre, and loudness of the utterance.The input can also include other aspects of the environment, such as who said the utterance, where they said it, when they said it, and what people or objects were in the environment at the time.
We can also observe children's behavior at any stage of development, either through naturalistic productions and behavior or clever experimental designs that elicit productions or behavior.In the example utterance above, we can observe who the child thinks she refers to, Lisa or the triplets.One way to do this is to present the child with two pictures, one of Lisa napping and one of the triplets napping, and ask the child to point to the picture the utterance describes.
The internal components of the acquisition process involve several pieces.The first piece concerns the information the child is able to perceive in the input signal.In particular, perceptual encoding involves extracting information from the input signal to create the perceptual intake.Perceptual encoding draws on the child's developing knowledge and systems to extract information.For instance, in our example utterance, the child may be able to perceive syllables (e.g., /li/, /sǝ/, /sεŋ/, etc.), words (e.g., Lisa, sang, etc.), syntactic structure (e.g., [ IP Lisa [ VP sang [ PP to [ NP the triplets]]]]), pronoun interpretations (she = Lisa), as well as the event participants (Lisa, the triplets) and properties of the events described (singing, napping), among many other types of information.What children can perceive depends on what they know about their language (e.g., developing linguistic knowledge: Lisa, the triplets, and she are words), what they know about the world (e.g., developing non-linguistic knowledge: who's likely to take a power nap), and how well they can extract information of different kinds (e.g., developing linguistic systems: speech segmentation, syntactic parsing, pronoun interpretation biases; developing non-linguistic systems: memory, cognitive inhibition).Notably, extracting information from the input signal involves ignoring information present (e.g., where the utterance was spoken) and adding information not explicitly present (e.g., where the words are, how a pronoun is interpreted).What children ignore and add depends on their developing knowledge and developing systems.
The second internal piece concerns how children generate their observable behavior.For this, children rely on the information they have been able to perceptually encode (the Figure 1.Proposal for the relevant components of the acquisition process that a computational cognitive model of language acquisition should consider.External components (input and behavior) are observable.Internal components are not observable, and include perceptually encoding information from the input signal (yielding the perceptual intake), generating output from the encoded information (yielding observable behavior), and learning from the encoded information (using constraints & filters to yield the acquisitional intake, and doing inference over that intake).The developing systems and developing knowledge (both linguistic and non-linguistic) impact all internal components, while the learning component updates the developing knowledge.
perceptual intake) and their developing systems and knowledge.In particular, children apply their production systems to the perceptual intake in order to generate behavior like speaking (which relies on linguistic systems and non-linguistic systems involved in utterance generation).In our example utterance, a child might say "Lisa's the one napping".Children can also respond non-verbally (e.g., look at a picture that encodes a scene described by the utterance, which relies on non-linguistic systems like motor control, attention, and decision-making).In our example utterance, a child might look at the picture of Lisa napping.
The last internal piece concerns learning, which is how the child's developing knowledge (both linguistic and non-linguistic) is updated over time.As with the other internal pieces, the child's developing systems and knowledge impact this piece.In particular, learning occurs over the part of the perceptual intake the child deems relevant to learn from: this is the acquisitional intake.The acquisitional intake is typically not all of the perceptual intake.That is, it is not everything the child is able to encode.Instead, depending on what the child is trying to learn, what is relevant is likely some subset of the perceptual intake.For instance, in our example utterance, the fact that the pronoun she is singular may be in the acquisitional intake, while the fact that she is a separate word from took may not.
The child's developing knowledge can filter the perceptual intake down to the relevant information by providing both constraints on possible hypotheses (i.e., what options are worth considering) and attentional filters (i.e., what in the information signal to pay attention to).For instance, in our pronoun interpretation example, a linguistic constraint may limit the possible hypotheses for she's antecedent to noun phrases, and so the number feature is relevant for choosing among different noun phrases; a non-linguistic constraint may limit potential antecedents to animate participants who are capable of power napping.An attentional filter may focus the child on the pronoun's interpretation, rather than other aspects of the utterance, because of uncertainty about how to interpret pronouns more generally at the child's current stage of development.
Inference then operates over the acquisitional intake, and typically involves nonlinguistic abilities like probabilistic inference, statistical learning, or hypothesis testing.The result of this inference can be used to update the developing knowledgepotentially both linguistic knowledge and non-linguistic knowledge.For instance, in our pronoun interpretation example, the child might update her hypotheses about how likely it is that she's antecedent is singular (linguistic knowledge) and how likely adults like Lisa are to take power naps (non-linguistic knowledge).
With this proposal in hand for relevant components of a computational cognitive model of acquisition, we can now think about some of the ideas we might want to incorporate into future models of syntactic acquisition.I briefly discuss some ideas for incorporating non-syntactic components and simultaneous acquisition of different knowledge aspects.

Incorporating non-syntactic components into acquisition models
Prior behavioral work has found that children are sensitive to animacy when learning aspects of syntax (e.g., see Becker, 2015).Pearl and Sprouse (2019) used animacy in their model of linking theory acquisition, allowing the animacy of a verb's arguments to be part of the acquisitional intake that children learned from.
Prior behavioral work has also found that children can use both pragmatic and world knowledge to help them choose between potential interpretations of pronouns (e.g., Hartshorne et al., 2015a;Pyykkönen et al., 2010;Song & Fisher, 2005, 2007;Wykes, 1981).Some recent computational cognitive modeling work has investigated how children choose between potential interpretations of utterances like Every horse didn't jump, which can either mean "No horses jumped" or "Not all horses jumped" (Savinelli, Scontras & Pearl, 2017, 2018;Scontras & Pearl, 2021).The modeled children in these studies incorporated both pragmatic knowledge about what speakers think the topic of conversation is and world knowledge about the event described (e.g., how likely horses are to jump) into the perceptual intake.Notably, differences in children's ability to adjust their expectations about the pragmatics and world of the experimentdue to immature nonlinguistic systemscan explain children's observed non-adult-like behavior, according to these models.
More generally, prior behavioral work (Gerard et al., 2018;Liter et al., 2022;Ud Deen et al., 2018) has noted the impact of immature non-linguistic systems (e.g., cognitive inhibition) in children's use of their knowledgethat is, how children generate their observed behavior in experimental contexts.So, I think it is useful for future computational cognitive models to consider the impact of these developing nonlinguistic systems when accounting for children's behavior (i.e., the output generation process).
Moreover, these developing non-linguistic systems may also impact several other pieces of the acquisition process: (i) perceptual encoding, leading to a perceptual intake that captures immature representations of information in the input, (ii) constraints & filters, leading to an acquisitional intake that is inaccurate, and (iii) inference, leading to learning that is non-adult-like.The exact way developing non-linguistic systems impact these pieces depends on what system is developing and how that system is proposed to contribute to the acquisition process.While this is certainly non-trivial to specify for any given non-linguistic system and model piece, the more we can do it, the better we will be able to capture the acquisition process in children and link their input to their observable behavior with a concrete acquisition theory encoded in a model.

Thinking about simultaneous acquisition
Another interesting consideration is simultaneous acquisition, where multiple types of knowledge are learned simultaneously.In the case studies discussed here, the acquisition of linking theories from Pearl and Sprouse (2019) was an example of this.More specifically, when learning how to cluster verbs together into classes whose linking theories were similar, the modeled child effectively learned about many different verb constructions simultaneously (e.g., which verbs are subject-raising, which verbs are unaccusative, which verbs are passivizable, etc.).The key insight is that the modeled child's objective was broadto learn about verbs that "behave" similarly with respect to certain types of information in the acquisitional intake (argument animacy, syntactic contexts, links between thematic roles and syntactic positions), instead of learning about which verbs allow a specific syntactic behavior (e.g., subject-raising).In other words, the specific syntactic knowledge about which constructions any given verb allows is a by-product of trying to learn something else about that verbnamely, which other verbs it behaves similarly to (i.e., which class it belongs to) and what the behavior of that verb class is.
I think this may be a more realistic approach to syntactic acquisition (and acquisition more generally), with children trying to learn about their language more broadly and picking up specific linguistic knowledge along the way as part of that broader learning goal.What this means modeling-wise is that the modeled child's objectivewhat hypotheses are being consideredwould be adjusted.For instance, instead of explicitly learning if a verb is subject-raising, can children's observable behavior about which verbs are subject-raising be captured by a modeled child learning about verb classes more generally and implicitly learning which verbs are subject-raising?This approach worked well for Pearl and Sprouse (2019).
Another example of simultaneous syntactic acquisition from my own research (Bates & Pearl, 2019;Dickson, Pearl & Futrell, 2022;Pearl & Bates, in press;Pearl and Sprouse, 2013) is the acquisition of knowledge about "syntactic islands" in children.For example, English-speaking children must learn that Who did Lily think the kitten for -who was cute? is not a good wh-question, which draws on their implicit knowledge of syntactic islands.Here, the modeled child's objective is to learn in general how to represent whdependencies like those in wh-questions, rather than learning how good a specific whdependency is (or is not).By learning to do this, modeled children learn to have adult-like preferences about how good different wh-dependencies are (Bates & Pearl, 2019;Pearl & Bates, in press;Pearl & Sprouse, 2013), especially if the modeled children are trying to represent wh-dependencies in an "efficient" way (Dickson et al., 2022) that makes processing future wh-dependencies easier.
A related approach gaining momentum in syntactic acquisition modeling involves simply learning to predict the next word, with the modeled children implicitly learning whatever knowledge is necessary to make that next word highly probable (and therefore easier to process).Along the way, several models of this type seem to implicitly learn a variety of syntactic knowledge, including knowledge about syntactic islands (e.g., Wilcox, Levy, Morita & Futrell, 2018;Futrell et al., 2019;Chaves, 2020;Warstadt et al., 2020;Wilcox, Futrell & Levy, 2021).

Conclusion
Here I hope to have shown how computational cognitive modeling can inform our understanding of syntactic acquisition by implementing theories of acquisition precisely enough to evaluate against empirical data from children.I reviewed some previous models that consider information from non-syntactic sources and the impact of nonlinguistic cognitive development on syntactic acquisition.I also highlighted some behavioral work that notes the role of other information sources children use and specific cognitive limitations children have during syntactic acquisition.I then discussed how we might build future models that incorporate these insights and so provide better explanations of syntactic acquisition.With this information in mind, I believe we can create, evaluate, and refine better theories of syntactic acquisition through computational cognitive modeling.