Building a unified model of the Optional Infinitive Stage: Simulating the cross-linguistic pattern of verb-marking error in typically developing children and children with Developmental Language Disorder

Abstract Verb-marking errors are a characteristic feature of the speech of typically-developing (TD) children and are particularly prevalent in the speech of children with Developmental Language Disorder (DLD). However, both the pattern of verb-marking error in TD children and the pattern of verb-marking deficit in DLD vary across languages and interact with the semantic and syntactic properties of the language being learned. In this paper, we review work using a computational model called MOSAIC. We show how this work allows us to understand several features of the cross-linguistic data that are otherwise difficult to explain. We also show how discrepancies between the developmental data and the quantitative predictions generated by MOSAIC can be used to identify weaknesses in our current understanding and lead to further theory development; and how the resulting model (MOSAIC+) helps us understand differences in the cross-linguistic patterning of verb-marking errors in TD children and children with DLD.


Introduction
The OI phenomenon Verb-marking errors are a characteristic feature of children's early language.For example, between the ages of 2 and 4 years, English-speaking children often make errors like those in (1) to (4) in which they use a zero-marked form in a context that requires a past tense or third person singular (3sg) present tense form (examples taken from the Manchester corpus; Theakston, Lieven, Pine, & Rowland, 2001;MacWhinney, 2000).
(1) *I buy them yesterday (Anne, 2;6.04) (2) *He want to go (Carl,2;8.15)(3) *We make this yesterday (Gail, 2;8.13) (4) *And the lorry go on top (Warren, 2;7.05) Early analyses of these kinds of errors assumed that they reflected incomplete knowledge of the target inflection (e.g., Brown, 1973), or the dropping of the inflection due to performance limitations in production (e.g., Bloom, 1990;Valian, 1991).However, analyses of a wider range of languages (e.g., Wexler, 1994) have shown that, in languages other than English, the equivalent errors tend to include verb forms marked with an infinitival morpheme like those in ( 5) to (8).Since these errors do not involve the use of a bare stem, they cannot be explained in terms of inflection drop, and this has led to the view that the pattern of verb-marking errors across languages (including the incorrect use of zero-marked forms in English) reflects the use of infinitives and other non-finite forms in finite contexts.These errors are sometimes referred to as Root Infinitive (RI) errors (Rizzi, 1993(Rizzi, /1994)).However, since they tend to occur during a stage in which the child is also producing correct finite forms, they are also often referred to as Optional Infinitive (OI) errors (Wexler, 1994), and the period during which they occur as the Optional Infinitive (OI) Stage.

Dutch
Why is building a unified cross-linguistic model of the OI stage challenging?The fact that OI errors occur across a range of different languages has stimulated numerous attempts to explain the cross-linguistic pattern of verb-marking error in children's speech.However, developing a unified cross-linguistic model of the OI stage has proved challenging for a number of reasons.
First, the rate at which children make OI errors varies considerably across languages.Early models (e.g., Wexler, 1994) tended to assume a qualitative distinction between languages in which children make OI errors and languages in which they do not.However, it is now clear that rates of OI errors vary along a continuum across languages.For example, Phillips (1995) reviews data from children learning 5 OI languages (Dutch, English, French, German and Swedish) and 4 non-OI languages (Catalan, Hebrew, Italian and Spanish) and concludes that rates of OI errors range from high in English and Swedish through moderate in Dutch, French and German to low (but by no means zero) in Catalan, Hebrew, Italian and Spanish.A good model of verb-marking error needs to be able to explain this variation.
Second, the patterning of OI errors interacts in complex ways with the semantic properties of the language being learned.Thus, contrary to the predictions of some models of the OI stage (e.g., Wexler, 1998;Legate & Yang, 2007), OI errors and the correct finite forms with which they are assumed to alternate tend to be distributed differently in children's speech.That is, OI errors tend to have a modal or irrealis meaning (the Modal Reference Effect), and to be restricted to eventive verbs like 'play' or 'buy' (the Eventivity Constraint), whereas correct finite forms tend to have a non-modal meaning and occur with both eventive and stative verbs.This pattern has been reported in a number of OI languages, including Dutch (Jordens, 1990;Wijnen, 1998); French (Ferdinand, 1996); German (Ingram & Thompson, 1996) and Swedish (Josefsson, 2002).However, it is also subject to variation.For example, the Modal Reference Effect and the Eventivity Constraint appear to be absent or much reduced in early child English (Hoekstra & Hyams, 1998).A good model of verb-marking error therefore needs to be able to explain the presence of these semantic conditioning effects in most OI languages, and the absence or reduced size of these effects in English.
Third, the patterning of OI errors interacts with the syntactic properties of the language being learned.Thus, in many OI languages, OI errors occur in declaratives but not in Wh-questions.However, in English OI errors occur in both declaratives (e.g., *Dolly eat biscuit) and Wh-questions (e.g., *What dolly eat?).Some UG accounts (e.g., Wexler, 1998) can explain this pattern, but others cannot.For example, Rizzi's (1993Rizzi's ( / 1994) ) truncation account predicts that children learning OI languages will make OI errors in declaratives but not in Wh-questions and therefore cannot explain why Englishspeaking children make OI errors in both declaratives and Wh-questions.A good model of verb-marking error therefore needs to be able to explain why children make OI errors in Wh-questions in English, but not in other OI languages.
Finally, although deficits in verb-marking are a characteristic feature of Developmental Language Disorder (DLD), the pattern of deficit varies across languages.Thus, English-speaking children with DLD tend to produce OI errors at higher rates, even relatively late in development, than younger children matched for their Mean Length of Utterance (MLU) (Rice, Wexler, & Cleave, 1995;Rice, Wexler, & Hershberger, 1998).However, this does not appear to be the case in other OI languages.For example, Wexler, Schaeffer, and Bol (2004) found no such effect in their Dutch-speaking sample at MLU = 3 and MLU = 4; and, although Rice, Noll, and Grimm (1997) did find an MLU-matching effect at MLU = 2.66 in their German sample, this effect had disappeared by MLU = 3.77.In contrast, several researchers have found higher rates of subject-verb agreement and verb-positioning errors in Dutch and German.For example, both de Jong (1999) and Wexler et al. (2004) report elevated (though still relatively low) rates of agreement error in Dutch-speaking children with DLD (see also Blom, de Jong, Orgassa, Baker, & Weerman, 2013); Clahsen, Bartke, and Göllner (1997) report higher rates of agreement error in German-speaking children with DLD (see also Rothweiler, Chilla, & Clahsen, 2012); and a number of researchers have reported verb-positioning errors in both Dutch (de Jong, 1999;Wexler et al., 2004) and German (Clahsen et al., 1997;Hamann, Penner, & Lindner, 1998;Roberts & Leonard, 1997; see Leonard, 2014, for a review).These positioning errors typically involve the incorrect use of finite forms (which are restricted to second position in Dutch and German) in utterance-final position, though the incorrect use of infinitives in second position has also been reported.A good model of verb-marking error needs to be able to explain this pattern of deficits.

MOSAIC
Most research on the OI phenomenon has been conducted within a Universal Grammar (UG) framework, which assumes that the child is analysing the input in terms of innate linguistic categories and principles, and that, where OI errors occur, they reflect an underlying difference between the child and the adult grammar.For example, both Wexler (1998) and Legate and Yang (2007) offer unified accounts of the cross-linguistic pattern of OI errors in young children's speech.According to Wexler (1998), children have correctly set all the inflectional and clause structure parameters of their language from a very early age, but produce OI errors because they are subject to a Unique Checking Constraint (UCC), which sometimes prevents the specification of Tense and Agreement in the underlying representation of the sentence; and according to Legate and Yang (2007), children produce OI errors because they are engaged in a probabilistic parameter-setting process, which, for tense-marking languages, results in them alternating between the correct [þTense] and an incorrect [-Tense] grammar until sufficient evidence has accrued that they are learning a tense-marking language.However, in a series of papers, our research group has used a computational model of early language development (MOSAIC), with no built-in knowledge of syntactic categories or rules, to show that the cross-linguistic patterning of OI errors can be understood in terms of the interaction between psychologically plausible constraints on learning and the semanticdistributional properties of the language to which children are exposed.
MOSAIC is an unsupervised learning model that learns from input in the form of orthographically transcribed child-directed speech.MOSAIC gradually builds a network of words and strings of words on the basis of the input to which it has been exposed and produces output in the form of 'utterances' that become progressively longer as learning proceeds.Some of these utterances are produced by rote (i.e., have occurred as utterances or parts of utterances in the input).Others are produced generatively (i.e., by substituting distributionally similar words into frames that have occurred as utterances or parts of utterances in the input).The rationale for generating novel utterances in this way is that words that occur in distributionally similar contexts tend to be of the same syntactic class (e.g., Redington, Chater, & Finch, 1998), and that children have been shown to be able to use this kind of distributional information to form syntactic categories from an early age (e.g., Gerken, Wilson, & Lewis, 2005).Since the average length of MOSAIC's output (both rote-learned and generative) increases with learning, MOSAIC can be used to simulate developmental changes in children's speech as a function of increasing MLU.
MOSAIC simulates OI errors because it has a strong utterance-final (and, in later work, an additional weaker utterance-initial) bias in learning.MOSAIC's utterance-final bias implements a recency effect in the processing and learning of sequential information, which is generally taken to reflect capacity limitations in working memory.Several authors have argued, on the basis of empirical data, that language-learning children are particularly sensitive to material in utterance-final position.That is, children comprehend and learn words and phrases that occur in utterance-final position more readily than words and phrases that occur in other positions (e.g., Braginsky, Yurovsky, Marchman, & Frank, 2019;Shady & Gerken, 1999).It has also been argued that this kind of utterancefinal bias in learning may provide a plausible explanation of the OI phenomenon itself (at least in languages such as Dutch and German).For example, Wijnen, Kempen and Gillis (2001) argue that the high proportion of OI errors in early child Dutch can be at least partly explained by the fact that in Dutch non-finite verb forms are restricted to sentencefinal position and are hence easier for children to learn than finite verb forms, which tend to occur earlier in the sentence.
MOSAIC's weaker utterance-initial bias in learning implements a primacy effect in learning, consistent with evidence that humans show primacy as well as recency effects in sequence and list learning, with the primacy effects generally being smaller than the recency effect (Bellezza, Andrasik, & Lewis, 1982;Gupta, 2005;Jahnke, 1965;Murdock, 1962).In fact, the issue of whether young children are subject to primacy effects in learning is not as straightforward as it might at first appear, since primacy effects are often taken to reflect active elaboration processes such as rehearsal (which are unlikely to be a factor in younger children) rather than processing limitations per se.However, primacy effects have often been found in humans in the absence of rehearsal (e.g., Neath, 1993;Sikström, 2006).Moreover, primacy (and recency) effects have also been found in non-human species (monkeys and pigeons), where active elaboration processes are unlikely to play a major role (Wright, Santiago, Sands, Kendrick, & Rook, 1985).These findings can be explained in terms of the increased distinctiveness of items occurring at the beginning and end of a list or sequence, and suggest that languagelearning children are likely to be preferentially sensitive to both the beginning and the end of unfamiliar utterances.
MOSAIC's utterance-final and utterance-initial biases result in the production of partial utterances that were present as utterance-final phrases in the input on which the model was trained and, in later work, utterance-final strings and concatenations of utterance-initial words or phrases and utterance-final strings.The structures in the input that give rise to OI errors are compound-finite structures: utterances that contain a (finite) modal or other auxiliary and an infinitive, such as the English utterance 'This could go there' or the German utterance 'Oma kann die Brücke bauen' (Grandma can the bridge build-INF).The truncation of utterances like these results in subjectless OI errors such as 'go there' and 'Brücke bauen'.The concatenation of utterance-initial and utterance-final phrases from such utterances results in OI errors with subjects such as 'This go there' or 'Oma Brücke bauen' (Grandma bridge build-INF).
MOSAIC simulates the developmental patterning of OI errors because it learns to produce progressively longer utterances as a function of the amount of input to which it has been exposed.Children produce OI errors at high rates early in development and produce fewer OI errors as the length of their utterances increases.MOSAIC simulates this pattern because of the way that compound finites pattern in the relevant languages.In compound finites, the finite modal or other auxiliary precedes the infinitive.Since MOSAIC produces increasingly long utterance-final phrases, the short phrases it produces early on are likely to contain only non-finite verb forms.As the phrases MOSAIC produces become longer, finite modals and other auxiliaries start to appear, and OI errors are gradually replaced by the compound finites from which they have been learnt.
It is worth emphasising at this point that MOSAIC is a relatively simple distributional analyzer, with no direct access to semantic, pragmatic or phonological information, which is clearly not powerful enough to acquire many aspects of adult morpho-syntax.MOSAIC therefore does not provide a complete model of the language acquisition processor even of children's early morpho-syntactic development.For example, MOSAIC does not have access to information about the morphological structure of different verb forms and is hence unable to build the kind of productive knowledge of verb-marking that would allow it to inflect novel verbs.However, because of its ability to produce child-like utterances across a range of different languages, MOSAIC does provide a powerful means of testing hypotheses about the relation between crosslinguistic variation in children's early language and cross-linguistic differences in the language to which they are exposed.This makes it a very useful model for exploring the sources of children's verb-marking errors.

Explaining differences in the rate of OI errors across languages
The first challenge facing any unified model of the OI stage is to explain differences in the rate of OI errors across languages.In an early paper, Freudenthal, Pine, Aguado-Orea, and Gobet (2007) showed that MOSAIC was able to simulate variation in the developmental patterning of OI errors across three languages (Dutch, German and Spanish) as well as the developmental patterning of OI errors with 3sg subjects in English.The model was thus able to explain the apparent qualitative difference between OI languages like Dutch and German, in which OI errors are very common during the early stages, and a non-OI language like Spanish, in which OI errors are relatively rare.However, it was also able to explain the more subtle difference between Dutch, in which OI errors occur at particularly high rates, and German, in which OI errors are somewhat less frequent.
The way MOSAIC simulates these differences can be understood by considering the data presented in Table 1.This table provides details of the rate at which the Dutch child Matthijs, the German child Leo and the Spanish child Juan produced OI errors early in development; the rate at which compound-finite constructions occurred in each child's input; and the rate at which non-finite versus finite verb forms occurred in utterance-final position in the input.
It can be seen from these data that, although compound-finite constructions occur at relatively low rates in Dutch, German and Spanish (34%, 29% and 25%, respectively), the rate at which non-finite forms occur in utterance-final position is very different in the three languages (90% in Dutch, 66% in German and 26% in Spanish), and very similar to the rate at which the Dutch, German and Spanish children produce OI errors during the early stages (91%, 61%, and 23%).MOSAIC simulates these differences in rates of OI errors because the model's utterance-final bias interacts with the way that compound finites pattern in the input language to result in high rates of OI errors in Dutch and German (where non-finite forms are tied to utterance-final position) and low rates of OI errors in Spanish (where non-finite forms can also occur earlier in the sentence).This suggests that the apparently qualitative difference in the rate of OI errors between Dutch and German on the one hand and Spanish on the other can be understood in terms of the interaction between an utterance-final bias in learning and differences in the rate at which non-finite forms occur in utterance-final position in the input language.
Note that the smaller difference in the rate of OI errors in Dutch and German also appears to reflect a difference in the percentage of non-finite versus finite forms in utterance-final positionand explains why MOSAIC is also able to simulate this difference.However, since Dutch and German are structurally very similar, and the data in Table 1 are based on only one Dutch-and one German-speaking child, it is unclear whether this pattern generalises beyond these two children.Freudenthal et al. addressed this question directly by examining the rate of OI errors in a larger number of children (7 German and 6 Dutch children at MLU = 1.5), and the rate of non-finite verbs in utterance-final position in these children's input.Their results revealed a significantly higher rate of OI errors in Dutch than in German (Mean = 77% v Mean = 60%), a significantly higher rate of non-finite verbs in utterance-final position in Dutch than in German (Mean = 85% v Mean = 77%), and a significant rank order correlation (Rs = .70,N = 13, p < .01) between these measures across the two languages.The implication is that Dutch-speaking children do tend to produce OI errors at higher rates than German children during the early stages, and that this difference is related to differences in the rate at which they hear non-finite verbs in utterance-final position in the input.Freudenthal et al. also identified a key factor that was responsible for differences in the rate at which non-finite verbs occurred in utterance-final position in Dutch and German.This was the tendency of Dutch caregivers to use a 'go þ infinitive' construction to describe future events where German caregivers tended to use a simple present tense form together with a temporal adverb.Note that the Dutch 'go þ infinitive' construction is precisely the kind of compound-finite construction from which children would be expected to learn OI errors under an input-driven account.These results therefore provide particularly compelling evidence both for the idea that OI errors are learned from compound-finite structures in the input and for the idea that quantitative differences in the rate of OI errors reflect the interaction between an utterance-final bias in learning and the way in which compound-finite structures pattern in the input language.
Explaining variation in the semantic conditioning of OI errors A second challenge facing any unified model of the OI stage is to explain variation in the semantic conditioning of OI errors.Freudenthal, Pine, and Gobet (2009) investigated whether the kind of input-driven account implemented in MOSAIC could explain the Modal Reference Effect and the Eventivity Constraint in Dutch and German, and the absence or reduced size of these effects in English.This was done simply by marking the infinitives that occurred in modal structures in the model's input, and then using this marking to determine whether the OI errors in the model's output had been learned from modal contexts, on the assumption that OI errors learned from modal contexts would have modal semantics.
The results showed that MOSAIC simulated both the tendency for Dutch and German OI errors to have modal semantics (i.e., to have been learned from modal contexts), and the significantly lower proportion of OI errors in English that have a modal reading.It also simulated the fact that the vast majority of OI errors in Dutch and German occurred with eventive verbs (with the model producing eventive OI errors at rates of over 90% in each case), and the significantly lower percentage of eventive OI errors in English (with the model producing eventive OI errors at the significantly lower rate of 76%).Moreover, in both cases, the model's ability to simulate these differences could be traced back to the fact that English contains a large number of compound-finite utterances in which an infinitive is combined with a finite form of auxiliary DO.Since auxiliary DO behaves like a modal auxiliary in English, but does not have modal semantics, compound finites with auxiliary DO in the input provide a large number of contexts from which the English-speaking child can extract OI errors with non-modal semantics.Moreover, since, unlike modal auxiliaries, auxiliary DO tends to combine with both eventive and stative verbs, compounds with auxiliary DO also provide contexts from which the English-speaking child can extract both eventive and stative OI errors.In short, the idea that OI errors are learned from compound-finite structures in the input not only provides a simple input-driven explanation of the Modal Reference Effect and the Eventivity Constraint in Dutch and Germanit can also explain the absence or reduced size of these effects in English.
Explaining variation in the patterning of OI errors in declaratives and Wh-questions A third challenge facing any unified model of the OI stage is to explain variation in the patterning of OI errors in declaratives and Wh-questions.In a more recent paper, Freudenthal, Pine, Jones, and Gobet (2015) showed that a modified version of MOSAIC, which incorporated both a strong utterance-final bias in learning and a weaker utteranceinitial bias in learning, could simulate the pattern of OI errors in declaratives and Wh-questions across English, Dutch, German and Spanish.More specifically, they showed that a version of the model that learned to produce both utterance-final strings and concatenations of utterance-initial chunks and utterance-final strings was able to capture the cross-linguistic patterning of OI errors in declaratives by learning from declarative input, and the cross-linguistic patterning of OI errors in Wh-questions by learning from interrogative input.Interestingly, the reason MOSAIC was able to capture the pattern in Wh-questions is that there is a critical difference in the way that Wh-questions are formed in English compared to Spanish, Dutch and Germannamely, that English does not allow subject main-verb inversion.This difference means that most Wh-questions in English contain a compound-finite verb and hence a context from which an interrogative OI error can be learned (e.g., What (does) dolly eat?), whereas most Wh-questions in Spanish, Dutch and German contain a simple finite main verb (e.g., Spanish: ¿Qué come Dolly?Dutch: Wat eet Dolly?German: Was isst Dolly?), and hence do not provide such a context.These results suggest that the idea that OI errors are learned from compound-finite forms in the input can not only explain the cross-linguistic patterning of OI errors in declarativesit can also be extended to explain the crosslinguistic patterning of OI errors in Wh-questions.

Using MOSAIC to identify weaknesses in our current understanding
The papers reviewed so far show that the input-driven account implemented in MOSAIC provides a simple and elegant explanation both of cross-linguistic variation in the rate at which OI errors occur and of the way in which such errors interact with the semantic and syntactic properties of the language being learned.However, another important strength of MOSAIC is that it makes further quantitative predictions about the pattern of error across languages.As a result, discrepancies between these predictions and rates of OI errors can be used to identify weaknesses in our current understanding, and lead to further theory development.For example, Freudenthal, Pine, and Gobet (2010) compared MOSAIC with Legate and Yang's (2007) Variational Learning Model (VLM) and investigated how well the two models explained the rate and lexical patterning of OI errors at an MLU of approximately 2 in English, Dutch, German, French, and Spanish.
The VLM is a probabilistic parameter-setting model under which the child's grammar at any particular point in development consists of a population of innately derived hypotheses whose composition changes in response to linguistic information in the input.The child is assumed to entertain a number of possible parameter settings, each of which is associated with a particular probability.When a particular parameter setting is used to parse linguistic data, it is rewarded by utterances that are consistent with this setting and punished by utterances that are not.The child is assumed to converge on the correct grammar of the language by gradually abandoning hypotheses that are not consistent with the input data.However, the probabilistic nature of the parameter-setting process means that there will be a period during which the child continues to entertain two or more competing hypothesesand hence continues to make errorsand that the length of this period will vary depending on the amount of evidence for the correct parameter setting that is available in the input language.
According to the VLM, OI errors reflect the fact that children learning tense-marking languages initially entertain the hypothesis that they are learning a non-tense-marking language like Mandarin Chinese.This hypothesis is gradually abandoned as a result of exposure to utterances in the input language in which tense is overtly marked, which reward the [þTense] grammar and punish the [-Tense] grammar.The VLM therefore predicts that children learning a morphologically rich language such as Spanish will make few OI errors and emerge from the OI stage relatively early, because a large proportion of the utterances in their input reward the [þTense] grammar, whereas children learning a morphologically impoverished language such as English will make more OI errors and emerge from the OI stage relatively late, because a large proportion of the utterances in their input are consistent with the [-Tense] grammar.Legate and Yang (2007) tested the VLM by deriving corpus-based measures of the extent to which English, French and Spanish input reward the [þTense] grammar and showed that Spanish input rewards the [þTense] grammar more than French input, and that French input rewards the [þTense] grammar more than English input.Since rates of OI errors tend to be very high in English, relatively low in French and very low in Spanish, these results provide some support for the VLM.However, given that the model is intended to capture variation across the full range of tense-marking languages, evaluating its success on one language with very high rates of OI errors and two languages with relatively low rates of OI errors is a relatively weak test of its predictions.Freudenthal et al. (2010) therefore conducted a stronger test of the VLM by testing its predictions on a wider range of languages (English, French and Spanish, but also Dutch and German), and comparing its ability to explain the cross-linguistic data on OI errors with that of MOSAIC.They also sought to differentiate more clearly between the two accounts by testing a prediction specific to the MOSAIC accountnamely, that there would be a significant correlation between the extent to which particular verbs occurred as OI errors versus correct finite forms in children's speech and the extent to which those verbs occurred as infinitive versus finite forms in the input.Note that MOSAIC predicts such a correlation because it assumes that OI errors are infinitives learned directly from compound structures in the input.The VLM, on the other hand, does not predict such a correlation since it assumes that OI errors reflect an underlying difference in the child's grammar rather than differences in children's knowledge about particular verbs.Freudenthal et al.'s (2010) results provide support for MOSAIC's account of OI errors in the form of significant correlations between the rate at which children produce OI errors with specific verbs and the rate at which those verbs occur in compound-finite as opposed to simple-finite structures in child-directed speech in all five of the languages studied.This result has since been replicated for French and German by Laaha and Bassano (2013).However, Freudenthal et al. also show that, although both MOSAIC and the VLM were good at explaining differences in the rate of OI errors in Dutch, German, French and Spanish, neither was able to explain the very high rate of OI errors in early child English.In the case of the VLM, this problem reflects the VLM's lack of sensitivity to lexical patterning in the data.Thus, the reason why the VLM is unable to explain the very high level of OI errors in early child English is that it does not discriminate between evidence for the [þTense] grammar in the form of inflected copula and auxiliary verb forms and evidence for the [þTense] grammar in the form of inflected lexical verbs.Although English lexical verbs provide very little evidence for the [þTense] grammar, and considerably less evidence than Dutch lexical verbs, English copulas and auxiliaries actually provide a great deal of evidence for the [þTense] grammar.This, together with the fact that copulas and auxiliaries make up a much greater proportion of the English child's input, means that the VLM actually predicts lower levels of OI errors in English than in Dutch.A more lexically oriented input-driven account could probably deal with this problem relatively easily by simply distinguishing between what the child is learning about copulas and auxiliaries and what the child is learning about lexical verbs, and predicting high levels of OI errors on lexical verbs and lower levels of OI errors on copulas and auxiliaries.The VLM, however, is not a lexically oriented account since it assumes that the child is not learning how to inflect particular lexical forms but learning to reject a particular grammar or parameter setting.The VLM is thus unable to explain why Englishspeaking children entertain the [-Tense] grammar for so long, when there is so much evidence for the [þTense] grammar in the form of tensed and agreeing copulas and auxiliaries in the input.
MOSAIC's problems simulating the high level of OI errors in early child English reflect the fact that the proportion of utterance-final verbs that are non-finite in English is simply not high enough to explain the almost exclusive production of OI errors during the early stages.One obvious reason why this might be the case is that English differs from the other four languages in that for lexical verbs the infinitive is indistinguishable from the bare stem.Since the only present tense form that is not a bare stem in English is the third person singular, this means that a much higher proportion of lexical verb forms in the input are either infinitives or forms that are indistinguishable from the infinitive.This fact is likely to slow down the process of paradigm building in English and result in defaulting effects where the child produces a bare stem/infinitive in the absence of knowledge of the relevant 3sg or past tense form.Since MOSAIC is insensitive to the morphological structure of the verbs that it encodes, it is clearly unable to simulate this kind of defaulting effect, and hence predicts fewer OI errors than children actually produce.
In response to these findings, Freudenthal et al. (2010) argue for an extended model of verb-marking error in which some errors reflect the use of infinitives learned from compound-finite structures in the input and others reflect a process of defaulting to the most frequent form of the verb.Räsänen, Ambridge, and Pine (2014) tested this hypothesis by correlating the tendency to produce OI errors involving specific verbs in an elicited production task with the relative frequency with which those verbs occurred as bare stems or 3sg forms in English child-directed speech.Their results revealed a significant relation between these two variables, suggesting that some OI errors in English do indeed reflect a process of defaulting to the bare stem (see Blom, 2007, for a UG account that also assumes that some apparent OI errors in English are finite bare stems).Räsänen et al.'s (2014) results have also been replicated by Kueser, Leonard, and Deevy (2018) in a group of English-speaking children with DLD and a group of language-matched controls, suggesting that defaulting to the most frequent form of the verb is also a factor in explaining the pattern of verb-marking errors in children with DLD.
These studies show how a computational model that makes quantitative predictions about the developmental data can be used to identify weaknesses in our current understanding and thereby lead to further theory development.They also show how this kind of approach can generate hypotheses which, when tested in subsequent research on typically developing children and children with DLD, shed further light on the sources of verbmarking errors in both these populations.
MOSAICþ: An extended model of the cross-linguistic pattern of verb-marking error in TD children and children with DLD In a recent paper, Freudenthal, Gobet, and Pine (submitted) tested the extended model of the OI stage suggested by Freudenthal et al. (2010) by supplementing MOSAIC's basic learning mechanism with a mechanism that defaults to the most frequent form of the verb (i.e., substitutes the most frequent form of the verb for low frequency forms of the verb in the model's output) when the relative frequency of that form in the input is above a certain threshold.They then investigated the extent to which this new version of the model (MOSAICþ) provided both a better fit than the previous version of the model to the cross-linguistic data, particularly the very high rate of OI errors in early child English, and a means of simulating the cross-linguistic pattern of verb-marking deficit in children with DLD.The latter was done by manipulating the defaulting threshold in the model to increase the rates of defaulting in the DLD models.
Freudenthal et al.'s simulations showed that this new version of MOSAIC could simulate both the very high rates of OI error in early child English and the fact that English-speaking children with DLD tend to show significantly higher rates of OI errors at high MLUs than MLU-matched controls, whereas Dutch-and German-speaking children do not, tending instead to show elevated, though still relatively low, rates of agreement and positioning errors.The model captured these patterns because, in English, defaulting tended to result in bare stem errors that are indistinguishable from OI errors, whereas in Dutch and German it tended to result in the substitution of high-frequency finite forms into lower frequency finite and non-finite contexts, resulting in agreement and positioning errors.The implication is that the cross-linguistic pattern of verbmarking error in both TD children and children with DLD can be understood in terms of a model in which some errors reflect the learning of infinitives from compound structures in the input and others reflect a process of defaulting to the most frequent form of the verb.
These findings show how a unified model of the cross-linguistic pattern of verbmarking error can provide insights into the sources of verb-marking error in both TD and language-impaired children.However, since they are based on simulations in which the defaulting parameter was manipulated directly, they also leave open the question of precisely what underlies the increased level of defaulting in children with DLD.One possibility that maps more or less directly on to the way defaulting is implemented in the current version of the model is that greater defaulting in DLD reflects a deficit in the ability to inhibit competition from higher frequency forms (see McMurray, Klein-Packard, & Tomblin, 2019, for an explanation of lexical deficits in DLD in terms of reduced lexical inhibition).A second possibility is that greater defaulting reflects a deficit in word learning which has knock-on effects on paradigm building.According to this view, greater defaulting reflects an underlying deficit in the ability to learn low frequency forms and hence more productive morphological patterns, which leaves the child with DLD more susceptible to competition from high frequency forms (see Harmon, Barak, Shafto, Edwards, & Feldman, 2022, for an account of deficits in past tense marking along these lines).A third possibility is that greater defaulting reflects a deficit in the ability to process long-distance dependencies that differentiate between contexts that require lower and higher frequency forms.According to this view, children with DLD use the most frequent form of the verb because they have yet to distinguish between contexts that require a lower frequency form of the verb (e.g., Baby sleeps in there) and contexts that require a higher frequency form of the verb (e.g., Does Baby sleep in there?or We let Baby sleep in there).This leaves children with DLD susceptible to competition from higher frequency forms of the verb for longer than TD children (see Leonard, Fey, Deevy, & Bredin-Oja, 2014, for a more detailed description of this Competing Sources of Input account).
In a recent paper, Freudenthal, Ramscar, Leonard, and Pine (2021) tested this third type of account using a sequential learning model based on Gureckis and Love's (2010) adaptation of Rescorla and Wagner's (1972) error-driven model of associative learning.Their results showed that, in line with the cross-linguistic data, the model was slower to learn to reliably mark verbs in 3sg contexts in English than it was to learn 1sg, 2sg and 3sg marking in Spanish.This reflected the fact that in English it had to learn to ignore bare forms in 3sg questions such as Does he go there?and Where does he go? on the basis of information in the preceding context, whereas in Spanish it did notsince Spanish verbs are marked in the same way in both declaratives and questions.Their results also showed that reducing the model's sensitivity to information in the preceding context had a much larger effect on its ability to learn 3sg marking in English than on its ability to learn 1sg, 2sg and 3sg marking in Spanish.The model was therefore able to simulate the greater deficit in verb-marking shown by children with DLD learning English than by children with DLD learning Spanish.These results provide further support for the view that the cross-linguistic pattern of deficit in verb-marking error in DLD can be understood in terms of the interaction between psychologically plausible constraints on learning (in this case reduced sensitivity to information earlier in the sentence) and the semantic distributional properties of the language being learned.

MOSAIC v UG approaches to the OI phenomenon
When taken as a whole, the work described above illustrates how conceptualising the developmental data as the product of an interaction between psychologically plausible constraints on learning and the semantic distributional properties of the language to which children are exposed can provide important insights into the sources of verbmarking errors in language learning children.It also shows how using the same computational model to simulate data across children learning different languages can be a powerful way of developing unified accounts of phenomena in child language acquisition.Of course, as noted earlier, MOSAIC is a simple distributional analyser which is not powerful enough to acquire many aspects of adult morpho-syntax.Nevertheless, it is worth asking why MOSAIC has been so successful in explaining the cross-linguistic data on OI errorsparticularly in comparison to many UG models of the OI stage.
One reason is that MOSAIC learns slowly and probabilistically and is therefore sensitive to the amount of information about verb-marking that is available in different languages.This allows it to simulate quantitative variation in rates of verb-marking error in a way that early UG models were unable to dothough this problem has been at least partially solved by the advent of UG þ statistics models, which assume that children set linguistic parameters gradually in response to the amount of supporting evidence in the input (see Pearl, 2021, for a review).A second reason is that MOSAIC incorporates psychologically plausible constraints on learning.This allows the model to simulate effects which reflect the child's differential sensitivity to information from different parts of the utterance, rather than making the unrealistic assumption that the child is equally sensitive to information at each point in the utterance.More recent UG þ statistics models may be able to explain at least some of these effects by linking morpho-syntactic development to parser development (see Omaki and Lidz, 2015, for a review).However, a third reason is that it builds representations that reflect the semantic-distributional properties of the language being learned by staying close to the surface properties and lexical patterning of the input data.It is this feature of the model that allows it to simulate variation in the way that OI errors interact with the semantic and syntactic properties of the language being learnedand that leads it to make predictions, which have since been confirmed, about the relation between children's tendency to make OI errors with specific verbs and the rate at which those verbs occur as infinitives in compound structures in the input.It is difficult to see how current UG þ statistics models can explain these effects since, by parsing the input in terms of high-level categories and features, they abstract away from the contexts in which infinitives and other non-finite forms occur, and hence away from the semantic-distributional properties of the input that appear to condition the pattern of OI errors in children's speech.

Future directions
Given MOSAIC's success in simulating the cross-linguistic data, it is also worth asking to what extent it might be possible to develop MOSAIC into a more comprehensive crosslinguistic model of morpho-syntactic development.That is, to what extent is it possible to add mechanisms to MOSAIC to allow the model to simulate a wider range of morphosyntactic phenomena in a wider range of languages.
In answering this question, it is important to recognise that one of the great strengths of MOSAIC is its relative simplicity compared to other computational models of language learning (e.g., Connectionist models of morphological development, Deep Learning models of the acquisition of long-distance agreement dependencies).For example, MOSAIC currently learns from orthographically transcribed child-directed speech segmented into words.This simplifying assumption is clearly unrealistic.However, it means that the model can accept as input corpora of child-directed speech in a range of different languages, and hence model cross-linguistic variation in children's speech in terms of the interaction between exactly the same learning model and the semantic-distributional properties of the language to which they are exposed.This makes it relatively easy to understand how MOSAIC is able to capture effects in child language data, and hence allows us to use the model to test specific hypotheses about the relation between crosslinguistic variation in children's speech and cross-linguistic variation in the input.However, it also precludes using the model to study the acquisition of non-isolating (polysynthetic) languages (such as K'iche' or Inuktitut), or to simulate effects below the level of the word (such as phonological effects on morphological learning or paradigm building).
One way in which MOSAIC could be developed in the future is by changing the input representation from strings of words to strings of phonemes, syllables or morphemes.This would open up the possibility of simulating a wider range of phenomena in a wider range of languages.For example, Freudenthal, Pine and Gobet (2011) used a syllabified version of MOSAIC to simulate the patterning of bare root errors in K'iche'.Their simulations showed that children's production of bare verb roots, which are ungrammatical and hence extremely rare in the input language, could be understood in terms of the omission of verb prefixes from non-suffixed verb forms, as a result of an utterancefinal bias in learning similar to that used to simulate OI errors in isolating languages.Note, however, that this syllabified version of MOSAIC is still very much a string-learning model, and hence leaves unanswered the questions of how the child segments the input into morphemes, identifies the function of those morphemes, and builds adult-like knowledge of inflectional paradigms.
Another approach is to use the insights gained from MOSAIC to develop more comprehensive models using other more powerful modelling approaches.For example, Freudenthal, Pine, and Bannard (submitted) are currently using a deep learning LSTM model to simulate the acquisition of English present tense verb-marking in typically developing children and children with DLD.This kind of model has the advantage that it is sensitive to both the semantic and the distributional properties of the input language; can learn long-distance dependencies; and can accept input segmented in a variety of different ways (e.g., segmented into words, morphemes, or phonemes).Phasing the input to such a model in a way that was consistent with MOSAIC's edge-based biases in learning would therefore result in a model with the potential to simulate a wider range of phenomena in a wider range of languages than the current version of MOSAIC, though it is worth noting that it is likely to be considerably more difficult to interrogate this kind of model to understand how it is capturing effects in the child language data.

Conclusion
To conclude, when taken as a whole, the modelling work described above illustrates how conceptualising the developmental data as the product of an interaction between psychologically plausible constraints on learning and the semantic distributional properties of the language to which children are exposed can provide important insights into the sources of verb-marking errors in language learning children.This includes both the sources of verb-marking error in typically developing children and the sources of verbmarking error in children with DLD.It also shows how using the same model to simulate data across children learning different languages can be a powerful way of developing a unified constructivist model of a particular set of phenomena that generates predictions that can be tested in subsequent research.Future work could use this approach to simulate a wider range of phenomena in a wider range of language, either by adding mechanisms to MOSAIC or by using the insights gained from MOSAIC to inform other modelling approaches.

Table 1 .
Early rates of OI errors in Dutch, German and Spanish and the rate of compound finites in the input and non-finite verbs in utterance-final position