Hostname: page-component-5db58dd55d-lqwgf Total loading time: 0 Render date: 2026-05-31T20:46:18.210Z Has data issue: false hasContentIssue false

Mind the gap: Learning the surface forms of movement dependencies

Published online by Cambridge University Press:  08 April 2026

Laurel Perkins*
Affiliation:
Department of Linguistics, University of California Los Angeles , United States
Naomi H. Feldman
Affiliation:
Department of Linguistics, University of Maryland College Park , United States Institute for Advanced Computer Studies, University of Maryland College Park , United States
Jeffrey Lidz
Affiliation:
Department of Linguistics, University of Maryland College Park , United States
*
Corresponding author: Laurel Perkins; Email: perkinsl@ucla.edu
Rights & Permissions [Opens in a new window]

Abstract

In acquiring a syntax, children must detect evidence for abstract structural dependencies that can be realized in variable ways in the surface forms of sentences. In What did David fix?, learners must identify a nonlocal relation between a fronted object of the verb (what) and the phonologically null ‘gap’ in canonical direct object position after the verb, where it is thematically interpreted. How do learners identify a nonadjacent dependency between an expression and something that has no overt phonological form? We propose that identifying abstract syntactic dependencies requires statistical inference over both overt linguistic material and unsatisfied grammatical expectations: noticing when a predicted argument for a verb is unexpectedly missing may serve as evidence for the gap of an argument movement dependency. We provide computational support for this hypothesis. We develop a learner that uses predicted but unexpectedly missing objects of verbs to identify possible gaps of object movement, and identifies which surface morphosyntactic properties of sentences are correlated with these possible movement gaps. We find that it is in principle possible for a learner using this mechanism to identify the majority of sentences with object movement in child-directed English, and that prior knowledge of which verbs require objects provides an important guide for identifying which surface distributions characterize object movement. This provides a computational account for why verb argument-structure knowledge developmentally precedes the acquisition of movement in a language like English. More broadly, these findings illustrate how statistical learning and learning from violated expectations can be combined to novel effect in the domain of language acquisition.

Information

Type
General Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of the Linguistic Society of America
Figure 0

Figure 1. Graphical model for Joint Inference Learner. Nodes correspond to random variables: the observed direct objects X and other features F in each sentence, the transitivity category T and rate of direct objects θ for each verb, the latent ‘category’ c of each sentence, the rate of direct objects δ(X) and other sentence features δ(F) produced by each category, and whether each category produces a transitivity violation e. Arrows denote conditioning relationships between variables.Figure 1. long description.

Figure 1

Table 1. Corpora of child-directed speech.Table 1. long description.

Figure 2

Table 2. Known verbs and transitivity categories assumed by learnerTable 2. long description.

Figure 3

Table 3. Direct objects and morphosyntactic features observed by learner (X and F). The presence of a direct object is the sole feature encoded by X. The remaining twenty-one features are encoded within the feature vector $ \overrightarrow{F} $F→.Table 3. long description.

Figure 4

Table 4. Distribution of underlying clause types in data set.Table 4. long description.

Figure 5

Figure 2. Proportions of clause types in inferred sentence categories, joint inference model.Figure 2. long description.

Figure 6

Figure 3. Accuracy on identifying sentences with object movement in three metrics: precision (proportion of model’s object-gap categories that contain object movement), recall (proportion of object movement in corpus identified by model), and F1 (harmonic mean of precision and recall).Figure 3. long description.

Figure 7

Figure 4. Distribution of movement types in model’s object-gap categories.Figure 4. long description.

Figure 8

Table 5. Proportion of object-movement sentences identified, by verb type.Table 5. long description.

Figure 9

Table 6. Features with significantly higher odds in object-gap categories.Table 6. long description.

Figure 10

Figure 5. Graphical models for (a) no-category baseline and (b) no-transitivity baseline.Figure 5. long description.

Figure 11

Figure 6. Proportions of clause types in sentence categories, no-transitivity baseline.Figure 6. long description.

Figure 12

Figure 7. Distribution of movement types in object-gap categories, no-transitivity baseline.Figure 7. long description.

Figure 13

Table A1. Accuracy of sentence feature and clause-type coding.Table A1. long description.

Figure 14

Table A2. Odds ratios for direct objects within transitivity-violating categories, joint inference model.Table A2. long description.

Figure 15

Table A3. Odds ratios for direct objects within sentence categories, no-transitivity baseline.Table A3. long description.

Figure 16

Table A4. Odds ratios for features F within object-gap categories, joint inference model.Table A4. long description.