A quantitative model of verb–object order in Middle English with special reference to the prose–poetry distinction

This article reports a regression model for the change from OV to VO in Middle English. It focuses on genre (prose versus poetry) as a predictor by including data from a recently published corpus, the Parsed Corpus of Middle English Poetry (PCMEP; Zimmermann 2015). Other independent variables considered are time, object type, clause type and weight. The results specify the time course of the development in Middle English with great precision and replicate several effects from the previous literature including the importance of the genre variable, poetry being considerably more conservative than prose. It is recommended that poetry texts should be considered in studies on early Middle English syntax more generally to arrive at comprehensive assessments of linguistic changes at the time.


Introduction
Old and Middle English witness a switch in verb phrase headedness from a final to an initial configuration. The phrase structure trees in (1) sketch the resulting change from object (O) verb (V) to verb-object order. (1) Despite the fact that the development has received great attention in the literature (e.g. Canale 1978 and much subsequent work), researchers have only recently begun to model (2021), it includes c. 216,000 words in 50 poems and continues to grow at a rate of about 20,000 word tokens per year. It is parsed according to the same guidelines as its much larger sister corpus, the Penn-Parsed Corpus of Middle English (PPCME2; Kroch & Taylor 2000a). Researchers familiar with the PPCME2 thus do not have to learn a new annotation scheme but can use their search queries directly on both corpora.
As an illustration, consider the corpus annotation for sentence (2a) shown in (2b). Note in particular the OV structure relevant to this paper in the last line. It uses the label for direct object, NP-OB1, sandwiched between a modal, MD, and a non-finite main verb, VB.
The PCMEP holds great potential for researchers working on ME syntax. The addition of its texts can (i) reduce the risk of erroneous generalisations based on a small number of texts with strong idiosyncrasies, (ii) give a fuller picture of a point of variation, for instance with respect to genre or dialect, and (iii) provide more realistic quantitative estimates for factors influencing a linguistic development. It is therefore advisable that ME poetry should be used to supplement the prose not just for the variation in OV/VO order discussed here, but for studies on ME syntax more generally.

Data collection and coding
Following the reasoning in Kroch & Taylor (2000b), studies on the variation between OV/ VO should only consider cases in which a direct object co-occurs with a non-finite main verb following a finite auxiliary or modal. 4 I therefore collected all such clauses from the PPCME2 and PCMEP. 5 Figure 1. Temporal distribution and size of poetry and prose texts (from Zimmermann 2020: 11) 4 In essence, clauses with a finite main verb and an object are not diagnostic of the headedness of the verb phrase because early English places finite verbs in a projection above VP as evidenced by facts from inversion and adverb placement. Clauses with final finite verbs are ignored because they are very rare and do not allow VO order. For an alternative envelope of variation containing OV-auxiliary versus V-auxiliary-O clauses, see e.g. Taylor & Pintzuk (2011). 5 Predicates of non-finite copulas are also included as direct objects in this study because the parsing scheme of the corpora requires the same label for both.  (IacoIose,7.206.204) The following independent variables were included: (b) Every text file was coded for its genre, prose versus poetry. The genre distinction will be a major focus of the subsequent modelling. (c) Every text was assigned an approximate date of composition. 7 (d) Clause type was a binary variable with the levels main versus subordinate clause. The corpus annotation guidelines involve some ambiguities (e.g. for-clauses can be parsed as either), but serve as an objective and replicable standard. (e) The presence or absence of an overt subject was registered. (f) Object type distinguished between pronominal, definite, indefinite, negative and quantified objects. Pronoun objects consisted of a single personal pronoun. Definite objects contained a th-based determiner, possessor or proper name. Indefinite objects involved other determiners, bare nouns or the word other. Negative objects included a quantifier starting with n-(e.g. no), quantified objects other quantifiers or numerals. These conditions were checked in the order pronoun > negative > quantified > definite > indefinite. Hence, conflicts are resolved according to this cascade. For instance, (4) involves both a quantifier, al, and a determiner, this, but the object was classified as quantified because this type appears earlier in the coding sequence. In total, the dataset comprises 12,419 examples. Of those, 1,955 have conservative OV order and 10,464 innovative VO order. A total of 9,570 instances are found in prose and 2,849 in poetic texts.

Results
I fitted mixed-effects logistic regression models to the coded data. The model I would like to propose is given in table 1. I present models for pronominal and nominal objects separately. They predict the word order variation between VO and OV from the independent variables outlined in the last section and control for clustering within texts. Every predictor is given a short abbreviation, which will be referenced in the subsequent discussion.
'Year' is a highly significant predictor for the occurrence of VO order. As time passes, VO gradually becomes more probable (P1, N1). However, not only does the significant 'Genre' effect indicate greater conservativeness of the poetry overall, more pronounced for pronoun objects (P2) than for nominal ones (N2), but the significant 'Year:Genre' interaction also shows that the rate of change is slower in that genre. The poetry 'slowdown' is substantial for pronominal objects (P3), and so great that it almost seems to stall the development altogether for nominal objects (N3). Figure 2 illustrates these points.
As a nominal object becomes heavier, the probability of VO increases (N4). In fact, once an object is at least 9 words long, it almost categorically occurs after the verb. I found only two exceptions, the most extreme of which, an OV configuration with a heavy, 18-word object, is presented in (5).
for not shall you no-ways [O þeos ilke twa cumforz min & þe worldes. these same two comforts my and the world's þe Ioie of þe hali gast. & þe flesches frofre] the joy of the holy ghost and the flesh's comfort [V habben] to gederes.
have to gether 'For you shall in no way have at the same time these two comforts, mine and the world's, the joy of the holy ghost and the comfort of the flesh.' (CMANCRIW-1,II.81.972) Another influence I have ignored is language contact. It is exceedingly difficult to demonstrate that some early ME text should be regarded as influenced by another language because of the anecdotal nature of textual transmission during that period. For a warning example, see the criticism in Cloutier (2005).
One should also keep in mind that there are further sources of noise in the data. I will give a few examples: (i) Lack of control for social variables. This is, I believe, an unsolvable problem in ME. (ii) Subgenres. Subgenres could be modelled as a random effect, but this only makes sense once the variable has, say, 10-100 different levels. This is a hard but perhaps not altogether unfeasible task, at least for the state of English literature from 1400 on. (iii) Palaeographical problems. Coding errors may be caused by issues of manuscript copying and updating. I do not see sufficient justification to posit further interaction effects between 'Weight', 'Year' and 'Genre'. The 'Weight' slopes are somewhat steeper in the poetry than in the prose, and the slope for prose levels off as VO becomes the norm. However, these effects may be spurious. The diachronic increase in the prose may be more pronounced simply because there is a lack of prose texts in the data between 1250 and 1350, precisely when the poetry texts are designed to supplement the gap. This might make the slopes of the prose texts appear more advanced simply because the prose texts are also on average younger. I will therefore model the effect of object heaviness as identical between the two genres and additive with time.
Main clauses have a higher chance of occurring with VO than subordinate clauses for pronoun objects (P4). Interestingly, this only holds for prose, not for poetry (P5).
The effect of clause type on word order is more complicated for nominal objects. Here, main clauses show a relatively linear relation between weight and word order. Subordinate clauses, on the other hand, have a lower probability of VO, but this is due in large part to greater conservativeness of 2-word objects. More research is needed to evaluate this surprising finding. The effect flattens the weight slope in subordinate clauses, leading to a significant interaction effect between 'Clause Type' and 'Weight' (N5, N6). These points are illustrated in figure 3.
Finally, indefinite objects have a significantly higher, quantified and negatively quantified a lower, probability of occurring with VO order than definite objects (N7, N8, N9). Furthermore, negative objects innovate VO order at a slower rate than the  Figure 4 illustrates. I assume, for the same reasons as above, that additional interactions, especially with 'Genre', are accidental and are therefore not included in the model. The variance of the random text intercepts is substantial especially for pronominal objects. This shows that it is essential to control for the clustering of examples within individual texts to avoid misleading coefficient estimates.

Discussion
I will now put the quantitative findings above into the context of existing studies. First, the rise of VO in Middle English can be measured in prose and poetry. Both sources of data are therefore useful as evidence. However, the change proceeds in a markedly different manner in prose and poetry. The latter genre is considerably more conservative (replicating Foster & van der Wurff 1995 and also innovates the new order more slowly (a new finding in this explicit form). One could attempt to establish a typology of syntactic features that are more or less prone to sustained variation in poetry. Shorter, morphosyntactic features may be harder to manipulate consciously whereas longer, macrostructural patterns are easily exploited for poetic effect. This assumption could begin to explain why there is no marked genre-specific differential in the rise of when as a subordinator (Zimmermann 2020) or the development of Jespersen's Cycle (Truswell et al. 2019), whereas the loss of null subjects (Walkden & Rusten 2017) or the change in the relative order of verb and object clearly proceed differently between prose and poetry.
A related perspective on the diachronic genre differences is offered by the notion of 'poetic function' (Jakobson 1960), poetry's focus on message formulation and linguistic creativity, utilising archaic features as readily as colloquial ones for aesthetic effect (e.g. Semino 2002). Most importantly, conservative OV patterns may persist to accommodate rhymes on the verb, although other artistic uses are conceivable (e.g. Fisher 1992: 373). These poetic functions may obfuscate the true rate of VO order associated with a given time, thus adding noise not encountered in the prose. Indeed, the variance of deviance residuals 9 of the model for texts before 1400 indicates larger between-text variability and poorer predictivity for poetry (1.29) than for prose (0.43).
One can also think of the findings in terms of 'archaisation', a concept by which I mean a gradual divergence in the frequency of use of a linguistic feature between idiomatic usage, where the decline is faster, relative to a more marked genre, in which the loss is slower, leading to the feature's association with that genre and its eventual niche survival. Here, prose is closer to spoken language and loses VO more quickly, while poetry is less natural and abandons VO more slowly, so that OV patterns are re-evaluated as distinctly 'poetic'. 10 The concept can be applied to a wide range of phenomena, e.g. the restriction of be come or thou to hyper-formal or biblical registers. It bears repeating that VO order reaches a probability of more than 99 per cent of use in prose in the fifteenth century. VO should thus be regarded as the norm from that time on ( pace some authors who sketch the time course differently, e.g. 'the VO base was dominant by 1200' (Kiparsky 1996: 144 11 ), 'the change took place … through roughly the late 17th c.' (Wallenberg et al. 2021: 3); see the excellent summary in Fischer et al. 2000: ch. 5). In fact, the data presented here allow a refinement of a plausible end point of the change in prose, datable to the early to mid fifteenth century.
The weight variable is implicated in a likely reason for the emergence of VO. The earliest medieval English is sometimes regarded as underlying OV plus a right-ward extraposition process (e.g. Lightfoot 1979;Kemenade 1987). This early extraposition is mainly driven by weight. Indeed, the ancient poem Beowulf shows post-verbal objects predominantly with heavy objects / across metrical breaks (Pintzuk & Kroch 1989;Taylor 2005). Subsequently, lighter objects begin to appear post-verbally and eventually a new underlying VO base is said to come about as a result. It would in principle be possible to derive a testable prediction from this account: if weight and time are merely additive, VO orders would emerge in addition to and independently of weight constraints. If on the other hand there is a significant interaction effect between weight and time such that VO rises faster with light objects, the new word order could be interpreted as feeding on extraposed objects, which would support a causal association. As stated above, the data are unfortunately too sparse and too noisy to ascertain with reasonable certainty if such an interaction effect should be postulated.
Ever since Lightfoot's (1991) proposal for a learning bias leading children to acquire linguistic structures predominantly from unembedded clauses (degree-0 learnability), main clauses have been seen to be overall more innovative than subordinate clauses. A corresponding clause type effect has been identified in the literature for the rise of VO (e.g. Pintzuk & Taylor 2006: 254-5). On the other hand, Stockwell & Minkova (1991) articulate the claim that subordinate clauses lead the way in the transition from OV to VO. While my model does include effects that support the general innovativeness of main clauses, it also suggests that the relationship between word order and clause type is much more complicated.
Pronoun objects in poetic texts do not provide evidence for the advanced status of main clauses. Nominal objects show a greater dependence on weight in subordinate than main clauses because, for unknown reasons, 2-word objects are more conservative than others in that environment. In general, it remains somewhat vague how the discovery of any distributional difference between clause types in historical corpora would impact on Lightfoot's proposal. There is thus room for a more detailed theory of clause type interactions of the type presented here. Perhaps it is possible to explain the complex influence of clause type on word order variation at least partially in terms of information density (Wallenberg et al. 2021).
The object type effects largely replicate previous findings. The more advanced status of indefinite as compared to definite objects most likely reduces to information-structural constraints already present in Old English. Indefinite objects tend to be new, and newness is a strong predictor for VO order (e.g. Taylor & Pintzuk 2012). Inversely, definite objects are often old and givenness predicts OV order (e.g. Struik & Kemenade 2020). The distinction appears to be less rigorous in my Middle English data than in previous studies (e.g. the repeated formulation of OV as 'reserved' for given objects in Struik & Kemenade 2020) perhaps because poetic requirements like rhyme can easily override general processing constraints. This is illustrated in (6), where an indefinite object, 'a present', new to the addressee, nevertheless appears with an OV configuration, quite possibly to enable a rhyme on the main verb.

(6) ffor I haue [O a presante] [V brouȝt]
for I have a present bought ffro hym þat made all thyng of nowȝht from him that made all things of nought 'for I have brought a present // from him who has made all things from nothing' (SirCleges,49.306.164) The finding that quantified, and particularly negatively quantified, objects have a significantly higher probability of occurring with OV than other objects in Middle English ties in seamlessly with identical constraints in the previous Old English (Pintzuk & Taylor 2004, 2006 and subsequent Early Modern English periods (Moerenhout & van der Wurff 2000;Ingham 2002). The constraint eventually leads to a narrowing of OV predominantly to these contexts in late Middle and Early Modern English. One interesting question that does not seem to have received great attention so far is to what extent this state of affairs was sustained by influence from French, as illustrated in (7). The issue might be resolved by separating negative indefinite pronouns (nothing) from negatively quantified nominals (no + noun).

RICHARD ZIMMERMANN
It has been claimed that subjectless clauses are another context permitting OV alignments in late prose (van der Wurff 1997). However, 'Subject' (overt versus null) did not emerge as a significant variable in any of my models. Perhaps the special context can be explained as a late exaptation of remnant OV order for a new use (e.g. Traugott 2004). More generally, the finding raises the important methodological question of how to distinguish between unquantifiable and non-existent effects.
I will spend the remainder of my discussion on what I believe to be a fundamental and recurring problem arising when historical linguists attempt to map estimates of effects from quantitative models onto phrase-structural grammar models. I will illustrate the problem with reference to the results from the present study.
One can make a rough distinction between two types of studies of historical syntax, which I will call 'statistical' and 'generative'. Statistical treatments attempt to predict the rate of a variant of a linguistic variable from quantifiable and additive factors. Generative treatments, on the other hand, attempt to build a grammar algorithm that can parse a string as a hierarchical phrase structure tree or reject it as ungrammatical. This article is an example of a statistical study, modelling variation in the form of a regression analysis, and not a generative study, as I have not contributed to the development of an explicit, phrase-structural model of Middle English syntax.
Generative analyses are important because they model syntactic structures in a computationally rigid way (e.g. through merge in Minimalism, context-free rewrite rules etc.). For example, the innovation in VO order can be modelled as the gradual emergence of speakers' knowledge to put together an initial verb with a subsequent object to form a VP, as shown in (1b). In particular, generative models can reveal structural ambiguities. This is a crucial contribution to historical syntax, where it is often the case that linguistic examples may represent an innovative structure of interest, but may also instantiate an alternative parse. In the context of reanalysis, such cases are often referred to as bridge contexts. For example, the existence of ME patterns such as V-X-O (where X can be any constituent) shows that a realistic grammar algorithm for that language stage must be able to handle post-posing structures (extraposition, rightward raising, heavy NP shift etc.). Consequently, the string VO should also be parsable with such an option resulting in ambiguity between two hierarchies, say, underlying VO, […[VO]], and post-posed VO, [[…V]O]. 12 The methodological question I wish to draw attention to is this: how can one reconcile the prediction of the frequency of word order patterns from statistical models with the insight that some tokens may instantiate extraneous rather than the relevant phrase structure parses from generative models?
In the present case, a logistic regression model 13 predicts VO order from a number of independent variables, but it is wholly uninformative about the frequency of distinct phrase structural parses of these VO patterns, such as an integral VP versus post-position.
While I do not currently have a good solution for this problem, I can see a number of possible paths forward. A first attempt may be to break down the statistical model for specific constructions that are assumed to diagnose the phrase structure of interest unambiguously. For instance, pronominal objects cannot readily post-pose and are hence less problematic with respect to the ambiguity discussed. Generative models can make phrase structure rules sensitive to a categorical difference between pronominal and full nominal phrases. In this way, the grammatical difference can map directly onto separate statistical models. Indeed, I have presented separate models for pronominal and nominal objects above. This method, however, has a number of shortcomings. It is inadequate for distinctions that are probabilistic in nature. For example, the significant effect of object type cannot readily be addressed in this way. In fact, some linguists may not assume a categorical difference even for pronominal and full noun phrases, but rather regard the difference as falling at the extreme end of a spectrum regulating the probability of post-position, since demonstrably post-posed pronoun objects, although extremely rare, are not entirely absent in Middle English. Furthermore, there will often be so many structural ambiguities that no one sensible subdivision is obvious. Not infrequently may subdivisions require the isolation of some data that disambiguates a parse of interest and at the same time the inclusion of the same data in the combined dataset as it is ambiguous with respect to some other kind of syntactic ambiguity. For example, pronominal objects do not normally post-pose, but they frequently pre-pose, creating ambiguous parses for OV orders. At best, the separate model for pronominal objects therefore estimates a lower bound for the true rate of underlying VO order with pronouns.
Another avenue could be inspired by the idea to correct the total counts of an ambiguous syntactic structure by estimating the rates of their distinct parsing alternatives. For example, one could compare V-X-O structures, where O must be post-posed, to V-O-X structures, where O is unlikely to have post-posed, to arrive at an estimate of the rate of post-position. This estimate can be used to calculate the proportion of post-posed objects in ambiguous VO patterns. For an excellent example of this methodology in connection with information structure, see Taylor & Pintzuk (2011). However, this approach has problems as well. One cannot be sure that the rate of a particular parse estimated independently from one diagnostic will carry over to another environment. I agree that it is fruitful to work under the assumption 'that postposing applies at a constant rate' (Pintzuk & Taylor 2006: 266) in all environments, but this is not a necessity. In the example above, the rate of post-position in contexts with an additional element X may be different from the rate of post-position in contexts without such an element. I also worry that the method may stretch the relatively small early English corpora beyond their limits so that fluctuations by random factors, like individual texts, may influence the results. I therefore believe that it may be more advantageous to attempt a principled mapping of regression coefficients to a probability for the application of a syntactic process.
Why do I contend that this is an important problem? First, generative analyses can make unfalsifiable claims about the prevalence and time of birth or death of underlying orders because they can freely interpret ambiguous tokens that could but do not have to be parsed according to the underlying order of interest. Exaggerating to illustrate the point, I could claim that VO order arises extremely late, say in 1400, and extremely fast, say within 10 years, by analysing all prior VO strings as results of post-position processes. Intuitively, the best way to inoculate against such absurdities is to tie historical generative models to quantitative corpus data. Yet statistical models do not actually resolve the issue in principle at the moment. Second, an enhanced synthesis of statistical and generative models can help to avoid ideological fragmentation in the field. Usage-based linguists may interpret the output from statistical models as directly representative of speakers' knowledge thus paying little attention to the reality of structural ambiguity. Generativists may struggle to include probabilistic effects in their discrete phrase structure rules. If successful, a synthesis could build strong bridges between historical syntacticians of different persuasions and between their ideas.

Conclusion
This article investigated the ME change from OV to VO quantitatively with a particular focus on the distinction between prose and poetry. The main findings are as follows: (i) The time course of the change is complex. Referring to positive objects in prose texts and abstracting away from syntactic pre-and post-posing processes, one can say that the transitional period comes to its end in the early to mid fifteenth century. Previous assessments that locate the end point of the change considerably earlier should be reassessed in light of the large textual base and conditioning factors considered here. (ii) The prose-poetry contrast emerges as a central determinant for the change. Poetry is more conservative overall and innovates the new order more slowly. The variable is more important than object type (information structure) or clause type. Studies on the variation between OV/VO in early English should therefore not ignore this crucial variable. Furthermore, claims about the existence of OV word order after c. 1400 should explicitly refer to genre categories in order to avoid misleading statements. (iii) ME poetic texts have great value as a bridge during M2, 1250-1350, when prose texts are exceedingly rare. At the same time, their inclusion must accommodate genre-specific patterns. Here, poetic texts lend strong support to a gradual rather than abrupt decline in OV throughout M2 but also behave so fundamentally differently from the prose that one cannot treat them as a mere supplement. Poetry is at once an illuminating and an obfuscating source of data. I have mentioned limitations, possible extensions and open questions throughout the article. I will repeat only the most pressing issue: there is a great need to develop a methodology that can map precisely between frequency predictions for string-based levels of a dependent linguistic variable from regression models and their hierarchical, phrase-structural parses from explicit grammatical theories, which, crucially, may involve structural ambiguity.
The change from OV to VO in early English is a decades-old and well-studied research topic. Historical linguists have learned about its long-term development as well as its fine-grained expression in individual texts or examples, its potential causes, dialectal distribution and a host of significant predictors. With a lot of the low-hanging fruit picked, it is to be expected that progress will now be made more slowly, by incremental processes of revisiting and reconceptualising some of the relevant issues. I hope the present article will function as one piece in the ongoing effort to put this jigsaw puzzle together.