Hostname: page-component-5db58dd55d-d6ndz Total loading time: 0 Render date: 2026-06-03T06:33:51.040Z Has data issue: false hasContentIssue false

Unpredictable grammatical choices are not harder than predictable ones

Published online by Cambridge University Press:  23 March 2026

Thomas Van Hoey
Affiliation:
Fonds voor Wetenschappelijk Onderzoek, Belgium and KU Leuven, Brussel, Belgium
Matt Hunt Gardner*
Affiliation:
Queen Mary University of London, London, UK
Ruiming Ma
Affiliation:
KU Leuven, Leuven, Belgium
Benedikt Szmrecsanyi
Affiliation:
KU Leuven, Leuven, Belgium
*
Corresponding author: Matt Hunt Gardner; Email: matthuntgardner@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

Recent variationist research indicates that grammatical intra-speaker variation (or: optionality) is unproblematic in speech production. Optionality contexts do not coincide with dysfluencies. In this study, we ask a theoretically significant follow-up question about optionality contexts that strongly cue grammatical variant choice: are optionality contexts in which all variants are probabilistically equally likely more problematic than those that strongly cue choice of grammatical variant? After all, unbiased (un-cued or freer) choices are sometimes theorized as being more difficult than biased (cued) ones. We empirically analyzed a subset of the SWITCHBOARD corpus of spoken American English on a turn-by-turn basis. The dataset covers 7,295 conversational turns containing 7,001 optionality contexts (spread over 20 grammatical alternation types), 2,970 filled pauses, and 41,297 unfilled pauses. Contrary to claims in the literature, weak probabilistic cueing does not trigger more production difficulties than strong probabilistic cueing. Unpredictable grammatical choices are not harder than predictable grammatical choices.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press or the rights holder(s) must be obtained prior to any commercial use.
Copyright
© The Author(s), 2026. Published by Cambridge University Press.
Figure 0

Table 1. Datasets and dependent variables under study in previous SWITCHBOARD research. Szmrecsanyi et al. (in press) offers a short synopsis of the studies cited here

Figure 1

Table 2. Demographics of the SWITCHBOARD corpus

Figure 2

Table 3. Summary of alternations with distribution of major variants considered for 35 young women from the South Midlands Dialect Area in SWITCHBOARD

Figure 3

Figure 1. Conditional random forest metrics per alternation. Models were tuned for number of trees and iterations. Metrics include accuracy (x-axis) and concordance C-value (y-axis). For binary models, the C-value is the same as the area under curve AUC value; for the ternary alternation #18 (Quotatives), we made use of AUNP, that is, a macro-weighted multiclass metric for calculating the area under the curve for each class against the rest, using the a priori class distribution.

Figure 4

Figure 2. Histogram of deviance values per alternation. Notice that most alternations have a peak at the 0.5 mark.

Figure 5

Table 4. Mixed-effects linear regression model with min-max scaled speech dysfluency as dependent variable and number of alternations per turn, turn duration, mean word length, and speech rate as fixed effects predictors and individual speaker as a random effect

Figure 6

Table 5. Mixed-effects linear regression model with min-max scaled speech dysfluency as dependent variable; turn duration, mean word length, and speech rate as fixed effect predictors; and alternation type as a random effect

Figure 7

Table 6. Mixed-effects linear regression model with min-max scaled speech dysfluency as dependent variable, and turn duration, mean word length, speech rate, and turn mean deviance as predictors

Figure 8

Figure 3. Estimates and confidence intervals (estimates ± standard error) for adjustments to intercept by grammatical alternation type in the enhanced baseline model. Response variable: number of dysfluencies by turn.