Hostname: page-component-89b8bd64d-n8gtw Total loading time: 0 Render date: 2026-05-06T08:02:20.044Z Has data issue: false hasContentIssue false

Communicative efficiency and the Principle of No Synonymy: predictability effects and the variation of want to and wanna

Published online by Cambridge University Press:  19 April 2022

Natalia Levshina*
Affiliation:
Max Planck Institute for Psycholinguistics
David Lorenz
Affiliation:
University of Rostock
*
*Corresponding author. Email: natalevs@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

There is ample psycholinguistic evidence that speakers behave efficiently, using shorter and less effortful constructions when the meaning is more predictable, and longer and more effortful ones when it is less predictable. However, the Principle of No Synonymy requires that all formally distinct variants should also be functionally different. The question is how much two related constructions should overlap semantically and pragmatically in order to be used for the purposes of efficient communication. The case study focuses on want to + Infinitive and its reduced variant with wanna, which have different stylistic and sociolinguistic connotations. Bayesian mixed-effects regression modelling based on the spoken part of the British National Corpus reveals a very limited effect of efficiency: predictability increases the chances of the reduced variant only in fast speech. We conclude that efficient use of more and less effortful variants is restricted when two variants are associated with different registers or styles. This paper also pursues a methodological goal regarding missing values in speech corpora. We impute missing data based on the existing values. A comparison of regression models with and without imputed values reveals similar tendencies. This means that imputation is useful for dealing with missing values in corpora.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Table 1. Variables tested in this study

Figure 1

Fig. 1. A fragment of the correspondence analysis map with text categories as labels. The analysis was based on co-occurrence frequencies of verbs (not shown) and text categories. Dimension 1 can be interpreted as a contrast between formal (left) vs. informal communication (right), while the vertical dimension can be interpreted as a contrast between informative language (bottom) and language for aesthetic purposes and entertainment (top).

Figure 2

Table 2. Table of coefficients of the Bayesian models

Figure 3

Fig. 2. Probabilities of wanna for different age groups in two models. This is a conditional plot showing the predicted probabilities of wanna depending on the age groups, computed for following values of the predictors: conversations, female speakers, no negation, not a question and mean values of the numeric predictors in the data set with imputed values. The points represent the mean posterior estimates, and the error bars stand for 95% credibility intervals based on MCMC sampling. Red: the model with complete observations only; turquoise: the model with imputed values.

Figure 4

Fig. 3. Interaction between speech rate of a recording and informativity of WANT given word on the left in the model with complete observations. This is a conditional plot showing the predicted probabilities of wanna depending on Speech Rate and Informativity of WANT given the word on the left, computed for the following values of the predictors: conversations, female speakers, Age Group 0, no negation, not a question and mean values of the remaining numeric predictors in the dataset with complete observations only. The lines correspond to the predicted posterior mean values, and the shaded areas are 95% error bands based on MCMC sampling. The blue line corresponds to high values to the model predictions for Info_WANT_given_left (6.17); the green line corresponds to medium values (4.01), and the red line corresponds to low values (1.85).

Figure 5

Fig 4. Random intercepts of different grammatical subjects in both models. The points represent the mean posterior estimates, and the error bars stand for 95% credibility intervals based on MCMC sampling. Red: the model with complete observations only; turquoise: the model with imputed values.

Figure 6

Fig. 5. Interaction between Info_WANT_given_left and Info_WANT_given_Verb in the model with imputed values. This is a conditional plot showing the predicted probabilities of wanna depending on Informativity of WANT given the word on the left and Informativity of WANT given the verb, computed for the following values of the predictors: conversations, female speakers, Age Group 0, no negation, not a question and mean values of the remaining numeric predictors in the dataset with imputed values. The lines correspond to the predicted posterior mean values, and the shaded areas are 95% error bands based on MCMC sampling. The blue line corresponds to the model predictions for high values of Info_WANT_given_Verb (8.32); the green line corresponds to medium values (7.01), and the red line corresponds to low values (5.7).

Supplementary material: File

Levshina and Lorenz supplementary material

Levshina and Lorenz supplementary material

Download Levshina and Lorenz supplementary material(File)
File 13.7 KB