Hostname: page-component-5db58dd55d-4jdj6 Total loading time: 0 Render date: 2026-05-25T23:04:18.780Z Has data issue: false hasContentIssue false

Predictive language processing revealing usage-based variation

Published online by Cambridge University Press:  04 June 2018

VÉRONIQUE VERHAGEN*
Affiliation:
Department of Culture Studies, Tilburg University
MARIA MOS
Affiliation:
Department of Communication and Cognition, Tilburg University
AD BACKUS
Affiliation:
Department of Culture Studies, Tilburg University
JOOST SCHILPEROORD
Affiliation:
Department of Communication and Cognition, Tilburg University
*
Address for correspondence: Véronique Verhagen, Department of Culture Studies, Tilburg University, D 418, Postbus 90153, 5000 LE Tilburg, the Netherlands. e-mail: v.a.y.verhagen@tilburguniversity.edu
Rights & Permissions [Opens in a new window]

Abstract

While theories on predictive processing posit that predictions are based on one’s prior experiences, experimental work has effectively ignored the fact that people differ from each other in their linguistic experiences and, consequently, in the predictions they generate. We examine usage-based variation by means of three groups of participants (recruiters, job-seekers, and people not (yet) looking for a job), two stimuli sets (word sequences characteristic of either job ads or news reports), and two experiments (a Completion task and a Voice Onset Time task). We show that differences in experiences with a particular register result in different expectations regarding word sequences characteristic of that register, thus pointing to differences in mental representations of language. Subsequently, we investigate to what extent different operationalizations of word predictability are accurate predictors of voice onset times. A measure of a participant’s own expectations proves to be a significant predictor of processing speed over and above word predictability measures based on amalgamated data. These findings point to actual individual differences and highlight the merits of going beyond amalgamated data. We thus demonstrate that is it feasible to empirically assess the variation implied in usage-based theories, and we advocate exploiting this opportunity.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © UK Cognitive Linguistics Association 2018
Figure 0

table 1. Mean number of responses participants gave per cue; standard deviations between parentheses

Figure 1

table 2. Mean stereotypy scores (on a 0–100 scale); standard deviations between parentheses

Figure 2

Fig. 1. Mean stereotypy score on the two types of stimuli for each individual participant.

Figure 3

Fig. 2. The difference between the mean stereotypy score on Job ad stimuli and the mean stereotypy score on News report stimuli for each individual participant; black bars show each group’s mean difference. A circle below zero indicates that that participant obtained higher stereotypy scores on News report stimuli than on Job ad stimuli.

Figure 4

table 3. Mean percentage of targets words that had been mentioned by the participants in the Completion task; range between parentheses

Figure 5

table 4. Mean Voice Onset Times in seconds; standard deviations between parentheses

Figure 6

Fig. 3. Mean Voice Onset Time on the two types of stimuli for each individual participant.

Figure 7

Fig. 4. The difference between the mean VOT on Job ad stimuli and the mean VOT on News report stimuli for each individual participant; black bars show each group’s mean difference. A circle below zero indicates that that participant responded faster on Job ad stimuli than on News report stimuli.

Figure 8

table 5. Generalized linear mixed-effects model (family: Gaussian) fitted to the voice onset times, using ‘Target not mentioned’ as the reference condition

Figure 9

Fig. 5. Scatterplot of the log-transformed corpus frequency of the target word (lemma), residualized against word length, and the Voice Onset Times, split up according to whether or not the target word had been mentioned by a participant in the preceding Completion task. Each circle represents one observation; the lines represent linear regression lines with a 95% confidence interval around it.

Figure 10

table 6. Mixed-effects logistic regression model (family: binomial) fitted to the responses to the Completion task (0 = does not correspond to a complement in the specialized corpus; 1 = corresponds to a complement in the specialized corpus), using ‘Recruiters–Job ad stimuli’ as the reference condition

Figure 11

table 7. Mixed-effects logistic regression model (family: binomial) fitted to the responses to the Completion task (0 = does not correspond to a complement in the specialized corpus; 1 = corresponds to a complement in the specialized corpus), using ‘Job-seekers–Job ad stimuli’ as the reference condition

Figure 12

table 8. Mixed-effects logistic regression model (family: binomial) fitted to the responses to the Completion task (0 = does not correspond to a complement in the specialized corpus; 1 = corresponds to a complement in the specialized corpus), using ‘Inexperienced–News report stimuli’ as the reference condition

Figure 13

table 9. Generalized linear mixed-effects model (family: Gaussian) fitted to the voice onset times, using ‘Recruiters–Job ad stimuli’ as the reference condition

Figure 14

table 10. Generalized linear mixed-effects model (family: Gaussian) fitted to the voice onset times, using ‘Job-seekers–Job ad stimuli’ as the reference condition

Figure 15

table 11. Generalized linear mixed-effects model (family: Gaussian) fitted to the voice onset times, using ‘Inexperienced–News report stimuli’ as the reference condition