Skip to main content

Predictive language processing revealing usage-based variation


While theories on predictive processing posit that predictions are based on one’s prior experiences, experimental work has effectively ignored the fact that people differ from each other in their linguistic experiences and, consequently, in the predictions they generate. We examine usage-based variation by means of three groups of participants (recruiters, job-seekers, and people not (yet) looking for a job), two stimuli sets (word sequences characteristic of either job ads or news reports), and two experiments (a Completion task and a Voice Onset Time task). We show that differences in experiences with a particular register result in different expectations regarding word sequences characteristic of that register, thus pointing to differences in mental representations of language. Subsequently, we investigate to what extent different operationalizations of word predictability are accurate predictors of voice onset times. A measure of a participant’s own expectations proves to be a significant predictor of processing speed over and above word predictability measures based on amalgamated data. These findings point to actual individual differences and highlight the merits of going beyond amalgamated data. We thus demonstrate that is it feasible to empirically assess the variation implied in usage-based theories, and we advocate exploiting this opportunity.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Predictive language processing revealing usage-based variation
      Available formats
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Predictive language processing revealing usage-based variation
      Available formats
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Predictive language processing revealing usage-based variation
      Available formats
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (, which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Corresponding author
Address for correspondence: Véronique Verhagen, Department of Culture Studies, Tilburg University, D 418, Postbus 90153, 5000 LE Tilburg, the Netherlands. e-mail:
Hide All

We thank Louis Onrust and Antal van den Bosch for their help in analyzing the corpus data, Sanneke Vermeulen for her help in collecting the experimental data, and Elaine Francis and two reviewers for their helpful comments and suggestions on this manuscript.

Hide All
Arnon, I., & Snider, N. (2010). More than words: frequency effects for multi-word phrases. Journal of Memory and Language, 62, 6782.
Baayen, R. H., Davidson, D. J., & Bates, D. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390412.
Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. J. (2004). Visual word recognition for single syllable words. Journal of Experimental Psychology General, 133, 283316.
Bar, M. (2007). The proactive brain: using analogies and associations to generate predictions. Trends in Cognitive Sciences, 11, 280289.
Bar, M., Neta, M., & Linz, H. (2006). Very first impressions. Emotion, 6(2), 269278.
Barlow, M., & Kemmer, S. (2000). Usage-based models of language. Stanford, CA: CSLI Publications.
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: keep it maximal. Journal of Memory and Language, 68(3), 255278.
Bates, D. M., Mächler, M., Bolker, B. M., & Walker, S. C. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 148.
Boersma, P., & Weenink, D. (2015). Praat: doing phonetics by computer [Computer program]. Version 5.4.06, retrieved 21 February 2015 from <>.
Borensztajn, G., Zuidema, W., & Bod, R. (2009). Children’s grammars grow more abstract with age – evidence from an automatic procedure for identifying the productive units of language. Topics in Cognitive Science, 1, 175188.
Brothers, T., Swaab, T. Y., & Traxler, M. J. (2015). Effects of prediction and contextual support on lexical processing: prediction takes precedence. Cognition, 136, 135149.
Brothers, T., Swaab, T. Y., & Traxler, M. J. (2017). Goals and strategies influence lexical prediction during sentence comprehension. Journal of Memory and Language, 93, 203216.
Bybee, J. (2010). Language, usage and cognition. Cambridge: Cambridge University Press.
Caldwell-Harris, C., Berant, J., & Edelman, Sh. (2012). Measuring mental entrenchment of phrases with perceptual identification, familiarity ratings, and corpus frequency statistics. In Divjak, D. & Gries, S. (Eds.), Frequency effects in language representation (pp. 165194). Berlin: Mouton de Gruyter.
Carlsson, K., Petrovic, P., Skare, S., Petersoon, K. M., & Ingvar, M. (2000). Tickling expectations: neural processing in anticipation of a sensory stimulus. Journal of Cognitive Neuroscience, 12(4), 691703.
Chen, S., & Goodman, J. (1999). An empirical study of smoothing techniques for language modeling. Computer Speech and Language, 13, 359394.
Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181204.
Connine, C. M., Mullennix, J., Shernoff, E., & Yelen, J. (1990). Word familiarity and frequency in visual and auditory word recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 10841096.
Croft, W. (2000). Explaining language change: an evolutionary approach. London: Longman.
Dąbrowska, E. (2008). The effects of frequency and neighbourhood density on adult speakers’ productivity with Polish case inflections: an empirical test of usage-based approaches to morphology. Journal of Memory and Language, 58, 931951.
Dąbrowska, E., & Lieven, E. (2005). Towards a lexically specific grammar of children’s question constructions. Cognitive Linguistics, 16(3), 437474.
Dambacher, M., Kliegl, R., Hofmann, M., & Jacobs, A. M. (2006). Frequency and predictability effects on event-related potentials during reading. Brain Research, 1084(1), 89103.
De Deyne, S., & Storms, G. (2008). Word associations: norms for 1,424 Dutch words in a continuous task. Behavior Research Methods, 40, 198205.
DeLong, K., Urbach, T., & Kutas, M. (2005). Probabilistic word pre-activation during language comprehension inferred from electrical brain activity. Nature Neuroscience, 8(8), 11171121.
Fernandez Monsalve, I., Frank, S. L., & Vigliocco, G. (2012). Lexical surprisal as a general predictor of reading time. Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 398408). Avignon: Association for Computational Linguistics.
Ferrand, L., Brysbaert, M., Keuleers, E., New, B., Bonin, P., Méot, A., Augustinova, M., & Pallier, C. (2011). Comparing word processing times in naming, lexical decision, and progressive demasking: evidence from Chronolex. Frontiers in Psychology, 2(306), 110.
Fitzpatrick, T., Playfoot, D., Wray, A., & Wright, M. (2015). Establishing the reliability of word association data for investigating individual and group differences. Applied Linguistics, 36, 2350.
Forster, K., & Chambers, S. (1973). Lexical access and naming time. Journal of Verbal Learning and Verbal Behavior, 12(6), 627635.
Frank, S. L. (2013). Uncertainty reduction as a measure of cognitive load in sentence comprehension. Topics in Cognitive Science, 5, 475494.
Frank, S. L., Otten, L. J., Galli, G., & Vigliocco, G. (2015). The ERP response to the amount of information conveyed by words in sentences. Brain & Language, 140, 111.
Gardner, M. K., Rothkopf, E. Z., Lapan, R., & Lafferty, T. (1987). The word frequency effect in lexical decision: finding a frequency-based component. Memory and Cognition, 15, 2428.
Goldberg, A. E. (2006). Constructions at work: the nature of generalization in language. Oxford: Oxford University Press.
Huettig, F. (2015). Four central questions about prediction in language processing. Brain Research, 1626, 118135.
Jaeger, T. F. (2008). Categorical data analysis: away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434446.
Johnson, P. C. D. (2014). Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models. Methods in Ecology and Evolution, 5, 944946.
Kaiser, E. (2013). Experimental paradigms in psycholinguistics. In Podesva, R. & Sharma, D. (Eds.), Research methods in linguistics (pp. 135168). Cambridge: Cambridge University Press.
Kemp, N., Mitchell, P., & Bryant, P. (2017). Simple morphological spelling rules are not always used: Individual differences in children and adults. Applied Psycholinguistics, 38, 10711094.
Kilgarriff, A. (2001). Comparing corpora. International Journal of Corpus Linguistics, 6(1), 137.
Kirsner, K. (1994). Implicit processes in second language learning. In Ellis, N. (Ed.), Implicit and explicit learning of languages (pp. 283312). San Diego, CA: Academic Press.
Kliegl, R., Grabner, E., Rolfs, M., & Engbert, R. (2004). Length, frequency, and predictability effects of words on eye movements in reading. European Journal of Cognitive Psychology, 16(1/2), 262284.
Kristiansen, G., & Dirven, R. (2008). Cognitive sociolinguistics: language variation, cultural models, social systems. Berlin: Mouton de Gruyter.
Kuperberg, G. R., & Jaeger, T. F. (2016). What do we mean by prediction in language comprehension? Language, Cognition and Neuroscience, 31(1), 3259.
Kutas, M., DeLong, K. A., & Smith, N. J. (2011). A look around at what lies ahead: prediction and predictability in language processing. In Bar, M. (Ed.), Predictions in the brain: using our past to generate a future (pp. 190207). New York: Oxford University Press.
Levy, R. (2008). Expectation-based syntactic comprehension. Cognition, 106(3), 11261177.
Lieven, E., Salomo, D., & Tomasello, M. (2009). Two-year-old children’s production of multiword utterances: a usage-based analysis. Cognitive Linguistics, 20(3), 481507.
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, R. H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305315.
McDonald, S. A., & Shillcock, R. C. (2003). Eye movements reveal the on-line computation of lexical probabilities during reading. Psychological Science, 14(6), 648652.
McEvoy, C. L., & Nelson, D. L. (1982). Category name and instance norms for 106 categories of various sizes. American Journal of Psychology, 95, 581634.
McNamara, T. P. (2005). Semantic priming: perspectives from memory and word recognition. Hove: Psychology Press.
Misyak, J. B., & Christiansen, M. H. (2012). Statistical learning and language: an individual differences study. Language Learning, 62(1), 302331.
Misyak, J. B., Christiansen, M. H., & Tomblin, J. B. (2010). Sequential expectations: the role of prediction-based learning in language. Topics in Cognitive Science, 2, 138153.
Otten, M., & Van Berkum, J. (2008). Discourse-based anticipation during language processing: prediction or priming? Discourse Processes, 45, 464496.
Pickering, M. J., & Ferreira, V. S. (2008). Structural priming: a critical review. Psycholinguistic Bulletin, 134(3), 427459.
R Core Team (2017). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Online: <>.
Rayner, K., Ashby, J., Pollatsek, A., & Reichle, E. D. (2004). The effects of frequency and predictability on eye fixations in reading: implications for the E-Z reader model. Journal of Experimental Psychology: Human Perception and Performance, 30(4), 720732.
Rayson, P., & Garside, R. (2000). Comparing corpora using frequency profiling. Proceedings of the Workshop on Comparing Corpora, held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, 16.
Roark, B., Bachrach, A., Cardenas, C., & Pallier, C. (2009). Deriving lexical and syntactic expectation-based measures for psycholinguistic modeling via incremental top-down parsing. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (Singapore), 324333.
Roland, D., Yun, H., Koenig, J.-P., & Mauner, G. (2012). Semantic similarity, predictability, and models of sentence processing. Cognition, 122, 267279.
Schäfer, R., & Bildhauer, F. (2012). Building large corpora from the web using a new efficient tool chain. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12).
Schmid, H.-J. (2015). A blueprint of the Entrenchment-and-Conventionalization Model. Yearbook of the German Cognitive Linguistics Association, 3, 127.
Simmons, W. K., Martin, A., & Barsalou, L. W. (2005) Pictures of appetizing foods activate gustatory cortices for taste and reward. Cerebral Cortex, 15, 16021608.
Smith, N. J., & Levy, R. (2011). Cloze but no cigar: the complex relationship between cloze, corpus, and subjective probabilities in language processing. Proceedings of the 33rd Annual Conference of the Cognitive Science Society (pp. 16371642). Austin, TX: Cognitive Science Society.
Smith, N. J., & Levy, R. (2013). The effect of word predictability on reading time is logarithmic. Cognition, 128(3), 302319.
Stolcke, A. (2002). SRILM – an extensible language modeling toolkit. Proceedings of the International Conference on Spoken Language Processing (pp. 901904). Denver, Colorado.
Street, J., & Dąbrowska, E. (2010). More individual differences in language attainment: How much do adult native speakers of English know about passives and quantifiers? Lingua, 120(8), 20802094.
Street, J., & Dabrowska, E. (2014). Lexically specific knowledge and individual differences in adult native speakers’ processing of the English passive. Applied Psycholinguistics, 35(1), 97118.
Taylor, W. L. (1953). ‘Cloze’ procedure: a new tool for measuring readability. Journalism Quarterly, 30, 415433.
Tomasello, M. (2003). Constructing a language: a usage-based theory of language acquisition. Cambridge, MA: Harvard University Press.
Traxler, M. J., & Foss, D. J. (2000). Effects of sentence constraint on priming in natural language comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26(5), 12661282.
University of Twente, Human Media Interaction (n.d.). Twente News Corpus (TwNC): a multifaceted Dutch news corpus. Retrieved from <>.
Van Berkum, J. J. A., Brown, C. M., Zwitserlood, P., Kooijman, V., & Hagoort, P. (2005). Anticipating upcoming words in discourse: evidence from ERPs and reading times. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 443467.
Wells, J. B., Christiansen, M. H., Race, D. S., Acheson, D. J., & MacDonald, M. C. (2009). Experience and sentence processing: statistical learning and relative clause comprehension. Cognitive Psychology, 58, 250271.
Willems, R. M., Frank, S. L., Nijhof, A. D., Hagoort, P., & van den Bosch, A. (2016). Prediction during natural language comprehension. Cerebral Cortex, 26, 25062516.
Zipf, G. K. (1935). The psychobiology of language: an introduction to dynamic philology. Boston, MA: Houghton Mifflin Company.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Language and Cognition
  • ISSN: 1866-9808
  • EISSN: 1866-9859
  • URL: /core/journals/language-and-cognition
Please enter your name
Please enter a valid email address
Who would you like to send this to? *



Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed