Skip to main content Accessibility help
×
Hostname: page-component-77f85d65b8-5ngxj Total loading time: 0 Render date: 2026-04-16T10:11:51.449Z Has data issue: false hasContentIssue false

5 - Using Corpus Linguistics in Formulaic Language Learning

Published online by Cambridge University Press:  26 December 2025

Gavin Brookes
Affiliation:
Lancaster University
Niall Curry
Affiliation:
Manchester Metropolitan University
Robbie Love
Affiliation:
Aston University
Get access

Summary

This chapter leverages the IdiomsTube project to illustrate how corpus linguistics enhances research and tool development for formulaic language acquisition. Formulaic language, encompassing idioms, proverbs, and sayings, is common in everyday communication. However, English as a foreign language (EFL) learners often struggle with these conventionalised expressions due to limited exposure to authentic spoken contexts. To address this challenge, the IdiomsTube project conducted corpus studies to uncover patterns in formulaic language use, including prosodic features and distribution across internet television genres. Corpus linguistic methods have also enabled the development of the IdiomsTube app, a specialised tool for computer-assisted formulaic language learning. Informed by corpus-derived frequency data and innovative concordancer design, the app uniquely prioritises user experience. Unlike conventional concordancers, the IdiomsTube app dynamically compiles a corpus from captions retrieved in real-time from YouTube videos based on the user’s search word, allowing users to read concordance lines from current, trending videos. This design makes concordancing engaging and motivating for learners. This chapter demonstrates how modernising concordancer designs with a focus on learner accessibility and real-time content can significantly advance formulaic language acquisition.

Information

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Ackermann, K. & Chen, Y.-H. (2013). Developing the Academic Collocation List (ACL): A corpus-driven and expert-judged approach. Journal of English for Academic Purposes, 12(4), 235247.CrossRefGoogle Scholar
Adolphs, S. & Lin, P. (2011). Corpus linguistics. In Li, W., Zhu, H., & Simpson, J. (eds.), The Routledge Handbook of Applied Linguistics. Routledge (pp. 597610).Google Scholar
Ashby, M. (2006). Prosody and idioms in English. Journal of Pragmatics, 38(10), 15801597.CrossRefGoogle Scholar
Benjamin, L., Newton, C., & Ebbels, S. (2020). Investigating the effectiveness of idiom intervention for 9–16‐year‐olds with developmental language disorder. International Journal of Language & Communication Disorders, 55(2), 266286.CrossRefGoogle ScholarPubMed
Bolinger, D. (1976). Meaning and memory. Forum Linguisticum, 1, 114.Google Scholar
Boulton, A. (2010). Data-driven learning: Taking the computer out of the equation. Language Learning, 60(3), 534572.CrossRefGoogle Scholar
Cheng, Y.-H. (2021). EFL college students’ concordancing for error correction. English Teaching & Learning, 45(4), 431460.CrossRefGoogle Scholar
Ellis, N. C. (1995). Psychology of foreign language vocabulary acquisition: Implications for CALL. Computer Assisted Language Learning, 8(2), 103112.CrossRefGoogle Scholar
Erman, B. & Warren, B. (2000). The idiom principle and the open choice principle. Text, 20(1), 2962.Google Scholar
Lei, L. & Liu, D. (2018). The academic English collocation list: A corpus-driven study. International Journal of Corpus Linguistics, 23(2), 216243.CrossRefGoogle Scholar
Lin, P. (2010). The phonology of formulaic sequences: A review. In Wood, D. (ed.), Perspectives on Formulaic Language: Acquisition and Communication. Continuum (pp. 174193).Google Scholar
Lin, P. (2012). Sound evidence: The missing piece of the jigsaw in formulaic language research. Applied Linguistics, 33(3), 342347.CrossRefGoogle Scholar
Lin, P. (2013). The prosody of idiomatic expressions in the IBM/Lancaster Spoken English Corpus. International Journal of Corpus Linguistics, 18(4), 561588.CrossRefGoogle Scholar
Lin, P. (2014). Investigating the validity of internet television as a resource for acquiring L2 formulaic sequences. System, 42(1), 164176.CrossRefGoogle Scholar
Lin, P. (2018a). Formulaic language and speech prosody. In Siyanova-Chanturia, A. & Pellicer-Sanchez, A. (eds.), Understanding Formulaic Language: A Second Language Acquisition Perspective. Routledge (pp. 7894).CrossRefGoogle Scholar
Lin, P. (2018b). The Prosody of Formulaic Sequences: A Corpus and Discourse Approach. Continuum.Google Scholar
Lin, P. (2019). Self‐directed English vocabulary learning through YouTube videos. ELTU Conference 2019, Chinese University of Hong Kong, Hong Kong, 27–28 May.Google Scholar
Lin, P. (2021). In search of the optimal mode of input for the acquisition of formulaic expressions. TESOL Quarterly, 55(3), 10111023.CrossRefGoogle Scholar
Lin, P. (2022a). Developing a test of comprehension of non-literal language in medical context. Paper presented at the 20th International and Interdisciplinary Conference on Communication, Medicine and Ethics (COMET 2022), the Hong Kong Polytechnic University, 13–15 July 2022.Google Scholar
Lin, P. (2022b). Developing an intelligent tool for computer-assisted formulaic language learning from YouTube videos. ReCALL, 34(2), 185200.CrossRefGoogle Scholar
Lin, P. (2023a). ChatGPT: Friend or foe (to corpus linguists)? Applied Corpus Linguistics, 3(3), 110.CrossRefGoogle Scholar
Lin, P. (2023b). Computer-assisted learning of English formulaic expressions from YouTube videos. In Reynolds, B. L. (ed.), Vocabulary Learning in the Wild. Springer (pp. 309333).CrossRefGoogle Scholar
Lin, P. (2023c). Why corpus linguistics matters to L2 teaching and learning. In Sands, K., Petray, M. J., Clements, G. D. & Santelmann, L. (eds.), Linguistic Foundations for Second Language Teaching and Learning. Cambridge University Press (pp. 115).Google Scholar
Lin, P. & Chen, Y. (2020). Multimodality I: Speech prosody and gesture. In Adolphs, S. & Knight, D. (eds.), Routledge Handbook of English Language and Digital Humanities. Routledge (pp. 6684).CrossRefGoogle Scholar
Lin, P. & Siyanova-Chanturia, A. (2014). Internet television for L2 vocabulary learning. In Nunan, D. & Richards, J. C. (eds.), Language Learning Beyond the Classroom. Routledge (pp. 149158).Google Scholar
Lin, P. & Adolphs, S. (2009). Sound evidence: Phraseological units in spoken corpora. In Barfield, A. & Gyllstad, H. (eds.), Researching Collocations in Another Language: Multiple Interpretations. Palgrave Macmillan (pp. 3448).CrossRefGoogle Scholar
Lin, P. & Adolphs, S. (2024). Corpus linguistics. In Li, W., Zhu, H., & Simpson, J. (eds.), The Routledge Handbook of Applied Linguistics, 2nd ed. Routledge (pp. 296308).Google Scholar
Lin, P. (2026). Corpus Linguistics and Human-Computer Interaction (HCI). In H. Nesi & P. Milin (Eds.), International Encyclopedia of Language and Linguistics (3rd ed.). Elsevier.Google Scholar
Littlemore, J., Chen, P. T., Koester, A., & Barnden, J. (2011). Difficulties in metaphor comprehension faced by international students whose first language is not English. Applied Linguistics, 32(4), 408429.CrossRefGoogle Scholar
Martinez, R. & Murphy, V. A. (2011). Effect of frequency and idiomaticity on second language reading comprehension. TESOL Quarterly, 45(2), 267290.CrossRefGoogle Scholar
Nation, P. (2012). The BNC/COCA word family lists 25,000. Retrieved 10 May 2025 from www.wgtn.ac.nz/lals/resources/paul-nations-resources/vocabulary-analysis-programs.Google Scholar
Nesselhauf, N. (2005). Collocations in a Learner Corpus. John Benjamins.CrossRefGoogle Scholar
Nguyen, T. M. H. & Webb, S. (2016). Examining second language receptive knowledge of collocation and factors that affect learning. Language Teaching Research, 21(3), 298320.CrossRefGoogle Scholar
O’Sullivan, Í. (2007). Enhancing a process-oriented approach to literacy and language learning: The role of corpus consultation literacy. ReCALL, 19(3), 269286.CrossRefGoogle Scholar
Paivio, A. (1971). Imagery and Verbal Processes. Holt, Rinehart & Winston.Google Scholar
Potthast, M., Trenkmann, M., & Stein, B. (2010). Netspeak: Assisting writers in choosing words. In Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., & van Rijsbergen, K. (eds.), Advances in Information Retrieval. Springer (pp. 672–672).Google Scholar
Saban-Bezalel, R., Dolfin, D., Laor, N., & Mashal, N. (2019). Irony comprehension and mentalizing ability in children with and without autism spectrum disorder. Research in Autism Spectrum Disorders, 58, 3038.CrossRefGoogle Scholar
Simpson-Vlach, R. & Ellis, N. C. (2010). An academic formulas list: New methods in phraseology research. Applied Linguistics, 31(4), 487512.CrossRefGoogle Scholar
Sinclair, J. (1991). Corpus, Concordance and Collocation. Oxford University Press.Google Scholar
Spöttl, C. & McCarthy, M. (2004). Comparing knowledge of formulaic sequences across L1, L2, L3, and L4. In Schmitt, N. (ed.), Formulaic Sequences: Acquisition, Processing and Use. John Benjamins (pp. 191225).CrossRefGoogle Scholar
Titone, D., Holzman, P. S., & Levy, D. L. (2002). Idiom processing in schizophrenia: Literal implausibility saves the day for idiom priming. Journal of Abnormal Psychology, 111(2), 313.CrossRefGoogle ScholarPubMed
Whyte, E. M., Nelson, K. E., & Scherf, K. S. (2014). Idiom, syntax, and advanced theory of mind abilities in children with autism spectrum disorders. Journal of Speech, Language & Hearing Research, 57, 120–130.CrossRefGoogle Scholar
Wible, D. (2008). Multiword expressions and the digital turn. In Meunier, F. & Granger, S. (eds.), Phraseology in Foreign Language Learning and Teaching. John Benjamins (pp. 163–181).CrossRefGoogle Scholar
Wojcicki, S. (2015) Keynote speech. VidCon 2015 Conference, Anaheim Convention Center, California, 23–25 July.Google Scholar
Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge University Press.CrossRefGoogle Scholar
Wray, A. (2004). ‘Here’s one I prepared earlier’: Formulaic language learning on television. In Schmitt, N. (ed.), Formulaic Sequences: Acquisition, Processing and Use. John Benjamins (pp. 249268).CrossRefGoogle Scholar
Wray, A. & Fitzpatrick, T. (2008). Why can’t you just leave it alone. In Meunier, F. & Granger, S. (eds.), Phraseology in Foreign Language Learning and Teaching. John Benjamins (pp. 123–147).Google Scholar
Wynne, M. (ed.). (2005). Developing Linguistic Corpora: A Guide to Good Practice. Oxbow Books.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×