Skip to content
Register Sign in

Cambridge English Corpus

The language taught in our English language teaching materials is guaranteed to be natural, relevant and up to date.

Using the Cambridge English Corpus in our products means we teach the language that learners will encounter in their everyday lives – language that’s useful, current and helps them to sound natural when they speak and write.

We use the Corpus to observe and track how language is changing, to identify specific problems, and to tailor solutions to fit your learners' individual needs - because every learner is different.

Scroll down
  • What is a corpus?

    A corpus is a massive electronic collection of examples of spoken and written English. We use our corpus to answer questions about English vocabulary, grammar and usage – this works in a similar way to how you might look for information using a search engine.

Cambridge Corpus
Cambridge Corpus Function
  • How does a corpus work?

    Think of a question you have about how English is used. To find the answer, perhaps you might ask a friend what they think? Perhaps you could ask your whole class, so you'd have a better idea about how representative the answer is? Imagine if you could ask your whole neighbourhood? Imagine if you could ask your whole city or even your country? 

    This is how corpus linguistics works. Because we gather data from a huge range of speakers and writers of English, we can identify the typical use of words and phrases. These frequent words and patterns are the ones we include in our courses – they're based on evidence drawn from a huge sample of language and not on the values or intuition of one person.

  • The Cambridge Learner Corpus

    In collaboration with Cambridge Assessment English, we collect and analyse learner writing. This unparalleled collection allows us to clearly see how learners from around the world are similar and different in how they acquire and use language. This means that we can provide tailored and comprehensive support to groups of similar learners; for example, those with the same first language or who are at the same level in their learning.

Cambridge Assessment

Our Corpus in action

Here are just a couple of examples of how the Cambridge English Corpus improves our content, making it more accurate and up to date.


The 'Get it right' sections in THiNK show our Corpus data in action. They alert learners and teachers to areas where learners are statistically more likely to encounter difficulties based on findings from the Cambridge Learner Corpus.

Find out more

This exam preparation course features exercises which are based on exclusive insights into real exam candidates' areas of difficulty from the Cambridge Learner Corpus, to help students avoid common mistakes.

Find out more

In the 'Changing Language' sections in Talent, we present some features of language change in a fun way through video with our language expert, and through entertaining interviews by our reporter on the streets of London.

Find out more

Constantly evolving

It’s essential that our corpus is current and relevant. We add to it all the time, refining our research data along the way. We are always working on new research projects, in partnership with universities and institutions worldwide. Some current research activities include:

• What are the features of current spoken English, and how has this changed?
• How can we best represent realistic dialogue features in our coursebooks?
• What language do we use to carry out certain language functions (e.g. asking for things politely)?
• Which features of language typify good exam answers and how does this change across learner levels?

We run projects to collect data on all varieties of English. Recently we worked with Lancaster University on a groundbreaking project – the Spoken British National Corpus – which documented current spoken British English and looked at how language has changed over the past 20 years.

Read more about this project
Research activities
  • The Corpus in the World of Better Learning

    Our Corpus has had an impact on course materials and content for many years and we've created blog articles that bring that to life in the World of Better Learning. Take a look below at the latest articles from the blog.

Spoken British National Corpus


The Spoken British National Corpus: Using “Yeah no” in spoken English

Read more
1000 hours IATEFL


IATEFL 2019: The Corpus, 1000 hours of conversations: what does it mean for ELT?

Read more
Investigating Idioms


Investigating idioms in the Cambridge Learner Corpus

Read more
This site uses cookies to improve your experience. Read more Close

Thank you for your feedback which will help us improve our service.

If you requested a response, we will make sure to get back to you shortly.

Please fill in the required fields in your feedback submission.