Skip to navigation
Skip to content

Cambridge Learner Corpus

The Cambridge Learner Corpus is the world’s largest Learner Corpus.

It is made up of many thousands of exam scripts written by students taking Cambridge English exams in countries around the world. It forms part of the Cambridge English Corpus. It has been built by Cambridge University Press and Cambridge English Language Assessment.


The Cambridge Learner Corpus currently contains over 200,000 exam scripts from students speaking 148 different languages living in 217 different countries or territories. The Cambridge Learner Corpus is growing all the time.

Exams currently represented in the Cambridge Learner Corpus are shown below:

What is unique about the Cambridge Learner Corpus?

Specialists at Cambridge University Press carefully check each exam script and highlight all errors made by students. We can then use this information to see which words or structures are difficult for learners of English.

The Cambridge Learner Corpus also contains information about the student's first language, nationality, level of English, grade, age, gender, and date of exam. Along with the error information, this means that we can:

  • Focus on certain groups of learners and see what they find easy or hard.
  • Make sure our materials contain appropriate content for a particular level or exam.
  • Find mistakes which universal to English language learning, and those which are a result of first-language interference.
  • Find plenty of examples of language used by students and use this to help other students.


The Cambridge Learner Corpus allows Cambridge University Press to produce specifically targeted materials giving learners help just where they need it!


The Cambridge Learner Corpus is also a valuable resource in the development of English Profile - a collaborative programme designed to enhance the learning, teaching and assessment of English worldwide.

Rep finder


Highlights