Linguistics

Investigating idioms in the Cambridge Learner Corpus

Laura Grimes

This article explores the use of idioms and fixed expressions in the Cambridge Learner Corpus. As language learners, we often feel a great sense of accomplishment on discovering a new idiom or fixed expression, and we’re keen to put it to use right away. Perhaps as a result of this enthusiasm, we are prone to errors and the expressions are often overused. In this article, find out which idioms and fixed expressions prove most popular with learners, as well as which ones are the trickiest to master.

An idiom is a group of words in a fixed order that have a particular meaning that is different from the meanings of each word on its own (Cambridge English Dictionary). In other words, idioms often can’t be taken literally; they are code for another meaning, and there’s a degree of deciphering involved in understanding them. In a similar way, fixed expressions often take on a more nuanced meaning than their component words alone might suggest. Perhaps it’s this opaqueness in meaning which makes idioms and fixed expressions so intriguing and inviting to learners; successfully using them can feel like cracking a secret code!

But learning how to use idioms appropriately can be a rocky road, and it definitely takes practice to get them right. The Cambridge Learner Corpus (CLC) gives us plenty of insight into this journey, because in the CLC errors with idioms and fixed expressions are specifically tagged. Within this broad “idiom error” tag, we find a range of different issues which fall into three main categories:

  • An idiom or fixed expression has been incorrectly reproduced, e.g. I have a bee to my bonnet.
  • The learner has translated an idiom or fixed expression from their own language, where there may not be an exact English equivalent, e.g. They were expensive, but one day is a day. (From Spanish un día es un día, which has a similar sentiment to you only live once.)
  • An English idiom or fixed expression has been correctly reproduced but is incorrect in the context, e.g. He was a perfectionist, by all means.

 

In the CLC, we find nearly 11,000 examples of such errors. These errors generally become more prevalent the higher the CEFR level; this doesn’t mean that learners are necessarily getting worse at using idioms, but rather that they are gaining the confidence to experiment with some of the more nuanced, tricky areas of language learning.

Percentage of idiom errors across CEFR

A note on relative percentages: in the CLC, we don’t have an equal amount of data for every CEFR level. This means that if we compare the raw frequency of idiom errors across levels, we’ll get an imbalanced idea of how errors are distributed. A relative percentage score of 100% means that the frequency of errors is perfectly proportionate to the size of that sub-section of the corpus. A relative frequency of 200% means there are twice as many errors as we might expect. A score of under 100% means there are fewer errors than we might expect.

The key culprits

Many of the most error-prone idioms relate to argumentation and expressing opinion, often functioning as cohesive devices. (We should note at this point that there may be some task effect at play here due to the nature of the data in the CLC, which comprises exam scripts from the Cambridge suite.)

Here are the top ten most error-prone idioms and fixed expressions in the CLC:

Error-prone idioms in the CLC

Many of these idioms are used unnecessarily, where using no idiom at all would be more appropriate. This is particularly common with on the other hand and in my opinion.

The table below shows some off the most common errors learners make when attempting to use these idioms:

Target idiom Learner errors
on the other hand in the other hand

on the other side

on the contrary

in my opinion according to me

from my point of view

for me

on the one hand one one hand

in one hand

on one side

in contrast on the contrary

contrary to

in contrary

contrary to

in my view

 

in my point of view

from my point of view

according to me

last but not least

 

at last but not least

the last but not the least

last but not the least

in conclusion

 

as a conclusion

concluding

from my point of view

 

in my point of view

on my point of view

in addition on the other hand
to sum up

 

in summary

in sum

Learners vs expert speakers

As well as investigating how learners use idioms by looking at error tags in the CLC, we can also find examples of overuse by comparing the frequency of an idiom amongst learners with its frequency amongst expert and native speakers. The idiom raining cats and dogs, for example, is a firm favourite amongst learners – it’s almost 60 times more common than amongst expert speakers! Under the weather is also more popular amongst learners than expert speakers, as is cost an arm and a leg, beat around the bush, blessing in disguise, best of both worlds and at the drop of a hat.

As before, many of the key culprits relate to argumentation and opinion. Here’s a small sample of some of the most over-used, with normalised frequencies for learners and expert speakers. The list is ordered by the degree of disparity, with the biggest difference at the top:

Idiom Frequency of use by learners (per million words) Frequency of use by expert speakers (per million words)
to sum up 86.83 0.53
last but not least 29.53 0.2
from my point of view 29.68 0.26
as far as I am concerned 13.15 0.39
on the other hand 267.96 15.77
in my view 21.01 1.99
on the one hand 33 3.38

Implications for the classroom

So, how can we help learners use idioms appropriately? One key insight we can draw from the CLC is that learners often use them too much, shoehorning them into their writing when they don’t quite fit. In the context of English exams, learners may feel that using idioms and fixed expressions as much as possible will result in a higher mark; however, it’s much more important that learners carefully consider the meaning of each idiom and fixed expression they use, and whether using them will genuinely enhance their writing.

It’s also worth focussing learners’ attention on the most error-prone idioms listed in this blog post; give them particular practice using these idioms so they become familiar with their forms, but also make sure they can pinpoint the subtle differences in meaning between them. Stress to learners that developing a nuanced understanding of a few idioms and fixed expressions is a better strategy than memorising long lists of them, wedging them into their writing and hoping for the best!

If you’re interested in reading more about our Corpus, read Laura’s most recent article on the language associated with love.


Share your ideas for a post below.
We're looking forward to hearing about it
and will be in touch once we've had a read.
0/5000 characters

Thank you for sharing your experiences.

We will take a read through your ideas and be in touch shortly.