Chunk-spotting – a user’s guide

Scott Thornbury

Scott Thornbury is an internationally recognised academic and writes books about language and teaching. He is the author of the latest paper in our Cambridge Papers in ELT series, Learning Language in Chunks.

How do you know a chunk when you see one?

According to Michael Lewis, a core principle in his Lexical Approach is that ‘language consists of chunks which, when combined, produce continuous coherent text’ (Lewis, 1997: 7). Accordingly, Lewis foregrounds a technique he calls ‘pedagogical chunking’, which, he argues, should be ‘one of the central activities of language teaching’ (Lewis 1993: 122). This is primarily because of the key role that lexical chunks play in achieving fluency, since the possession of a memorized store of chunks allows more rapid processing, not only for production but also for reception. It’s quicker to process several words at a time rather than each word individually.

So, a typical activity might involve asking learners to identify the chunks in a text like the following (O’Connor 2015):

Feed ducks frozen peas instead of stale bread, charity asks

It may be a favourite family pastime, but apparently going to your local park to throw stale bread at ducks is completely wrong.

The Canal and River Trust is launching a campaign this week which urges people to feed ducks with frozen peas and sweetcorn instead. Ducks are also reportedly partial to grapes, which should be cut into quarters to make them easier to eat.

People in England and Wales feed an estimated six million loaves of bread a year to ducks, which can cause damage to birds’ health and pollute waterways.

Ducklings that are fed on bread end up being malnourished, while birds that get used to hand-outs can lose their natural fear of humans and may become “aggressive”.

The charity warns families that bread is essentially “junk food” for ducks, and the remnants left behind encourage rats, disease and algae. Oats, barley, rice and vegetable trimmings are also acceptable replacements for leftover crusts, it advises.

But how easy is it, really, for learners to identify chunks?

If stale bread is a chunk, what about leftover crusts? If launching a campaign is a chunk, what about feeding ducks?

A chunk  – by definition – is a word sequence, such as a collocation, that has become conventionalized over time. But this means that the capacity to identify chunks assumes you are already familiar with the way that the language is conventionally used. Which, of course, most learners are not.

However, it may be possible to train learners to make intelligent guesses as to what constitutes a conventional chunk – as opposed to a random sequence – and to check their intuitions accordingly.

How? Here are some pointers:

Length: chunks can be up to six words or so in length, but the vast majority are only two or three words long: junk food, launch a campaign, left behind.

Form: chunks consist of at least one lexical word, and may include one or more function words, such as prepositions or articles: end up, get used to, loaf of bread. Typical structures include:

  • adjective + noun: stale bread, or noun + noun: junk food, or noun + of + noun: loaf of bread
  • preposition + noun phase: at the moment, in the end
  • adverb + adjective/participle: completely wrong; densely populated
  • verb + particle (as in phrasal verbs): end up, leave behind
  • verb + noun phase: make a point, launch a campaign, cause damage
  • noun + noun, or adjective + adjective pairs: fish and chips; sweet and sour


Occasionally, they break the rules of conventional grammar altogether: by and large, as it were. Sometimes they betray their age by including words that no longer exist on their own: spick and span, scot free.

Fixedness: chunks are often – but not always – invariable: point blank, by the way, trial and error. But others can display considerable variability: he made his point; the point was made…; he made a valid point.

Pronunciation: they often display some kind of sound harmony, such as alliteration or rhyme, as in safe and sound, deep sleep, fish fingers.

Meaning: their meaning may be non-literal, i.e. figurative or idiomatic: a far cry, leaps and bounds, pay lip service, a red herring, get along with…

Function: they often function as discourse markers (by the way, on the other hand), sentence starters (I was wondering… Do you think…?), and adverbials, e.g. of time or place, or adjuncts (last night, by myself, frankly speaking).

And, of course, they are frequent, with some chunks (you know, of course, for instance) being even more frequent than some of the most frequent words. In fact, many words are highly frequent simply because they form part of high frequency chunks, e.g. by the way, the way forward, way of life; take cover, take a bus, take off, etc. But, of course, this is precisely the kind of knowledge that learners lack. Even life-long users of a language are notoriously bad at estimating frequency. This is why we rely on corpus linguists to help us out.

Equipped with the above list of characteristics (or a simplified version thereof), learners can then work (preferably together) on identifying possible chunks in the text. They can then check these by:

  • using a standard dictionary or a collocations dictionary
  • using a collocations website, such as HASK ( Here for example is a visualization of the nouns that follow frozen, which suggests that frozen pea(s)is a common combination, i.e. that it has ‘chunk status’:


  • using an online corpus, such as COCA (although to use corpus sites effectively takes some training). COCA, for example, shows that the most common nouns in the sequence feed the * are: world, animals, baby, children, chickens, family and … ducks! Moreover, since ducks is a relatively low frequency word (compared, say, to world or family), its presence in this list is significant – a significance that can be computed statistically, and is called its ‘collocational strength’
  • more simply and speedily, using a search engine. Stale bread, for example, gets 143 million hits on Google, but leftover crusts gets a mere ten thousand, suggesting it is not sufficiently frequent to qualify as a chunk
  • comparing their analysis with one that the teacher has made previously. (For an analysis of the text about ducks, see the appendix).


In the end, chunks are a moving target: some chunks are more chunky than others. Moreover, the presence of chunks ‘on the page’ does not necessarily reflect how chunks are stored and accessed in the mind. Nevertheless, sensitizing learners to the presence of chunks, through ‘pedagogical chunking’ tasks such as the one described, may help redress the traditional view of vocabulary as consisting simply of individual words.


Davies, M. (2008-). The Corpus of Contemporary American English (COCA): 560 million words, 1990-present.

Lewis, M. (1993). The Lexical Approach. Hove: Language Teaching Publications.

Lewis, M. (1997). Implementing the Lexical Approach. Hove: Language Teaching Publications.

O’Connor, R. (2015). Feed ducks frozen peas instead of stale bread, charity asks. The Independent. Reprinted by Scott Thornbury (2017). About Language (2nd Edition).


If you’d like to delve deeper into the world of lexical chunks, you can read Scott’s paper on the subject. Or why not find out how the Cambridge Handbooks for Language Teachers series, of which Scott is Series Editor, has been inspiring teachers for over 40 years?

Share your ideas for a post below.
We're looking forward to hearing about it
and will be in touch once we've had a read.
0/5000 characters

Thank you for sharing your experiences.

We will take a read through your ideas and be in touch shortly.