In this article, Dr. Charles Browne, Dr. Brent Culligan and Joseph Phillips compare how important vocabulary words have changed over the course of 60 years, and discuss how this is useful for language learners and teachers.
The English Language has a surprisingly large number of words. Even if we count words like accept, accepts, accepting and acceptable as part of the same word family, there are still more that 500,000 words in English! Fortunately for teachers and students, language has built in redundancy, with certain words occurring much more frequently than others (the word the, for example, makes up 6-7% of all the words in any book, magazine or newspaper). Because of this, the average native speaker of English knows only a small percentage of these half million words (about 22,000 words for a recent college graduate).
Although 22,000 words may sound like a daunting number, there is good news. Corpus linguistics, the science of analyzing large collections of texts, has shown that knowledge of just a few thousand of the most important words can give an astonishing degree of coverage of English used in daily life. In 1953, Michael West published a list of about 2,000 important vocabulary words known as the General Service List (GSL). Based on more than two decades of pre-computer corpus research and a corpus size of 2.5 million to 5 million words, the GSL gives about 84% coverage of general English. However, as useful and helpful as this list has been to us over the decades, it has been criticized for (1) being based on a corpus that is both dated and small by modern standards and (2) for not clearly defining what constitutes a “word.”
On the 60th anniversary of West’s publication of the GSL, there was a creation of a New General Service List (NGSL) that is based on a carefully selected 273 million-word subsection of the 1.6-billion-word Cambridge English Corpus (CEC) formerly known as the Cambridge International Corpus. Following many of the same steps of West and his colleagues (as well as the suggestions of Professor Paul Nation, project advisor and a leading figure in modern second language vocabulary acquisition), we have tried to combine the strong objective scientific principles of corpus and vocabulary list creation with useful pedagogic insights to create a list of approximately 2,800 high frequency words. Our goals have been:
- to update and expand the size of the corpus used (273 million words) compared to the limited corpus behind the original GSL (about 5 million words), with the hope of increasing the generalizability and validity of the list
- to create a NGSL of the most important high-frequency words for second language learners of English which gives the highest possible coverage of English texts with the fewest words
- to make a NGSL that is based on a clearer definition of what constitutes a word
- to be a starting point for discussion among interested scholars and teachers around the world, with the goal of updating and revising the list based on this input (in much the same way that West did with the original Interim version of the GSL).
The NGSL: A word list based on a large, modern corpus
Utilizing a range of computer-based corpus tools, we began developing the NGSL with an analysis of the CEC. The CEC is a 1.6 billion-word corpus of the English language that contains both written and spoken data of British and American English. In order to get a balanced corpus, we used different parts of the overall corpus (Learners, Fiction, Journals, Magazines, Non-Fiction, Radio, Spoken, Documents, and TV) for a total of 273 million words. Then we used some mathematical wizardry and advice from vocabulary expert, Professor Paul Nation, to bring it all together.
We compared the list to the original GSL to see how the list had changed. Of course, many of the words are still the same. Words like fashion, flower, and music are still used, especially when making a list to teach to young people. In some ways the words that didn’t make it onto the NGSL are more interesting and tell us a little about how our world is changing. Words like flour, roast, and grind suggest that preparing food was much more general to their lives than it is nowadays. Farming was also more common then, which we can see with words like barrel, donkey, nest, straw, and bucket. Words like courage and coward reflect the original GSL’s development during the war years. In our modern times, words like tobacco and slavery are thankfully fading away.
The NGSL: More coverage for your money!
One of the important goals of this project was to develop a NGSL that would be more efficient and useful to language learners and teachers by providing more coverage with fewer words than the original GSL. For a meaningful comparison between the GSL and NGSL to be done, the words on each list need to be counted in the same way. A comparison of the number of “word families” in the GSL and NGSL reveals that there are 1,964 word families in the former and 2,368 in the latter (using level 6 of Bauer and Nation’s 1993 word family taxonomy). Coverage within the 273 million word CEC is summarized in the table below, showing that the 2,368 word families in the NGSL provide 90.34% coverage while the 1,964 word families in the original GSL provide only 84.24%. That the NGSL with approximately 400 more word families provides more coverage than the original GSL may not seem a surprising result, but when these lists are lemmatized, the usefulness of the NGSL becomes more apparent. The more than 800 fewer lemmas in the NGSL provide 6.1% more coverage than is provided by West’s original GSL.
|Vocabulary List||Number of Word Families||Number of Lemmas (headwords)||Coverage in CEC Corpus|
Where to find the NGSL:
The list of 2,818 words is now available for download, comments and debate from a the website dedicated to the development of this list.
Bauer, L., & Nation, I. S. P. (1993). Word Families. International Journal of Lexicography, 6(4), 253–279.
Browne, C. (2013). The New General Service List: Celebrating 60 years of Vocabulary Learning. JALT Publications: The Language Teacher.
West, M. (1953). A General Service List of English Words. London: Longman, Green & Co.