What is a Dictionary?
Before we can discuss how a word gets entered into an English dictionary, we first need to examine what purpose a dictionary serves.
Many people believe that a dictionary acts as a sort of gatekeeper for proper English, listing only the words that are considered ‘correct’ or ‘worthwhile’. The corollary to that belief is the assumption that a word is not an ‘official’ word or a ‘real’ word until it has been entered into a professionally edited dictionary. Nothing could be further from the truth. The job of a dictionary is to record, as much as possible, the language as it is actually used, and not as people think it should be used. This may seem like a minor distinction, but as we will see, it’s an important one and forms the basis for how modern dictionaries chronicle the language.
A Short History of the Early English Dictionaries
People can be forgiven for assuming that dictionaries only record ‘proper English’ or ‘elegant English’, because for much of the history of English dictionaries, that is exactly what they did. The book considered to be the first English monolingual dictionary was written in 1604 by a schoolmaster named Robert Cawdrey and is called A Table Alphabeticall. (The full title of the work is much longer, in keeping with book-naming conventions of the 1600s, but modern scholars use the shortened form.) Cawdrey’s focus was on listing and defining what he called ‘hard usual’ words that he felt would improve plainspoken communication, though it’s clear in reading through his wordlist that his ideas of what constituted ‘plainspoken communication’ were a bit loftier than the modern person’s idea of simple speech. Cawdrey’s dictionary is rudimentary and contains none of the features that we associate with modern dictionaries: no comprehensive pronunciations or etymologies, no example sentences showing the word in use, and no extensive or multi-sense definitions. He does mark some of his entries as coming from French or Greek where appropriate, and in the introduction, he explains how to look up a word without knowing how to spell it – proof that dictionary use was not universal even among the educated gentry. There are 2,543 defined words in his book, and they cover everything from general vocabulary like abhorrent and confidence, to legal terms (misprision and rejoinder), scientific terms (meteor and comet), and theological terms (tabernacle and sanctification). In writing A Table Alphabeticall, Cawdrey consulted previously published wordlists, bilingual dictionaries of English, and specialised glossaries – sometimes borrowing entries wholesale from those other sources. But his dictionary is not just a compilation of those works: it is written in its own style and is intended to be as systematic and helpful as possible. It is a remarkable lexicographical work, and soon other dictionaries like it followed (most notably, John Bullokar’s 1616 English Expositor and Henry Cockeram’s 1623 English Dictionary).
A Table Alphabeticall did, however, have one significant shortcoming that was shared by these other early seventeenth-century English dictionaries. It was written by one person, and though Cawdrey did his research, everything presented in A Table Alphabeticall is based ultimately on Cawdrey’s thoughts and opinions about English. There are words that he chose not to enter because they were not ‘the plainest & best kind of speech’, including many foreign terms he deemed to be nothing but hot air used only to impress the hearer with the speaker’s intelligence, as well as any terms that he considered to be ‘low’ (that is, common among the speech of the lower classes). Simple terms were also omitted. The target audience for Cawdrey’s book and these other early English dictionaries was educated gentlepersons. Literacy was not as widespread in the early seventeenth century as it is now, and a comprehensive education was generally reserved for the wealthy or the well-connected. Early dictionary writers (or lexicographers, as they are properly called) had no reason to include simple words in their dictionaries, since their intended audience already knew these words. These early English dictionaries were not general surveys of English, nor were they intended to be. They were instead meant to polish the already-decent English of the educated.
This began to change in the late seventeenth and early eighteenth centuries, as literacy rates in England began to increase sharply. Scholars give several reasons for this increase: urbanisation, as many rural and less-educated people moved to London to seek work; a newfound class mobility, as the educated gentry lost influence and power to a rising – and comparatively less well-educated – merchant class; and the establishment of ‘dissenting academies’, which, unlike the traditional universities at Oxford and Cambridge, taught English as a subject and also taught other subjects in English (as opposed to the scholarly language of Latin). As literacy increased, so did the interest in (and market for) didactic books like grammars and dictionaries. Lexicographers, though, tended to continue to write their dictionaries primarily for the well-educated, though some of them grew savvy to the advantages of marketing a book well: Edward Phillips’s A New World of English Words (1658) primarily focused on hard words, as was the custom, but he called his book ‘a general dictionary’ in deference to the broader interest in dictionaries. (He also came under fire from another lexicographer, Thomas Blount, who accused him of heavily plagiarising Blount’s 1656 Glossographica.)
Elisha Coles, a schoolmaster, was the first lexicographer to make substantial moves towards creating a general dictionary. His 1676 An English Dictionary contains the usual lexical suspects – hard words and specialised vocabulary from mathematics, law, science, and theology – but also includes regional terms from all over England, as well as some ‘canting terms’, or criminal jargon of the time. It still did not include many simple words, nor did it include etymologies or quotations, but its extended scope was a hint of things to come.
The general dictionary came into its own in the early eighteenth century. John Kersey, a philologist and trained lexicographer, wrote one of the first truly general dictionaries of English, the 1702 A New English Dictionary. Kersey’s dictionary included hard words but focused primarily on common words that were in use, as his intended audience was ‘Young Persons, Tradesmen, Artificers, and the Female Sex’ – about as broad an audience as any writer could hope to get in the early eighteenth century. Nineteen years later, another philologist and lexicographer, Nathaniel Bailey, released his own general dictionary called An Universal Etymological English Dictionary, also written for a broad audience. (Despite its title, Bailey’s dictionary did not focus on etymology.) They were truly general dictionaries: Kersey was the first to include the words cat and dog in later editions of his New English Dictionary, and Bailey was the first to enter the definite article the. Bailey’s dictionary, especially, was very popular: its last edition, the thirtieth, was printed in 1802.
Samuel Johnson and the Modern English Dictionary
Despite the proliferation of dictionaries, there was nonetheless an expressed desire among writers and grammarians for The Dictionary, not just a dictionary – an authoritative reference work. Many of these same grammarians and writers were in a state of panic about the English language: it was too profligate, borrowed from other languages too easily, and allowed for the creation of words without regard to elegance of style. The Dictionary, then, would not just chronicle the language, but help set borders around it and point its users towards a more elegant and lasting manner of expression. In the 1740s, a group of London booksellers banded together to commission the creation of this work and tapped Samuel Johnson to create it.
Samuel Johnson was not the obvious choice for this project. He did not have a university degree; he was not well connected among the aristocracy; he was not a teacher, schoolmaster, or well-regarded scholar. What he was, however, was available and interested. In 1746, he agreed to write what was hoped to be the first authoritative dictionary of English.
Johnson did not, like lexicographers before him, rely solely on his own sense of the language to come up with his list of headwords (that is, the list of words that would appear as main entries in his dictionaries) or his definitions. Instead, he systematically read hundreds of sources as preparatory work for his dictionary – everything from Shakespeare and Milton to legal texts, educational treatises, geology texts, and poems – and as he did, he watched for interesting words or passages that he wanted to use in his dictionary. He underlined the word to be quoted and defined, then marked the surrounding context of that word, and finally put the initial letters of that word in the margin. When Johnson was finished reading a volume, his assistants would go through it and copy all the marked passages onto individual slips of paper, which were then organised alphabetically according to the underlined word. In the end, Johnson had hundreds of thousands of these slips, which modern lexicographers call ‘citations’, and he used them as the source of the headwords, definitions, and illustrative quotations in his dictionary.
Johnson’s Dictionary of the English Language, published in 1755, was remarkable in its scope and is still considered to be a masterwork of lexicography. The definitions were extensive; unlike in earlier dictionaries, Johnson did not simply give each entry one or two definitions that he felt were most important, least understood by the educated public, or most common in use. Johnson based every definition in his dictionary on the contextual meaning that a word was given in each of the citations he had collected, which meant his dictionary often gave numerous meanings for each headword in an attempt to catalogue the totality of the word’s use. For instance, the noun light has two separate meanings in Kersey’s dictionary and just one in Bailey’s, but Johnson gives fourteen; the adjective general has two discrete meanings in Bailey and one in Kersey, but ten in Johnson.
Johnson’s dictionary was also the first to make extensive use of illustrative quotations. These quotations help orient a word’s definition, which is an abstracted statement of meaning without context, within the word’s actual use in print, which is its native habitat. These quotations also inadvertently defended Johnson against any detractors who complained that his dictionary included terms considered inelegant or low. In the preface to his dictionary, Johnson writes, ‘Some of the examples have been taken from writers who were never mentioned as masters of elegance or models of stile; but words must be sought where they are used’. The Dictionary of the English Language immediately established the basic defining method and set the scholarly standard for all dictionaries that followed.
By the nineteenth century, dictionaries were a growth industry, not just in Britain, but in America as well. Widespread literacy and general education created a booming market for dictionaries of all kinds, and by the mid-1800s, dictionary publishers moved away from a single-author model and began to recruit and retain editorial staffs to keep up with the popular demand for more extensive and comprehensive dictionaries. They also established reading programmes to make sure that the raw material they were collecting for their dictionaries covered as much thematic, geographical, and sometimes chronological territory as possible. It was during this time that the modern template for how a word gets into a dictionary was solidified.
The Nuts and Bolts of the Defining Process
Entering a word into an English dictionary generally consists of two processes: the collection of written evidence, and the analysis of that evidence. We’ll look at each process separately.
Gathering Evidence of Use
There are several ways to gather the written evidence of a word’s use. Until the late-twentieth century, the primary way this was done was through a process that was based on the preparatory work Samuel Johnson did for his dictionary. This process is often called ‘reading and marking’.
Each dictionary company compiles a list of written, edited prose sources that they have their editors or trained readers go through in order to find new words or new uses of existing words. These lists can be as targeted or as comprehensive as necessary, according to the type of dictionary being written. For a general English dictionary, lexicographers try to formally read as much edited, published work as possible: books from all genres, magazines, newspapers, speciality trade journals, monographs, pamphlets, and so on. For a more specialised dictionary, the reading list may focus on one particular type of source (such as medical texts) or one particular era (like eighteenth-century science works). The list is then assigned to readers or editors, and they will read through the sources, looking for new words, new uses of old words, or sometimes just words that catch their eye. The word is then underlined and the context around that word bracketed, and then the page is somehow marked for the assistants who will go through each source and copy each bracketed citation into a database (or, formerly, on 3x5 inch index cards, which were then filed alphabetically). Just as it was with Johnson’s dictionary, these citations comprise the raw materials used to create a dictionary.
In addition to formal reading and marking, lexicographers often read and mark additional sources that they find on their own: the marketing copy on TV dinner boxes, menus, phone books, playbills, catalogues. If it has print on it and is widely distributed, there’s a very good chance that a lexicographer has read it for citations. Some dictionary publishers also solicit or accept citations from the general public. The goal is to collect the biggest possible cross-section of the language. Lexicographers cannot hope to record everything in print – they are lucky if they can record even a measurable fraction of what makes it into print. But the goal is to at least have a good representative sampling of the language to draw upon when writing dictionaries.
This axiom applies to the geographic reach and types of books, magazines, trade journals, and informal materials read as well. Most major regional newspapers from around the English-speaking world are often included in a comprehensive reading programme and marked for regional variations. A balanced reading programme is not snobbishly academic: while technical fields like medicine and computer science do add to the language, lexicographers recognise that non-academic fields such as cooking and pop culture give us just as much – if not more – new language as academia. A representative cross-section of English includes everything from legal texts to romance novels, from Today’s Chemist at Work to Thrasher Magazine, from California to Australia and back again.
The Internet and the Lexicographer
This process has changed slightly with the rise of the Internet, of course. Just as most people now get their news online, so, too, does the dictionary: many lexicographers now comb through news sites, popular blogs, and well-known public social media feeds as they look for new words.
The Internet has been both a blessing and a curse as the lexicographer gathers evidence of English in use. On the one hand, it has given the lexicographer access to vast amounts of writing that they did not have easy access to before. Books that have been out of print for decades now show up on Gutenberg.org or Google Books; magazines now offer full archives on their site. Lexicographers can also now collect more informal writing from personal blogs and public social media feeds, though most lexicographers tend to track well-known blogs and feeds so as not to run foul of data-collection laws around the world. They can also read through published TV and radio transcripts now, which means that they can collect more types of English than they previously could – academic English, spoken English, informal and formal Englishes. But the overwhelming amount of information now available online is an insurmountable mountain the lexicographer will never scale – particularly as market realities mean that fewer and fewer lexicographers are available and trained to read and mark.
Fortunately, one of the blessings of the Internet is the online corpus. A corpus is a curated collection of searchable full-text sources, and there are easily dozens of corpora available for lexicographical use. Some corpora focus on one particular type of source (transcripts from the Old Bailey, for instance, or American soap-opera scripts) or on a particular period of time (such as American English print from 1750–850), but they are invaluable sources which help add to the lexicographer’s citation database without requiring more time-consuming reading and marking. They are also helpful during the defining process itself.
Sorting the Evidence Grammatically and Semantically
The defining process is when the words collected during reading and marking are analysed and evaluated for entry. The assignment of words for review varies depending on the type of defining that a lexicographer is doing. A new edition or a complete revision of an existing dictionary is generally broken up into bite-sized alphabetical batches of defining that, thanks to the conventions established during the era of print dictionaries, corresponded with one column’s worth of words – hand to hardball, for instance. Lexicographers move through the alphabet one batch at a time. If a lexicographer has special training in a particular subject or field where one needs technical knowledge in order to define terms related to that subject or field, they may do something called ‘group defining’, where they are assigned only the words from that subject area, sometimes for the entire alphabet. Other times, lexicographers are assigned words based on other factors, like the set of words that are most often looked up on a dictionary’s website. When a lexicographer is given a batch of defining, they are given all the citational evidence for the words in that batch as well.
All entries are based on the collected evidence for a word, so the first step in the process of defining is to read and analyse that collected evidence. There are several kinds of analysis that lexicographers must do while defining. The first is to determine whether the word under consideration is actually a discrete lexical item that belongs in a dictionary.
‘Discrete’ is the key here. English combines words into compounds very easily: dog food, dog days, dog-tired. Some of these compounds are what lexicographers call ‘self-explanatory’; that is, the meaning of the compound is already covered by the existing meanings of each constituent word in that compound. Of the three dog compounds above, only dog food is self-explanatory: it is food for a dog. Dog days does not refer to days for a dog, but to the hot days in the summer, and so it requires entry. Dog-tired does not literally mean ‘tired as a dog’, but is a word derived from a metaphor. Dog-tired actually means ‘very tired’, and so it requires entry.
Occasionally, a citation like ‘She’s the Ginger Rogers of coding’ will present itself to the lexicographer. Though a reader marked Ginger Rogers as a term for possible entry, this is not quite a generic lexical use. The writer is trying to evoke something by comparing the subject to Ginger Rogers – perhaps that the coder is deft, or writes elegant computer code – but the phrase Ginger Rogers does itself not mean ‘deft’ or ‘elegant’. There are similar examples where a marked word may not be eligible for entry simply because it’s not considered a discrete lexical unit with generic use.
The next type of sorting that the lexicographer does is syntactical: the lexicographer must analyse the context given in the citation to determine what part of speech the target word is. Definitions are written to match the word’s part of speech: a noun definition begins with a noun or noun phrase (such as ‘a state or condition of … ’); a verb definition usually begins with an infinitive verb (‘to VERB’); adjective definitions begin with adjectival phrases (‘of or pertaining to … ’); and so on. A lexicographer must know what part of speech a word is in order to define it properly. While this is usually not difficult, there are many words which do not sit comfortably in one part of speech. The word apple in ‘I like apple pie’ is functioning like an adjective in that sentence, but it is actually a noun acting like an adjective – it does not exhibit any of the characteristics of what grammarians call a ‘true’ adjective (use after the verb ‘seem’; ability to be modified by the word ‘very’; ability to be compared as more or less of itself; ability to be gradable into comparative ‘-er’ and superlative ‘-est’ forms). Many dictionary publishers make use of in-house databases where all the words in that database are programmatically tagged with part-of-speech information, but even the best tagging programme is not 100 per cent correct.
Once the lexicographer has determined which part of speech the target word in that citation is, they must determine if the word’s meaning as given in that citation is already entered in the dictionary they are working on. This is based on a contextual reading of the citation: a word’s meaning is only recorded by the lexicographer, not established by the lexicographer. It is not unusual for some amount of the collected citations for a word to be for meanings that are already entered into the dictionary. After all, no reader knows all the meanings of every word entered into a dictionary, and the professional lexicographer is no different in this regard. In those cases, the lexicographer marks the citation as already entered, either physically on the citation slip itself or using a marking or sorting programme, so that the citation is not pulled and re-evaluated the next time that the same dictionary is revised. This is not the frustration one might think it is, however. If nothing else, it is evidence to the lexicographer that an older meaning is still in use.
Most often, however, a citation will give evidence for a meaning that is not currently entered in that dictionary. In that case, the citation is set aside or marked as ‘new’ and grouped with other citations that exhibit the same meaning. In the days when citational evidence was on paper, this resulted in a lexicographer’s desk being covered with piles of index cards; now, marking and sorting software turns those desk-bound piles into electronic ones. Once all the collected citations have been sorted, any new meanings uncovered while analysing the citations are evaluated for entry.
The Essential Criteria for Entry
Generally speaking, a new word or a new meaning must meet three basic criteria for entry into a dictionary. The first, and easiest, criterion is that the word in question must have an easily identifiable meaning. Fortunately, almost all words that make it into print do. Even words that many people regard as ‘filler words’ have some sort of lexical value that might be worth entering into a dictionary: um, when seen in print is often used to mean that the speaker is hesitating for some reason, while ugh can communicate disgust or irritation. There are, very occasionally, words that make it into print as examples of a particular type of word (mostly long words), and they do not have a lexical meaning in context. The best-known example of a commonly known word that does not have any lexically significant print use is the word antidisestablishmentarianism, which is often cited in print as an example of ‘the longest word in English’ or ‘the longest word in the dictionary’, though it is neither of those things. Even in the previous sentence, the word antidisestablishmentarianism is used as an example of a long word – but its lexical meaning is not ‘a long word’. It has no lexical meaning in that sentence. These cases are rare, but they do occur.
The second criterion a word must meet before it can be entered into a dictionary is widespread use in printed and, ideally, edited prose. ‘Widespread’ here is not just meant to cover geography, but register (that is, the various types of language that a person uses in different social contexts). The ideal candidate for entry that meets this criterion is one which is used all over the country, or in several English-speaking countries, in formal contexts like academic writing, everyday journalistic writing, and informal published writing like lifestyle blog posts or entertainment weeklies.
The third criterion for entry is sustained written use. General English dictionaries are meant to cover words which have ‘settled’ into the language, and this is not always as straightforward a process as one thinks. Most words are coined in speech, which means they are invisible to the lexicographers who track written use. Sometimes a word is used very sparingly in print for a very long time, and when it appears in print, it is usually glossed (that is, explained in running text parenthetically, like this). To a lexicographer, this is evidence that the word has not quite settled into the language. Additionally, words sometimes shift between meanings when they make the transition from speech into writing as new readers and writers take the word up and try it on, and a lexicographer cannot accurately describe the word’s use because the meaning is in flux. Other times, a word becomes very popular very quickly, but then falls out of common use just as rapidly. Lexicographers need to see a word consistently in print for a certain period of time before it passes the test of ephemerality.
When evaluating words for entry, corpora are often consulted to confirm the lexicographer’s read of a word and to alert the lexicographer to any uses that they might have missed. Corpora are invaluable tools for finding regionalisms that the lexicographer may not be aware of, helping a lexicographer determine the relative frequency of a particular meaning, and even uncovering more uses of an under-marked word.
These three criteria are general guidelines for entry, not hard-and-fast rules. The larger the dictionary is, the more likely it is that the lexicographer will enter words that have a narrower scope of use (either geographically, chronologically, or technically). Dictionaries that focus on one subject area, such as medicine, will obviously not require their medical terms be used in the grocery-store tabloids before they are entered. And dictionaries for English-language learners will enter fewer technical and scientific words and many more idioms and phrases than a dictionary for native English speakers will. Even within one dictionary, the lexicographer must weigh each word carefully with regard to the criteria for entry. Some words (particularly scientific terms) may not have decades of general and sustained use, but the lexicographer nonetheless may decide to enter them because of the prevalence of the things those words describe. Names for newly named chemical elements are good examples of this type of situation, as are the names of viruses like Ebola or Zika. At other times, a word may have a significant amount of use in one type of source – food writing, for instance – and a comparatively small amount of use outside of food writing, and the lexicographer will still draft an entry for the term based on the volume of use within that field. And there are also instances when a lexicographer decides that a discourse marker like ‘mm-hmm’ does have lexical uses after all and should be entered into a dictionary, hundreds of years after its first appearance in print.
Objectivity and Subjectivity
It may be apparent at this point that petitions to dictionary publishers to enter pet coinages are not an effective way to get that word entered into a dictionary. Every part of a dictionary entry, from the way the headword is hyphenated to the definition itself, is based on accumulated written use, and while lexicographers value the interaction that petitions can sometimes give rise to, no one person’s thoughts or feelings about a word will – or should – affect how that word is defined in a dictionary. In that sense, the dictionary reflects the democratic nature of English: all speakers collectively take part in shaping the language.
This is also true of the lexicographer’s own thoughts and feelings about a particular word. Unlike the early English lexicographers, modern lexicographers must set aside their own linguistic prejudices about what makes a word ‘good’ or worthy of inclusion and simply record the language as it is used. In doing so, the lexicographer discovers that much of what they assumed about English is wrong: that peruse has been used for hundreds of years to mean both ‘to skim something’ and ‘to study something in detail’; that fewer people adhere to the less and fewer distinction than claim to; that ‘non-words’ like irregardless do make it into print with regularity; and that, in spite of these supposed errors, English is nonetheless thriving.
When the dictionary-publishing house of Chambers closed its lexicographic department in 2009, a sympathetic article by Allan Brown in the London Times lamented the passing of a venerable institution. For Brown, lexicographers are by nature ‘boffinish’ and ‘pedantic’ and, with the demise of Chambers, the world was losing a valued team of ‘white-haired, cardiganed index-carded old duffers’ whose responsibility it was to keep the language safe. Underlying Brown’s article is a set of popular misconceptions about the process and purpose of lexicography, with which most people in the dictionary community will be depressingly familiar. The telltale word here is ‘index-carded’, reflecting an era when lexicographers – in the tradition of James Murray – approached the task of entry-writing by rifling through citation slips. For Brown (and probably most of the general public), fifty-odd years of technological innovation might never have happened. The reality is that computers have had an essential role in lexicography since as long ago as the 1960s.
In this chapter, we review the transformative effects of technology on dictionary-making in four main areas. We look at the dictionary as a document consisting of continuous plain text, which can be organised as a database. Secondly, we review changes in the evidence base for dictionaries, which has moved from the aforementioned index cards (recording manually gathered citations) to computerised corpora. Thirdly, as corpora grew, the software tools for interrogating them became more sophisticated, thanks to the application of techniques developed in the natural language processing (NLP) community. Finally, we discuss the migration of dictionaries from print to digital media, and the implications of this change for both publishers and consumers.
Early Days of Computers and Lexicography
The first application of computer technology to dictionary-making came during the project which produced the Random House Dictionary of the English Language (1966). Managing Editor Laurence Urdang recognised that the text of a dictionary entry consisted of a finite set of recurrent and clearly delineated components (headword, part-of-speech label, pronunciation, definition, and so on) which were well adapted to configuring as a computer database. This was a huge advance, significantly enhancing the value of a text which could now be stored, searched, and output in new ways. In an early indication of the benefits of technology, it also brought efficiency savings, enabling a large project to be completed in a relatively short time. From our twenty-first-century perspective, Urdang’s insight seems self-evident, but in the early 1960s this was a brilliantly far-sighted innovation. Indeed, Urdang was so far ahead of his time that the computerisation of the Random House operation stopped short at the typesetting stage. The ‘production department failed to find a typesetter able to undertake computer-driven typesetting from customer-generated tape … Astonishingly, therefore, the whole dictionary was dumped onto paper and re-keyboarded at the printing house, adding a year to the schedule’ (Hanks 2008, 468). Nevertheless, within a couple of decades of Urdang’s groundbreaking work, the approach he pioneered had become more or less routine.
The Corpus Revolution
Around the same time, another of the great visionaries of computerised lexicography, John Sinclair, was taking his first steps in corpus development and corpus analysis. With his colleagues Susan Jones and Robert Daley, Sinclair led the Office for Scientific and Technical Information Project, known as the ‘OSTI Project’, in the first half of the 1960s (Daley et al. 2004). The project aimed to provide an empirical description of collocation in English, and at the heart of the research was a corpus of 135,000 words of transcribed conversation stored on magnetic tape. In the report on this project, Sinclair was building on ideas developed by the British linguist J. R. Firth, who believed that the complete meaning of a word is always contextual, and no study of meaning apart from a complete context can be taken seriously. Sinclair’s focus was on recurrent patterns in text (especially collocation) and their relationship with meaning, and he recognised at an early stage that very large corpora would be needed in order to investigate these phenomena on the scale required for lexicography. Electronic corpora of English – notably the Brown Corpus and Lancaster-Oslo/Bergen Corpus ‘LOB’ Corpus – were beginning to be used in the study of grammar. But the well-known Zipfian distribution of vocabulary in a language meant that a corpus like Brown – at one million words – was too small to support an empirical description of the lexicon of English.
Another decade or so would pass before Sinclair returned to serious corpus study, and this time he planned a corpus an order of magnitude larger than Brown and LOB. At the beginning of the 1980s, Sinclair inaugurated a joint project with the publisher Collins, the outcome of which was the Collins COBUILD English Language Dictionary (1987), the first English dictionary compiled through the systematic exploration of a corpus.
The COBUILD project broke new ground in a number of areas, and technology played a major part at every stage. As dictionary text was compiled, it was initially written (by hand) onto a series of paper slips which ‘were specially designed to hold the information in a format suitable for computer input to the dictionary database’ (Krishnamurthy 1987, 79). This data was then keyed into the computer, and as the database grew, ‘facilities for interrogating [it] using almost any item (e.g. a syntax pattern, a particular lexical item entered as a synonym, etc.) were made available … and these facilities were refined and extended as the need arose’ (ibid.). By this time, it was no longer unusual for dictionary text to be stored as a computer database. For example, the first edition of the Longman Dictionary of Contemporary English (1978) was compiled in a database, and included a number of added-value features (including an elaborate system of semantic coding) which – though never visible to the dictionary’s end users – proved an invaluable resource for people working in the field of NLP. But the COBUILD project saw greater interaction than ever before between lexicographers and a database, and this contributed to the internal consistency of the eventual dictionary.
What really set COBUILD apart, however, was its systematic use of corpus data as the evidence base for the dictionary. In the later stages of the compilation process, the Birmingham Collection of English Texts (as the corpus was known) ran to twenty million words, but for much of the project a corpus of around 7.3 million words was used. By today’s standards this looks small, but collecting a corpus of this size was a laborious task in the early 1980s. Many of the source texts for the corpus did not exist in digital form, and the conversion process depended on an early version of the optical scanner – which, like most emerging technologies, was expensive, slow, and error-prone. The raw data for the corpus had to be processed by the University of Birmingham’s mainframe computer in order to produce concordances, testing the available technology to its limits. The lexicographers themselves, it should be stressed, rarely had direct access to computers. Concordances for the words they were working on were printed off from microfiches, and these hard copy versions formed the basis for every linguistic fact recorded in the dictionary. The dictionary entries themselves, as noted above, were handwritten onto forms before being keyed by a separate team of computer officers. The most revolutionary aspect of the COBUILD project was its corpus-driven approach to describing language (which ushered in the new discipline of corpus linguistics), while the technological innovations mostly built on earlier work. Notwithstanding these limitations, COBUILD represented a significant advance in the application of technology to the creation and management of dictionary text. (For an excellent introduction to the COBUILD project, see Moon 2009.)
The decade or so after the publication of the COBUILD dictionary coincided with rapid advances in computer technology. Increased processing power and data storage went hand in hand with dramatically lower costs, and, as personal computers became more widely available, working on computer (though not yet online) became the norm for lexicographers on most dictionary projects. Corpora, too, became progressively larger, and the business of corpus creation – though still a far from trivial task – was no longer quite the heroic undertaking it had been in earlier years. During this period, corpus-based lexicography became standard practice among British dictionary publishers.
The Role of Natural Language Processing
The concordances used in the COBUILD project differed from what we are familiar with today in two important respects. First, they came in the form of static printed pages, with concordances sorted to the right: what you saw was what you got, and options such as sampling or left-sorting were simply not available. Second, there was little in the way of linguistic annotation. The COBUILD corpus (at least for the duration of the original project) was neither part-of-speech-tagged (i.e. each word token is assigned a part-of-speech category) nor lemmatised (i.e. each word is assigned a corresponding basic form or ‘lemma’). Consequently, a lexicographer writing the dictionary entry for sound would be issued with separate concordances for each word form (sound, sounds, sounding, sounded), and would then have the job of disentangling the various adjective, verb, and noun uses; the concordance for the form sound, for example, would include instances of all three word classes.
This was an obvious bottleneck in the corpus analysis, but manual part-of-speech (or lemma) annotation has always been a time-consuming process which is not feasible for reasonably large corpora. To annotate a corpus of ‘only’ 100,000 words would add months to the lexicographic process as well as a need first to establish a reliable annotation methodology. Having the annotation, however, allowed the corpus query system to handle all word forms of the same lemma together and spare the lexicographer from a rather cumbersome synthesis of corpus results. To avoid this, lexicographers soon recognised the value of NLP systems automating these tasks, specifically, part-of-speech taggers and lemmatisers. Fortunately, these were of primary interest in the NLP community too as the first step of natural language analysis (in terms of linguistic levels), and a great deal of effort was (and still is) put into the development of high-accuracy tools for these tasks.
Technically speaking, to perform part-of-speech tagging means assigning a PoS tag – a string of characters encoding the morphological information about the word – to each word. In the case of English, a language with rudimentary morphology, these tags were usually just atomic labels describing the part of speech, possibly with a more fine-grained distinction for some classes such as verbs (e.g. for their tense, or for auxiliary verbs).
Every PoS tagger makes errors, so the information is never perfect; the accuracy of the first manual taggers was around 70 per cent (which means three words in ten had a wrong tag). Today’s tools achieve around 98 per cent accuracy on English texts. While this still cannot be seen as ‘problem solved’, because these 98 per cent usually mean that only some 50 per cent of sentences are completely correctly tagged (so follow-up systems that operate on sentence level still suffer from error propagation), lexicography has greatly benefited from automatically PoS-annotated corpora.
Apart from making it possible to distinguish between sound-noun and sound-verb, the PoS tagging enables querying the corpus for grammatical constructions, such as sound followed by an adverb. For this purpose, a technical language for querying corpora was designed at the University of Stuttgart in the early 1990s – the Corpus Query Language (CQL). Using this language, a corpus can be queried very precisely for a wide range of constructions and word combinations.
A separate task is lemmatisation, which involves assigning a basic form (lemma) to each word in the corpus. Lemmatisation enables searches for all occurrences of a word (e.g. sleep), regardless of the particular word form (e.g. sleeps, slept, etc.). For English, lemmatisation is rather easy (there are only a few suffixes and only a little homonymy), but for morphologically rich languages, it is a challenging task requiring use of large databases. Usually, the lemmatisation is performed after PoS tagging, so that the lemmatiser already knows that this particular word ending with -s denotes a third person singular of a verb, rather than a plural of a noun.
PoS tagging and lemmatisation are the basic annotations that are present in most of today’s corpora. It is possible (and automatic tools are available) to add more information, such as an output from a syntactic parser, or information about a particular word sense or word family – but such annotations are rarer because the accuracy of these tools is significantly lower and they also do not contribute that much to the querying options, except in special cases.
Mega-Corpora and Their Implications for Lexicography
The volume of corpus data available to lexicographers increased by two orders of magnitude in the fifteen years following the publication of the COBUILD dictionary. The British National Corpus (BNC), released in 1993, was created by a consortium of UK dictionary publishers (Oxford, Longman, Chambers) and academic institutions (Lancaster University, Oxford University’s Computer Services department, and the British Library), with significant funding from the British government. Details of the design, creation, and linguistic annotation of the BNC can be found at British National Corpus, www.natcorp.ox.ac.uk/corpus/. Its aim was to collect ‘samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the twentieth century, both spoken and written’. Annotation of the BNC, including PoS tagging and lemmatising, was carried out at UCREL, the Centre for Computer Corpus Research on Language at Lancaster University. Crucially, while the COBUILD corpus measured in the tens of millions of words (rising to twenty million words by 1987), the BNC was the first corpus of English to breach the 100-million-word target.
For lexicographers and corpus linguists alike, the BNC represented a major advance. Yet within a further ten years, we begin to see corpora another order of magnitude larger. The first of these was the Oxford English Corpus, a billion-word corpus assembled by Oxford University Press and made up of texts dating from the year 2000 onwards. Larger corpora followed in quick succession, and at the time of writing, the largest available corpus of English runs to over thirty billion words – though effectively, there is no longer an upper limit on corpus size for English and other major languages. What drove these changes was the arrival of the Internet, so that assembling very large corpora became a far less daunting enterprise than it had been in the 1980s and 1990s. And in another move towards working practices which we now take for granted, lexicographers began to work not only on-screen but also online – enabling large dictionary projects to be completed by geographically dispersed lexicographic teams.
For lexicographers working in English, the days of data-sparseness were well and truly over. But this presented its own challenges. Since the beginnings of corpus lexicography, the concordance was the primary tool for language analysis. Scanning a couple of hundred instances of a word in order to identify its lexicographically relevant features requires skill and patience, but it is perfectly feasible. With the new mega-corpora, even relatively infrequent words come with thousands of concordance lines. For example, the word conceptual occurs 1,000 times in the 100-million-word BNC, which gives it a normalised frequency of ten hits per million words of text. On this basis, one would expect the COBUILD lexicographers to have had access to around seventy-five instances of this word. But users of a billion-word corpus would be confronted with perhaps 10,000 occurrences of conceptual (and most dictionary developers now use even larger corpora). And conceptual is only a mid-frequency word; for most really common English words, today’s corpora provide hundreds of thousands or even millions of examples. It is simply not practical to analyse that much data in the form of a concordance.
Random sampling was one response to this problem (take your 10,000-line concordance for conceptual and make a random sample of 500), but not a satisfactory one. What was needed was a technology which could ‘fully exploit the benefits of very large corpora, while preserving lexicographers from an excess of information’ (Kilgarriff and Rundell 2002, 808). A solution was on the horizon.
‘Lexical Profiling’ and the First Word Sketches
As early as 1990, it was recognised that the available tools for corpus analysis were no longer adequate for use with the larger corpora which were then emerging. A groundbreaking paper co-written by a computational linguist (Ken Church) and a lexicographer (Patrick Hanks) proposed the use of a word association metric from the field of information theory as a means of identifying statistically significant patterns of co-occurrence among words in a corpus. This opened up the possibility that a mathematical formula ‘could be applied to a very large corpus of text to produce a table of associations for tens of thousands of words’ (Church and Hanks 1990, 28). One of the potential applications of this approach was ‘enhancing the productivity of lexicographers in identifying normal and conventional usage’ (ibid.) by providing a simple summary (derived from the data in a corpus) of a word’s most salient collocates.
The metric used by Church and Hanks was the so-called ‘mutual information’ measure (MI), and corpus-querying tools of the time began to incorporate lists of frequent collocates as revealed both by MI and another metric, named the T-score. Initially these had little impact on lexicographers’ working practices, because the lists tended to be noisy and random, requiring a good deal of human effort to extract genuinely useful information. But Adam Kilgarriff and David Tugwell, working at Brighton University, hit on the idea of finding (and then grouping) collocates on the basis of an inventory of grammatical relations. These relations included combinations such as verb+object collocations (like forge+alliance, bond, link, partnership, etc.), adjective+noun combinations (sound, practical, useful, etc. +advice), adverb+adjective combinations, and many others. They also experimented with different word association metrics, in order to smooth out some of the biases in the lists generated by the MI and T-score. Their research led to the development of the Word Sketch, which provides an at-a-glance summary of the most important facts about a word’s combinatory preferences.
The base for Word Sketches is a set of queries in the CQL referred to earlier. Each of these queries describes a grammatical pattern, such as a verb and its object. If the pattern is found in the corpus, the particular words (lemmas) matching the position of the verb and the object are extracted and remembered as a collocation candidate. Then, the word pairs extracted by this procedure are computed and scored statistically. The original Word Sketches used the MI score to compute co-occurrences, but since 2006 another metric, the logDice coefficient, has been used because it is not affected by the size of the corpus, and has been found, through a great deal of trial and error, to deliver more satisfactory results.
A primitive version of the Word Sketches was used in the late 1990s, during the development of the Macmillan English Dictionary (2002; now available at macmillandictionary.com). Around 8,000 sketches, using data from the BNC and covering the most frequent nouns, verbs, and adjectives in English, were loaded onto the computers used by Macmillan lexicographers (who at this stage still worked offline). Collocate lists were organised according to grammatical relations, and for each listed collocate, ten examples from the BNC were provided. The original purpose of using Word Sketches during the compilation stage was to complement the use of concordances and support a systematic account of collocation in English. But the Word Sketches turned out to have considerable diagnostic power. By delivering a user-friendly snapshot of the most important features of a word’s behaviour, Word Sketches ‘came to be the lexicographer’s preferred starting point for analyzing a given word’ (Kilgarriff and Rundell 2002, 817).
Further developments followed, as the same technology was applied to corpora in other languages. The Word Sketch function was harnessed to the fastest concordancer then available, resulting in the first version of Sketch Engine (Kilgarriff et al. 2014) – a software package which gradually acquired more functions (and many more corpora) to become the industry-standard corpus-querying system for lexicography in English.
Dictionary-Writing Systems
Following Laurence Urdang’s pioneering experiments in the 1960s, it gradually became normal for publishers to use computer databases to store and organise their dictionary content. One significant development was the digitisation of the Oxford English Dictionary (OED). The OED, published in twelve volumes in 1933, had been updated with a four-volume Supplement in 1986, and at this point the dictionary ran to over 20,000 printed pages, all typeset using old technology. During the second half of the 1980s, it was converted into a digital database (in a joint project with the University of Waterloo, in Canada) – an enormous undertaking.
At this stage, each dictionary publisher would use its own home-grown software systems, developed in-house, and sometimes linked to general-purpose text-processing programs. This model began to be disrupted with the launch, in 1992, of a generic, off-the-shelf software package for dictionary production. Gestorlex, developed by the Danish software house TEXTware, was an ambitious product, consisting of a database module with a structured environment for writing and editing dictionary entries, and a corpus-querying system which generated concordances.
Gestorlex was too far ahead of its time to survive. Lexicographers were still working offline through the 1990s, and this applied to Gestorlex users too. The corpus and editing software had to be installed on individual PCs, and batches of work would be downloaded, edited, then uploaded to a central database. On top of this, the software was engineered to work on IBM’s new (at the time) operating system OS/2 – which never became a mainstream product. Although Gestorlex was used for a time by several dictionary publishers, including Longman, it proved to be a false start. But it set an impressively high bar for future developments.
Wide Internet accessibility led, finally, to the now-usual working model, in which lexicographers investigate corpus data and compile or edit dictionary text online, with the corpora and dictionary databases typically held on remote servers. To exploit this new environment, a number of dedicated dictionary-writing systems were launched in the early 2000s, including the South African TshwaneLex system and the Dictionary Production System (DPS) produced by Paris-based IDM, which (at the time of writing) is used by most UK dictionary publishers. These sophisticated packages generally include project-management modules and systems for publishing dictionary text in either print or digital media.
Automating Lexicographic Processes
Producing a dictionary is a complex and labour-intensive business. From the outset, the application of new technology has brought efficiency savings, and by the early 2000s, some of the processes involved in dictionary-creation had been substantially automated. With the advent of corpora, the creation of headword lists became simpler and more ‘scientific’. Broadly speaking, a dictionary which is planned to have 50,000 headwords will start from the 50,000 most frequent words in the corpus, plus (say) the next 10,000 words by frequency: this provides a candidate list from which the final headword list can be created.
The new generation of dictionary-writing systems has not only made life easier for lexicographers, but has also contributed to accuracy and systematicity. The internal ‘syntax’ of a dictionary entry is built into the software, and this facilitates the entry-writing process. For data fields with a finite set of options – such as style labels, grammar codes, or part-of-speech markers – lexicographers typically select from a drop-down list. This ensures that the content of these fields, and the order in which they appear in the entry, are controlled by the software. Thus, ‘human error is to a large extent engineered out of the writing process’ (Rundell and Kilgarriff 2011, 260). Meanwhile, various routine tasks are handled in the background, unobtrusively and without human intervention – notably the tedious and (for humans) error-prone business of ensuring that cross-references match up correctly.
Developments like these have relieved lexicographers of much of the ‘drudgery’ which Samuel Johnson famously saw as the lexicographer’s lot: the dull, intellectually undemanding jobs which nevertheless have to be done, and have to be done well. A more interesting challenge is to see how far one can automate the more complex parts of the editorial process. In response to a request from UK publisher Macmillan, Adam Kilgarriff and his colleagues created an algorithm, in 2008, which would facilitate the procedure for finding appropriate example sentences in a corpus (Kilgarriff et al. 2008). Examples are a feature of most dictionaries, especially pedagogical dictionaries, but the usual working practice – whereby lexicographers scan dozens or hundreds of concordance lines in order to identify suitable instances – is time-consuming and therefore expensive. The GDEX (‘good dictionary examples’) tool works from a set of criteria for what constitutes a ‘good’ example, such as sentence length, the presence or absence of rare words and proper names, and the number of pronouns a sentence contains. Those corpus sentences which most closely match the criteria are then ‘promoted’ to the top of the concordance, providing the lexicographer with a list of perhaps a dozen likely cases. When the system works optimally, the task of picking examples for the dictionary becomes much simpler and can be performed far more quickly.
Software tools such as Word Sketches and GDEX point the way to a new way of working. Traditionally, a lexicographer would carefully examine the corpus evidence in order to identify a word’s most significant features, such as its syntactic and collocational preferences, and to find good example sentences. With these new technologies, a different working model is emerging, where the computer presents the lexicographer with intelligently selected information, which can then be finalised by human editors – thus largely bypassing the laborious process of ‘manual’ corpus analysis.
Such automatically generated dictionary drafts exploit NLP technologies to derive headword lists, find corpus examples, induce word senses (by clustering collocates or by other means), and suggest collocation candidates, dictionary labels, or even definitions (taking various forms of explanations by means of using collocations, synonyms, or even multimedia). The focus then needs to be put on devising a solid methodology and tools for the post-editing workflow. One of the most recent attempts on this road is Lexonomy (Měchura 2017), a web-based dictionary-writing system designed for the post-editing of dictionary drafts. The key feature of Lexonomy lies in functions which ease the ‘fixing’ of the drafts and functions which preserve data’s connection to the underlying corpus evidence, so that lexicographers can quickly accept automatically generated content, access corpus data in case of doubt, and revise the drafted entries accordingly.
From Print to Digital
Most of our discussion has focused on the creation, editing, and management of dictionary text. But probably the most significant change since the year 2000 has been in the way that dictionaries are delivered to their users. The complete OED was released as a CD-ROM in 1989, and other publishers quickly followed Oxford’s lead. But at this stage, changes were largely cosmetic; these were not much more than ‘books in digital form’. The CD-ROM is in any case an obsolescent technology, and dictionaries in this form were soon superseded, as most publishers migrated their products to online platforms. (Many online dictionaries continue to exist in print editions, too, but this situation seems unlikely to continue for long.)
From a historical perspective, this is a very recent change, and it will be some time before its full implications become clear. But the concept ‘dictionary’ is in the process of being redefined. Most online dictionaries already include multimedia content, as publishers experiment with the opportunities which the new media offer. Features such as audio pronunciations, animations, and video clips are now common, and one can observe a general trend towards visual (rather than text-based) ways of presenting lexical information. Supplementary materials (blogs, Q&A features, quizzes, and the like) are becoming a normal part of a dictionary website, and the addition of ‘user-generated content’ (still in its early stages) is further broadening the scope of what users expect in a dictionary. Another obvious consequence is that dictionaries can now stay fully up to date. When dictionaries were printed books, they would typically be updated with new editions every four or five years. But in online media, new material (such as novel words or meanings) can be added at any time – and should be, as there is an expectation among users that they will find such data in their online dictionary.
With the move online, lexicography is entering a new era, and the consequences of this change are still being worked out. For example, the space constraints of a printed dictionary meant that the headword list had to be carefully selected, and publishers would have strict inclusion criteria. With no such limitations in digital media, how do we now decide which words should be included (or excluded)? Issues such as these are still under debate in the lexicographic community.
Conclusion
The automation of lexicographic tasks continues. Emerging technologies which will contribute to this include systems for detecting novel vocabulary (new words and new meanings of existing words), and tools for automatically applying subject-field labels (for example, where a word is predominantly used in medical or legal discourse). Developments like these reflect the interaction between lexicography and NLP, which has been so fruitful in streamlining many of the processes involved in creating a dictionary. There are still many aspects of the lexicographer’s job which require a great deal of skilled human effort, notably word sense disambiguation and definition-writing. But even these are unlikely to be intractable in the long term. In this chapter, we have shown how – over the last half-century or so – the application of technology, and collaborations with researchers in the NLP community, have transformed both lexicography and the dictionary. Further exciting developments are in the pipeline.
In modern lexicography, a core distinction has been made between diachronic and synchronic dictionaries. English dictionaries are at the centre of this debate, since the Oxford English Dictionary (OED), a landmark scholarly undertaking of the nineteenth century, is arguably the most successful exposition of the diachronic approach to dictionary-making. While many other historical language dictionaries have modelled themselves on the OED, the development of a more theoretical basis for synchronic dictionaries was largely led by English language learner dictionaries in the late twentieth century. This chapter seeks to explain the distinctions between diachronic, or historical, dictionaries and their synchronic counterparts; how the distinction arose in English lexicography; what it means for those using or writing dictionaries; and, perhaps, why it’s important. In conclusion, I will assess how meaningful the distinction continues to be today, and what changes we might expect to see in the future.
Terms and References
A diachronic dictionary is one which is concerned primarily with describing the language as it has developed and evolved through time. A synchronic dictionary, on the other hand, is concerned primarily with the language as it exists at a particular time (typically the present) without explicit reference to the past. In modern references, the term ‘historical’ is often used interchangeably with ‘diachronic’. The term ‘historical’ to refer to dictionaries on historical principles emerged before the distinction of ‘diachronic’ and ‘synchronic’ was formalised by early twentieth-century linguistics. For most practical purposes, synchronic dictionaries deal with the present-day language, although this is not necessarily the case: a dictionary only of Shakespeare’s language is also a synchronic dictionary. Diachronic dictionaries typically take the present day, or near present day, as the vantage point for their compilation and exposition.
The OED is the most well-known and celebrated diachronic dictionary in English, and is the main diachronic reference point for this chapter. There are many synchronic dictionaries, and those I’ll be referring to span both the US and the UK and include, among others, the Oxford Dictionary of English (ODE, a core text of the website OxfordDictionaries.com), the American Heritage Dictionary, and the Random House Dictionary (RHD, a core text of dictionary.com).
Structural Features of Synchronic and Diachronic Dictionaries
Differences between synchronic and diachronic dictionaries are best explored by looking at the treatment of various dictionary structures. Modern dictionaries have highly formalised structures; they follow a regular internal organisation and share broad features of organisation in common with each other. Commentators and meta-lexicographers since the 1970s have broadly agreed on the division of dictionary structure into features of macrostructure and microstructure.
Macrostructure deals with large-scale features of the dictionary organisation outside individual entries; for example, inclusion policy and word coverage, as well as high-level treatment of derived forms, compounds, phrases, and homographs. Microstructure deals with features within an entry; for example, variant spellings, sense organisation, definitions (or translations in bilingual dictionaries), labelling, and examples of usage.
Many of these features of dictionary macro- and microstructure have a diachronic or synchronic aspect. As a result, synchronic dictionaries will tend to treat features in one way, while diachronic dictionaries will tend to treat them in another. In the next sections, I focus on individual features, highlighting the way each feature informs or underpins a diachronic or a synchronic approach. As will become clear, the two approaches are not always easily separated, not just in practice but axiomatically, such that synchronic approaches are often dependent on a diachronic perspective to some extent, and vice versa.
Macrostructure Features
Inclusion Policy and Coverage
Inclusion and coverage refer to the categories, numbers, and types of words that are included; for example, whether the dictionary includes all forms of language (slang, technical, regional, obsolete, etc.) or alternatively includes, for example, only standard terms, or only modern slang (such as the crowdsourced consumer website UrbanDictionary.com). Decisions about inclusion and coverage for synchronic dictionaries are largely driven by factors relating to the purpose and target audience. A small dictionary of only a few thousand words designed for children, for example, will try to focus on the words that children need to know and use and exclude the rest, while a general-purpose learner’s dictionary will focus on core vocabulary and avoid unusual and less-used terms.
While synchronic dictionaries are generally practical tools to help with understanding and using the language of the present day, diachronic dictionaries typically have a mission to describe and document history for its own sake. Their primary aim is to be comprehensive and inclusive rather than to fulfil a particular audience need (though they may well do that as well). In a later section, I shall look at how, in the nineteenth century, this idea of the mission to describe the whole language became important. But whatever the reasons, diachronic dictionaries such as the OED are inevitably much larger and contain more information than synchronic dictionaries. The OED includes obsolete and archaic words from around 1,000 C.E. to the present day, together with their forms, etymologies, examples, grammatical features, and meanings. Around 20 per cent of the OED is made up of obsolete entries and senses (52,000 entries and 166,000 senses in March 2018). Some obsolete entries are likely always to have been rare (such as praggish, meaning ‘cunning’, or praiere, ‘a meadow’), while others were more widely used at one time but do not exist today (such as accite, ‘to excite (feelings)’ and abroach, ‘to pierce’, both of which are found in Shakespeare’s plays).
While synchronic dictionaries focus on present-day English and typically do not include obsolete vocabulary at all, they do include archaic vocabulary. Archaic terms can be understood as those terms which are not actively used in the present day but still retain influence or resonance owing to their presence in literature, or to their deliberate use to give an old-fashioned flavour (as for example words such as knave or alack). Obsolete and archaic terms are a peculiarly diachronic feature of synchronic dictionaries, just as they are a peculiarly synchronic feature of diachronic dictionaries. In the first instance, archaic terms exist only in reference to the past, which means that, despite appearances, synchronic dictionaries include historical information; in the second, archaic terms exist only in the context of looking at them from the perspective of the present (synchronic) period. In summary, both diachronic and synchronic dictionaries deal with time and language change.
Entry Organisation and Homographs
Both synchronic and diachronic dictionaries identify words using their modern spelling. For a synchronic dictionary, this is relatively uncomplicated. For a diachronic dictionary, it is justified in terms of the benefits of normalisation for search and retrieval and also due to the fact that, before the eighteenth century, spelling was so variable that picking one form from many would be largely arbitrary anyway. Nonetheless, normalising the use of present-day spelling conventions can give rise to apparently counterintuitive results, as, for example, the situation in which the head form for obsolete words (for example, abroach) may not actually be attested in the evidence gathered (for example, abbroche, abroche).
Decisions about the organisation of ‘homographs’ (words that have the same spelling but are otherwise distinct in meaning, grammar, or origin) can be complex. Whether guy as in ‘a man’ is treated together with guy as in ‘a rope’ because it has a different origin is a question that results either in one single entry, or two separate ones (two nouns, each with the same homograph). Similarly, bear meaning the animal and bear meaning ‘to carry’ can be separated either on grounds of origin or on grounds of representing different parts of speech (a noun and a verb), or both.
For the OED’s diachronic approach, none of these questions is really a problem: homograph organisation is driven strictly by origin, and, knowing that, the dictionary editor can organise usages with the same origin together (whether the usages are similar semantically is of secondary importance). Since different parts of speech also have their own histories, each part of speech is typically also a separate homograph in the OED. Thus, the OED lists two entries for guy, and eight in total for bear. Considerations of synchronic logic or later semantic development have no bearing on high-level homograph organisation. In the treatment of capital, for example, two different etymologies and hence two different homographs are traced, which show different development from Latin via Old French, but which are both ultimately connected to Latin caput meaning ‘head’. The first, slightly older homograph gives rise to the architectural sense of capital meaning ‘the top of a column’, while the second homograph gives rise to the more usual meanings: capital meaning ‘capital city’, financial meanings of capital, and capital as in ‘capital letter’. These meanings are grouped together, even though, to a synchronic eye, they seem at least as different semantically from each other as they are from the architecture sense.
We might expect synchronic dictionaries to take a different approach to homograph division, given their focus on present-day language. Certainly, this is true of learner’s dictionaries: both the Oxford Advanced Learners Dictionary of Current English and the Cambridge International Dictionary of English (CIDE, at dictionary.cambridge.org) have a single homograph for capital. However, curiously, in most general-purpose dictionaries from the US and the UK, this is not the case. Both the Oxford Dictionary of English and dictionary.com (i.e. Random House Dictionary), replicate the OED structure and retain two distinct homographs. I shall return to this tendency of synchronic English dictionaries to retain etymological homograph distinctions in the section on etymologies.
Derivatives
Derivatives, or derived forms, are words derived morphologically from other words. For example, prettily is an adverb which is regularly derived from pretty, while comfortable is an adjective from the noun comfort. Derivation is by definition a diachronic feature (since it describes a temporal development of one word from another), and is one of the primary types of productive word formation in a synchronic context as well.
The treatment of derivatives in diachronic dictionaries is straightforward and consistent. All derivatives have full entry status regardless of their relative semantic distinctness or importance, because each necessarily has a separate traceable origin and history. Thus, capitally has its own entry and so does capitalise, even though the former is quite uncommon.
Within a single synchronic dictionary, derivatives may be separate entries or they may be treated as run-ons at the parent entry. This will depend on how important and distinct they are in their own right (formerly in print dictionaries, placing derivatives under their parent entry was also a common device for saving space). For example, depression is related morphologically to depress but it has many distinct modern meanings and usages in psychiatric, economic, and meteorological domains. It is not useful to most readers to place it as if it were just a simple morphological development of depress, and consequently all modern synchronic dictionaries position it as an entry in its own right. However, the same modern dictionaries treat more straightforward derivatives as undefined forms listed at the parent entry (such as listlessly and listlessness under listless at dictionary.com). This ‘nesting’ of one entry inside another is done on the assumption that the user can extrapolate the meaning and use of the derivative from its parent given a basic understanding of derivational rules.
Synchronic dictionaries are free to make pragmatic, user-focused decisions on a case-by-case basis on the treatment of derivatives, which makes sense for users but generates inconsistency in the data model. In the digital era, the need for a consistent data model is increasingly important for machine-readable use cases, so the underlying data structure may need to be different from what is presented in the (human) user interface. For diachronic dictionaries, historical principles have driven a more systematic and consistent approach.
Compounds, Phrases, and Other Multi-word Units
While the OED promotes all morphologically distinct forms (derivatives) to entry level status, the majority of compounds, phrases, and other multi-word units are treated below entry level. In the example of capital, the relatively unimportant derivatives capitally and capitalhood have their own entry, while the compound forms capital gains tax and capital punishment appear in a section following the main sense blocks for capital. This is consistent with the historical principles of the diachronic dictionary, which privileges morphological over semantic development. In synchronic dictionaries the situation is often reversed, in that compounds are more likely to be treated as separate entries on the grounds of their importance for users, but as with the treatment of derivatives, it depends on the editorial policy of the dictionary. In modern digital approaches, the data model could accommodate variable outputs depending on the use case.
Microstructure Features
Etymologies
Etymologies describe the origin of words. Etymologies are one of the essential features of diachronic dictionaries, while for synchronic dictionaries they are optional. In diachronic dictionaries, etymologies are placed at the beginning of the entry, emphasising their importance. For complex entries, OED’s etymologies may run to hundreds of words in length; they are a notable feature of recent work on the third edition, drawing on extensive new scholarship and becoming mini scholarly articles in their own right. Where they exist, etymologies in synchronic dictionaries are shorter, often truncated versions of what may be found in a historical ‘parent’, and in some cases, summaries of the OED’s sense development; typically, they are placed at the end of the entry. Few learner’s dictionaries or smaller dictionaries include etymologies.
It might seem surprising that etymologies exist in synchronic dictionaries at all, focused as they are on the current language. Part of the reason for their inclusion is tradition: the early synchronic dictionaries grew out of diachronic antecedents and kept many of the hallmarks of them. But it’s more than that too. Users of synchronic dictionaries do not need to know where words come from in order to use them, but knowing word origins seems to be rooted deep in native speakers’ sense of language identity, as well as in the practice of dictionary editors.
Sense Organisation
Where a word (or homograph) has multiple meanings, dictionaries need a set of principles to determine how to order them.
It seems self-evident that diachronic dictionaries organise meanings by chronology. In the OED, the first sense is always the one for which there is the earliest documentary evidence — even if it is obsolete, archaic, or rare in modern-day English — and the date of the first citation of that sense is taken as the earliest date of the entry. Thus, the first sense of the noun budget in the OED is dated 1434 and defined as a ‘pouch, bag, wallet, usually of leather. Obs. exc. dial’. The modern meanings relating to forecasts of expenditure do not emerge chronologically for another 300 years, and do not appear until sense 4.
By contrast, synchronic dictionaries typically aim to put the most ‘salient’ modern meanings first. Criteria for determining salience vary. Using evidence from large modern text corpora, frequency is a key indicator of salience, but this may be overridden by ordering principles such as ‘logically prior’ (e.g. concrete senses before figurative ones), sub-sense groupings (related senses together), or specialism (generalised senses coming before specialised ones). In synchronic dictionaries, there is a degree of subjectivity that is difficult to escape: the entry for trunk, for example, has several important senses, including ‘the main part of the human body’, the trunk of an elephant, and ‘a luggage box’, and there is little agreement among synchronic dictionaries over which to put first.
In fact, chronology is not the only principle for ordering in diachronic dictionaries. In larger entries, semantic, grammatical, or syntactical ordering criteria are used at secondary levels to give logical shape and coherence as well as to enable navigation, and in this structure they abandon a single strict chronological sequence. As the OED’s first editor, James Murray, wrote: ‘As … the development often proceeded in many branching lines … it is evident that it cannot be adequately represented in a single linear series’ (Murray 1884, vii–xiv). The noun capital, for example, has subsections distinguishing core strands of meaning development, such as ‘relating to the head’ and ‘standing at the head’.
Synchronic dictionaries aim to foreground core modern meanings, while diachronic dictionaries do not. In diachronic dictionaries, the modern sense may appear at any point of the entry (as with the entry for ‘budget’); this can be disorienting, since the modern sense is, by default, the access point for modern users as well as its anchor point. With flexible data models, there is no reason that diachronic and synchronic approaches to sense ordering cannot be combined to allow users to access both.
Definitions
Definitions, or explanations, are core to any monolingual dictionary. Modern English synchronic dictionaries have focused on ensuring that definitions are accurate and accessible. At least part of this drive has been the explosion of English language learning in the last half century. Drawing on today’s large proprietary text corpora, such as the Oxford Corpus of English and the Cambridge English Corpus, lexicographers assess the evidence with the help of statistical tools and aim to write definitions in normal everyday English that accurately represent the typical meanings of a word.
Definitions seem to have been less of a focus for the OED’s original editors. Despite a whole section on ‘Signification’, definitions and definition writing are barely mentioned in Murray’s ‘General Explanations’. The first OED editors sometimes borrowed or quoted definitions from earlier dictionaries. While that practice has been abandoned, the defining language of the OED today is noticeably formulaic and does not read like natural language, as in this example from the entry for capital, sense 6b: ‘Of a town, mansion, estate, monastery, etc.: that is the principal of its kind (in a region, group, etc.); (also) chief, principal, main; important. Hence: designating, of, or relating to a capital city’. Whereas a huge amount of attention has been given to improving definitions for synchronic dictionaries, the same is not true of diachronic dictionaries, and perhaps this is understandable: the focus of the OED, and other diachronic dictionaries, is the scholarly endeavour to delineate the history and origins of words, and capturing that level of complexity has not always been in sync with defining language that is clear and accessible.
Evidence: Quotations and Illustrative Examples
In modern lexicography, evidence (whether from large text corpora or individual citations) is the basis for both synchronic and diachronic approaches. But while it might appear that quotations in the OED are performing the same function as examples in synchronic dictionaries, a closer look suggests they are different. Synchronic English dictionaries typically include real examples of text from modern sources, but rarely cite their source. The focus is not the source but the illustration of the word in natural language; to this end, small changes to the wording of the source text may be admitted for clarity or brevity.
Diachronic dictionaries include verifiable citations (i.e. snippets of text from cited sources). The scholarly aim is primarily to verify that a word or sense exists, precisely, at a particular date and in a particular source. Considerable effort is expended on bibliography and dates, with the earliest attestation being of particular interest. For older periods, the research might involve, for example, dealing with lacunae, illegible text, or variable spelling. The selection criteria for quotations are not primarily based on typicality of use (even if that could be known for a period when evidence is scarce), but on the necessity of showing the earliest date and a spread of quotations over time. This is partly practical: whereas for synchronic dictionaries there is, in practice, sufficient evidence to inform accurate generalisations about word use and behaviour, for diachronic dictionaries this is not the case. In general, the older the period, the fewer pieces of evidence are available and the more difficult it is to select typical citations and make accurate generalisations.
Theory and Origins
This section looks at the origins and possible theoretical basis for the distinctions between synchronic and diachronic dictionaries. The use of the contrastive terms ‘synchronic’ and ‘diachronic’ in relation to language and dictionaries is relatively recent. Neither term is recorded in English in this context before the 1920s. The nineteenth century was characterised by the growing field of historical linguistics and the establishment of the best-known dictionaries on historical principles, including the Grimm brothers’ Deutsches Wörterbuch (first volumes published 1854) and the OED (published 1884–1928). In English, the term ‘historical’ with reference to language is first recorded in the early part of the nineteenth century (see OED’s sense 2c, and the citation from 1832). During this period, the study of language was increasingly seen as the study of evolutionary development, tracing words from their earliest origins to the present day, where the present day was seen as the pinnacle of human progress.
The late eighteenth and early nineteenth centuries were a period in which the dictionary was established in English as having a normative, standardising function as well as a scholarly mission to capture the entirety of language for the first time. In the views of the time, the march of ‘progress’ signalled by the growing middle class and the extension of the British Empire throughout the world brought with it a perceived need to ‘fix’ and ‘stop’ the forces of decay in the language at that moment of high achievement, however impossible and subjective that mission might turn out to be.
Thus, the great historical dictionaries were not only bold cultural statements; they were also primarily documentary accounts rather than practical tools. According to Richard C. Trench: ‘A Dictionary is an historical monument, the history of a nation contemplated from one point of view’ (Trench 1860, 6). Of course, the claim to be comprehensive did not extend to less favoured registers of the language: the language of the working classes and regional dialects could happily be ignored, for example. The notion of comprehensiveness was focused on the ‘best’ sort of language, and the literary greats of the past were read and absorbed in order to enrich and enlighten the present population’s understanding of words in the present. Not only did words come to have meaning only by means of their history, but also, according to Trench’s logic, ‘[t]he study of language [was] … the most potent means … for planting us in the true past of our country’ (Trench 1860, 8).
The high-minded historical view of language prevailed until well into the twentieth century. It was not until the work of Ferdinand de Saussure and the development of structural linguistics in the twentieth century that the case was put forward for a serious study of language as a system in itself, operating at a particular point in time and without reference to a temporal dimension. Where words came from or how they had been used in the past was less important than how words related to each other and how meaning was constructed through those relations. In the 1920s, the words ‘diachronic’ and ‘synchronic’ were borrowed directly into English from the French, as used by Saussure and recorded in his posthumous Cours de linguistique générale (1916).
Structuralism was hugely influential in linguistics and anthropology in the early and mid-twentieth century. Even if much of Saussure’s thinking has now been superseded, the work of the structuralists was important in changing the terms of the debate and overturning the strongly held assumption that linguistic study was necessarily historical. This process thereby reset the debate for the emergence of synchronic theories, especially theories of meaning, which would have a greater influence on lexicography later on.
Of course, dictionaries which were synchronic in their approach and content had existed for years, but they were deprecated as functional tools rather than serious contributions to the field of language study. They were ‘primers’ for the ordinary man and woman, designed to enable them to understand the current language and communicate effectively. Funk & Wagnall’s A Standard Dictionary of the English Language (1893) was a large and ambitious synchronic dictionary of current American English which ordered definitions according to current rather than historical usage and placed etymologies at the end of definitions, rather than at the beginning, but the professed approach was functional, rather than academic, as indicated in the preface: ‘It has been thought better not to follow a system simply because it is logical or philosophically correct, if practically it hinders rather than helps the inquirer’ (Funk & Wagnall 1893, xi).
As the demand for everyday synchronic dictionaries increased, publishers initially found it difficult to free them from historical principles. Dictionaries published around the turn of the twentieth century — for example, the Concise Oxford Dictionary (1911) and Webster’s Collegiate Dictionary (1898) — were published as dictionaries for practical, everyday use, but they were more like compact versions of their historically based parents, and as such often preserved their organisational principles and features, ordering senses chronologically and including obsolete terms. Some of these habits remain in synchronic dictionaries today, particularly the practice of homograph distinction on grounds of origin (e.g. separating guy into two entries), which matches a strong impulse, at least with dictionary editors, to organise on the basis of origin regardless of current meaning and use.
Structural linguistics can be seen as a necessary foundation that gave academic credence to synchronic approaches and allowed new discourses to emerge. Several theoretical developments, in particular those related to theories of meaning, were later to become important in terms of their application and use in dictionaries. In particular, common features and themes existed among Wittgenstein’s later work, with its interest in fluid meaning boundaries (as with the well-known discussion of the word game; see Philosophical Investigations); the Californian psychologist Eleanor Rosch’s prototype theory in the 1970s, with its focus on the natural human perception of typical meanings rather than peripheral edge cases; and the English philosopher J. R. Firth’s focus on the context-dependent nature of meaning (Firth is famous for his aphorism, ‘You shall know a word by the company it keeps’). First, these studies were empirical, and second, they located meaning in the usage and behaviour of language itself and its context, rather than in any external reference, historical truth, or abstract ‘langue’.
Important as these developments were, it was the growth of corpus linguistics that inspired a new generation of synchronic dictionaries. The Brown corpus of 1967 was tiny (one million tokens) by modern-day standards, but well annotated and large enough to reveal the potential of empirical, objective investigation of synchronic language. The Collins COBUILD English Language Dictionary (1987), the brainchild of the linguist and Birmingham University Professor John Sinclair, was the first English language dictionary to be systematically compiled from scratch on the basis of real corpus evidence of everyday written language. In a mission to provide a brand-new dictionary to transform the learning of English, the project team compiled a large modern corpus (the Bank of English), used statistical methods to analyse frequency and prioritise uses, and abandoned many commonly held dictionary conventions (such as a lingering historical bias, or the principle of ‘substitutable’ definitions) in favour of a resolute reliance on the evidence in front of them. COBUILD revolutionised the learner’s dictionary market and, over the next few decades, exerted influence on bilingual dictionaries and general monolingual dictionaries. When Oxford University Press decided to invest in a new monolingual English dictionary in the early 1990s (first published as the New Oxford Dictionary of English in 1998; later the Oxford Dictionary of English and its cousin, the New Oxford American Dictionary), it was not only looking to contemporary rivals such as the Collins English Dictionary, but also aimed to take a bold step that would fully mark it out from its venerable parent, the OED.
It is hard to say where the debate between synchronic and diachronic dictionaries currently lies. There is no doubt that rapid migration online in the early twenty-first century, for both diachronic and synchronic dictionaries, has led to huge disruption and transformed business models. But the principles of dictionary writing do not seem to have radically changed. Online dictionaries look remarkably similar in layout, structure, and approach to print dictionaries (see Hargraves this volume).
There are signs that this could and should change. In the digital era, dictionaries as isolated datasets have limited value compared to dictionaries as data which is interconnected, agnostic, and primed for multiple uses. Thinking of synchronic and diachronic approaches as somehow mutually exclusive has been driven by analogous ideas of, for example, the physical ordering of elements (on the page or webpage). In turn, this magnifies the differences and disadvantages of each approach, whereas there are plenty of similarities and benefits to their being more closely aligned. From a data perspective, diachronic and synchronic are just two of the axes on which data points can be plotted, and access to all the data relating to them allows for variable outputs, as well as the enrichment of one or the other approach. This is how technology companies think about language data, and as dictionary makers, we need to as well.
Conclusion
Diachronic dictionaries aim to record and document the language over time, while synchronic dictionaries typically provide practical guidance on using the current language. In the past, this gave synchronic approaches a lower intellectual status and gave historical dictionaries higher status and cultural authority. In the twentieth century, synchronic lexicography was able to plug into academic discourse relating to modern theories of meaning, and to reflect more accurately how language is actually used and understood. These approaches have had a strong impact on modern synchronic dictionaries.
Diachronic dictionaries need to investigate and present a much wider set of information compared to synchronic dictionaries. Their organisation and focus is driven by the need to document the history of words and notable changes in meaning, form, grammar, and use.
Neither the diachronic nor the synchronic perspective exists in complete isolation from the other in the context of a dictionary. Register labels such as ‘dated’ or ‘archaic’ imply a historical perspective in a synchronic dictionary and a synchronic perspective in a historical dictionary. For one’s own language — as opposed to another language being learned — an appreciation of word origins and history seems to be deeply embedded, and there is a natural human tendency to regard older uses as somehow ‘better’ or more authentic. Learner’s and other pedagogical dictionaries need make little or no reference to history, but there may continue to be a public appetite for etymological background in general synchronic dictionaries.
While diachronic and synchronic approaches have taken different developmental paths, they could now be better aligned and signposted. Their goal of capturing types of language information is not as different as might be supposed. In the digital era, this raises some exciting possibilities, including a vision of data models and language repositories (rather than dictionaries) that include lexical information about both the current language and the past, and use linking to enable users, and dictionary editors, to draw on both.
Introduction
Adults taking medicines prescribed by a physician are acquainted with instructions such as ‘Take one tablet twice daily with meals’ and ‘Do not take if pregnant’, which appear on labels and package inserts. Besides ‘dos’ and ‘don’ts’, package inserts may provide descriptions of the chemical structure of a drug compound and other technical facts. In a realm quite different from prescription drugs, most adults do not ordinarily need instructions about how to use their language. We generally say what we want to say without paying much attention to the process – and we’re usually understood by those we address. Unlike prescription medications, language is a shared community resource, and it succeeds not by following instructions but by use with naturally acquired customary understandings. Still, English speakers sometimes solicit aid in understanding what others have expressed in written materials, media broadcasts, and social media, or in expressing themselves accurately when writing. In those cases, consulting a dictionary is a common remedy.
Focusing on monolingual general-purpose dictionaries of English, this chapter addresses aspects of dictionary-making that are important and sometimes controversial but little understood by many dictionary users, who refer to ‘the dictionary’ as though all were the same – and all equally authoritative. The focus here is on practices that differentiate between more-or-less descriptive and more-or-less prescriptive approaches to lexicography. The chapter asks how dictionaries have balanced their primary task of describing word usage with the kind of guidance about ‘correct’ and ‘incorrect’ that many dictionary users expect.
As every dictionary maker knows, living languages change continuously. Dr Johnson (1755, Preface) noted that ‘to enchain syllables, and to lash the wind, are equally the undertakings of pride, unwilling to measure its desires by its strength’. Noah Webster (1806) straightforwardly made the same point: ‘Every man of common reading knows that a living language must necessarily suffer gradual changes in its current words, in the significations of many words, and in pronunciation’. Every facet of English varies from place to place and situation to situation, and over time that variation leads to changes in pronunciation, grammar, vocabulary, meaning, and spelling. Of interest in this chapter, words and word meanings come and go: speakers invent words (sistren, fatberg) or adopt them from other languages, and some words, both old and new, drop out of use. Word meanings also die out and new meanings for existing words arise: nice, mad, and feminism no longer mean what they once meant. Words may take on additional meanings (think bug, virus, and bookmark in talk about computer software). A word may expand its range of application by changing from one part of speech to another – from being, say, only a noun to also being a verb – sometimes altering in form (from the noun enthusiasm came the verb enthuse) but more often not (keyboard and bookmark are nouns and verbs). What happens, then, when speakers meet a new word or a known word whose familiar meaning doesn’t fit the context just read or heard? In those circumstances, turning to ‘the dictionary’ for help seems second nature, and with electronic and online accessibility it’s easier than ever to ‘look it up’.
People who consult a dictionary may be interested in learning what a word means (cryptocurrency, hirsute, malarkey, probity) or how to pronounce it (banal, harass, quinoa, status) or spell it. Student writers especially may seek synonyms and antonyms. Of course, dictionary makers take an interest in which words are looked up, and online dictionary sites keep track of lookups and lookup spikes. Lexicographers can often determine what prompted a spike: mention of hoosegow during a sports broadcast watched by millions or emolument, Brexit, or hashtag prominent in political, social, or cultural discussions. In 2017, dictionary makers at Merriam-Webster – the Springfield, Massachusetts successor company to the original Noah Webster dictionaries – chose feminism as their word of the year because it spiked several times in discussions associated with the #metoo movement. Besides such occasional spikes, online sites witness perennial favourites in lookups for empathy, esoteric, insidious, integrity, paradigm, pragmatic, and ubiquitous, among others – doubtless because people want to be clear and confident in their understanding and use of these less common words (Sokolowski, 2014).
Earlier Dictionaries
Today’s dictionary entries tend to be rich sources of information about words, but the earliest English dictionary was a mere glossary – basically translations of ‘hard usual’ words into ‘plaine’ ones, as Robert Cawdrey put it in his Table Alphabeticall (1604). An entire entry in the Table – commonly regarded as the first English dictionary – could be limited to just a couple of words, as spare as ‘fact, deede’ or ‘potion, a drinke’ or ‘memorable, worthie to be remembred’. Headwords in these entries are not defined but merely associated with a synonym, as with fact and potion. Other complete entries from Cawdrey’s Table are similar:
| absurd, foolish, irksome | § mirrour, a looking-glass |
| foraine, strange, of another country | § moderne, of our time |
| § maniacque, mad: braine sick | § nice, slow, laysie |
| matrixe, wombe | obtestate, humble, to beseech, or to call to witnesse |
| microcosme, (g) a little world | pistated, baked |
Because by 1604 English had benefitted from a significant boost in word borrowing from other languages, there were lots of ‘hard usual’ words, and besides glossing them Cawdrey signalled which language they came from: § for words from French, (g) for those from Greek, and no marking for the more numerous Latin borrowings. Some words have fallen out of use since 1604, and when a dictionary like the Oxford English Dictionary includes them for the historical record, it labels them obsolete, as with Cawdrey’s obtestate and pistated. Others of his headwords have changed in meaning (matrixe and nice) or spelling (foraine and maniacque). Absurd, mirror, and modern are easy today but were ‘hard’ in 1604. Cawdrey compiled his dictionary as a teaching tool but offered no guidance about using the words, not even identifying their part of speech.
Besides Dr Johnson’s famous 1755 dictionary (see Chapter 11), other significant dictionaries were compiled in the two centuries after 1604, but for the purposes of this chapter we focus next on the first dictionary compiled by Noah Webster. The scope of his A Compendious Dictionary of the English Language (1806) reached beyond hard words to ordinary ones like book, hour, and swan, and he identified multiple meanings for some words. For example, for the noun meal he identified the distinct senses ‘grain ground to a powder’ and ‘the food taken at one eating’. Being a patriotic Connecticut Yankee in the young United States, he sought to produce a dictionary that would be independent of Johnson’s and others made in England and would recognise peculiarly American words and meanings such as Americanize and presidential. The entries in his first dictionary were, like Cawdrey’s, basically glosses, and although they don’t identify a word’s origin, they contain other information, as shown:
| Absurd′, a. contrary to reason, foolish, inconsistent | Ill, a. bad, sick, disordered, not in health, evil |
| Amer′icanize, v.t. to render American | Mániac, a. raving with madness; n. a person mad |
| Fémale, Fem′inine, a. belonging to the female, effeminate, kind, tender, soft, delicate, weak. | Nice, a. exact, refined, squeamish, finical, fine |
| For′eign, [forun] a. belonging to another country, distant, not connected with | Preſiden′tial, a. pertaining to a president |
Between Cawdrey’s day and Webster’s, dictionary-making had grown in sophistication, and dictionaries were more informative. Although not the first to do so, Webster’s Compendious Dictionary furnished part of speech (Amer′icanize, verb transitive; ill, adjective) and pronunciation (vowel quality: fémale and mániac; stress placement; absurd′ and presiden′tial; phonetic approximation: forun for foreign). While Webster sought to correct the work of earlier lexicographers and shine light on distinctively American words and meanings, he made only incidental note of disputed usage. The spelling accompt was ‘false orthography’ for account, and corps, meaning ‘a body of soldiers’, was ‘an ill word’, but such comments were very few. Later, in his mammoth two-volume An American Dictionary of the English Language (1828), besides faulting the etymologies of previous dictionaries, he aimed for some spelling reform (calice for chalice; sythe for scythe; tung for tongue) and judged certain pronunciations ‘incorrect’ (senna pronounced seena). But he generally reported what his collection of examples showed – for example, ‘We observe that between is not restricted to two’ (as some grammarians had claimed it should be).
Modern Dictionaries
Modern dictionaries may be available online and in digital formats and differ in other ways from earlier ones, but the basic challenges that faced Cawdrey, Johnson, Webster, and others in creating dictionary entries remain today. One challenge is how best to balance the commitment to describe words as speakers and writers use them with guidance about ‘correct’ use that some dictionary users expect when usage varies. Viewed sometimes as a tug of war between descriptive and prescriptive approaches to lexicography, one way or another dictionaries managed the balance until the mid-twentieth century, as we’ll see.
A given dictionary can address only select aspects of the language it represents. Medical dictionaries focus on words and meanings of importance in medicine; legal dictionaries, those of significance in law. Others limit themselves to slang or gambling terms or the language of the underworld. Fowler’s Dictionary of Modern English Usage (1926), now nearing a century in age, remains famous for its advice about what H. C. Fowler saw as the best ways to say and write things in early twentieth-century Britain. In the same tradition, Garner’s Modern American Usage (2009) and The Cambridge Guide to Australian English Usage (2007) offer advice about so-called disputed usages like hopefully as a sentence adverb, who versus whom, infer versus imply, and I versus me – language phenomena whose correctness people judge differently. Such usage dictionaries don’t normally have entries for absurd, foreign, maniac, mirror, or other ordinary words unless some aspect of their use varies in ways that prompt dispute. Precisely because usage dictionaries offer advice about how to speak and write and what expressions or meanings to avoid, they are considered prescriptive. In the words of the editor of the most recent edition of Fowler (Butterfield 2015, ix), they function as ‘a sort of linguistic emergency service’.
The dictionaries most people consult most of the time aren’t usage dictionaries. They are general-purpose dictionaries, and they contain many more headwords and a wider range of them than in specialised dictionaries, not just everyday words but less quotidian ones like esoteric and ubiquitous that are looked up year in and year out, as we saw above. The fundamental task of a general-purpose dictionary is to provide a description of the ‘general’ vocabulary that is as accurate as possible, limited only by the raw materials – usually written texts (books, newspapers, magazines, blogs) – that lexicographers rely on, materials that are richer and more abundant now than ever before, with the availability of gigantic online reservoirs of writing and transcribed speech. The principal object of description in a dictionary is what the raw materials show about which words writers and speakers use, how they use them, and what they use them to mean. This is a descriptive task – listing headwords in alphabetical order, with part of speech and meanings, as well as pronunciations and in some cases etymology, though the last two cannot usually be extracted from the data reservoirs of writing.
Prescriptive and Descriptive Approaches to Usage
If a word is sufficiently common and widespread to be part of the general vocabulary but warrants an indication of special status, general-purpose dictionaries add a label to the word or one of its meanings. ‘Some words … are appropriate only to certain situations or are found only in certain contexts, and where this is the case a label … is used’, reports the Concise Oxford English Dictionary (Stevenson and Waite 2011). Subject labels indicate that a word or meaning is associated with a particular field or activity (‘Catholicism’, ‘baseball’). Geographical labels specify a nation or region of use (‘US’, ‘chiefly Brit’, ‘NewEng’). Register or status labels may be temporal (‘archaic’) or stylistic, the latter being numerous, diverse, and sometimes controversial (‘slang’, ‘literary’, ‘colloquial’, ‘nonstandard’, ‘substandard’, ‘dialect’, ‘offensive’). Although labels are generally perceived as descriptive, a label such as ‘substandard’ also implicitly warns a writer not to employ the labelled word in a text intended to represent standard English. Even without a prescriptive intent, then, some labels inevitably exert a prescriptive influence. More generally, the centuries-old practice of labelling words and meanings sits on the edge between description and prescription – describing the status of a word or meaning and thereby implicitly guiding use.
In the debate between those who favour strictly descriptive roles for dictionaries and those who judge a dictionary’s responsibility to include explicit guidance about ‘correct’ usage, even the mere listing of certain words and meanings can be a bone of contention. Besides taboo words like those bleeped out in radio and television broadcasts or asterisked and otherwise disfigured in magazines and newspapers, the more general concern involves headwords like ain’t and irregardless, along with usages that might not be recognised by everyone as problematical, as with singular they or the part-of-speech-altering suffix -ize on, say, prioritize or finalize. In 1961, in the front matter to what has been dubbed ‘the most controversial dictionary ever published’ (Skinner 2012), the editor of Webster’s Third New International Dictionary (Gove 1961) wrote that ‘Accuracy … requires a dictionary to state meanings in which words are in fact used, not to give editorial opinion on what their meanings should be’ (Gove 1961, 4a), and, he added, ‘This new Merriam-Webster unabridged is the record of this language as it is written and spoken. It is offered with confidence that it will supply in full measure that information on the general language which is required for accurate, clear, and comprehensive understanding of the vocabulary of today’s society’. By any reasonable measure, Webster’s Third (W3) provided an accurate description of the general vocabulary as it was in fact used. It supplied ‘information’ and ‘the record’ but not ‘editorial opinion’. As we’ll see, not everyone applauded.
Discussion about how descriptive and how prescriptive a general-purpose dictionary should be goes back at least to Dr Johnson’s 1747 Plan of a Dictionary and his 1755 dictionary itself. But two centuries later, a balance struck one way or another between description and prescription in lexicographical practice was upset when, in 1961, many influential writers among the learned public in North America and the United Kingdom condemned what they perceived as laxness – even negligence – in the overtly descriptive approach of W3. For the harshest of W3’s critics, a dictionary’s job – its chief responsibility – was to uphold language standards and, as the primary linguistic gatekeeper, exclude words and meanings deemed incorrect or unsuitable for dictionary recognition – at the very least to label them ‘substandard’, ‘low’, ‘illiterate’, or the like. Gatekeeping also meant not relying on – and certainly not citing as examples – any language written or spoken by people who weren’t sufficiently esteemed for their use of English to merit quotation. Critics ignored or had forgotten Dr Johnson’s observation that ‘words must be sought where they are used’.
To take one notorious example, in describing what Merriam-Webster’s files showed about how Americans used ain’t, W3 reported that, when it means ‘am not’, ain’t is ‘disapproved by many and more common in less educated speech’ and that it is ‘used orally in most parts of the U.S. by many cultivated speakers esp. in the phrase ain’t I’. Then and now, that statement is a fair description, but newspapers and magazines at the time headlined their reviews in terms that made clear the descriptivism of the new word book had touched a nerve: ‘Sabotage in Springfield’, ‘The Death of Meaning’, ‘Madness in Their Method’, and ‘Say It Ain’t So!’ The dictionary had ‘abandoned a function indispensable in any advanced society, that of maintaining the quality of its language’, said The New Republic. The New York Times, in an editorial, wrote that W3 had ‘surrendered to the permissive school that has been busily extending its beachhead on English instruction in the schools … disastrous because … it serves to reinforce the notion that good English is whatever is popular’ (see Sledd and Ebbitt 1962). Sabotage, madness, and disastrous are stinging words: applying them to a dictionary reflected the fact that some critics judged W3 as failing to protect the language from change and supposed corruption.
Publishing houses took note of the negative reactions to W3’s strict descriptivism, and five years later when The Random House Dictionary of the English Language (Stein 1966) made its debut its editor acknowledged that a dictionary must be ‘fully descriptive’, but for him ‘fully descriptive’ covered more than how people used words – it also included how they viewed them:
the lexicographer must give the user an adequate indication of the attitudes of society toward particular words or expressions, whether he regards those attitudes as linguistically sound or not. The lexicographer who does not recognize the existence of long-established strictures in usage has not discharged his full responsibility. He has not been objective and factual; he has reported selectively, omitting references to a social attitude relevant to many words and expressions. He does not need to express approval or disapproval of a disputed usage, but he does need to report the milieu of words as well as their meanings.
One could interpret the editor’s statement as merely extending dictionary description beyond actual recorded usage to a wider range of phenomena. Given his allowing that there is no need ‘to express approval or disapproval’, Random House’s approach could be viewed as not prescriptive. But the very act of reporting ‘strictures’ in a reference work without explicitly contextualising, if not discounting, them could be interpreted by users as an endorsement. As for whether the lexicographer should report attitudes regarded ‘as linguistically sound or not’, professional linguists would certainly judge any reporting of linguistically unsound attitudes as irresponsible unless they were identified as unsound.
To report ‘the milieu of words’, Random House appended usage notes to entries the editorial team regarded as ‘disputable’ – notes ‘intended to reflect the opinions of educated users of English, particularly editors and teachers’ (Stein 1966, xxxi). The note at ain’t, for example, says the word ‘is so traditionally and widely regarded as a nonstandard form that it should be shunned by all who prefer to avoid being considered illiterate’. The note at enthuse acknowledges that the verb ‘is too widely encountered in the speech and writing of reputable teachers and authors to be listed as anything short of standard’, but it adds this advice: because it is ‘felt by many to be poor style … it would be best to paraphrase it’ in formal writing. An uneasy balance between description and prescription is clearly evident in usage notes like those.
Elsewhere, too, feathers were ruffled by W3’s vigorously descriptive approach to reporting usage, so ruffled that one publishing house attempted to suppress the dictionary. Spurred by reaction to W3’s ‘permissiveness’ and a conviction that the public needed guidance in using English, the president of American Heritage Publishing Co. sought to buy out the Merriam-Webster Company. The coup failed, and American Heritage set about compiling a dictionary of a different kind. When The American Heritage Dictionary of the English Language (AHD) (1969) appeared, its editor proclaimed it a dictionary that, besides recording usage, aimed to ‘add the essential dimension of guidance … toward grace and precision which intelligent people seek in a dictionary’. For the first time in dictionary history, American Heritage established a panel of distinguished writers and others and polled them about their judgements of particular usages (a practice continuing to the present day in American Heritage dictionaries). In the usage notes appended to hundreds of entries, the panel’s judgements are expressed as percentages of approval or disapproval. In various editions over the years, AHD’s usage notes rely on terms such as ‘correct’ and ‘proper’, ‘stigma’ and ‘stigmatization’, ‘distaste’, ‘beyond rehabilitation’, ‘ignorance’ and ‘vulgarism’, ‘should never’, and ‘condemned’ to prescribe and proscribe common usages. Such characterisations, it should be noted, apply to words and meanings in sufficiently widespread use to have drawn the attention of AHD’s lexicographers; they’re frequent, not rare, occurrences. Inevitably, such negative characterisations cast the usages themselves into disrepute, and, more significantly, they also risk doing the same to those who speak or write them. In a reference work widely regarded as an essential vehicle for promoting equality, mentions of ‘upper class’ and ‘lower classes’ of people in AHD’s usage notes, along with an editor’s reference to what ‘intelligent people’ seek, suggest elitism more than equality.
While usage notes and panel votes of the kind introduced by AHD would have been anathema to editors of W3, Merriam-Webster (W3’s publisher) eventually did introduce ‘usage paragraphs’ to its dictionaries in 1983, likely reflecting recognition that usage notes had boosted AHD’s appeal and that, to compete in a market where dictionary users do seek guidance, Merriam-Webster’s word books needed to provide them. The company did not rely on panel polling, however. As the editor of Webster’s Ninth New Collegiate Dictionary (Mish 1983, 6) wrote: ‘The guidance offered [in the usage paragraphs] is never based merely on received opinion, though opinions are often noted, but typically on both a review of the historical background and a careful evaluation of what citations reveal about actual contemporary practice’. Differing in tone from AHD’s notes, Merriam-Webster’s usage paragraphs are characterised by terms such as ‘appropriate’, ‘flourishing’, ‘evidence’, ‘observation’, ‘objected to’, ‘apparently’, and ‘less educated’, not the more absolutist terms characteristic especially of early AHD notes.
The theoretical difference between descriptivism and prescriptivism can be seen encapsulated in the contrasting words of W3’s editor as noted above (‘information … required for accurate, clear, and comprehensive understanding’) and AHD1’s editor as noted above (‘guidance … toward grace and precision’). In practice the difference can be illustrated by comparing the treatment of unique in three twentieth-century dictionaries. Below are listed all the senses found in the roughly contemporary Random House Dictionary of the English Language, College Edition (RHD/C), American Heritage Dictionary (AHD1), and Webster’s Seventh New Collegiate Dictionary (NCD7).
RHD/C:
AHD1:
1. being the only one of its kind: solitary; sole
2. being without an equal or equivalent: unparalleled
NCD7:
1. sole
2. unequaled
3. very rare or uncommon: very unusual
RHD/C and AHD provide only two senses for unique – roughly ‘sole’ and ‘unequaled’. NCD7 provides those two plus the sense ‘rare or uncommon’. Only in the usage note appended to AHD1’s entry does it identify a third sense that would allow unique to be qualified in degree or intensity. The very fact of appending a note indicates that in its citation files AHD had identified unique in a sense that allowed qualification frequently enough to list the third sense and to poll the usage panel’s views about it if it chose. Given the panel’s strong disapproval, however, AHD1 omitted that sense from its list. Below is the relevant part of the usage note.
Unique, in careful usage, is not preceded by adverbs that qualify it with respect to degree. Examples such as rather unique, with reference to a book, and the most unique, referring to the most unusual or a rare species of animals, are termed unacceptable by 94 per cent of the Usage Panel, on the ground that the quality described by unique cannot be said to vary in degree or intensity and is therefore not capable of comparison. The same objection is raised about examples in which unique is preceded by more, somewhat, and very.
Only in its third edition (1993) a quarter of a century later did American Heritage list a third sense of unique: ‘Informal. Unusual; extraordinary’, and the usage note for unique in AHD3 is nearly quadruple the length of the notes for unique in AHD1 and AHD2. The fourth edition (2000) echoes the substance of the third, and AHD5 (Pickett 2011) (currently online) lists three meanings: ‘Being the only one of its kind’; ‘Characteristic only of a particular category or entity’; and ‘Remarkable; extraordinary’. The initial part of the accompanying note in AHD5 says:
Unique may be the foremost example of an absolute term – a term that, in the eyes of traditional grammarians, should not allow comparison or modification by an adverb of degree like very, somewhat, or quite. Thus, most grammarians believe that it is incorrect to say that something is very unique or more unique than something else, though phrases such as nearly unique and almost unique are presumably acceptable, since in these cases unique is not modified by an adverb of degree. A substantial majority of the Usage Panel supports the traditional view. In our 2004 survey, 66 percent of the Panelists disapproved of the sentence Her designs are quite unique in today’s fashion, although in our 1988 survey, 80 percent rejected this same sentence, suggesting that resistance to this usage may be waning.
Claims about what ‘most grammarians believe’ are deliberately loose, of course – after all, who counts as a grammarian and how would grammarians’ beliefs be counted? For the purposes of this chapter, what matters is the note’s report about very unique being ‘incorrect’ in the judgement of ‘most grammarians’; it matters because that judgement sets grammarians (as AHD uses the term) apart from the facts about actual usage and consequently apart from the fundamental descriptive task of lexicography; it also sets the prescriptive view in direct opposition to the fundamental conviction of linguists that the relationship between sound and meaning is essentially arbitrary and that word meanings naturally change over time. The judgement raises questions about the basis of the beliefs and about why AHD reports them on an equal footing with the contradictory evidence of what their own research would have shown. What does it mean for a dictionary to report that traditional grammarians say unique ‘should not allow … modification by an adverb of degree like very’?
To balance what could be regarded as the implicit warning by ‘traditional grammarians’ and, implicitly, by AHD itself in the first part of its usage note, the second part effectively undercuts the warning and highlights actual usage.
In fact, the nontraditional modification of unique may be found in the work of many reputable writers and has certainly been put to effective use: “I am in the rather unique position of being the son, the grandson, and the great-grandson of preachers”
On the one hand, then, traditional grammarians and a substantial majority of AHD’s usage panel disapprove the sense ‘unusual’; on the other hand, ‘many reputable writers’ use it – and do so effectively.
Comparing that most recent note in AHD5 with Merriam-Webster’s 2018 online note (www.merriam-webster.com/dictionary/) highlights the continuing challenge of balancing description and prescription in general-purpose dictionaries. Online, Merriam-Webster asks whether something can be very or somewhat unique and answers this way:
Many commentators have objected to the comparison or modification (as by somewhat or very) of unique, often asserting that a thing is either unique or it is not. Objections are based chiefly on the assumption that unique has but a single absolute sense, an assumption contradicted by information readily available in a dictionary. Unique dates back to the 17th century but was little used until the end of the 18th when, according to the Oxford English Dictionary, it was reacquired from French. H. J. Todd entered it as a foreign word in his edition (1818) of Johnson’s Dictionary, characterizing it as ‘affected and useless’. Around the middle of the 19th century it ceased to be considered foreign and came into considerable popular use. With popular use came a broadening of application beyond the original two meanings [‘sole’ and ‘unequaled’]. In modern use both comparison and modification are widespread and standard but are confined to the extended senses [‘distinctively characteristic’ and ‘unusual’]. When sense 1 [‘sole’] or sense 2a [‘unequaled’] is intended, unique is used without qualifying modifiers.
In reporting the history of unique and its senses, as well as its treatment in a two-hundred-year-old dictionary, Merriam-Webster’s note carries strikingly different information from that found in AHD. As interesting as such detail may be to some readers, one must wonder whether it is what typical users seek to learn about unique in a general-purpose dictionary: wouldn’t that kind of information be more appropriately sought in a usage dictionary such as Webster’s Dictionary of English Usage (1989)?
As we have seen, AHD’s usage notes and those of Merriam-Webster differ significantly, AHD’s highlighting attitudes and judgements about usage, Merriam-Webster’s highlighting a word’s senses and in some cases the history of attitudes. Neither may be what an ordinary dictionary user is seeking – and neither is ideal. A different model is illustrated in the Concise Oxford English Dictionary (Stevenson and Waite 2011), whose usage note at unique economically blends description with guidance:
Strictly speaking, since the core meaning of unique is ‘being the only one of its kind’, it is logically impossible to modify it by adverbs such as really or quite. However, unique has a less precise sense in addition to its main meaning: ‘special or unusual’ (a really unique opportunity). Here, unique does not relate to an absolute concept, and so the use of really and similar adverbs is acceptable.
Is that not the kind of information that most users are seeking when they consult a dictionary about the word unique?
Conclusion
Examining the struggle to balance descriptivism and prescriptivism in lexicography shows a persistent desire among influential people to slow or halt changes to English right alongside recognition that the language is changeable by nature. Speakers and writers often seem attached to words and meanings learned earlier in life and reluctant to accept certain innovations, especially when newer (or newer-seeming) forms or meanings violate a familiar rule or prohibition. Judging by criticism of W3 and the fact that major publishers, including Merriam-Webster, have for decades appended usage notes to dictionary entries, a fair inference is that users seek guidance about ‘correctness’ in ‘the dictionary’ and that lexicographers are providing it, alongside reports of actual usage. Description is essential to sound lexicography. Increasingly in recent decades dictionaries treat description of attitudes as part of their mandate. Cawdrey and Webster aimed to teach, and today’s dictionaries aim to report and guide. Just how they balance description and prescription in doing that differs from dictionary to dictionary.
The frontispiece facing the title page of James Howell’s Lexicon Tetraglotton, An English–French–Italian–Spanish Dictionary (London 1660) depicts four female figures and a soldier who guards their meeting, set against a woody background (Figure 6.1). A double caption – in Latin on top, ‘Associatio Linguarum’, and in French below, ‘La Ligue des Langues’ – suggests that the engraving should be interpreted symbolically: the ladies represent the countries where the languages in the Lexicon are spoken, or the languages themselves. The figurative import is highlighted by five letter labels (possibly added by the printer) over the figures’ heads: from left to right, the four ladies have S for Spanish, F for French, I for Italian, and E for English; the helmeted soldier, instead, has Br for British or Welsh, though he may originally have been meant to represent Teutonic, from which English derives. The engraver, William Faithorne the elder, significantly varied the ladies’ postures and attitudes: the French lady seems to be stepping forward in order to embrace or kiss the English lady, and has already taken her by the hand; the Italian lady is sympathetically near behind them, while the unsmiling Spanish lady looks cold and standoffish. This hints at the socio-political climate of the mid-seventeenth century – England and France had fought against Spain during the Thirty Years’ War, concluded not long before by the Peace of Westphalia – and also at the relationships among the languages, with Italian giving way to French as the vernacular that had most influenced English since after the Renaissance, while Spanish had always had a limited linguistic impact.
Figure 6.1 James Howell’s Lexicon Tetraglotton (1660), frontispiece
Faithorne’s engraving is the visual counterpart of other interesting paratextual material in the Lexicon Tetraglotton. In a prefatory poem, Touching the Association of the English-Toung, with the French, Italian, and Spanish, James Howell invites the ‘sisters three’, the Latin-derived Romance languages, to accept English in their ‘society’, so that English ‘Consonants and thougher strains / Will bring more Arteries ’mong your soft veins’ – an anatomical metaphor that may have been suggested by the recent, 1653 translation into English of William Harvey’s book on the circulation of blood, and arguably a harbinger of the future cross-linguistic lexical influence of the English language all over Europe. The poem is followed, among other prefatory material, by a six-page address To the tru Philologer, in which Europe is said to have ‘eleven Originall, Independent, and Mother-Toungs’, among them ‘Teutonic or High-Dutch’, from which English derived, and Latin, the source of Italian, Spanish, and French. In time, Howell wrote, English came to its perfection by adopting words from other languages, especially French, which has now ‘arrivd to a great pitch of perfection, purity and sweetness’. Italian ‘is held the smoothest, the civillst, and charmingst vulgar Toung of Europe’, also because it is ‘the topbranch or eldest daughter of the Latin’. Spanish ‘may be sayed to be nothing els but Latin inlayed with Morisco (and som few old Gothic words)’. Given this historical and sociolinguistic context, Howell claims that his Lexicon Tetraglotton will contribute to the public good and the honour of Great Britain. English will also benefit from the publication of the dictionary, since it is made to compare with ‘the civillst languages of Christendom, and as it were incorporated with them’; it will also ‘expand, and spread further abroad by mixing with these spacious languages’ and remove all those aspersions that are cast on it. In sum, James Howell argues that English, by the mid-seventeenth century, has come to be on a par with the most important European languages, and that the Lexicon Tetraglotton, by linking English with them, provides good evidence for this claim.
Howell’s opinions as they are symbolically and verbally expressed in the paratext of his polyglot Lexicon exemplify what was (or at least might be) behind the compilation of a dictionary in those days. Thus, they may aptly introduce a brief description and assessment of the evolving role of English in early modern Europe as this is reflected in dictionary-making from the late Middle Ages to the end of the eighteenth century – that is to say, from the time when the scholarly tradition of medieval Latin Christendom still held sway in Europe through the period Samuel Johnson once defined as an age of dictionaries for its abundance and diversity of lexicographical products, be they monolingual, bilingual or polyglot, general or specialised, lexical or encyclopaedic.
In very broad terms, the early modern period in Europe witnessed, on the one side, the development of the national vernaculars, favoured by the rising literate middle class, at the expense of Latin – both the medieval Latin of the Church and the neoclassical Latin of the humanists – on the other side, the development of local varieties and dialects. Yet, the historical and sociolinguistic background of Europe was much more nuanced than this. For instance, Latin continued being used by lawyers, diplomats, and even travellers abroad if they did not master the local language, and at the well-known annual book fair in Frankfurt, most texts for sale were in Latin until the mid-seventeenth century. As to the vernacular languages, not only were they used for a wider and wider range of communicative purposes, but there was also a steady increase in language contact and mixing, because of migrations (for example, the French Huguenots leaving France for England or other Reformed countries), travel, trade, and warfare, all the more so as the boundaries between the states, and accordingly between the speech communities and their languages, were less clear-cut than at later times. Moreover, the introduction and development of the printing press since the mid-fifteenth century also played a key role in the growing use of the vernaculars: it contributed to the standardisation of national languages, it favoured the study of modern foreign languages through the publication of countless dictionaries and grammars, and it stimulated massive translation activity from one vernacular into another. In more than one way, then, the decline of international communication in Latin was counterbalanced by the growth of language contact and exchange among European speech communities. And, of course, it is in language contact situations that dictionaries get compiled.
A final, introductory caveat: in early lexicography, it was common practice – at best judicious, at worst flagrant – to copy from other dictionaries, which was done most often without acknowledging one’s sources, or mentioning some minor ones in order to disguise the most influential. D. T. Starnes and G. E. Noyes, the authors of a seminal study on the early history of English dictionary-making, wrote that English lexicography up to Samuel Johnson progressed by plagiarism, that the best lexicographers were simply the most discriminating plagiarists, and that a good dictionary was its own justification, whatever the method of compilation. Compilation, rather than plagiarism, is arguably the keyword here: a lexicographer puts together different pieces of information, and legitimately so, wherever they are found. Lexicography develops by accretion rather than progression; continuity is more important than innovation. However unappealing this may sound to modern ears, lexicography was (and still is) largely traditional. It may be conceded that originality and intellectual property become more and more relevant as the eighteenth century progressed; but lexicographers continued to be inspired by their colleagues’ work, all the more so as imitation did not simply mean using glosses from an earlier compilation but also, for instance, making a polyglot dictionary out of a bilingual one, or using the wordlist of a bilingual dictionary to compile a monolingual one. Moreover, this kind of giving and taking usually went beyond national boundaries, so that a real network of dictionaries and dictionary-making was created in Europe.
Dictionaries in Early Modern Europe
Dictionaries were no novelty in early modern Europe, nor in England. English lexicography is generally said to have begun around the early eighth century: hard words in Latin manuscripts were addended with interlinear or marginal glosses, that is to say, translation equivalents in Old English (much as we do when we read a text in a foreign language nowadays); these glosses were sometimes collected in a wordlist, or glossary, either alphabetically or topically (that is, arranged according to semantic fields). A glossary is no proper dictionary, at least in the modern sense of the word, as a dictionary provides a lot more information than simply lexical equivalents; still, once glosses are no longer directly linked to a given text, once they are decontextualised, they qualify as proper lexicographical items, since decontextualisation is one of the main features of dictionaries.
The four oldest surviving Anglo-Saxon glossaries – nowadays named for the libraries where they are held – are the Epinal, the Erfurt, the Leiden, and the Corpus (Christi College, Cambridge) glossaries. It is worth noting that Continental Latin-Latin glossaries were copied and augmented in Anglo-Saxon England, and that Anglo-Saxon glossaries were copied on the Continent – for example, the Leiden glossary in St Gall and the Erfurt glossary in Cologne: even in the early Middle Ages, learning and scholarship were international, at least as far as Western Europe was concerned. In post-Conquest medieval England, glossaries were still compiled as handy tools for learning Latin. Four large glossaries from the late Middle English period represent the culmination of medieval dictionary-making in England: the so-called Medulla Grammaticae and the Ortus Vocabulorum are Latin–English glossaries, whereas the Promptorium Parvulorum and the Catholic Anglicum are English–Latin glossaries. The earliest of them seems to have been the Medulla, which lists about 17,000 entries and probably had as its main source the Catholicon of Johannes Balbus (Giovanni Balbi), a thirteenth-century Italian priest – further evidence of the supranational connections of medieval Christianity, and of the European foundations of most dictionary-making. Interestingly, the Promptorium was printed by Richard Pynson in 1499, and the Ortus, of which no manuscripts are extant, by Wynkyn the Worde in 1500; both glossaries were frequently reprinted for about three decades, which proves that the distinction between the Middle Ages and the Renaissance, or between the age of manuscripts and the age of printing, was not so clear-cut as many suppose.
The simple fact that none of these four compilations had glossary, vocabulary, lexicon, or dictionary in its title shows that the very concept of ‘dictionary’ as a specific text-type was still missing or, at least, was being formulated in medieval England: the first ever English reference work to use the word dictionary in its title was Sir Thomas Elyot’s Latin–English compilation of 1538, most probably prompted by Elyot’s main source, Ambrogio Calepino’s Dictionarium, first published in 1502. As for an English monolingual lexicography, one has to wait until Henry Cockeram’s The English Dictionarie of 1623. As a matter of fact, though, the sixteenth century saw significant developments – both quantitative and qualitative – in dictionary-making all over the Continent and, with some delay, in England too. The abovementioned rise of the national vernaculars and the increase in contacts among people for economic, political, cultural, and social reasons could but result in the production of dictionaries, initially bilingual and polyglot, and later monolingual. By yoking together two languages in a bilingual dictionary, or by establishing correspondences among a number of languages in a polyglot one, lexicographers showed that, although Latin was slowly in retreat, Europeans shared a cultural heritage because their vernaculars were different but equivalences might usefully be established among them, so that communication was still possible. At a later stage, as the national states were consolidating their power, the compilation of monolingual dictionaries responded to a different stimulus, to an intellectual and socio-political climate of linguistic improvement and standardisation: marking boundaries and claiming national identity had also become important.
This is particularly true of England and of English lexicography. English did not lead the way in the European development of early modern dictionary-making: indeed, what follows in this chapter will show that English lexis took time to be listed in polyglot dictionaries, and one has to wait until the Restoration of 1660 to find Howell claiming that English could rival the more prestigious Romance languages of Europe. Moreover, despite Queen Elizabeth transforming England into a world power, she did not live to see the first English monolingual dictionary, Robert Cawdrey’s A Table Alphabeticall of 1604; one century and a half later, Samuel Johnson’s Dictionary was a masterpiece admired (and soon to be influential) all over Europe, but Italy, France, and Spain had published their academy dictionaries long before that.
Polyglot Dictionaries
Merchants, travellers, and soldiers were the social groups that most needed polyglot dictionaries: they moved across Europe, they lacked an advanced education, and their communicative needs might be rather basic, so that multilingual lists were sufficient. Commercial and other contacts (for whatever reason) among peoples in the late sixteenth and seventeenth centuries explain why, where, and how polyglot dictionaries were compiled and published: their sharp increase in number corresponds to the expansion of trade all over Europe; this tendency is also shown by the dictionaries’ places of publication, very often trading centres and ports like Venice, Amsterdam, and Antwerp; further, since polyglot dictionaries usually started as bilingual ones to which one or more further languages were added, the number and variety of the languages involved show that they were most often meant to serve practical, specific, and clear purposes.
The European tradition of polyglot dictionaries began in Italy in 1477, when the German-born Adam von Rottweil, a former collaborator of Johann Gutenberg, published in Venice Introito e Porta, an Italian–German topically arranged wordlist. After being repeatedly printed with such titles as Solennissimo Vocabulista or Libro Utilissimo, the Introito became the basis for a number of polyglot dictionaries: a Vocabolarius Quatuor Linguarum (1510), with Latin, Italian, French, and German wordlists; three years later, Italian and French were dropped and replaced by Czech in order to produce a Dictionarius Trium Linguarum for an Eastern European market (but note the mediating role of Latin!); the following year another trilingual edition included Latin, French, and German. The four-language edition of 1510 added a Spanish wordlist in 1526, while another five-language dictionary, published in Antwerp in 1534, replaced German with Dutch, which was inserted in a pre-eminent position immediately after Latin as the first of the vernaculars. This latter edition became the basis of the Sex Linguarum expansion of the Introito e Porta, in which English first made its appearance in polyglot lexicography: this dictionary was published in ‘Southwarke’, London, in 1537, with English added as the last language. In turn, this London edition became the basis of the Septem Linguarum (undated, but published c. 1540 in Antwerp), in which German was reintroduced. The first compilation with eight languages was published in Paris, and accordingly, had a French title, Le Dictionnaire des Huict Langaiges (1546): the newcomer was Greek, preposed to Latin. All in all, by the mid-1630s, eighty-nine editions of the polyglot dictionaries derived from the Introito e Porta had been published in a number of European countries.
Noël de Berlaimont’s Vocabulare (1530), a Dutch–French wordlist that is preserved only in its second edition of 1536, had a similar gradual expansion into a polyglot dictionary, of which more than ninety editions got published. Let it suffice to say that, here again, the English wordlist was included in this dictionary fairly late, in the French Dictionnaire en six langues (1576), which included Dutch, English, German, French, Spanish, and Italian; notice that the three Germanic languages precede the Romance ones here. The polyglot dictionaries, which derived from both Rottweil’s Introito e Porta and Berlaimont’s Vocabulare, had a practical orientation, as is shown by the compilers’ choice of the lexical items to be listed and by the very layout of the page, with each language given its own column for quick and easy consultation.
Other polyglot dictionaries in early modern Europe had different sources and a more learned scope and aim. One is the very famous Calepinus, that is to say, the Latin Dictionarium of Ambrogio Calepino, an Italian Augustinian friar and lexicographer, first published in Reggio, a town in Northern Italy, in 1502, and very often reprinted. Calepino’s Dictionarium did not only become the basis of Elyot’s Latin–English dictionary, as mentioned above, but was also the starting point of a number of polyglot dictionaries, gradually expanded up to an eleven-language edition published in Basel in 1580; this included Latin, Greek, German, Dutch, French, Italian, Spanish, and Hebrew wordlists from an earlier edition, into which three newly listed languages – Polish, Hungarian, and English – were incorporated. The English language had to wait five more years to be added to the Nomemclator omnium rerum, a polyglot dictionary with a topical arrangement originally composed as such and first published in 1567 by Hadrianus Junius, or Adriaen de Jonghe, a Dutch physician and classical scholar. Neither Calepinus’s nor Junius’s compilation have a primarily practical purpose: their wordlists also include learned words, and the different languages are not given their own column; instead, each headword is followed by its equivalents in the various languages, so as to make something like a philological note, and above all give testimony to the lexical richness of the languages of Europe.
Although most polyglot dictionaries originated on the Continent, with the English language added at a later time and playing a secondary role in them, an outstanding English contribution to European polyglot lexicography can be mentioned: this is John Minsheu’s Ductor in Linguas: The Guide into Tongues, published in London in 1617, the first polyglot dictionary with English headwords. In The Second Epistle to the Reader, Minsheu explains that he placed English before the other tongues:
for the vse chiefly of our owne Nation, or others that vnderstand the English Tongue, to find out any Word by order of Alphabet they call or looke for, and so by that to have a fit French, Italian, Spanish word, to speake or write, (in which Calepine is very faultie) besides to haue the Etymologies of them as of all the rest, (the better euer to hold them in their memorie) which none other yet euer hath performed.
In other words, Minsheu proudly affirms to have compiled a productive polyglot dictionary for his countrymen and a receptive one for speakers of a large number of other languages, as The Guide into Tongues includes eleven of them: in fact, English headwords are followed by Welsh, Dutch, German, French, Italian, Spanish, Portuguese, Latin, Greek, and Hebrew equivalents, to make a huge folio of over 500 pages. The entries include citations, cognates, and etymologies, which aimed to help dictionary users learn multiple languages. Minsheu’s sources comprise a variety of Continental dictionaries: Calepinus is mentioned here, and it should also be remembered that Minsheu had compiled a Spanish–English bilingual dictionary, first printed in 1599.
The idea that comparing words and their etymologies in a polyglot dictionary might make language-learning easier and faster is also found in the prefatory matter to James Howell’s Lexicon Tetraglotton, the second large polyglot dictionary to be printed in seventeenth-century England. This is, however, a less learned book than Minsheu’s, a compilation including both an alphabetical and a topical dictionary in four languages, and a collection of proverbs. Howell was not new to lexicography: in 1650 he had revised the second edition of Randle Cotgrave’s French–English Dictionary (1632) to which Robert Sherwood had added an English–French section, and in his address To the tru Philologer preceding the Lexicon proper, Howell explains that his work includes ‘very many recent words in all the fower languages which were never inserted’ in the earlier bilingual dictionaries of John Florio, Randle Cotgrave, and John Minsheu. He fails to mention, however, that he used Sherwood’s English wordlist to compile the Lexicon Tetraglotton.
It is usually said that polyglot dictionaries went out of fashion, in Britain as elsewhere in Europe, before the end of the seventeenth century. This is true if one thinks of the more learned kinds of compilations such as Minsheu’s or the ones derived from Calepino’s Dictionarium. However, polyglot dictionaries meant to satisfy practical communicative needs kept on being compiled and printed, as shown by the following examples from the last decade of the eighteenth century. First, The Soldier’s Pocket-Dictionary, or Friend in Need: Being a Vocabulary of Many Thousand Words, Terms, and Questions, In General Use, and most likely to occur in Military Service, Expressed in Six Languages, viz. English, German, Dutch, French, Italian, and Spanish. To Which Are Annexed, Accurate Tables of the Coins of Various European Nations was compiled by one Capt. James Willson, of the Marines, and printed in London in 1794; it was meant for the British expeditionary force operating on the Continent with the Dutch and Austrian armies as part of the anti-French coalition. Intended for a completely different readership was the Universal European Dictionary of Merchandise, compiled by the German-born Philipp Andreas Nemnich and published in London in 1799; it had entries in English, German, Dutch, Danish, Swedish, French, Italian, Spanish, Portuguese, Russian, Polish, and Latin. Other compilations by this prolific writer included such works as Catholicon oder encyclopädisches Wörterbuch aller europäischen Sprachen (1791), Allgemeines Polyglotten-Lexicon der Natur-Geschichte (1793–8), and Lexicon nosologicum polyglotton omnium morborum (1801), which show Nemnich’s interest in joining the century-old tradition of polyglot lexicography with eighteenth-century encyclopaedism.
Bilingual and Trilingual Dictionaries
Quite unsurprisingly, the first early modern bilingual dictionaries published in England involved English and Latin. These were not derived from the extant Middle English glossaries, since sixteenth-century humanist lexicographers preferred turning to the Continent for their sources and models. As has already been mentioned, Elyot’s dictionary of 1538 was mainly based on Calepinus, and when a new edition came out in 1542 as the Bibliotheca Eliotae. Eliotis Librarie, another Continental lexicographer, the French Robert Estienne (or Robertus Stephanus) provided new material through his Latin Dictionarium, seu Latinae Linguae Thesaurus (1531) and Dictionarium Latino-Gallicum (1538). A later edition of this latter dictionary, together with the Swiss Johannes Fries’s (or Frisius) Dictionarium Latino-Germanicum (itself based on Estienne’s Thesaurus), became the main sources of the biggest early modern Latin–English dictionary, this being Thomas Cooper’s Thesaurus Linguae Romanae et Britannicae of 1565, running to 1,812 folio pages. Smaller in bulk was the last English–Latin dictionary compiled in sixteenth-century England, John Rider’s Bibliotheca Scholastica, first published in 1589, but frequently reprinted as Riders Dictionarie. This was not meant for a learned readership but for a wider one; more importantly, editions of Riders had an impact on seventeenth-century English monolingual dictionary-making because lexicographers extracted a general English wordlist from it.
Latin, however, was the language of Papist Rome, and the Reformation of the 1530s as well as the increasing importance of Tudor Britain favoured the publication of bilingual and trilingual dictionaries where English was opposed to other living European languages, especially French, Italian, and Spanish, rather than Latin. A political motive was behind the compilation of the very first of them, John Palsgrave’s Lesclarcissement de la langue francoyse (1530): King Henry VIII’s sister was to marry the King of France, Louis XII, and Henry chose Palsgrave to teach her the language of her future husband and subjects; in time, Palsgrave wrote Lesclarcissement, which includes the first grammar of the French language and a French–English dictionary. The leading role of the Italian language and literature in the European Renaissance, moreover, brought the Welsh William Thomas to compile and publish the Principal Rvles of the Italian Grammar, with a Dictionarie for the Better Vnderstanding of Boccace, Petrarcha, and Dante (1550), which was based on two Italian sources. Not until after the defeat of the Armada in 1588 did Spanish become part of English bilingual lexicograghy: The Spanish Grammer: With certeine Rules teaching both the Spanish and French tongues, published by John Thorius in 1590, had a Spanish–English glossary appended to it; the following year, Richard Percyvall’s Bibliotheca Hispanica came out, including, like Palgrave’s and Thomas’s books, a grammar as well as a dictionary. This proves that these early bilingual dictionaries were meant to help English people study the most important European languages rather than celebrating the beauty of English or at least its linguistic adequacy.
More and more bilingual and trilingual dictionaries were published in the early modern period, with the usual giving and taking of models and lexicographical material, both within the national lexicographical tradition and beyond it. A few representative examples will suffice. To Robert Estienne’s Dictionariolum Puerorum Latino-Gallicum (1542) was added English translation equivalents by John Veron who, ten years later, published his Dictionariolum Puerorum, Tribus Linguis, Latina, Anglica et Gallica Conscriptum. The well-known Italian–English dictionaries published by John Florio relied on both English and Continental material: A Worlde of Words (1598) has a list of seventy-two texts and authors, most of them Italian literary works, but also bilingual dictionaries, although Florio does not mention the dictionary he borrowed most words and definitions from, that is to say, the enlarged 1592 edition of Thomas Thomas’s Dictionarium Linguae Latinae et Anglicanae. The seventy-two cited works became 252 in Florio’s second Italian–English dictionary, Queen Anna’s New World of Words (1611). This was later (1659) revised by Giovanni Torriano: new entries in the Italian–English wordlist were taken from the 1623 edition of the Italian Vocabolario della Crusca and from Cotgrave’s French–English dictionary, but Torriano’s real innovation was the English–Italian section, though derived from a single source: the English–French dictionary that Robert Sherwood had included in the second edition (1632) of Cotgrave’s work. A final example, out of many that could be mentioned, concerns Captain John Stevens’s Dictionary English and Spanish (1705): Stevens used Minsheu’s bilingual dictionary of Spanish and English as the basis for his compilation; the additions in the wordlist were taken from Torriano and from Abel Boyer’s French and English Royal Dictionary (1699); as for the equivalents, Stevens relied on Minsheu’s polyglot Guide into Tongues.
Monolingual Dictionaries
It has been calculated that about 170 bilingual, trilingual, and polyglot dictionaries and wordlists preceded the publication of the first monolingual English dictionary, Cawdrey’s Table Alphabeticall, in 1604. Cawdrey’s work inaugurated the so-called tradition of hard-word lexicography, as the earliest monolingual dictionaries of English only covered, as Cawdrey’s title page states, ‘hard vsuall English wordes, borrowed from the Hebrew, Greek, Latine, or French. &c’. Indeed, they were meant to teach ‘Ladies, Gentlewomen, or any other vnskilfull persons’ (that is to say, people who could read but had not studied Latin and Greek) the spelling and meaning of learned and foreign borrowings lately added to the English word-stock. These dictionaries, therefore, derived their wordlists from spelling books, from glossaries of technical terms appended to specialised texts, and above all from earlier Latin–English dictionaries, thus establishing a link, albeit indirect, with the Continental tradition of Latin lexicography.
Another hard-word lexicographer, the barrister Thomas Blount, was ready to write as follows in the preface To the Reader of his Glossographia (1656):
I profess to have done little with my own Pencil; but have extracted the quintessence of Scapula, Minsheu, Cotgrave, Rider, Florio, Thomasius, Dasipodius, and Hexams Dutch, Dr. Davies Welsh Dictionary, Cowels Interpreter, &c. and other able Authors, for so much as tended to my purpose.
Some of Blount’s real sources (John Bullokar’s An English Expositor, another hard-word dictionary, and John Rastell’s Les Termes de la Ley) are not mentioned. More revealing, however, is the variety of the dictionaries referred to here: Joannes Scapula was a German lexicographer who published a very successful abridgement of Henri Estienne’s Thesaurus Grecae Linguae; Minsheu, Cotgrave, Rider, and Florio need no explanation; Thomasius is the Latinised name for Thomas Thomas; Dasipodius was the Strasburg schoolmaster and lexicographer Peter Hasenfuss (or Petrus Dasypodius) who wrote a Latin–German dictionary; Henry Hexham was an English soldier, translator, and lexicographer who compiled the first English–Dutch dictionary in 1647; Dr John Davies published a Welsh grammar in Latin, and a bilingual dictionary of Welsh and Latin in 1632; finally, The Interpreter was a very influential law dictionary published by the jurist John Cowell in 1607. In short, Blount’s seemingly self-effacing statement may instead be read as his attempt to fit in with the Renaissance tradition of European lexicography.
Hard-word lexicography came to an end in the early eighteenth century when the first general (or universal) English dictionaries were published, because the likes of John Kersey and Nathan Bailey thought that the ordinary words of English should be included in monolingual dictionaries. What they needed were substantial wordlists running through the alphabet from which they could easily lift the everyday words that were not to be found in seventeenth-century hard-word dictionaries. They found them in English–Latin and English–French dictionaries, the Latin and French definitions being translated into English or replaced by a new one. As a result, here again, the production of English monolingual dictionaries came to be rooted in the European tradition of dictionary-making.
Samuel Johnson’s Dictionary: Redressing the Balance
Up to the early eighteenth century, English dictionary-making played a marginal role in Europe; the taking – of lexicographical models and material – was much more relevant than the giving. Then, thanks to a number of different lexicographers, first and foremost Samuel Johnson, the balance was redressed.
Research, mainly Robert DeMaria’s, has demonstrated that A Dictionary of the English Language (1755) should be placed in the humanist tradition of European learning, and that Johnson’s verbal universe was not limited to eighteenth-century English but extended back to sixteenth- and seventeenth-century Latin and Continental culture. From this point of view, the celebration of the newly printed Dictionary as a masterpiece that rivalled and surpassed the great Italian and French academy dictionaries (think of Garrick’s laudatory lines: ‘And Johnson, well arm’d like a hero of yore, / Has beat forty French, and will beat forty more’) might well be meant to promote the Dictionary and its author, but it was conceptually wrong: Johnson’s achievement was heroic indeed, but the hero’s arms had been taken from the Continental armoury! A more balanced attitude was shown by Johnson himself, in his preface to the Dictionary, when he wrote: ‘I have devoted this book, the labour of years, to the honour of my country, that we may no longer yield the palm of philology, without a contest, to the nations of the continent’; and although two paragraphs later he stresses that his work was not compiled ‘under the shelter of academick bowers’, Johnson must have felt that, by writing his Dictionary, he had become part of the ideal community of European scholars, an academician although there was no academy in Britain.
More than anything else, Johnson’s use of quotations to illustrate the meaning and usage of words was found impressive – and rightly so, since the Dictionary, in support of 42,773 entries, contains about 110,000 quotations. Quotation gathering for dictionaries, however, was no new lexicographical practice, as is witnessed by a number of Continental dictionaries of Greek and Latin, the Vocabolario degli Accademici della Crusca (1612), and the French dictionaries of Richelet (1680) and Furetière (1690). In more general terms, when trying to focus on the books that may have served Johnson as a model or provided him with material for his Dictionary, three pieces of evidence should be analysed. The first is the five-volume Catalogus Bibliothecae Harleianae (1743–5); that is, the catalogue of the vast library of Edward Harley, second earl of Oxford, that Johnson helped prepare: the library contained nearly 350 dictionaries, and it is likely that Johnson went through many if not all of them, and was able to examine the methodology of the lexicographers that would become his predecessors. The second consists of the dictionaries that Johnson himself mentions in his preface and in a large number of entries: research has shown that Johnson refers to Robert Ainsworth’s Thesaurus Linguae Latinae Compendiarius (1736) 584 times, and to Nathan Bailey’s Dictionarium Britannicum (1730 and 1736) 197 times; moreover, there are 1,144 references to ‘Dicts’, which is an unspecified reference to a dictionary (very often Bailey’s again). The preface also comments on Stephen Skinner’s and Francis Junius’s etymological dictionaries, and quotations in the entries also come from a variety of dictionaries, including Minsheu’s polyglot compilation, Boyer’s Royal Dictionary, Cowell’s legal lexicon, and Chamber’s Cyclopaedia. The third piece of evidence is the sale catalogue of Johnson’s library, printed after his death: Johnson’s familiarity with Renaissance lexicography is proved by the fact that he owned such dictionaries as Estienne’s Thesaurus, Gerhard Vossius’s Etymologicon Linguae Latinae (1695), Robert Constantine’s Lexicon Graecolatinum (second edition 1592), and an eight-language edition of Calepine.
Therefore, however innovative Johnson’s Dictionary was for English lexicography, his work was firmly grounded in European humanism and the tradition of Continental dictionary-making, and showed remarkable affinity with it. Arguably, this is one of the reasons that his masterpiece captured attention throughout Europe as soon as it was published. Indeed, Johnson’s ideas as a lexicographer had been appreciated long before: the July–September 1747 issue of the Bibliothèque Raisonnée des Ouvrages des Savants de l’Europe, a quarterly journal published in Amsterdam, warmly praised the recently printed Plan of a Dictionary of the English Language, which detailed Johnson’s proposed methodology for compiling the dictionary.
Since it first appeared on 15 April 1755, A Dictionary of the English Language exerted its influence in Britain and beyond. It comes as no surprise that Johnson’s Italian friend, Giuseppe Baretti, systematically consulted Johnson’s Dictionary when revising the English–Italian section of Ferdinando Altieri’s bilingual dictionary that he published in 1760. In 1777, Ferdinando Bottarelli, another Italian living in London, published a pocket dictionary of the Italian, French, and English languages: the title page mentions, as Bottarelli’s sources, the dictionaries of the French and Italian academies, and Johnson’s – in a way, acknowledging the Englishman’s dictionary as the extraordinary product of a one-man academy. Indeed, the identification of Johnson’s achievement with that of the Italian and French academies had already been made: as soon as the Dictionary was published, a copy was sent to the two academies, and they gratefully reciprocated with their own. More significantly, when in the early nineteenth century the Crusca academy appointed a committee to revise its Vocabolario, the prologue to the Castillian dictionary of 1770 and Johnson’s preface were translated into Italian and published in 1813 in order to help improve the methodology behind the Vocabolario. A few years later, between 1818 and 1824, the Italian classicist poet Vincenzo Monti published the volumes of his Proposta di alcune correzioni e aggiunge al Vocabolario della Crusca (Proposal for some emendations and additions to the Crusca vocabulary). The second volume (1819) included a 52-page essay by the Piedmontese Giuseppe Grassi entitled Parallelo del Vocabolario della Crusca con quello della Lingua Inglese compilato da Samuele Johnson e quello dell’Accademia Spagnola ne’ loro principj costitutivi (Comparison between the Vocabolario della Crusca and the dictionaries of the English language, compiled by Samuel Johnson, and of the Spanish Academy in their guiding principles). Here, Johnson’s is defined as ‘the most philosophic dictionary of all living languages’, and some entries are translated into Italian in order to compare them with the corresponding ones in the Vocabolario. France did not behave differently: in 1778, when Voltaire urged the Académie française to revise its dictionary, he proposed Johnson’s Plan as a model, and in 1823, when a new edition of the Dictionnaire was being prepared, François Andrieux, possibly influenced by what had been happening in Italy, translated the preface into French.
Apart from its impact as a lexicographic model, the wordlist in Johnson’s Dictionary was incorporated into European bilingual dictionaries. In Germany, Johann Christoph Adelung’s Neues grammatisch-kritisches Wörterbuch der englischen Sprache für die Deutschen (1783) was a translation of the Dictionary’s fourth edition, up to the letter J. The second volume of this English–German dictionary (letters K to Z) appeared in 1796, and in the following years, the German–English section was added. Interestingly, the prefatory matter of Adelung’s Wörterbuch included his assessment of Johnson’s work; this was translated into English by Anthony Willich and published in 1798 as On the Relative Merits and Demerits of Johnson’s English Dictionary. In France, Alexander Spiers compiled Dictionnaire général anglais-français, nouvellement redigé d’apres Johnson, Webster, Richardson, … (1846–9). Johnson’s masterpiece was also used by the lexicographers of Portugal, Sweden and the Netherlands, not to mention being influential across the Atlantic (with Webster’s early reliance on Johnson, despite its later criticism, and Worcester’s defence of Johnson’s work) and even beyond (an 1851 Bengali–English dictionary used Johnson as a model).
Conclusion
Johnson’s was the most important, but not the only, English lexicographical resource for dictionary-making available in eighteenth-century Europe. For example, what was published in London in 1727 as the second volume of Nathan Bailey’s The Universal Etymological English Dictionary (actually, a supplement) included An Orthographical Dictionary – originally compiled for foreign as well as English readers – that was translated into German by Theodore Arnold and published as Mr Nathan Bayley’s English Dictionary (1736). Another pre-Johnsonian work, A New General English Dictionary by Thomas Dyche and William Pardon, first published in 1735, was mentioned by Denis Diderot in the 1750 Prospectus of the Encyclopédie as a dictionary to which the latter work was indebted; as a matter of fact, in the 1750s, French editions of Dyche and Pardon’s dictionary were published in Avignon, Paris, and Amsterdam. Even more surprisingly, the influence of English lexicography reached as far as Russia: Slovar’ na shesti iazykakh (A Dictionary in Six Languages) is a multilingual dictionary of 1763, which included Russian, Greek, Latin, French, German, and English words; its preface identifies as its source John Ray’s Dictionariolum Trilingue (English, Latin, and Greek), reissued in 1696 under the title Nomenclator Classicus.
These and most of the above examples demonstrate that the art and craft of lexicography developed harmoniously in Europe, different national traditions of dictionary-making smoothly merging with each other, cross-fertilising each other, and contributing to a common heritage. John Considine is the historian of lexicography who has most convincingly argued that ideas of heritage can help us understand how a European lexicographical tradition evolved, and European lexicography came to have a recognisable identity since the Renaissance.
For historical, socio-political, and linguistic reasons, English lexicography was not at the forefront of dictionary-making in the early modern period; however, by the eighteenth century the situation had changed, and English lexicography set the stage for the great influential innovations of the nineteenth century – above all, the Oxford English Dictionary – and of the twentieth- and twenty-first centuries, for example, the Learner’s Dictionaries of English for the international market that have revolutionised pedagogical lexicography in Europe and across the globe, as well as the dictionaries of varieties of English and of endangered languages all over the world.
In keeping with its character as evasive and subversive language, slang is hard to define. Some see it as urban masculine vocabulary focused on sex, intoxication, and excretion (Green 2015), others as instrumentally valuable in the construction of in- and out-groups (Eble 1996), or as a matter of style to facilitate fitting in and standing out (Adams 2009). Coleman (2012, 57–8) identifies a coherent set of eleven ‘ideal conditions for slang’, including an ‘accepted form of the language which it exists within and rebels against’, a hierarchy of which the bottom rung includes group solidarity, ‘dense social networks’, and ‘a real (or perceived) threat to individuality and self-expression’. Such disagreements about the slang concept have not impeded the progress of slang lexicography, however.
One might mistakenly assume that slang is of its nature a reaction to standard English, and while it certainly does dissent from standard varieties of English worldwide, standard varieties are not required. Local norms are sufficient, norms imposed from the top down, inhibiting the young but also those dissatisfied with other local hierarchies. Most extant early English texts are not slangy, but if we subscribe to the Uniformitarian Principle, we must suppose that language has always included a slang dimension. Slang lexicography, on the other hand, rises only after codification of standard English requires codification of its anti-languages, which makes it a thoroughly modern enterprise. The first glossary of thieves’ slang or cant appeared in 1566; slang lexicography has thrived ever since, but has undergone radical transformations in step with lexicography of standard English along the way.
Hard, Bad Words: The Earliest English Slang Dictionaries
The pre-history of English slang dictionaries begins as early as the brief wordlist – just over 100 items – Thomas Harman appended to his A Caveat or Warening for Commen Cursetors (1566), a pamphlet that exposed how vagabonds and beggars practised on unwary citizens – public anxiety was high in the age of so-called ‘masterless men’. The cant he reported was so interesting to readers that Caveat went rapidly through several printings and the wordlist ended up in books by others, sometimes wholesale, in others adapted, in a textual heritage that extends into the seventeenth century, with Martin Mark-all, Beadle of Bridewell (1610), by the semi-anonymous S. R. Interest in cant renewed in mid-century, with Richard Head’s The English Rogue (1655), with a wordlist of less than 200, though the list was lengthened slightly between the two editions published that year. As with Harman, several later authors borrowed Head’s glossary, through to the History of the Live and Actions of Jonathan Wild (1725). (The textual interrelationships among wordlists in both the Harman and Head families are accounted for thoroughly in Coleman Volume 1, 20–75).
The cant wordlist tradition ended in 1725, but the first dictionary of English slang had already appeared by then, in 1699. The transition from list to book was no coincidence. The earliest English dictionaries were wordlists, too, but included considerably more entries than Harman’s and Head’s. They focused on supposedly ‘hard words’ rather than general vocabulary, and one might view the cant wordlists as in the ‘hard words’ vein of lexicography – the language of thieves, vagabonds, and rogues, for the average speaker of English was as hard to understand as Latinate terms were for schoolboys. To grasp the meaning of such terms, all one needed was a more familiar gloss, a synonym or two. Such dictionaries defined words but did not treat them as objects worthy of extended investigation.
Over the course of the seventeenth century until publication of Samuel Johnson’s A Dictionary of the English Language (1755), lexicographers developed increasingly complex dictionary structures motivated by new assumptions about lexical knowledge and readers’ interests. Edward Phillips’s The New World of Words (1658), Elisha Coles’s An English Dictionary (1676), John Kersey’s A New English Dictionary (1702), and Nathan Bailey’s Universal Etymological Dictionary (1721) established this lexicographical trajectory (for commentary, see Starnes and Noyes 1991 and Read 2003). The New World of Words treated approximately 11,000 words and is more closely aligned with the ‘hard words’ tradition than the other two; An English Dictionary ran to as much as 25,000 items, including cant terms from Head’s list; A New English Dictionary included still more, at roughly 28,000 entries. Along the same line, that is, in keeping with contemporary developments in lexicography, the semi-anonymous B. E., Gent., published A New Dictionary of the Terms Ancient and Modern of the Canting Crew (1699), narrower in scope, so on a smaller scale, at some 4,000 items.
Several features of B. E.’s dictionary follow modern dictionary practice. First, it provides multiple senses for polysemous words, especially those with standard as well as cant, slang, or jargon meanings. Second, he labels register, so that cant terms are marked with a ‘c’. Third, he ventures several etymologies, my favourite of which is that in the entry for taudry:
garish, gawdy, with Lace or mismatched and staring Colours: A Term borrow’d from those times when they Trickt and Bedeckt the Shrines and Altars of the Saints, as being at vye with each other upon that occasion. The Votaries of St. Audrey (an Isle of Ely Saint) exceeding all the rest in the Dress and Equipage of her Altar, it grew into a Nay-word, upon any thing very Gawdy, that it was all Taudry, as much as to say all St. Audrey.
It sounds like a perfect folk etymology, but it’s actually more or less correct – the OED says so (s.v. tawdry lace). Most important, however, is not the truth of the matter, but that from B. E. forward, slang lexicography assumes that readers want etymologies, analysis of polysemy, labelling, etc., and that any adequate explanation of English vocabulary – cant and slang included – depends on such explanatory features. Slang lexicography thus emulated lexicography of standard English.
English Vulgar and Flashy
The great event of eighteenth-century English slang lexicography was publication of Francis Grose’s Classical Dictionary of the Vulgar Tongue (1785). The third edition (1796) saw several printings. Captain Grose was an officer successively in the Hampshire and Surrey militias but also a Fellow of the Society of Antiquaries, an avid collector of provincial words, as well as cant and slang. His Classical Dictionary was somewhat smaller than B. E.’s, at just less than 3,900 entries. He viewed slang as a broad concept, something like Eric Partridge’s ‘unconventional English’, including the innocuous babble and the vulgar and obscene cunt. His dictionary includes features that suggest he paid attention, not only to general dictionaries of the period – Bailey’s and Johnson’s – but also to B. E.’s. Some entries include etymologies, others observations on usage, still others labels. Perhaps most significantly, Grose on occasion refers to other dictionaries and, Coleman (Volume 2, 21) proves, provides illustrative citations in 11.5 per cent of entries, improving on B. E., though far short of Johnson. Grose’s dictionary, itself perennially successful, gave rise to any number of other slang dictionaries (see Coleman Volume 2, 72–105). Until mid-century Grose was the pre-eminent influence on slang lexicography and chief provider of knowledge about English slang.
Grose’s Classical Dictionary varies in tone: generally, it is detached (or scientific), but sometimes preceptive (or normative), and occasionally even facetious. Grose saw no reason to write in one and only one tone, and lexicographical theory and practice had not yet regularised dictionary style to an absolute seriousness. Johnson wrote facetiously, for instance, in his infamous definitions of lexicographer, ‘a writer of dictionaries; a harmless drudge that busies himself in tracing the original, and detailing the signification of words’ and oats, ‘a grain, which in England is generally given to horses, but in Scotland supports the people’. In following Johnson’s practice, Grose merely kept pace with the lexicography of his time. In the second edition, for instance, he includes an entry for Richard Snary, with the following definition and etymology: ‘A dictionary. A country lad, having been reproved for calling persons by their Christian names, being sent by his master to borrow a dictionary, thought to shew his breeding by asking for a Richard Snary’. Were the story true, it hardly warranted the entry, but one suspects Grose of having invented the word and written a folk etymology to justify it, all for fun. Today, we think of dictionaries as scientific reference works rather than joke books, but Grose’s occasionally facetious style reminds us that not all audiences turn to dictionaries for information only or prefer a detached tone. Cant, jargon, and slang tempt wit. Readers of slang dictionaries often hope to be entertained, and, sometimes, their correspondent lexicographers have obliged.
In Life in London (1821), Pierce Egan brought ‘flash’ language to national and even international attention. Flash admits several related senses, not easily teased from one another. Following the OED (s.v. flash, adj.3), it can mean ‘connected with or pertaining to the class of thieves, tramps, and prostitutes’, but also ‘connected with […] the class of sporting men’ and ‘knowing, wide-awake, “smart”, “fly”’, as well as ‘dashing, ostentatious, swaggering, “swell”’. In the conflation of these senses, flash raised the social prestige of slang. Slang might be vulgar and cant the language of thieves, but neither was only that – representatives of high and low life mingled around the ropes of the boxing ring. In the wordlists of Life in London, Egan elevated both slang and its lexicography. On the heels of that book, he brought out a new edition of Grose’s third edition of the Classical Dictionary of the Vulgar Tongue (1823). Thus, he moved into new sociolexical territory while acknowledging and extending an earlier tradition of slang lexicography.
Victorian Slang Dictionaries: From Hotten to Farmer and Henley
John Camden Hotten was a London bookseller. The first book he published was his own Dictionary of Modern Slang, Cant, and Other Vulgar Words (1859), which registered over 2,000 slang words. The work was revised four times, and each edition included new entries. Indeed, by the time of the posthumous fifth edition (1874), the dictionary had expanded to include more than twice the original entries. No major dictionary of English slang had appeared after Egan’s edition of Grose, so Hotten’s filled the gap, satisfied the public appetite for slang, and remained dominant until nearly the end of the century, when John Stephen Farmer and William Ernest Henley published their far superior Slang and Its Analogues, Past and Present (1890–1904), in seven volumes.
Far from perfect, Hotten’s dictionary was more professional than Grose’s. Following post-Johnsonian dictionaries of standard English, he introduced new features into slang dictionaries, for instance, frontmatter that explains slang and its history. He also intensified features by then considered essential to a good dictionary: roughly half of his entries attempt etymologies, although, as he points out in a cautionary note on the first page of entries in the third edition: ‘Slang derivations are generally indirect, turning upon metaphor and fanciful allusions, and other than direct etymological connexion. Such allusions and fancies are essentially temporary or local; they rapidly pass out of the public mind; the word remains, while the key to its origin is lost’ (Hotten 1865, 65). Grose had labelled cant as distinct from slang, and thus slang emerged as a register on his watch; but Hotten was the first to use slang in his dictionary’s title, after which it became the umbrella term for several registers, from cant to flash. He was the first lexicographer to treat rhyming slang and back slang.
Hotten drew on a fairly extensive bibliography of printed works, but he also conducted fieldwork, lurking in alleyways himself and engaging agents to collect yet more material, probably under the influence of Henry Mayhew, whose London Labour and the London Poor (1851) he acknowledged as a source. All these features of method and analysis represent advances in English lexicography. Hotten, though a publisher of American works, stuck with English slang in his dictionary. One might criticise him for such parochialism, but perhaps he simply knew his audience.
More significantly, he fell short on expectations raised by Grose and defended by his successors in the tradition of slang lexicography, Farmer and Henley, but wholly in keeping with Victorian manners: ‘Filthy and obscene words’, he wrote, ‘have been excluded’. While heralding his incrementally increasing wordlist, Hotten ignored many words in the slang register of English. Farmer and Henley supplied this deficiency and ended up in court defending the principle that recording obscene vocabulary in a dictionary is not the same as committing obscenity. They lost the case in 1891, but the argument has since justified not only slang dictionaries but all dictionaries in their treatment of the whole vocabulary, rather than just the words some people prefer to hear or read. Mounting this defence was perhaps their greatest gift to lexicography and linguistics.
Yet Slang and Its Analogues achieved much more than mere principle. At last, a dictionary of English slang included American and Australian slang. The number of entries correspondingly expanded to approximately 20,000, leaps and bounds beyond any previous slang dictionary. And, doubtless with the OED’s example in mind, about 50 per cent of entries include citations. Nearly all entries employ usage labels; nearly three-quarters indicate grammatical function; etymologies. Farmer and Henley also elevated the most boring element of dictionary entry structure, cross-references – theirs are frequent and accurate. In all of these features, Farmer and Henley converged on the canons of lexicography established by mainstream nineteenth-century dictionaries of English – Webster, Richardson, Ogilvie, and ultimately the OED. Still, their practice fell short of those models. It took another century before any one produced even part of an historical dictionary of English slang to complement the OED.
The Partridge Family: Slang Lexicography of the Twentieth Century
In the twentieth century, speakers of English gradually let down their hair, indulged teen culture, and levelled social hierarchies – lots more slang was spoken, heard, and printed than ever before, and there were many slang dictionaries of all imaginable sizes of all imaginable niche vocabularies (Coleman, Volume IV). Here, we can address only the most significant dictionaries of the twentieth and twenty-first centuries, the most ambitious and durably influential, the most comprehensive and, in lexicographical terms, historical.
Eric Partridge redefined slang lexicography in the twentieth century, not always for the better, but indelibly, and because his work was so popular, his dictionaries are significant, not just in the history of lexicography, but in the history of Anglophone culture. Partridge arrived in England from New Zealand, in 1921, on a Queensland Visiting Fellowship, entering Balliol College, and remained there, with a non-standard accent, for the rest of his life. His undergraduate years at the University of Queensland had been interrupted by service in World War I, and out of that experience, he and John Brophy edited Songs and Slang of the British Soldier 1914–1918 (1930). The following years saw publication of American Tramp and Underworld Slang (Irwin 1931), for which Partridge was assistant editor, and a new edition of Grose’s third edition of 1796 (1931), all three works published by Partridge’s own Scholartis Press. By the time his book, Slang Today and Yesterday (1933), appeared, his interest and expertise in slang were well established.
Partridge published the twentieth century’s most important Anglophone slang dictionary, the Dictionary of Slang and Unconventional English (DSUE)( 1937), which he revised six times before he died in 1979. In part, its significance lay in its length. The fifth edition (1961) ran to 1,362 double-column pages, comprising the original dictionary – with over 55,000 entries and a section of addenda including another 1,100 or so – and a supplement that gathered together the addenda for all previous volumes and yet newer material. DSUE is not an historical dictionary – its entries do not identify the first extant use of a headword and do not include a quotation paragraph in the style of the OED – but they do quote sources occasionally, refer to use in texts and earlier treatment in dictionaries, and opine on the period of earliest use. Partridge follows Johnson as much as he follows the developed lexicography – general/commercial or historical – of his time. Like Johnson, given the chance, he cannot resist a pithy, opinionated comment occasioned by a word or its definition. For instance, in the fifth edition supplement, he defines quota quicky as ‘A short British film put on a cinema programme to fulfil the regulation concerning the quota of British films to be used, in Britain, in proportion to foreign (including American) films: cinema world: 1936. (It doesn’t matter how short the films are; a nasty reflection on British films.)’ The definition and the parenthetical comment illustrate an encyclopaedic tendency that diverged from mainstream modern lexicographical practice.
Partridge’s title, A Dictionary of Slang and Unconventional English, anticipated and – in its copiousness – attempted to avoid the problems of slang as a term for linguists or anyone else. Yet, remarkably, Partridge refused to include American slang in DSUE, which certainly limited its scope and obscured the history of some Anglophone slang terms while simply ignoring others – many others. The American alternative, Harold Wentworth and Stuart Berg Flexner’s Dictionary of American Slang (1960), came to the game late and thin; in terms both of the number of entries and the density of treatment, it remained thin even in later revisions by Robert L. Chapman, compared to DSUE, which prompted Jonathan Lighter to attempt an historical dictionary of American slang, as discussed below.
Partridge wrote with assurance and, rhetorically, this served him well with a popular audience and to some degree teachers and scholars, too. Often, however, he is simply in error. As Coleman (Volume 4, 16) observes: ‘Providing precise dates for a notoriously slippery movement of terms between levels of informality was a bold development, but this appearance of authority is undermined by Partridge’s deductive dating’. So, Partridge correctly identifies O.K. as ‘orig. U.S.’, but misdates it terribly, at c. 1880, when in fact it appeared first in 1839, a much more precise date than Partridge could deduce from the evidence at hand. Etymologies are often as bad as the dates, partly because, though he was a good Latinist, Partridge had little to no direct knowledge of other Indo-European languages. As the novelist and sometime lexicographer Anthony Burgess (1980, 27) put it memorably, ‘he preferred a shaky etymology to none at all’, but etymology, especially by amateurs, requires restraint, a willingness to say nothing rather than introduce falsehoods into knowledge. Arguably, too, Partridge’s work lacked sociological perspective and grounding in the sort of speech facts uncovered by fieldwork, so it gradually lost status among language scholars to sociolinguistic treatments of slang and historical lexicography.
Still, DSUE’s reputation was so strong that Paul Beale could publish a substantially revised Eighth Edition as late as 1984. Then, Tom Dalzell and Terry Victor arranged with Routledge, publisher of DSUE from the beginning, to bring out a freshly conceived two-volume New Partridge Dictionary of Slang and Unconventional English (NPDSUE), the first edition of which appeared in 2006. First among NPDSUE’s virtues over the original is its comprehensive attention to American slang. Second, it ignores etymology entirely – thus minimising error – and dates according to other authorities – in most cases – rather than ‘deductively’. It labels items geographically – UK, US, AUSTRALIA – but not according to register or ‘levels of informality’. It is not an historical dictionary, but each entry is accompanied by one or two illustrative quotations.
Partridge’s DSUE appealed to readers partly because it was so idiosyncratic, to slang what H. W. Fowler’s A Dictionary of Modern English Usage (1926) was to English usage – pungent, thoughtful, occasionally witty, critical, intuitive, and so appealing rhetorically that, though wrong (in the case of Partridge) or opinionated (in the case of Fowler), though written for a specific generation, it could last unexpectedly and perhaps inexplicably for generations. Dalzell and Victor’s NPDSUE, in contrast, is the twenty-first century’s great general-purpose dictionary of Anglophone slang. It provides an option for lovers or students of slang lighter than historical slang dictionaries of roughly the same period.
Two Historical Dictionaries of English Slang
Towards the end of the twentieth century, two talented and unusually committed lexicographers, Jonathan Lighter and Jonathon Green, ventured into historical dictionaries of slang. Lighter published the first and second volumes of his as the Random House Historical Dictionary of American Slang (HDAS) (1985 and 1993), covering the alphabetical range from A–O, but a series of problems led to suspension of the project, which is still incomplete. Green later published his massive, three-volume Green’s Dictionary of Slang (GDoS) (Green 2010), perhaps standard and indispensable dictionary of English slang for the foreseeable future.
Lighter was a precocious lexicographer, having compiled his first work, ‘The Slang of the American Expeditionary Forces in Europe, 1917–1919: An Historical Glossary’ (1972) – published as an almost entire issue of the journal American Speech – while he was an undergraduate at the University of Tennessee. Encouraged by this début, he started work on what would become HDAS, with a draft of the A section submitted as his doctoral dissertation. Random House published the first two volumes of HDAS but withdrew from the project after it was acquired by Bertelsmann and its language reference division was closed. Oxford University Press later adopted the project but has not proceeded with it. These abandonments have nothing to do with Lighter’s scholarship, and HDAS A–O rightly enjoys a high reputation among linguists, lexicographers, historians of English, and an admiring public.
Lighter plumbs the textual well of American slang underlying HDAS thoroughly and he analyses senses and defines like a new Webster. The entry for bull, for instance, unfolds into seventeen senses – some with subsenses – followed by accounts of bull’s use in phrases and proverbs, on the latter of which his treatment is reliably strong throughout the dictionary. Combinations like bull butter ‘margarine’ and bullshit – and even further extended combinations like bullshit bomber, ‘an aircraft or airman engaged in psychological warfare operations, as dropping leaflets or broadcasting propaganda messages’ – receive their own entries. In cases like this one, we see Lighter’s virtues as a definer: it is elegantly written, precise, and concise, but not too concise – a less adept definer might have stopped at ‘psychological warfare’, yet the examples of ‘psychological warfare operations’ clarify the aptness of bullshit in this context. The primary sense of bullshit is defined with similar grace and also care to delineate the word’s full semantic extension, as illustrated by the accompanying quotations: ‘lies, nonsense, exaggeration, or flattery; trickery or tomfoolery’.
As with Lighter and HDAS, Green had been working up to GDoS for a very long time. His Dictionary of Contemporary Slang (Green 1984, 1992, and 1995) started with about 15,000 entries and grew slightly longer and heavier with each edition. Clearly, it helped satiate the reading public’s appetite for slang, and led the market for a spate of other user-friendly dictionaries of roughly the same period, such as the third edition of Robert L. Chapman’s Dictionary of American Slang (1995), which was based on Wentworth and Flexner; John Ayto’s Oxford Dictionary of Slang (1998); Tony Thorne’s Dictionary of Contemporary Slang (DCS) (1990), now in its fourth edition; Richard Spears’ NTC’s Dictionary of American Slang and Colloquial Expressions (1989); and Pamela Munro’s Slang U. (1989), compiled from material developed by her students at the University of California, Los Angeles.
Yet Green was all along working on a more ambitious project, and DCS was immediately superseded by Cassell’s Dictionary of Slang (CDS) (1998 and 2005), which treats slang from the 1700s to what was then the present. CDS swallowed DCS whole, and, with the second edition, Green had added some 70,000 entries that stretched to 1,565 pages of double-column text. CDS entries were compact but indicated rough dates of use in brackets after the headword, identified region – it recorded slang from throughout the Anglophone world – and provided more etymological information than DCS. CDS was an unusual dictionary in both scope and style – Ayto includes dates at which items first entered English and usually a single quotation to illustrate use, as does Thorne, who also comments on usage and word history more often and commendably, compared to dictionaries of similar size and purpose, but Thorne’s dictionary includes roughly 7,000 senses of slang words, while Ayto’s included over 10,000 items, but nowhere near the slang wordhoard collected in CDS, in either case.
Green’s ultimate project, towards which the other dictionaries were increasingly confident steps, was GDoS, which Julie Coleman (2012, 193) judged ‘the best historical dictionary of English slang there is, ever has been or (in print at least) is ever likely to be’. Originally published in three heavy volumes at 2,204 pages, including something like 53,000 entries, many of them treating impressively polysemous words, amply illustrated with more than 415,000 quotations, it justly deserves the accolades it has received. Based on an astonishingly full bibliography – roughly 6,000 works are listed, even though the list includes only those cited at least five times – it turns over lots of stones in the back alleys of slang c. 1000 to nearly the present, though, of course, many stones are still left unturned. However, since launching an online edition in 2016 (in partnership with David P. Kendal), Green has turned some more, adding 2,500 new entries comprising 5,000 new word senses, and more than 60,000 new quotations throughout the dictionary text.
The quotations are the lifeblood of GDoS or any other historical dictionary. They bring slang items both to light and to life. There is much to praise in Green’s execution, much to criticise, too (Adams 2012). Though one might complain about some of his etymologies, others reflect his profound familiarity with English popular culture, especially London’s, but those of the US and Australia, as well. For instance, he identifies the origin of foxy grandpa, ‘sly person, neither necessarily old nor a grandfather’, in ‘the cartoon character Foxy Grandpa, by C. E. Schultze (1866–1939), which appeared c.1900 and featured an adult who in a reverse of the usual cartoon situation, played tricks on children’. Perhaps Schultze’s work was already within Green’s knowledge when he crafted the entry, or perhaps he asked the intelligent question, ‘Why foxy grandpa?’ which then prompted a search that turned Schultze up. Either way, it is evidence of sound lexicographical intuition.
Green is a discriminating purveyor of slang: when it comes to sense analysis, he is a splitter, and he compiles compelling lists of compounds, phrases, and idioms towards the ends of entries – sometimes such lists occupy a lot of entry space, much to the reader’s benefit. GDoS is also the first fully global Anglophone historical dictionary of English slang, which is not to say that it includes all items of English slang worldwide, any more than it includes all English slang from ‘inner circle’ countries, but it stretches far beyond all other slang dictionaries in this respect. As a result, forms are labelled precisely – on any two-page spread, one can tell at a glance which words hail from the West Indies or South Africa, or are from the United States but originally, largely, or specifically African-American. In the digital edition, colourful flags of the Anglophone nations pin quotations to our mental maps of the world. Timeline charts also accompany entries, so that readers confront the intersectionality of space and time in historical lexicography.
Unfortunately, on publication of the print GDoS, some reviews, especially one by Simon Winchester (2012), praised GDoS at the expense of HDAS. Winchester claimed that HDAS is inferior to GDoS in nearly every way, but careful evaluation of the two dictionaries reveals their independent strengths, and, as Geoffrey Nunberg (2012) noted after reading Winchester’s review, several of Winchester’s claims were weak or false. Green replied to Nunberg with some corrections but also with considerable poise and grace. It seems very unlikely that HDAS will ever see completion, but for slang vocabulary in the alphabetical range A-O, we are fortunate to have two complementary historical approaches to American slang. Nonetheless, GDoS is superior to HDAS in at least five ways: it includes many more quotations; it accounts for Anglophone slang worldwide; it is complete; because it is complete, it is available in a very attractive online format; and because it is online, it can be updated and revised continually.
Conclusion
Speakers of proper English may look down on slang, though they probably use it occasionally. Yet even the purist of speakers, even those at the top of the social hierarchy, even well-intentioned prescriptive teachers and clergy cannot resist pleasuring themselves with slang dictionaries. From the beginnings of slang lexicography to this day, many readers have revelled in the exotic, illicit, and thrilling aspects of Anglophone culture from a textual distance. In the privacy of one’s parlour, one may experience less conventional lives vicariously. Entry by entry, dictionaries withdraw curtains of obscurity – rhyming slang, back slang, wild metaphor, items whose etymologies rest on specific cultural moments now lost to all but lexicographical memory – and there slang lies revealed, naked by definition.
The value of slang dictionaries exceeds this sort of lexical prurience, however. Slang dictionaries are important because they provide access to words and phrases, to morphological and metaphorical strategies, of various underworlds, youth culture and its in-grouping and out-grouping, workers protesting, everyday people muttering under their breath. Adequate histories of Anglophone culture and of the English language in England or North America or elsewhere require knowledge of those words and what they have meant to those who use them. Some speakers of English – indeed, some lexicographers – may turn their heads, avert their eyes, and otherwise avoid unconventional behaviour – including language – preserving their innocence, their cultural cleanliness, they suppose, by preserving their ignorance. But avoiding facts out of distaste or embarrassment seems intellectually irresponsible and culturally callous. Indeed, the historical dictionaries of Anglophone slang are perhaps most important of all because they are the most democratic. They enable historical and cultural eavesdropping, amplifying voices of real speakers past and present, from the street just below us or from great distances, voices absent from and therefore silenced by mainstream dictionaries and authorised histories.