English Dictionaries Throughout the Centuries

Part II English Dictionaries Throughout the Centuries

Chapter 8 A Dictionary Ecosystem: Four Centuries of English Lexicography

Metaphors of evolutionary development should always invite critical scrutiny, and the narrative summed up in the title of Starnes and Noyes’s dangerous classic The English Dictionary from Cawdrey to Johnson has been misleading us all for decades. Robert Cawdrey’s Table Alphabeticall of 1604 and Samuel Johnson’s Dictionary of the English Language of 1755 were very different kinds of books, and in so far as they both play a part in an evolutionary story, it is not the story of how a small dictionary evolved into a larger and more complex dictionary. Rather – to keep our metaphor in touch with the world of living things – it is the story of the development of a whole ecosystem. The largest and most conspicuous members of this ecosystem, its charismatic megafauna, as it were, have never been as important as its smaller members. For the story of the English dictionary since the early seventeenth century has always primarily been a story of cheap, unpretentious, and fairly portable texts. These were often published in a long succession of editions, and conservative tendencies both in the dictionary-making process and in the dictionary-buying public enabled some dictionaries to remain in print for over a century, and in circulation for long after the date of the last printed edition. One must not be misled by the date of the first edition of John Bullokar’s English Expositor (1616) into seeing it simply as a dictionary contemporaneous with James I; in its last Dublin editions (1766, 1769, and 1775), it was a dictionary contemporaneous with George III.

The story which follows can be divided into a prologue and three phases: the age of cheap hand-press printing; the age of cheap mechanical printing; and the age of digital publication. By chance, these four periods can be roughly defined in terms of centuries: the fifteenth and sixteenth; the seventeenth and eighteenth; the nineteenth and twentieth; and the twenty-first.

Prologue

The prologue begins with an evolutionary dead end. The first printed dictionaries which included English were the Promptorium parvulorum, first published in 1499, with English headwords and Latin equivalents, and the Ortus vocabulorum, first published in 1500, with Latin headwords and Latin and English equivalents. These were descended from a Latin dictionary of the thirteenth century, the Catholicon of Giovanni Balbi of Genoa, adaptations of which had had English equivalents added to some of their entries from the late fourteenth century onwards. The Catholicon was superseded in the early sixteenth century by a new Latin dictionary, the Dictionarium of Ambrogio Calepino, and this was adapted to make a Latin–English dictionary, the Dictionary of Sir Thomas Elyot, published in 1538. The title of this book, which is simply an anglicisation of Calepino’s title, is the first attestation of Dictionary as an English word. The old Promptorium and Ortus, which had been published in a number of editions, were never published again, and a succession of Latin–English and English–Latin dictionaries based on Elyot were published in the remainder of the sixteenth century and into the seventeenth and beyond. The most elaborate of these were large, stout folio volumes, and none of them were pocket-sized.

Many smaller free-standing dictionaries which included English were produced in the sixteenth century. The most numerous of these were thematically ordered polyglot volumes, offering dialogues and wordlists, which presented English together with other European languages (including Latin). These portable little books ran to dozens of editions, most of them printed in continental Europe. Bilingual dictionaries of English with Law French, Welsh, vernacular French, Italian, and Spanish also appeared, the largest of them in non-portable formats. Difficult words were explained in a number of books in English, the explanations being sometimes presented in alphabetical glossaries or tables after the main text. The least sophisticated lexicographical texts were spelling books, which were intended for the use of small children or, at least in some later cases, illiterate adults (as in George Fisher’s The Instructor, or, Young Man’s Best Companion of 1733), and presented lists of words without definitions. The most successful of these to originate in the sixteenth century was Edmund Coote’s English Schoole-Maister (1596), which included a glossary of 1,368 hard words. The last surviving edition, of 1737, claimed that it was for the ‘Four and fiftieth time imprinted’. Books like this are an integral part of the story of the English dictionary, and were compiled by a number of lexicographers who also compiled larger dictionaries, including Elisha Coles, Thomas Dyche, Nathan Bailey, William Pardon, and Noah Webster.

The Age of Hand-Press Printing

Robert Cawdrey’s Table Alphabeticall of 1604 drew many of its 2,498 lemmas from Coote’s English Schoole-Maister, but provided them with definitions. It drew other hard words from glossaries, but unlike the glossaries, it presented the vocabulary of multiple texts, and it was published as a very small free-standing book. Cawdrey did not call his book a dictionary, because the English word dictionary had always been used for bilingual wordlists: what he had produced was, he stated, a table of hard words in an innovative format, rather than a monolingual equivalent of an English–Latin dictionary. John Bullokar, whose enlarged adaptation of Cawdrey’s Table superseded the original, likewise called his work An English Expositor rather than ‘dictionary’, and when, in 1623, Henry Cockeram produced an adaptation of Bullokar’s Expositor and daringly called it An English Dictionarie, Bullokar’s work continued to be republished under the title of Expositor. Cockeram had given a name to his own book, but not to a genre.

Editions of Bullokar and Cockeram competed with each other until 1676, when a new octavo English wordbook came on the market: the work of Elisha Coles, called An English Dictionary by way of announcing that it was a competitor with Cockeram’s book, which it duly drove out of the market. It squeezed 25,698 entries into just over 300 triple-columned pages. Future editions of Coles and Bullokar continued to compete with each other into the eighteenth century.

Half a century after the first edition of Cawdrey’s Table Alphabeticall, a rather different sort of English dictionary from the little volumes of Bullokar and Cockeram was published. This was the Glossographia of Thomas Blount (1656), a larger and more gracious book, meant to appeal, as its preface made clear, to a more educated reader. It was followed two years later by an adaptation, Edward Phillips’s New World of English Words, and for the rest of the seventeenth century, editions of Blount and Phillips competed with each other to be the inventories of English words on the shelves of the better sort of reader. The difference between Phillips’s dictionary and Coles’s is vividly shown by examining original copies of late seventeenth-century editions side by side: the gracious folio of Phillips is a book for a gentleman’s library, and the octavo of Coles is a schoolbook. Once the title ‘dictionary’ had been taken by Cockeram, it was not used by Blount or Phillips, who did not wish to suggest that their books were like his. To be sure, they took over some of Cockeram’s entries, but that was a different matter; their books and his occupied very different ecological niches.

It was only at the beginning of the eighteenth century that a tradition of English dictionaries derived from that of Phillips started to use the word Dictionary as a title: namely, the Universal Etymological English Dictionary of Nathan Bailey, of which the first edition appeared in 1721, and the so-called ‘thirtieth’ (when they can be checked, such edition counts seem usually to be approximately right) in 1802. These were large, stout octavos, more compact than Phillips and more comprehensive than Coles, which registered both words relating to technology and the sciences, and high-frequency words. The latter must have reassured readers anxious about the developing sense that there was only one way to spell any English word, and must have helped a growing readership for whom English was an acquired language.

As for technical and scientific words, these were, and have remained, a considerable challenge for English lexicographers. Since the seventeenth century, English has had a large and expanding naturalised technical vocabulary. Some of this has been treated in specialised dictionaries, for instance of law, of medicine, and of gardening (notably Philip Miller’s Gardener’s Dictionary, of which there were eight unabridged and six abridged editions from 1724 to 1771). Some has been treated in encyclopaedic texts, seen increasingly as distinct from the traditions of general lexicography, such as John Harris’s Lexicon Technicum (1704–10), Ephraim Chambers’s Cyclopedia (1728), and the Encyclopaedia Britannica (1768–71). But Phillips’s dictionary was advertised from the beginning as being rich in the technical language of a multiplicity of named subject fields, and eighteenth-century dictionaries such as Bailey’s were hospitable to such language, the New General English Dictionary of Thomas Dyche and William Pardon (1735; twentieth edition 1794) being especially richly encyclopaedic.

The title Dictionary was, then, given in the eighteenth century to the works of Bailey and of Dyche and Pardon, and to specialised works like Miller’s. It was also given to little books like John Entick’s New Spelling Dictionary of 1764. Although these books were sold in great quantities, it is easy to underestimate their importance. The closely printed oblong octavos of Entick’s dictionary do not reward reading in the way that the two folio volumes of Samuel Johnson’s Dictionary do, and they often come down to us in decidedly scruffy condition. But when the miscellaneous writer John Trusler was reflecting at the beginning of the nineteenth century on the trouble which a large edition of a given book might give to the publisher, the counter-example which occurred to him was Entick: ‘Mr. Dilly used to print 5000 copies at a time of Enticks Spelling Dictionary’ (quoted in Raven 2007, 305). They evidently flew off the shelves, and fifty-four editions are known, printed in London, Dublin, Wilmington, Glasgow, New Haven, New York, and Belfast, the latest being of 1831; even if 5,000 was the largest edition size, there may well have been over a hundred thousand copies all told.

The greatest activity took place at the bottom of the ecosystem. Thomas Dyche’s spelling-book A Guide to the English Tongue (1707) was, by 1800, in its ‘Hundred and second edition’, and printing records show that between 4 December 1733 and 1 February 1748, 275,000 copies were printed in London alone (there was also a Dublin edition); only six of these copies survive. In 1723, Dyche brought out another spelling-book, called A Dictionary of All the Words Commonly Us’d in the English Tongue, marking stressed syllables; this was the first indication of pronunciation in the English lexicographical tradition, and was emulated almost at once in one of Bailey’s dictionaries.

The idea of a history of the English dictionary leading from Cawdrey to Johnson suggests that after the publication of Johnson’s Dictionary of the English Language in 1755, English lexicography entered a new phase, and this did not happen at all. The two folio volumes of Johnson, in which beautifully written entries, richly illustrated with literary quotations, were laid out spaciously upon the page, did not compete with concise dictionaries like Bailey’s, or with little dictionaries like Entick’s, let alone with spelling books. On the contrary, they introduced a new kind of English dictionary to the market, much grander than anything which had come before. Johnson’s forerunners were the great sixteenth-century dictionaries of Latin and Greek, and the seventeenth- and eighteenth-century academy dictionaries of Italian, French, and Spanish. The story of lexicographical evolutionary development from which his dictionary emerges is a story which plays out in the international high culture of Renaissance and Baroque Europe. The other evolutionary story behind it is that of upmarket reference publishing in the English book trade; the consortium of London booksellers who financed Johnson’s work on the Dictionary had seen expensive works like Chambers’s Cyclopedia making good returns on the investments of their proprietors.

In strictly lexicographical terms, we might indeed see Johnson’s work as evolving into, rather than out of, other English dictionary traditions. The early folio editions which we think of as Johnson’s dictionary, and their unabridged successors in folio and quarto, were accompanied by abridged editions, and these were more numerous than the folios, and printed in larger quantities: 2,000 copies of the first edition in 1755, but 5,000 of the first abridged edition in 1756. When Becky Sharp was given a copy of ‘Johnson’s Dictionary’ in Thackeray’s novel Vanity Fair, what Thackeray had in mind was a portable single-volume abridged edition; that is why Becky was able to toss it unappreciatively out of a window.

In the second half of the eighteenth century, then, the Bailey tradition was superseded by abridged Johnsons and other books in the market for concise dictionaries, and miniature dictionaries claiming misleadingly to be abridged from Johnson’s competed for a share of the market for small dictionaries (no fewer than 309 editions are recorded in the standard bibliography of Johnson, the last ones dating to the early 1890s).

A new element in both markets was the pronouncing dictionary, marketed in particular to Scottish, Irish, and provincial speakers of English embarrassed about the relationship of their speech to emerging metropolitan norms. One of the most influential was the General Dictionary of the English Language of the actor and elocutionist Thomas Sheridan (1780; London, Dublin, and Philadelphia editions to 1797), superseded by the Critical Pronouncing Dictionary and Expositor of the English Language of John Walker, in the title of which we still hear an echo of the title of Bullokar’s dictionary of 1616. The first edition of Walker’s dictionary appeared in 1791, and the last ones in the early twentieth century. These dictionaries always offered definitions as well as giving keys to pronunciation (the latter must, indeed, have been difficult to use effectively).

The Age of Machine-Press Printing

Until the end of the eighteenth century, monolingual English dictionaries were, with only a few minor exceptions – a handful of Scots wordlists, and one or two lists of distinctive words from overseas varieties – composed in England, although they might be reprinted in Ireland, Scotland, or North America. Spelling books were somewhat more international: James Porterfield’s Edinburgh’s English School-Master appeared as early as 1695; spelling books which did not claim to be distinctively American had been compiled in Boston and New York before the end of the seventeenth century; an edition of Dyche’s Guide was published at Tharangambadi ‘For the use of English schools on the Coast of Coromandel in the East Indies’ in 1723; editions of Fisher’s Instructor were published in Philadelphia as The American Instructor from 1748 onwards. Noah Webster’s spelling-book, first published in an edition of 5,000 in 1783 as Grammatical Institute of the English Language, was reprinted in about 400 editions and adaptations.

The independence of the United States of America in 1776, and a developing awareness among Americans that their English differed from, and was not inferior to, the English of England, led inevitably to the production of dictionaries of the English language as used in the United States. The experience of Noah Webster as a maker of spelling books and smaller dictionaries (one of them based on Entick), his patriotism as an American, and his confidence in his ability to produce a dictionary on the scale of Johnson’s but superior to it in multiple respects, led to his compilation of An American Dictionary of the English Language, published in two large quarto volumes in 1828. As well as introducing American words and senses of words which had not been treated by Johnson, and providing illustrative quotations from American texts, Webster removed a great many of Johnson’s quotations, reworked much of Johnson’s own text, and replaced Johnson’s etymologies with more ambitious ones of his own.

A tradition of dictionaries derived from Webster’s American Dictionary and bearing his name on the title page was published in the United States throughout the nineteenth century, competing vigorously in mid-century with dictionaries by Joseph Worcester. The Webster brand – owned from 1847 by George and Charles Merriam – won that competition, and by the end of the century, the leading one-volume English dictionary in the United States was Webster’s International Dictionary of the English Language (1890). More extensive was the multi-volume Century Dictionary (1889–91), which was founded on the Imperial Dictionary (1850), itself a British descendent of the Webster tradition. Its publisher produced a successful informative magazine, and this gave it both the resources to invest in a multi-volume dictionary and also a sense of the size and energy of the potential readership of such a dictionary. Just as the success of Chambers’ Cyclopedia gives us a context for the financial viability of Johnson’s Dictionary, so the flourishing of the multi-volume encyclopaedia in late nineteenth-century America gives us a business context for the Century Dictionary, which was indeed rich in encyclopaedic material.

Meanwhile, new four-volume editions of Johnson were produced in England, by Henry Todd in 1818 and by R. G. Latham in 1876. These competed at the upper end of the British dictionary market with a two-volume English edition of Webster (1831), with the Imperial Dictionary, and with successive editions of the dictionary of Charles Richardson (book publication 1836–7; overlapping publication in fascicles as part of an encyclopaedia 1818–44), which deliberately turned away from the analytical, and fundamentally synchronic, definitions of the Johnson tradition to a defining style which attempted a much more unified treatment of the sense-development of each word, supported by a chronologically ordered array of quotations, the earliest being from Middle English texts. Before Richardson, the Etymological Dictionary of the Scottish Language of John Jamieson, in two volumes (1808) with a two-volume supplement (1825), had already presented chronological sequences of quotations for each word, starting with the earliest available: as well as being the first major learned dictionary of a variety of English from beyond England, this was the first dictionary of a variety of English – or of any other European language – to be compiled on historical principles.

In the late 1850s, work began on a new English dictionary on historical principles, inspired not by Jamieson but by continental European developments in the lexicography of ancient Greek (by the German scholar Franz Passow, brought into English and improved on by Henry Liddell and Robert Scott) and of post-medieval German (by Jacob and Wilhelm Grimm). This new dictionary was originally imagined as a work on the same scale as the largest of its English-language predecessors, but grew in the planning, and was finally published in fascicles from 1884 to 1928 as the New English Dictionary, later called the Oxford English Dictionary (OED). At the time of its completion, it was the most extensive completed dictionary of any language in the world. Its aim was to document all but the rarest and most specialised words in all the printed texts in every variety of English subsequent to the Old English period, arranging the definitions on historical principles, and illustrating them accordingly with a great profusion of quotations. Although it naturally drew on the dictionaries of Johnson, Jamieson, Webster, and Richardson, it should not be seen as an evolutionary development from any of these. Like Johnson’s Dictionary of 1755, the OED was something unprecedented in the story of English-language lexicography, with its origins and closest relatives on the Continent. An independent parallel project, the six-volume English Dialect Dictionary (1896–1905), covered regionalisms from the British Isles much more fully than the OED.

In the nineteenth century, as in the eighteenth, the busiest part of the lexicographical ecosystem was the part which is hardest to see at a distance: the production in huge numbers of intellectually unoriginal single-volume dictionaries. The difference between the two centuries is that nineteenth-century printers used steam-powered presses to mass-produce books on an unprecedented scale. For instance, about one and a half million copies of Peter Nuttall’s Standard Pronouncing Dictionary of the English Language were sold in the century after its first publication in 1863. Small dictionaries from Great Britain might claim to incorporate the best elements of Johnson, Walker, and the American lexicographers, an indication of the marketing reach of the latter: an Illustrated National Pronouncing Dictionary of the English Language on the Basis of Webster, Worcester, Walker, Johnson, Etc., published in London and Glasgow by Collins around 1891, was identified on its title page as in its three hundred thousandth edition.

The ecosystem was more diversified in the twentieth century than ever before. On a grander scale than any other dictionary of English, the OED – completed, as we have seen, in 1928 – was reissued with a supplement and bibliography in 1933; after a period of dormancy, a second supplement was issued in four volumes from 1972 to 1986, and was integrated with the first edition of the dictionary in a so-called second edition of 1989, in twenty volumes. The great innovation of this second edition is that it had been electronically prepared: underlying the printed text was a marked-up database, which led the way not only to the release of the dictionary as a CD-ROM in 1992 and online in 2000, but also, more importantly, to the possibility of its revision as a flexible electronic text. Other multi-volume scholarly dictionaries followed from the OED project: of older and of modern Scots, of Middle English, of American English. Shorter scholarly dictionaries were made of a number of world Englishes: of Jamaica, Canada, Australia, New Zealand, South Africa, and so on. Oxford also began to publish a suite of smaller dictionaries: Shorter, Concise, Little, and Pocket, all of these competing with dictionaries from other British publishers. Some of these were adapted for general use in English-speaking countries beyond the British Isles (where of course they might compete with editions or adaptations of American dictionaries), and some of these adaptations had their own progeny: for instance, the Canadian Oxford Dictionary (1998) was the parent between 1999 and 2003 of Canadian Spelling, Paperback, High School, and Compact dictionaries, and of two for small children, My Very First and My First Canadian Oxford Dictionary.

A similar kind of diversification was taking place in the United States. There, since the Century Dictionary had no multi-volume successor, the largest format in active use was that of the unabridged dictionary, which might present more than half a million entries in a monstrous single volume (Webster’s New International Dictionary of the English Language, Second Edition of 1934 was issued as a tall quarto, nearly six inches thick) or two or even three large volumes. An active and lucrative market was that for which collegiate dictionaries, more manageable single-volume works which were regarded as suitable equipment for undergraduate students, and of which up to two million might be sold in a given year, competed from the late 1940s to the 1990s. The genealogical relationships between larger dictionaries might be complex: for instance, the Century Dictionary was abridged as the New Century Dictionary (1927), which was the basis for the American College Dictionary (1947), which was expanded as the Random House Dictionary of the English Language (1966), revised and updated as the Random House Unabridged Dictionary (1993). Far below this level, small derivative dictionaries, usually issued in soft covers as the century went on, were sold very cheaply, sometimes in drugstores and other businesses which did not normally deal in books. Because the word Webster’s was not protected as a trademark, it was sometimes appropriated by these dictionaries, just as their predecessors at the bottom of the market two centuries before had used the names of Johnson, Sheridan, and Walker. It was also appropriated by the World Publishing Company and by Random House.

One of the two most important new developments which took place in the twentieth century was the rise of yet another dictionary genre: the specialised dictionary for adult learners of English. Such persons had been more or less formulaically identified on monolingual dictionary title pages since the seventeenth century, but it was in the 1930s that their distinctive needs were first understood to require distinctive dictionaries, which would be rich in information about grammatical constructions and idioms. The other development was that of corpus-based lexicography. The two were of course related, because some of the questions which are most urgent for learners – what prepositions can follow this verb? – are much better answered by a lexicographer who uses a corpus than by one who relies on her own intuition. Learner’s dictionaries such as the first corpus-based dictionary of English, the Collins COBUILD English Language Dictionary (1987), devised elaborate ways to make such information about words perspicuous to their users.

A third important twentieth-century development was in the presentation of the dictionary page: a competitor in the lucrative collegiate and learner’s markets needed to be attractive to sell well, and many were pleasingly designed and (especially for the American market) illustrated.

The Age of the Digital Dictionary

A spectacular twenty-first-century development at the upper end of the ecosystem has been the incremental revision of the OED. The revision of an entry has often entailed the rethinking of sense-analysis as well as the provision of more quotations and the drastic revision of etymologies, and many new entries have been added. The revised material is being published online but not in hard copy, and Oxford University Press is not committed to the publication of a new hard-copy edition of the dictionary once every entry has undergone revision. Although many of the principles on which OED entries are written have remained constant, the form in which the dictionary is regularly used has evolved very significantly, as have the ways in which users retrieve information from the dictionary.

This development has been echoed by changes in the more populous parts of the ecosystem. For whereas the story of the evolution of the English dictionary from the beginning of the seventeenth century to the end of the twentieth was very largely that of the development of cheap, attractive single-volume dictionaries, its story as the twenty-first century has progressed has been that of the emergence of the free online dictionary. The users of such dictionaries perform rather conservative activities: checking the spelling of a word, ascertaining its meaning. But these happen to be activities well suited to online texts, and the advertising revenue from dictionary websites is considerable.

A very successful competitor in the online market has been dictionary.com, which was launched in 1995, and was acquired in 2008 by the media company IAC, bringing it under the same ownership as the news website The Daily Beast and the dating website Tinder. News reports at the time of the acquisition suggested that the price was in the region of $100 million, justified by the high traffic attracted by the site, and hence its potential advertising revenues; as of 2018, the dictionary.com website claimed 5.5 billion hits annually. A basis for the dictionary entries provided by dictionary.com is the Random House Unabridged Dictionary, and work is presumably done to update content from this source, as well as to provide entertaining features like a Word of the Day (for which fifteen million daily users are claimed). It is suggestive that the three persons named on the dictionary.com website at the time of writing are not lexicographers but a CEO with experience as a business analyst; a VP, Revenue and Analytics; and a VP of Engineering.

Oxford University Press and Merriam-Webster have likewise established websites offering free access to dictionary entries together with third-party advertising and entertaining information about words, with claims to be ‘The World’s Most Trusted Dictionary Provider’ (oxforddictionaries.com) and ‘America’s most trusted online dictionary’ (merriam-webster.com). A possible development would be for the non-specialised paper dictionary to be superseded by such websites, and some publishers of formerly successful dictionaries have closed their dictionary departments – for instance, Random House in 2000. But book publishing and online publishing have co-existed for such a short time that it would be rash to make predictions about their futures.

Summing up the evolution of the English dictionary over four centuries is scarcely possible: it has evolved in too many directions. Again, we might do better to think in terms of ecosystems. Just as a palaeontologist might take a particular creature as a representative of each of the successive ecosystems which she wished to discuss – a trilobite, an ammonite, a limpet – so we might take one English dictionary per century. Coles’s English Dictionary could represent the seventeenth century, as the first attempt to cram a big vocabulary into a cheap dictionary. Walker’s Pronouncing Dictionary could represent the eighteenth, as an attempt to bring spoken as well as written English together. Jamieson’s Etymological Dictionary of the Scottish Language could represent the nineteenth century, as the first major dictionary of a variety of English used beyond England (and the first English dictionary on historical principles). The Collins COBUILD English Language Dictionary could represent the twentieth, as a major learner’s dictionary and an exemplary product of corpus lexicography. Whether dictionary.com will be the best choice to represent the twenty-first century remains to be seen.

Seventeenth-Century English Dictionaries: Hard Words

Chapter 9 Cawdrey, Coote, and ‘Hard Vsual English Wordes’

The emergence of the very first monolingual dictionaries in English is far from a simple and obvious progression, and previous scholarly views and theorising need careful consideration. Many factors – including attitudes to English in both its contemporary form and its older forms, education and the state of learning in the sixteenth century, rising literacy rates, and printing technology, as well as the interests of the printing industry itself – all exerted significant influence. Lexicographers had aims and rationales in mind which were not simply to record the language. The motives of the dictionary-makers are crucial – no dictionary, especially those of an early historical stage, is produced in a cultural and social vacuum or is devoid of personal interest and bias. Contemporary lexical trends and fads might meet with mixed reactions, educators might espouse a variety of aims, from training a learned elite to providing social, commercial, or religious help, and printing houses at that time imposed their own standards and moulded texts to suit themselves. A further matter for reconsideration is the set of notions which modern scholars have entertained about the state of the early modern English lexicon.

Lexicography in England had produced a rich tradition by the end of the sixteenth century; though admittedly lagging behind the immense scholarship of the continental humanist tradition exemplified by the Estiennes in Paris, it was worthy in its own way. It had also shared in the Renaissance interest in vernacular–Latin dictionaries, a departure from the medieval derivation-based Latin tradition. Elyot, Cooper, Howlet, Thomas, and Rider achieved much. The vernacular–vernacular dictionary had also appeared, recording French, Welsh, Italian, and Spanish alongside English in print, while Russian had been put in manuscript form by Mark Ridley, and Algonquin by Thomas Harriot. Against this background, the monolingual dictionary seems an obvious next step in hindsight, but this was not apparent to Elizabethan England.

Concern among users about contemporaneous dictionaries, all of them bilingual or polyglot, had already surfaced. William Turner, a mid-sixteenth-century botanist and puritan polemicist, called for a dictionary revision in 1568. In A new boke of the natures and properties of all wines that are commonly vsed here in England, Turner pointed out the inadequacies of Thomas Cooper’s bilingual Thesaurus linguae Romanae & Britannicae, calling for a committee of learned authorities to revise it. He wished that ‘one learned phisition & philosopher like vnto Linaker, one olde and learned gramarian like vnto Clemond, and one perfite Englishman like vnto Sir Thomas Moore, had the amendment and making perfite of this booke commited vnto them’.

The bald claim that Robert Cawdrey produced the English monolingual dictionary A Table Alphabeticall (1604), which is often made, obscures several other considerations. His work, a dictionary of ‘hard words’, is in fact heavily dependent on an earlier glossary, and in no sense shares the lexicographical interest and seriousness of the unfinished work of Georg Henisch in Germany in his Teutsche Sprach und Weißheit, published in Augsburg in 1616. Cawdrey’s agenda, as we shall see, was not primarily lexicographical. Since Henisch’s intention was to exalt German as a pure language based on monosyllabic roots comparable to Hebrew, foreign borrowings could have no part in his dictionary; quite the contrary. Henisch’s publication was multilingual, incorporating both Latin and some Greek into his entries, and was preceded by a Latin preface. Unfortunately, only A–G was ever published.

In England, there was an impetus towards making lists of words both as aids to memory and as teaching aids, along with a rise in vernacular lexicography. Keeping commonplace books of useful information for personal reference was a common practice. In the 1580s, the educator Richard Mulcaster (1531/2–1611), who taught at Merchant Taylors’ school in London and then at St Paul’s School, saw the desirability of a complete dictionary of English, writing in The First Part of the Elementarie (1580) that

It were a thing verie praiseworthie in my opinion … if som one well learned and as laborious a man, wold gather all the words … in our English tung, whether natural or incorporate, out of all professions, as well learned as not, into one dictionarie, and … open vnto us therein, both their natural force, and their proper use

(p. 166)

Mulcaster wisely points out that people learn a foreign language by rule and effort, but learn their own tongue instinctively and without conscious consideration – ‘our natural tung cometh upon us by hudle, and therefore hedelesse’ (167) – that is, known, but without conscious knowing. Thus, he would endeavour to emulate those in other countries who had undertaken this by ‘expounding their own words by their own language’ (168). He offers a ‘table’ listing many English words, but without definitions, and merely adding a few orthographical and disambiguating notes. Mulcaster stresses the more commonly used words. His list is largely monosyllabic, reflecting a perception that borrowings are generally polysyllabic.

The idea that hard words needed explanations for those who might have difficulty with them was not new when Edmund Coote, who taught at Bury St Edmund’s, published a teaching text in 1596, entitled The English Schoole-Maister. Hard words were seen as those which were borrowed from Latin, Greek, or other languages or were obscure in meaning and perhaps infrequently used. Such words, occurring in many subject areas in the sixteenth century, including religion, law, and medicine, were often called dark, obscure, or strange words by contemporaries. The most familiar contemporaneous designation was that they were inkhorn terms, perhaps only written and printed, probably rather pretentious, but not often in common spoken use. Authors increasingly felt that explanations of such terms were needed, as well as a guide to their pronunciation. That a pedagogue should eventually gather such terms into a single list is thus no great surprise, especially given the ever-increasing rate at which English was assimilating new vocabulary during this period.

Edmund Coote’s The English Schoole-Maister (1596)

Coote’s dictionary, which is appended to The English Schoole-Maister, is a list of ‘wordes taken from the Latin or other learned languages’ and interpreted by a ‘more familiar English word’ (73) where this is possible. These total 1,368 in all. Some left without glosses – such as alight, necessitie, wrought, and surcingle (a strap around the body of a horse to attach the saddle) – were intended to teach the pupil the correct spelling, since their orthography was felt to be problematic. Most of the hard words were from Latin and French, including astronomie, captious, perilous, propriety, and temporize.

Coote’s intended readership is made clear on the title page. His work, he declares, ‘teacheth a direct course how any unskilfull person may easily both vnderstand any hard English words, which they shall in the Scriptures, Sermons, or else-where heare or reade, and also be made able to vse the same aptly themselues; and generally, whatsoeuver is necessary to bee knowne for English speech’.

Coote then indicates that this will be of use for those wanting to take up an apprenticeship, children who have already begun their education, and those who have not learned Latin. One thing is clear – the words he means here may be hard, but they are English, and they are essential to English speech. There is no mention of neologisms which have not been generally accepted. His preface embraces the use of a plain style, and he addresses those who want to teach, especially those in various trades, offering advice on teaching practice. His method is based on the division of words into syllables, which allows the pupil and the would-be teacher access to even the longest and most difficult words, barring only those about which there is disagreement as to spelling. The glossary is an aid and finding-list for this project. He declares that the headwords will be explained by simpler English words.

Typefaces were commonly varied to indicate the status of text: roman and italic for learned material, and blackletter for English. Coote uses typefaces to indicate the language source in a slightly more complex way: blackletter, which he calls the English letter (73), for English or any other vernacular, roman for Latin, and italic for French. Coote is more specific than this, using italics for ‘French words made English’.

The dictionary itself is embedded in a text intended for use in schools, one which proved to be very popular and enduring since it went through numerous editions, the fifty-third and last as late as 1737. The rarity of many of the various editions nowadays suggests that many copies were used to destruction. Coote gives practical advice to teachers as well as language exercises for the students. His instructions for using his glossary begin by directing the reader to master the order of the alphabet, and he then proceeds to outline the way in which words from Latin, Greek, and French are recognised and handled.

Robert Cawdrey’s A Table Alphabeticall (1604)

Robert Cawdrey’s stand-alone dictionary, the first such monolingual dictionary of English, was reprinted posthumously in 1609, 1613, and 1617. Cawdrey (1537/8–1604?) is one of the few early English lexicographers to have attracted a lot of research interest, since he is understandably seen as a lexicographical pioneer, but this is perhaps to overstate the case. Cawdrey’s work is modest in scope, amounting to only 2,543 lemmas in all, and is simple and straightforward as a dictionary.

The title page of the Table has been repeatedly cited as adopting a patronising attitude, especially to women. This needs to be assessed in terms of the contemporaneous social structures and attitudes rather than what we might find offensive now. The title page contains additional useful information as well, claiming not only to contain the ‘true writing of hard usual English wordes’ borrowed from Latin, Greek, French, and other languages, but also to teach the reader both to understand and to spell them. This he will do ‘with the interpretation thereof by plaine English words, gathered for the benefit & helpe of Ladies, Gentlewomen, or any other vnskilfull persones, whereby they may the more easilie and better vnderstand many hard English wordes, which they shall heare or read in Scriptures, Sermons, or elsewhere, and also be made to vse them more aptly themselves’.

Women users of this work are conceived here as of good social standing but not necessarily versed in Latin or Greek, that is, ‘learned’, and as people who are independent and thoughtful readers with a didactic and religious responsibility within the family and society (Shapiro 2018, 188–9). We should also recall that individual, silent reading had not been the norm in the later middle ages, thus magnifying the role of those who could read and instruct others by reading to them.

Further categories of potential readers are specified in the dedication, including foreigners (‘strangers’), and pupils who can familiarise themselves with Latin words, that is, the kind of person who might feel the need for further assistance with their education. Not that knowing such expressions is an end in itself – Cawdrey claims that one’s speech should be to the point and comprehensible and that ‘apt words that properly agree vnto that thing, which they signifie’ (A4^r) is what is required, no more and no less. Verbosity for its own sake is to be avoided, and he is harsh on those men who ‘pouder their talke with ouer-sea language’ (A3^r). It is to be comprehended, but used judiciously and only when necessary.

Cawdrey clearly has a religious agenda, as indicated by the references on his title page to women being able to cope with sermons and scriptures, and as Sylvia Brown (2001) has further suggested. Gentlewomen such as the Harington sisters, to whom the work is dedicated, were to be the domestic inculcators of godliness and, in this respect, are contrasted to the men whose peregrinations about Europe, particularly France and Italy, led them to lard their speech with foreignisms. As Brown points out, the business of such women was to understand these terms rather than pointlessly to ape them. Since they were in the best position to influence and persuade their children, relatives, and household servants, to catechise and to instruct, using such terms would have a closely defined purpose. Cawdrey’s front matter is also derived in large part from Thomas Wilson’s The Arte of Rhetorique of 1553, a work by an author similarly concerned with reformed religion and with the abuses of those who polluted their own language with affectations picked up in Catholic Europe. The notion that they thus ‘pouder’ their English is Wilson’s, and Cawdrey was happy to reproduce it verbatim in his own preface. Cawdrey and Wilson, both puritans, concurred that the common speech should be the acceptable form of English.

In assessing Cawdrey, one should recall that he wrote several other works which deal with education in the sense of creating a godlier society, and that lexicography was by no means his primary concern. It was merely part of a larger agenda, but effectively meant lexicographical restriction. Thus, rather than adumbrating a set of lexicographical principles about borrowings into English, a process which was in its full flowering at the end of the sixteenth century, Cawdrey rather obviously identifies a need and a niche market for such a book among his other publications. Cawdrey’s instincts were sound, since complaints about the abundance of such borrowings were much more frequent than attempts to assist people to cope with them. He does employ sound lexicographical practice in his work, however, ordering and rationalising what he found in his sources, and being prepared to edit when it seemed necessary.

Comparing Coote and Cawdrey

The significance of minor changes when material from one dictionary is incorporated into a later one has been somewhat underrated by scholars, and needs further examination. Since merely talking about plagiarism as a blanket term obscures important data, close attention to the way in which Cawdrey utilises Coote’s material proves to be revealing. In general, Cawdrey takes over Coote’s entries with little or no change; orthographic alterations, such as that from unlawefully under illegitimate in Coote to unlawfully and illegitemate in Cawdrey, may be explained by compositorial custom or individual practice in the printing house. There are, however, many significant changes.

While Cawdrey’s wordlist, like Coote’s, is simple, containing no multiword expressions at all, some of his additions have quite lengthy definitions. A good example is stigmaticall, defined as ‘(g) knauish, noted for a lewd naughty fellowe, burnt through the eare for a rogue’. Smatterer (‘some what learned, or one hauing but a little skill’) is another. This suggests that Cawdrey’s work is partly glossarial and partly definitional. Coote’s definitions are mostly brief, and Cawdrey adds to many of them, usually by way of adding polysemy, synonymic expansion, or now and then adding something otiose. An example of the first appears under collation, defined simply as ‘recitall’ in Coote, but as ‘recitall, a short banquet’ in Cawdrey. Although it may be difficult to distinguish in all cases between synonymous additions, polysemous additions, and genuine lexical definitions, especially at four centuries’ remove, these also seem to be polysemous: Coote’s notifie (‘giue knowledge’) is expanded to notifie (‘to make knowne, or to giue warning of’), thus recognising a special sense of the word; similarly, Coote’s tenure (‘hold’) Cawdrey expands to ‘tenure, hold; or manner of holding a profession’, and two distinct senses seem to be apparent in Cawdrey expanding Coote’s duetifull to ‘dutifull, diligent, very readie or willing to please’ under officious. The same may be true of Coote’s entry for sinister, glossed as ‘vnhappie’, which Cawdrey renders as ‘vnhappie, bad, vnlawefull, or contrarie’, and the same happens under refection, where Cawdrey adds ‘retreating’ to Coote’s ‘refreshing’, which may be a sense not quite grasped by the OED (see OED 2a and 3a) if one assumes that a meal taken in a religious house may in some sense be a retreat.

Synonymous additions are numerous. To Coote’s ‘gentlenes’ for clemencie, Cawdrey adds ‘curtesie’, and he likewise augments Coote’s ‘thought’ for cogitation with ‘musing’. Under erect, Cawdrey adds ‘lyft vp’ to Coote’s ‘set vp’. While the addition of a single synonym is frequent, these augmentations are sometimes much greater, an example being estimate (‘esteeme’, for which Cawdrey offers ‘esteem, value, or prise, thinke or iudge’), which distinguishes two major senses of this term. Cawdrey adds ‘fine, singular, curious’ to the headword exquisite, as well as ‘qualifie, or pacifie’ to Coote’s single gloss ‘asswage’ for mitigate. Fabulous in Coote presents some further challenges in that, first, it is recorded in blackletter, and the gloss ‘feigned’ indicates the degree to which the sense has altered since. To this Cawdrey adds ‘counterfeited, much talked of’, indicating a major division in sense, as well as changing Coote’s spelling to ‘fained’.

Explanatory extensions are very frequent and often helpful. Where Coote has ‘failing’ for eclipse, Cawdrey’s entry supplies ‘failing of the light of the sunne or moone’, which looks like a genuine lexical definition. Coote defined edict as a ‘commandement’, which Cawdrey again supplements with more specific information as well as a synonym: edict, ‘a commandement from authoritie, a proclamation’. Coote’s not very clear definition of lapidarie as ‘skilfull in stones’ is improved by Cawdrey as ‘one skilfull in pretious stones or iewells’, as well as disambiguated as a noun. Illegitimate is clearly glossed as ‘unlawefully born’, but Cawdrey specifies this further as ‘unlawfully begotten, and born’. A larger extension is Coote’s Nicholitan (‘an heretick, fro[m] Nicholas’), to which Cawdrey adds an informative and rather encyclopaedic explanation: nicholaitan, (g) an heretike, like Nicholas, who helde that wiues should bee common to all alike’. A rather less useful addition is made to paraphrase, which Coote glosses as ‘exposition’; Cawdrey makes this ‘exposition of any thing by many words’. A last example of this technique is quintessence, glossed by Coote as ‘chiefe virtue’; Cawdrey gives this a much more technical context, explaining that it is the ‘chiefe virtue, drawn by arte out of many compounds together’.

Coote lists a number of words without defining them, such as defame, hipocrite, malitious, reioynder, and sense, explaining that this has been done simply to show the correct spelling of these words (p. 73). Cawdrey defines reioynder as ‘a thing added afterwards, or is when the defendant maketh answere to the replication of the plaintife’, and Coote’s rheume is defined as ‘catarre, a distilling of humors from the head’. The practice of bracketing terms which were either synonymous, morphological variants, or variant spellings is also used by Coote and adopted by Cawdrey (see Stein 2010, 165–6).

Other types of editorial change are also apparent. Relatively pointless additions are not all that frequent, but include coaiutor, defined in Coote as ‘a helper’, but by Cawdrey as ‘a fellow helper’. A definition is recast under Coote’s notifie, since he suggests ‘giue knowledge’, whereas Cawdrey offers ‘to make knowne, or to giue warning of’. Coote’s ecclesiastical (‘belonging to the Church’) is unmarked for source language, but Cawdrey adds ‘(g)’, as well as decapitalising ‘church’. The marker (g) is also added to episcopal and monasterie, among others. Coote’s equinoctiall is replaced by equinoctium, the more familiar form in the late sixteenth century, and Coote’s headword subuersion is altered in Cawdrey to subuerte. Change in grammatical class is not uncommon. Finally, a potential ambiguity in Coote under compact, glossed as ‘ioyned together’, is resolved by Cawdrey, who offers ‘compact, ioyned together, or an agreeme[n]t’. Coote’s definition of prospect, ‘a sight farre off’, becomes ‘a sight a farre off’ in Cawdrey. Changes like those may indicate more general editorial patterns, but this remains to be researched.

As in Coote, Cawdrey’s Latinate headwords are typed in roman and left unmarked otherwise, but those of French and Greek origin are given markers. The address to the reader mentions further languages, but none are identified. He also uses the marker ‘k’ to mean ‘a kind of something’, as Coote does, although there are only twenty-one of these, mostly in the early letters, and exactly what he meant by this is unclear. There are also inconsistencies, such as cockatrice being defined as ‘a kind of beast’, rather than using the abbreviation. No italics are used in the headwords, and the English glosses and definitions are in blackletter.

Cawdrey spread his net wide in utilising source material. The most used was Mulcaster, Cawdrey employing about 90 per cent of his entries. He also used Thomas Thomas’s Latin-English dictionary of 1587, and lists such as that in Peter Bales’s book on shorthand, The Writing Schoolmaster of 1590, in which the author advises the reader to jot down words which seem hard, and then to practise writing them, another instance of the practical function of wordlists. Some legal terms came from the 1598 edition of John Rastell’s Exposition of Certaine Difficult and Obscure Wordes … of the Lawes of this Realme, a book first published in about 1523 under the title Exposiciones terminorum legum anglorum. Cawdrey also used various glossaries, including that in M. A.’s translation of the Artzneybuch of Oswald Gabelkhover (1539–1616), published in 1598. This includes some very obscure words, such as floscles, pluviatile, and snipperinges, all three of which Cawdrey incorporates. Others include ebullient and ebulliated. Snipperinges is unmarked by Cawdrey but, as with many other headwords, cannot possibly be from Latin, and may have a low German origin, according to the OED. Floscles is from Latin, but Cawdrey marks it as French, an understandable misattribution for the time. Cawdrey’s use of this glossary is heavy. Only thirty-two (28 per cent) entries are omitted, and in several cases another form of the etymon is used. Thus Cawdrey has diurnal, exciccate, menstruous, and puluirisated, for instance, for diurnally, exciccated, menstruosities, and puluers in Gabelkhover.

Cawdrey seems sometimes to abandon the idea that unmarked words are Latinate, listing a variety of others, including akecorne, bequeath (not in Coote), cowslip, float (not in Coote), heathen, lore, and loam (not in Coote) of Old English origin; askey (‘askew’), bob (not in Coote), maffle (‘stammer’; not in Coote) and queach (a dense thicket), all of uncertain origin; brooch (not in Coote), cider, hush (not in Coote), misknow (not in Coote), thwart (not in Coote), and throttle (not in Coote) all from Middle English; grease (not in Coote), gulf, moosell (‘muzzle’; not in Coote), strangle (not in Coote), and troop (not in Coote) from French; moote of Germanic origin, and swain (not in Coote) from Old Norse.

There are some obvious and more subtle changes in what Cawdrey does with Coote’s list. First, he adds a considerable number of entries, mainly Latinate terms. Second, he deletes some, tending to excise archaic terms (or terms in somewhat archaic form, such as diffine, discomfit, enchaunt, enterlace, guerdaine, moitie); familiar words which could not be considered ‘hard’ (anchor, assises, bloud, bonnet, certaine, citie, consent, creede, cupboard, fountaine, moote, and throtle); and words for plants and animals (boare, ciuet, elephant, gourd, hyssope, laurel, and thyme). Legal terms are sometimes deleted, including arraigne, attainder, escheat, nonage, nonsuit, and premunire. On the other hand, it is unclear why expulsion and feminine have been omitted. It is also hard to know, given his stated aims, why Cawdrey would add couch, diet, foraine, globe, glee, hush, or specke. To illustrate his methods, Cawdrey omits mercement, mercie, and mercer in the letter M, replacing them with mercenary. Rather against this trend, he adds the words miscreants, misknow, and misprision between mirror and mitigate.

More typical additions are exclude, excogitate, and excommunicate after exclaime and before excrement; fecunditie and felicitie following fatall; interpellate between intermission and interpretor, and sincere and singularitie after simplicitie. Deletions with medical senses include apothecarie, expulsion, leprosie, maladie, quintessence, scalp, scarrifie, squinancie, and tympanie, while others such as adustion, aliment, emerods (haemorrhoids), infection, and laxatiue are retained. It has already been claimed that very few if any of Cawdrey’s medical terms actually have a definition which is unambiguously medical (McConchie 2012a, 184). The case of medical terms does not seem to suggest a principled procedure on Cawdrey’s part at face value, but further research might indicate that Cawdrey was responding to what he considered ‘hard usual’ and what he did not, a distinction which would be rewarding to make but is hard to pin down. Similar considerations might apply to other technical areas, such as the law. Only research on large-scale word-searchable databases is likely to resolve this question.

The layout adopted by Coote is taken over by Cawdrey by and large, with a few modifications. Coote indicates Latin by not marking these words and French with a pilcrow, where Coote has used italics. Both used either (gr) or (g) for Greek. Coote indicates in a marginal note at the word abricot that ‘k’ means ‘a kind of’. This explanation is taken over by Cawdrey, and inserted into the preamble to the table itself (Bi^r). Other examples include abricot, akecorne, anchoue (a kind of pear), barbell, buglasse, chough, cowslip, emerods, frontlet, lethargie, and pomgranet (which appears to be the last one). The use made of both Coote’s title page and forematter was pointed out by Starnes and Noyes (1946/1991, 13–19). Cawdrey’s debts to Coote also extend to the instructions to the reader on the alphabet, part of which he adopts almost verbatim.

Conclusion

We need not assume that the words in Cawdrey were by and large either recent or neologisms, since many turn out to be well established. The advent of printing in the last quarter of the fifteenth century in England opened up a rich new means of preserving words previously in use but unrecorded in manuscripts. While the large apparent increase in the lexicon of English through the sixteenth century, which is often commented upon, is impressive, it should not be over-stressed, given the rate at which source material also increased. Now, the ready availability of word-searchable texts online only increases the rate at which antedatings are found, and our previous perceptions demand modification.

Despite being rarely mentioned in print later in the seventeenth century, Cawdrey’s work became an important source for the later monolingual dictionaries of John Bullokar and Henry Cockeram, albeit somewhat indirectly. In 1581, the spelling reformer William Bullokar promised to publish a dictionary; he never did so, but his son did in 1616. This may or may not have been the promised work or some version of it. John Bullokar’s dictionary incorporated into its pages rather more than half of the headwords and a little less than half of the definitions in Cawdrey, a matter recently canvassed at some length by Kusujiro Miyoshi (2010, 22). Both lexicographers also relied on Thomas. Miyoshi provides a detailed analysis of the use Bullokar made of the Table Alphabeticall, which shows a close and relatively dependent relationship on the whole, usually as copying and/or abbreviation of Cawdrey’s entries. Although Cawdrey’s dictionary did not outlast the early seventeenth century, Bullokar’s was much more enduring and thus prolonged Cawdrey’s influence, having survived into the eighteenth century. This influence also extended into the third monolingual English dictionary, compiled by Henry Cockeram, which again had a long published life extending to yet another revision by R. Brown in 1719.

The appearance of the first monolingual dictionaries in English represents the confluence of several interconnecting streams in English society and culture. Lexicography as an independent practice was important, but not primary among these. Both Coote and Cawdrey advanced their particular religious and educational agendas in collaboration with the printing industry to produce the impetus for a succession of such dictionaries of steadily increasing weight and scholarly seriousness over the next century and a half. A linguistic vision, however, had little to do with it; indeed, it seems more than a little gratuitous. The hard words tradition also relied to some degree on the medieval tradition of derivation-driven lexicography in that the notion of hard words was concerned with the linguistic potential of the lexicon, as much as with what the lexicon actually consisted of. The conventional notion of copiousness in rhetoric and in lexicography also fed into this impetus to seeking out words otherwise little used or simply conjured up for their own sakes.

Chapter 10 Seventeenth- and Eighteenth-Century English Lexicography

It was not always as easy to learn about words and their definitions as it is today. Typically, all we need to do now is to open up a dictionary (or go online) and read what’s written there. However, dictionaries have not always assumed these now-familiar formats. For instance, a few hundred years ago dictionary-makers did not always place words in alphabetical order, nor were their definitions as clear and detailed as we have come to expect. Early dictionaries did not offer information about word origins, parts of speech, relation to other words, proper pronunciation, or sentences exemplifying usage. Indeed, because dictionaries were so new, authors felt the need to explain how to use them. In order to understand how dictionaries evolved, and to recognise the social and cultural developments that made dictionaries so important, it is helpful to look back to a period when dictionaries and other reference works were just beginning to be standardised. Each period is steeped in its own belief system, and the dictionaries of the seventeenth century are precursors to the monolingual English dictionaries with which we are most familiar. At that time, authors were still looking to the ideas of the Renaissance, and their writing reflected the period’s emphasis on scientific inquiry and experimentation with form, as well as classical topics and translation from the classical languages to English. What that means is that these first dictionaries focused on understanding the world and attempting to make meaning out of it by creating systems and programmes.

Early lexicographers often approached their work empirically, in the manner of natural philosophers; that is to say, they collected words and samples of language, setting them out as a collection of particulars potentially useful for inductively understanding larger patterns of language. An excellent example of this process is by John Ray (1627–1705). Ray is primarily known as a botanist and for his Wisdom of God Manifested in the Works of the Creation (1691); during his surveys of the flora of rural England, he also recorded language and words that were particular to the same regions where he found plants. Ray’s Collection of Words Not Generally Used (1674) is one of the first collections of dialectal variants, featuring words unique to northern and southern counties, away from the linguistically hegemonic region of London. Importantly, he credits colleagues who helped provide his corpus, explaining in the preface: ‘I desired my friends and acquaintances living in several Countreys to communicate to me what they had observed each of their own Countrey words, or should afterwards gather up out of the mouths of the people; which divers of them accordingly did. To whose contributions I must acknowledge my self to owe the greatest part of the words’ (Collection of Words n.p.). He acknowledges that his etymologies have deficiencies, and are possibly incorrect, asserting, ‘I am sensible that this collection is far from perfect … But it is as full as I can at present make it and may give occasion to the curious in each county to supply what are wanting, and to make the work compleat’; as such, he refers readers to other, more learned books such as Stephen Skinner’s Etymologicon Linguae Anglicanae (1671).

Ray catalogues and charts the natural language of people in everyday use, also incorporating information about native birds and fish, cataloguing ‘a splinter, of the same original’, relating a possible connection to Greek. He concludes that others may take up his work and add to it, suggesting that he considers this project both unfinished and collaborative, as true scientific works are. Ray’s entries are short and not always clear, and yet his works introduce readers to the language of people outside the metropole, as well as some specialised language of the natural sciences. Not only does he make his scientific learning generally available, but he provides a way to contextualise it.

Throughout the seventeenth century lexicographers added new words, specialised terminology, and other information to dictionaries. The predecessors to these dictionaries were wordlists or bilingual reference works intended to assist students, scholars, and travellers. Soon, dictionaries became not merely repositories of definitions, but also collections of professional jargon, scientific terms, words from other languages, and cultural, biblical, or historical knowledge. Lexicographers laboured to present and order words taxonomically as their corpora greatly expanded over the course of the century. A significant trend during this time was the introduction of ‘hard words’ into the English lexicon and dictionary. Jonathon Green calls hard words ‘the new buzzword’ of that period, and he defines them as ‘the new scholarly vocabulary, usually drawn from Latin and Greek but also from Arabic and Hebrew, which by then had been infiltrating standard English’. He continues, ‘Hard words were still largely incomprehensible outside the scholarly circles that had taken them from the classics and introduced them to the vernacular, but the difference was that not only were such words seen as a necessary part of English, but for all their scholarly background, there was now a mass market eager and willing to take them on’ (Green 1996, 171–2). These hard-word dictionaries were monolingual, so that by including words that were either new or etymologically different from common English words, the authors were asserting that English had more words than previous lexicographers had added to their dictionaries.

One of the earliest hard-word dictionaries was by John Bullokar (c. 1580–c. 1642), who wrote An English Expositor (1616). In his preface, he is careful to mention that the words to which he introduces his readers, while new to them, are familiar to those whose practices require their use, such as ‘professors of Logicke, Philosophy, Law, Physicke, Astronomie, … Divinitie’. In addition to hard words in use by those in technical fields, Bullokar includes obsolete words, so all readers may understand them when they appear in older works. Bullokar is also one of the first lexicographers to specify that ladies and gentlewomen were among his intended readers. Like many lexicographers who depended on their predecessors for their wordlists, Bullokar uses works by Robert Cawdrey (1604) and Thomas Thomas (1606) as his base; however, he enlarges upon the wordlists. He attempts to achieve regularity in orthography, marks obsolete words with asterisks, and notes in which field or profession particular words are found. Most of the entries are generally a few short lines on a page with two columns, though others are much longer, spanning many sentences, such as the entry for beaver, in which he mixes senses of the word, first mentioning the part of a suit of armour and then moving on to the familiar animal in nature and as it is used in medicine, or Physick:

In Armour it signifieth that part of the Helmet which may be lifted up, to take breath the more freely. It is also a Beast of a very hot Nature, living much in the water. His two forefeet are like the feets of the beast called Gattus, (as Ionnes de Sancto Amando writeth;) but what this Gattus is, I doe not well understand.

Bullokar goes on for several more sentences, finally asserting, ‘The stones of this beast are sold in Apothecaries shoppes, by the name Castoreum: they are much vsed in Phisicke, being very good against palsies and cold diseases of the sinewes: But the skin is of more valew then the stones’ (English Expositor n.p.).

It might strike the modern reader as odd when Bullokar uses the first-person pronoun in discussing his book. Today we tend not to think of dictionaries as the production of one person – or one person with assistants – but this was not always the case. Until relatively recently, dictionaries were compiled by the person whose name appeared on the title page. This reflects the economic realities of the book trade. The named authors often owned the rights to their books, though gradually printers and booksellers paid a flat rate to authors for the copyright, assuming both the cost and the profits of publishing. Bullokar’s dictionary proved to be popular, spawning many other editions (mostly posthumous), as well as expansions, such as R. Browne’s English Expositor Improv’d (1648), The English Expositor or Compleat Dictionary (1663) by ‘A Lover of the Arts’, and others, all the way to 1775.

It is clear immediately that the proliferation of new hard-word dictionaries was directed to an upper-class readership, as the spread of literacy to the middle and lower classes would not occur for almost a century. That the hard-word dictionary was geared towards a mostly elite audience can be seen in the title page of Henry Cockeram’s The English Dictionarie, or An Interpreter of Hard English Words (1623), as Cockeram declares that his book is meant for:

ladies and Gentlewomen, young Schollers, Clarkes, Merchants, as also strangers of any nation, to the understanding of the more difficult authors already printed in our language, and the more speedy attaining of an elegant perfection of the English Tongue, both in reading, speaking, and writing. Being a collection of the choisest words, contained in the Table alphabeticall and English Expositor, and of some thousands of words never published by any heretofore.

Moreover, the dedication is to Sir Richard Boyle, a powerful politician and aristocrat. Cockeram, like Bullokar, clearly gives credit to dictionaries that preceded his, placing his dictionary in a line of respectable works upon which, as he asserts, his own improves. He explicitly credits his predecessors in the first sentence of his ‘Premonition from the Author to the Reader’:

I am not ignorant of the praiseworthy labours which some scholars of deserved memory have heretofore bestowed on the like subject that I have here adventured on; however it might therefore seeme a needlesse taske of mine, to intrude upon a plot of study, the foundation of whose building hath been formerly level’d and laid, yet the Justice of defence herein is so cleare, that my endeavours may be truly termed rather a necessity of doing, than an arrogancie in doing.

He continues to explain to his readers exactly how he went about ordering his words, both those in common use and also those that are of a ‘vulgar’ nature. He even accedes to the current trend of including ‘fustian-termes’ used by those who are a la mode – a term much used at that time to indicate someone or something fashionable. Cockeram explains that he adds at the end of his book a ‘recitall of severall persons, Gods and Goddesses, Giants and Devils, Monsters and Serpents, Birds ad Beasts, Rivers, Fishes, Herbs, Stones, Trees, and the like’ (n.p.). With this list, he inaugurates the tradition of expanding the hard-word dictionary to include more than mere lexical items. In his first alphabetical section, some of his definitions are as scant or vague as those of previous lexicographers: the entry for Abecedarium is simply ‘the crosserow’; others are tautological, as when he states that ‘Grammaticall’ means ‘of, or belonging to Grammar’. On the other hand, some of his entries obviously take for granted that all readers would have some information in common, as in the definition for lignum vitae: ‘A wood much used in Physick against the French disease’. Doubtless, most readers would not need to consult a medical dictionary to recognise the oft-insulting reference to syphilis.

Cockeram’s final section brings together terms that might not otherwise belong in his word entry sections; these are terms he clearly considers more unusual, and culturally marked. Still, he categorises them alphabetically by group and within each group. So, for example, in ‘Of Animals’, he includes crocodile, defining it as ‘a Beast hatched of an Egged, yet some of them grow to great bigness, as much as twenties or thirties foot in length; it hath cruel teeth, and a scaly backed, with very Sharpe claws on is feet; … Having eaten the body of a man, it will weeper over the head … thence the proverb, She shed Crocodile tears’. Entries like this indicate that dictionaries had not yet evolved to the point of separating senses or phrases. Cockeram’s grouping ‘Of Boys’ does not define types of children; rather, it accounts for specific youths in classical literature like Ganymede or Phuong. In this section Cockeram interweaves figures from classical mythology and historical figures, as in the several sections devoted to various types of men who were known for heroic or tragic events, inventions, and other achievements. Cockeram thus places Flavor, ‘who invented the Seaman’s Dial’, right before Oedipus.

As with Cockeram’s English Dictionarie, other dictionaries increased in size and scope, adding content that made them resemble almanacs or encyclopaedias – they contained more than words and definitions, hard or otherwise. Lexicographers added more categories of words and information, and some entries were longer and discursive, explaining, for example, scientific advances or types of military hardware and fortifications. On occasion, a dictionary would include short essays that provided more knowledge than the general reader would typically need. Often dictionaries signalled a different kind of reader or user: one who was adapting to new professions; for instance, clerks of various types, who were suddenly in need of information about the law or other professions, such as supporting noblemen by administering their businesses or estates. Typical title pages advertised the kinds of people who would need to know these terms; Michael Hancher has written extensively on the purported readership of early modern dictionaries, paying particular attention to the categories of information advertised on title pages and provided within. These reference books, with their specialised focus, show the rapid changes in English society and economy that would require people to acquire new terms and apply them quickly, as social roles and professions changed during economic shifts, the first stages of the industrial revolution, and the beginning of the rise of the middle class. These dictionaries, then, were as much about teaching as they were about learning; indeed, many early lexicographers were also school teachers. Unsurprisingly, then, both general and specialised dictionaries, such as Cockeram’s English Dictionarie, Thomas Blount’s ΝΟΜΟ-ΛΕΞΙΚΟΝ: A Law Dictionary (1670), Pierre Danet’s Complete Dictionary of the Greek and Roman Antiquities (1700), acknowledged that the books they wrote would be best used by students, ‘young schollars’, and were often called ‘helps’, so that the reader would be able to enter a new or burgeoning field and be confident that the information contained within would be accurate and useful. To that end, some early lexicographers informed readers, either on title pages or in prefatory front matter, that they had consulted with experts who had provided specialised knowledge, explicitly claiming their works were up to date, accurate, and useful. Many authors, however, only asserted that they had referred to experts, but since they did not provide names, readers would have to trust in the authors’ sources.

In the sixteenth and seventeenth centuries, intellectual communities throughout Europe were creating academies and societies to share information in meetings, letters, and print. As they took up language as an area of study, some organisations had as a key element of their charter the regulation of language and usage. Italy’s Accademia della Crusca (1583), France’s Académie française (1634), and Spain’s Real Academia Española (1713), were each dedicated to preserving and refining their languages, creating standardised dictionaries with the imprimatur of the Crown or a royally appointed governing body of scholars. England’s Royal Society, established in 1660 for the advancement of science and intellectual exchange, and seriously interested in language, was never able to establish an academy for the advancement and standardisation of the language. Their publications, however, in the form of letters or observations in the Philosophical Transactions, pamphlets, and monographs, indicate a keen interest in researching language in the manner of the new science – historically, culturally, and theoretically. In his History of the Royal Society (1767), Thomas Sprat explains that an ‘Assembly’ was needed to refine and fix the language, ‘to have Reason set out in plain, undeceiving expressions’ (p. 41). Members of the Society published on language throughout the later seventeenth and eighteenth centuries; notably, John Wilkins’s Essay Towards a Real Character and a Philosophical Language (1667) advocated for creating an authoritative English dictionary in which words would be ordered according to philosophical tables. To put words in tables meant to order them systematically according to logical principles based on the new science of lexicography. Language, Wilkins insisted, was a topic ripe for study, in the same way that scholars of flora, fauna, and geology were gathering from other fields of knowledge and systemising their findings according to natural philosophy.

Even though a number of scholars in the Royal Society considered forming an organisation devoted to language, one was never created. Nor did the interest of literary figures have much effect, though such influential writers as John Dryden, John Dunton, Daniel Defoe, and Jonathan Swift wrote about the need for English to be regulated by an academy. Had such an organisation emerged out of the interests of the Royal Society and others, the scandals and dictionary wars involving Thomas Blount and Edward Phillips might have been avoided.

When Thomas Blount (1618–79) wrote his hard-word dictionary, the field was already well established, with lexicographers using the words and works of others, generally indicating whose works they had used or quarrelled with, and to whom they were indebted. His Glossographia first appeared in 1656, running to seven editions before the turn of the century. Blount approached lexicography from the perspective of a lawyer with literary and classical interests (he also published a law dictionary in 1670). The title page of Glossographia made clear that Blount would define not only words from Romance, classical, or biblical languages, but others as well, to serve the needs of those who would need specialised terms or jargon belonging to various fields of study or interest. His work was, as he wrote, a

Dictionary, Interpreting all such Hard Words, Whether Hebrew, Greek, Latin, Italian, Spanish, French, Teutonick, Belgick, British, or Saxon; as are now used in our refined English Tongue.

Also the Terms of Divinity, Law, Physick, Mathematicks, Heraldry, Anatomy, War, Musick, Architecture; and of several other Arts and Sciences Explicated.

With Etymologies, Definitions, and Historical Observations on the same.

Very useful for all such as desire to understand what they read.

Whereas earlier lexicographers tended to advertise to scholars, clerks, ladies, or students, Blount’s audience – at least on the title page, and less so in the preface – expanded to include just about everyone. His readers had to be general, as the words he included were both in current use and belonged to numerous professions. What additionally made his work important was his acknowledgment that English was in constant flux; therefore, he chose to exclude words that were either obsolete or were aging out of the lexicon. He admitted that to do otherwise would become a never-ending task, writing that ‘our English Tongue daily changes habit’. If people wished to learn about old, foreign, or specialised words, they could consult other dictionaries, which he names in the preface. Blount likewise names the previous lexicographers upon whom he relied; not only did he use their wordlists and even many of their entries, but also, as DeWitt Starnes and Gretchen Noyes indicate, Blount supplemented his borrowings from other hard-word dictionaries with words drawn from Latin–English dictionaries, such as Dictionarium linguae Latinae et Anglicanae (1587) by the noted scholar Thomas Thomas (1553–88). Thus, while he stated an obligation in general, he did so without providing specifics regarding where, exactly, credit was due.

While Blount was not the first to elide source material, he is largely credited with being the first lexicographer of a monolingual dictionary to attempt, though not always accurately, to insert etymology into his work. Though Starnes and Noyes call some of his word histories ‘fanciful’ (p. 46), he does particularly well with Latinate terms such as those related to the law, his original field. An example of this is his definition of cornage: ‘(from the Lat.), Cornu, a horn in our Common Law signifies a kind of Grand Sergeancy, the service of which tenure, is to blow a horn, when any invasion of the Northern enemy is perceived, and by this many hold their Land Northward, about the wall commonly called the Picts wall’. This kind of entry, enriched by Blount’s knowledge of both Latin and the law, helped make his Glossographia one of the first dictionaries to register how words entered English, as well as to contextualise how they were used professionally. Most of Blount’s entries were succinct and free of tautology, so the field of lexicography could be said to be increasing in rigour, while the knowledge base was growing to the point that there were multiple sources to credit. However, what Blount did not do was to credit sources in the text. For instance, when Blount borrowed language from other dictionaries, as he does from Thomas or Bullokar, he did not mention specifically that he was doing so. That level of qualification and scholarship would have to wait until much later, with Samuel Johnson’s Dictionary of the English Language (1755) and the posthumous revision of Nathan Bailey’s New Universal English Dictionary (1756), in which extensive quotations were attributed.

The work that set off English lexicography’s first serious plagiarism debate was Edward Phillips’s New World of English Words (1658), which followed Blount’s dictionary by only two years. Unlike Blount’s volume, Phillips’s dictionary appeared in folio, and appeared more impressive on the shelf. N. E. Osselton has characterised the volume as unusual, not only for allegedly pirating Blount, but because it was the first to move away from hard words, to proclaim its encyclopaedic scope, and to expand its front matter. Phillips provided an essay on the history of the English language, instructions on how to use the dictionary, and a statement of the purpose the dictionary serves as a work of social and national import. With his front matter, Phillips inaugurated a more vigorous professionalism and prescriptivism than earlier lexicographers had manifested. For example, Osselton notes that Phillips indicated that certain words were not acceptable, writing, these ‘are prefixed by a dagger symbol as a warning to the dictionary user that those are not acceptable words in English. Altogether 95 words are thus marked in the dictionary as “Pedantisms”’ (2009, 144).

Scholars of dictionaries concur that Phillips generally did not provide much in the way of new material or original entries, but that it was the organisation and structure of his work that was important. The New World of English Words might not be known as very special at all except as the latest in lexicographical innovations, had Phillips’s predecessor and competitor Blount not taken offence at what might have been understood as the largely accepted tradition of borrowing another lexicographer’s word base. The problem that Blount seems to have found in Phillips’s dictionary was not so much that Phillips borrowed extensively – Blount himself borrowed from others before him – and not that it was published so soon after Blount’s, but that Phillips ostentatiously announced on the title page that his work was the product of extensive consultation with sundry learned gentlemen, named and unnamed. It is true that Phillips’s title page is a study in self-aggrandisement, densely packed with claims about the expanded fields of knowledge for which he and his consultants provided terms; the encyclopaedic scope of the work, and the wealth of etymologies and derivations from other languages. Moreover, he dedicated his dictionary to ‘the Most-Illustrious, and Impartial Sisters’, Oxford and Cambridge, implicitly asserting his place in a pantheon of scholars and intellectuals. At the same time, he neglected to mention many earlier lexicographers upon whose works he drew; this conduct went against typical protocol of naming predecessors to whom one is indebted. Granted, other lexicographers also used words and definitions from others, but they mostly acknowledged and thanked predecessors. Phillips does, however, mention John Minsheu’s Hegemon eis tas glossas, id est, Ductor in Linguas, The Guide into Tongues (1617); Minsheu was one of the few lexicographers who protected his work by taking out a patent. Phillips also openly took up Blount’s ‘challenge’ in the preface to the Glossographia, that ‘to compile a Work of this nature and importance, would necessary require an Encyclopedie of knowledge, and the concurrence of many learned Heads; yet, that I may a little secure the Reader from a just apprehension of my disabilities for so great an Undertaking, I profess to have done little with my own Pencil’; Blount then lists several scholars whose work he has consulted, and concludes by hoping ‘I have taken nothing upon trust, which is not authentick’.

Therein seems to lie the crux of the dispute: whether what Phillips used was ‘authentick’ or not. Responding to the perceived infractions by Phillips, Blount produced the pamphlet A World of Errors Discovered in the New World of Words (1673), which takes to task all of the supposed errors of fact and omission. Blount wrote that Phillips had ‘extracted almost wholly out of mine’ numerous words and entries, which he lists, along with errors of printing and minor faults, in excruciating detail. At the heart of the debate over whether Phillips plagiarised Blount is how plagiarism was defined and understood at that time. Richard Terry has written about literary theft using Blount’s own term, plagiary, which refers to when someone takes and brings to fruition work that was actually created by another (2007). The term was subject to semantic broadening, not unlike the case of the word rape, initially indicating a crime against property but coming to include metaphorically stealing something immaterial, but of deep value. The term plagiarism was used by Randle Cotgrave and then Blount to first mean ‘One that steales, or takes free people out of one countrey, & sells them in another for slaues; a stealer, or suborner of mens children, or seruants, for the same, or the like, purpose; […] also, a booke-stealer, or booke-theefe; one that fathers other mens workes upon himselfe’. To understand, then, how plagiarism became a crime against one’s words, it is helpful to consider literary works as intellectual property. Terry, however, shows how our modern sense of using words or text without attribution has changed over time, so now ‘plagiarism is associated not with an irregularity in how a work has been composed or put together but with the fabrication of a false claim of ownership over a work’ (2007, 9). This must be what Blount meant when he wrote of a ‘book thief’, protesting a crime against literary property. When one goes to Blount’s definition of authentick, it becomes easier to understand the objection: he defines authentick as ‘that which is allowed, or hath just authority, the original’. Considering that there were few ‘originals’ in the wordlists of lexicographers, and considering how difficult it is to define words succinctly without repeating many of the same words, the claim of being ‘original’ is at best disingenuous.

At their most basic, dictionaries are tools for literate people to impose order on language, to create meaning, to systematise knowledge, and to teach users about the world of ideas. The seventeenth century was an especially fertile time for dictionary-making because the inability to create a legislative academy meant that lexicographers had more freedom in what was contained in their works, and that the criteria for inclusion were left up to the author and his – and later her – particular objectives. The result of authors having more autonomy was that they could develop theories, adapt each other’s works, and affect the theories and development of the English language in a more bottom-up process than would have occurred otherwise. So, if Ray shared information about dialectal variations; or Blount provided information about English loanwords from other languages and what their roots were; or Phillips elaborated on the history and development of English grammar, then they all could do so. This authorial and editorial freedom extended to the determination of just who could be a lexicographer as well, and as it happened, as in the case of Blount, several early lexicographers came either from minority Protestant sects or were Catholic. Thus, writing dictionaries and other reference books or school books meant that they had available to them not only a profession, but a way to influence the language – as well as the ideology – of the nation. People looked to reference books for assistance; as dictionaries developed structure, organisation, and reliability, they became essential in teaching the newly literate populace not only what kinds or words were useful in what kinds of circumstances, but also how best to pronounce those words. It is not an accident that John Seargent, in a laudatory poem following Blount’s preface, wrote about the developing field of lexicography as a matter of ‘fixing’ what had been done to the English language:

Since all Science first from Notions springs,

Notions are known by Words; there’s nothing brings,

Then [sic] treating these, to Knowledge more advance,

Held Pedantry by witty Ignorance.

In fine, what’s due t’ industrious observation,

And re-acquainting our self-stranger Nation

With its disguised self; what’s merited

By rendring our hard English Englished;

What, when our Tongue grew gibberish, to be then

National Interpreter to Books and Men;

What ever praise does such deserts attend,

Know, Reader, ’tis thy debt unto my Friend.

Dictionaries soon became an invaluable tool in the search for a kind of national normativity with respect not only to spelling, but also to reading, grammar, and even speaking. Dictionaries and other books like them attempted to help people gain a more copious supply of words, to use them properly, and even to regulate their accent based on what was considered ‘best’. They would come in handy as the British Empire grew, people migrated to the cities, and education became more accessible. Thus, the dictionary was a tool for the people and a levelling instrument, but it was likewise a tool for controlling people by determining what and how they accessed their own language.

Table 10.1 Prominent seventeenth- and eighteenth-century English dictionaries, and the texts upon which they were based

Dictionary title	Date	Base dictionary	Author/Editor
A Universal Etymological English Dictionary	1733, sixth edition	Kersey	Bailey, Nathan
Glossographia, or a Dictionary, Interpreting all such Hard Words, …	1656	Thomas; Holyoke; Bullokar	Blount, Thomas
An English Expositor, Teaching The Interpretation Of The Hardest Words Vsed In Our Language	1616	Cawdrey; Thomas	I. B. [Bullokar, John]
The English Dictionarie, Or, An Interpreter of Hard English Words	1623	Cawdrey; Bullokar	Cockeram, Henry
The New World of English Words, Or, a General Dictionary	1658	Bullokar; Cockeram; Blount	Phillips, Edward
A Collection of English Words Not Generally Used	1691, second edition		Ray, John

Eighteenth-Century English Dictionaries: Prescriptivism and Completeness

Chapter 11 Recording the Most Proper and Significant Words

In attempting to understand the lexicography of the eighteenth century–or indeed the history of lexicography in any century, including our own–the following questions are among those one should ask: 1) what is a dictionary’s intended audience, and what needs is it trying to meet; 2) what kinds of words does it include (i.e. is it expansive or restrictive), and what criteria does it bring to bear in its selections; 3) is there a pronunciation component, or an emphasis on spelling; 4) is the dictionary prescriptive and proscriptive, or is it more firmly descriptive, based on usage; 5) is the dictionary concerned with assisting foreign learners; 6) does the dictionary indicate grammar; 7) to what extent is the dictionary encyclopaedic, and to what extent is it restricted to information about words; does it include pictorial illustrations; 8) does the dictionary include attention to spoken as well as written language; 9) do the entries include multiple definitions of words; 10) is the wordlist illustrated or explained by literary or other written examples?

These are not merely technical inquiries or neutral questions; rather, the answers provide insights into the intentions of lexicographers as well as the nature and demands of their users. The dictionaries of the eighteenth century reflect the concerns of a rapidly growing population and increasing commerce, and eventually, industrialisation. The spread of contacts and culture increasingly demanded new sources of information – especially linguistic information, but also many other kinds. More people meant contact with more foreign languages, as well as more regional variations of English. Increased opportunity for social and geographical mobility also induced uncertainty. What, people wondered, were the standards for correct, cultured speech? How should one write English with correct, prescribed spelling? In other countries, linguistic academies had established standards for various aspects of their respective languages through national dictionaries; in England, there was no linguistic academy to issue such authoritative standards through a dictionary. The Act of Union, signed in 1707, creating a united England and Scotland (inhabited by ‘North Britons’), produced new concerns for uncertain speakers of English in the new nation. As more people spoke and wrote English, varieties of English seemed to multiply; recognition of the need for guides to speaking, writing, and the meaning of words and their grammatical uses proceeded coextensively with the expansion of the country and its culture. And concerns for correctness sometimes rested uncomfortably with the counter-desire for and importance of comprehensiveness in describing existing examples of English.

How did the lexicographers of the eighteenth century fit into these broader concerns? The earlier tradition in English lexicography, the so called ‘hard-word’ tradition of providing meanings for foreign words, chiefly Latin, which had found their way into English, soon evolved to meet other concerns after the turn of the eighteenth century. Users turned to dictionaries for instruction in other aspects of language besides hard words, such as the use of common words, pronunciation, and spelling. The attention to common English words begins to make these early eighteenth-century dictionaries more recognisable to modern users. In 1702, in fact, in a dictionary by ‘J. K’ (probably John Kersey, who would produce other dictionaries in 1706 and 1708), we can see attention paid to ordinary words as well as unfamiliar ones, the common words having been taken from bilingual dictionaries in which English was the basis: bilingual dictionaries indicated to the lexicographer what words were necessary for the learner and the common reader (Osselton 2009, 149). In his later dictionaries, Kersey expanded the attention to ordinary language and even recorded dialect words. Bilingual dictionaries with wide and diverse readerships (for example, those of Abel Boyer and Robert Ainsworth), opened up the monolingual tradition to new challenges and possibilities, such as the treatment of current usage in a variety of practical circumstances (Cormier 2009, 84–5). These early eighteenth-century dictionaries became much more useful to people of all manner of livelihood and ability. Kersey’s 1708 Dictionarium Anglo-Britannicum advertised itself as ‘Compil’d, and Methodically Digested, for the Benefit of Young Students, Tradesmen, Artificers, Foreigners, and others’ – a learned dictionary compressed into ‘a portable Volume, which may be had at any easie Rate’.

Nathan Bailey’s dictionary of 1721, An Universal Etymological English Dictionary, raised the standard for comprehensiveness in a dictionary (it had 40,000 entries) and emphasised the importance of etymology for understanding the meanings of its headwords. The definitions, however, were simple and sometimes vague, and rarely took into consideration multiple meanings of words. Nevertheless, the information included in the dictionary ranged widely from mythological to geographical, from technical to mechanical to legal. The inclusion of such information not specifically linguistic is usually referred to as ‘encyclopaedism’, and was a characteristic of English dictionaries of an age previous to the eighteenth century. In the editions of this dictionary as well as Bailey’s 1730 (revised 1736) folio Dictionarium Britannicum, the wordlist and its pages contained information and illustrations of all sorts, especially extensive technical terms and explanations. Bailey relied heavily on Ephraim Chambers’s Cyclopaedia (1728) for the Dictionarium Britannicum. This dictionary would treat approximately 48,000 entries, and nearly 60,000 in the second edition (Osselton 2009, 151–2). In their various editions and formats (versions of An Universal Etymological English Dictionary alone would go through over thirty editions in the eighteenth century), Bailey’s dictionaries were the most popular of all eighteenth-century dictionaries.

But Bailey’s emphasis, it might be said, was only secondarily linguistic. His dictionaries were concerned with the distribution of information from a variety of published sources; his interests lay in presenting facts, not in establishing good English or in explaining the English language that existed. Definitions of headwords in both of Bailey’s dictionaries were particularly curt and limited. His dictionaries paid nearly no attention to polysemy (multiple meanings of words) or phrasal verbs (verbs made up of a verb and either an adverb or preposition to form an idiomatic phrase, very common in English, such as to put on or to help out); grammar and pronunciation were scarcely treated. Whatever their accomplishments in aiding comprehension and mediating knowledge, Bailey’s dictionaries remained clearly limited in their linguistic and lexicographic functions.

In the early eighteenth century, the collective demands for a new authority on the language – issuing from the general population, experts, and publishers – gathered strength. The belief that the language was disordered and its standards uncertain had been growing since the establishment of the Royal Society’s committee to monitor and reform the English language in 1664, and various authors and experts had offered their voice for the establishment of an authority on the language. Some argued for the creation of an English Academy along the lines of the French, Italian, and Spanish academies: its function would be to compile an authoritative dictionary that endeavoured to preserve the language from disarray and perceived (further) dissolution. Usually couched in terms of regulation and ‘fixing’ (in the sense of correcting as well as freezing against change), these calls stressed the need for a prescriptive and normative authority, especially at a time of increasing contact with other populations, both internationally and nationally. The language was thought to have picked up enormous quantities of loan words, especially from French, which altered the very substance of the ‘English’ language. Samuel Johnson would write in the preface to his 1755 Dictionary of the English Language: ‘we have long preserved our constitution, let us make some struggles for our language’. Dynastic concerns of nationhood coincided with concern over the language: it is not difficult to understand the worry, growing into consensus, that Britain, and the English language, were falling into an uncultivated and increasingly primitive condition.

Finally, in 1746, a group of professional publishers (‘booksellers’, as they were called) recognised not only the need but the financial opportunity of publishing an authoritative dictionary, and turned to a capable but little-known writer in London, Samuel Johnson, to produce it. The story of what followed illuminates crucial aspects of the famous dictionary he eventually produced, as well as the role of a dictionary in mid-eighteenth-century Britain. Most students of eighteenth-century letters and lexicography may understandably assume that Samuel Johnson took on the writing of the most famous early English dictionary because he was already established as a great writer, critical authority, and man of letters; however, when he began his work, he was virtually unknown, having not even published anything under his own name. Johnson had shown himself to be a useful writer of journalism, prefaces, and various short works. Given the task of writing a proposal for a dictionary that would ‘fix’ or ‘delimit’ the language (to stop it from decay and expanse beyond appropriate borders), Johnson published his Plan of a Dictionary in 1747, addressed to Lord Chesterfield, Secretary of State, a public person (and potential patron) with well-known interest in the ‘proper’ use of language.

The Plan expressed both the need to control the untended profusions of the language and the author’s confidence in his own ability to do so. Relying upon previous dictionaries, especially Bailey’s Dictionarium Britannicum and Abel Boyer’s and Robert Ainsworth’s bilingual dictionaries, Johnson outlined the structure and method for constructing his own text. Identifying the process of defining as the most difficult of the lexicographer’s tasks, he studied the treatment of polysemy in these dictionaries (and probably consulted John Locke’s discussion of the use and meaning of individual units of language in Book III, ‘Of Words’, in his Essay Concerning Human Understanding) and developed his own system for defining. For words with multiple definitions or ‘explanations’ (which had hardly been considered in previous English monolingual dictionaries), he proposed the following seven categories: 1) the primitive or natural sense (that sense closest to the etymon); 2) the consequential or accidental; 3) the metaphorical; 4) the poetical (‘where it differs from that which is in common use’); 5) the familiar; 6) the burlesque; and 7) the peculiar sense as used by a great author. These categories begin with that closest to the etymological root (the original meaning) and extend from there, each more distant from its root meaning. This strategy provided him with a systematic and coherent structure on the basis of which he could order and prescribe proper English usage. He would incorporate examples from English writers for the various categories to provide evidence and authority for the definitions within this coherent structure. Any example of English usage he would encounter in printed sources, Johnson seems to have believed, would fit within this scheme for multiple definitions. Illustration would be found for most or even all definitions (Reddick 1990, 25–54).

The Plan was intended to construct Johnson’s authority, and it succeeded in publicising an eloquent, serious, and plausible blueprint for the work. Various prominent figures in England, both privately and in print, are recorded as having followed the work and checked its progress. The author employed several assistants, probably eventually six in all, to help with various tasks, including finding and copying quotations from printed sources. Johnson apparently predicted that the work would take three years; in fact, it was nine years before the book was published in 1755. What happened?

Textual evidence, as well as evidence from extant manuscript fragments, suggests that Johnson’s work was critically disrupted at a stage when the practical exigencies of constructing the lexicon coincided with a crisis of, and change in, Johnson’s lexicographic philosophy. To begin his work, Johnson seems to have relied on a copy of the second edition (1736) of Bailey’s Dictionarium Britannicum to organise his material, at first possibly writing his own additional material and changes on interleaves of Bailey’s book, to be later copied into notebooks. His procedure involved searching for examples of word usage in written literature and other kinds of texts, marking various instances of each word for the assistants to copy out onto slips. He claimed that he had sought examples of word usage in ‘the wells of English undefiled’ (preface) – that is, from the time of Sir Philip Sidney in the latter half of the sixteenth century up into the period of the Restoration of the monarchy, culminating with the crowning of Charles II in 1660 after the end of ‘the Commonwealth’ of Oliver Cromwell and his son. In practice, Johnson’s chronological boundaries were somewhat wider than this. He defended the decision to limit his sources in this way by asserting that it was the period in the development of the English language that came after ‘a time of rudeness antecedent to perfection’, yet before the language was considerably influenced by French (‘gradually departing from its original Teutonick character, and deviating towards a Gallick structure and phraseology, from which it ought to be our endeavour to recall it, by making our ancient volumes the ground-work of stile’).

But Johnson discovered that he could not impose an a priori model of word usage onto what he would call ‘The boundless chaos of a living speech’ (preface). He was overwhelmed by the multiplicity and sheer number of his ‘authorities’, as well as the unmethodical, often contingent variations in usage exemplified in his sources. The sources he gathered simply did not fit neatly into his categories of determining definitions. Manuscript evidence suggests that the text Johnson and his assistants were trying to build became unwieldy and overfilled. The confidence of the Plan is exploded in the ‘Preface to the Dictionary’ seven years later, when Johnson expands on the difficulties of establishing or fixing the language according to previously established categories. The ‘Preface’ articulates, as the manuscript and other archival material that still remains exhibits, that Johnson found that the written sources he consulted offered such a range of language (not only individual words but individual uses and phrasal combinations) that he had to rely upon his sources to supply his wordlist and the content of his entries. Rather than serving as simple illustrations of the definitions, the quotations from ‘authorities’ became the groundwork upon which the remainder of the dictionary was constructed. Clearly, this marked a crucial moment in the shift from prescriptive lexicography (prescribing correct usage according to an established standard) to descriptive (describing the language as it is used, without judgements concerning propriety or correctness). ‘[I]t must be remembered’, Johnson wrote in the preface (referring to English verbs), ‘that while our language is yet living, and variable by the caprice of every one that speaks it, these words are hourly shifting their relations, and can no more be ascertained in a dictionary, than a grove, in the agitation of a storm, can be accurately delineated from its picture in the water’.

Johnson’s doubts concerning an authoritative dictionary are articulated in the preface to the Dictionary in a profound meditation upon language, lexicography, and (often vain) human endeavour. He acknowledged that he had earlier hoped that he ‘should fix our language, and put a stop to those alterations which time and chance have hitherto been suffered to make in it without opposition’; yet such efforts are doomed and foolish: ‘to enchain syllables, and to lash the wind, are equally the undertakings of pride, unwilling to measure its desires by its strength’. He had originally intended to ‘ransack’ all literature and produce a book to take the place of all other reference books, but this proved to be ‘the hope of a poet doomed to awake a lexicographer’. In fact, the task overwhelmed Johnson’s confidence in completeness of system and coherence of approach. The preface records his radical disillusionment with prescriptive lexicography:

Those who have been persuaded to think well of my design, require that it should fix our language, and put a stop to those alterations which time and chance have hitherto been suffered to make in it without opposition. With this consequence I will confess that I flattered myself for a while; but now begin to fear that I have indulged expectation which neither reason nor experience can justify. When we see men grow old and die at a certain time one after another, from century to century, we laugh at the elixir that promises to prolong life to a thousand years; and with equal justice may the lexicographer be derided, who being able to produce no example of a nation that has preserved their words and phrases from mutability, shall imagine that his dictionary can embalm his language, and secure it from corruption and decay, that it is in his power to change sublunary nature, or clear the world at once from folly, vanity, and affectation.

Johnson necessarily re-orients and re-defines the role of the lexicographer, as one of those ‘who do not form, but register the language; who do not teach men how they should think, but relate how they have hitherto expressed their thoughts’.

Johnson’s reluctant shift from a prescriptive to a descriptive approach was certainly not absolute, however, and his Dictionary retains aspects of prescriptivism. For example, while usually accepting the evidence of usage provided by his authorities, Johnson also stigmatises examples he considered ‘a low word’, ‘a bad word’, or ‘a cant word’, as well as those he deemed ‘obsolete’,’ ‘a word proper, but little used’, or ‘not in use’, despite it being supported in examples by authorities. He frequently criticises authors, even William Shakespeare, John Milton, and Alexander Pope, for using a particular expression in an improper way. To some extent, the struggles outlined in the preface between desires for stability and order, on the one hand, and the inevitability of change and disorder, on the other, produce contradictions throughout and across the entries in Johnson’s work. While elements of prescriptivism are found throughout the dictionary, at least as often one finds the criterion of word use is more important than predetermined criteria, such as etymology, semantic rules, or analogy. While Johnson recognised the importance of etymology for establishing proper spelling and settling disputes of meaning, his enthusiasm for etymology as the determinant of multiple meanings expressed in his ‘Plan’ was often abandoned in the body of the Dictionary. In this, Johnson departs dramatically from Bailey’s sense of the power of that science.

While recording written rather than spoken language is selective and potentially elitist, Johnson’s reasons appear to be largely practical (how would he record all the examples of spoken English?) rather than judgemental. And despite the claim on the title page of the Dictionary that his words are ‘illustrated in their different significations by examples from the best writers’, Johnson did not necessarily seek examples from ‘the best writers’: some of the illustrations are ‘extracted from writers who were never mentioned as masters of elegance or models of stile; but words must be sought where they are used’.

Johnson quotes works of poetry and prose, theology and philosophy, history and politics, philology and art history – not to mention technical works and special subjects, such as plants, coins, and printing technology. The specialised sources introduce a sometimes overtly encyclopaedic quality to parts of the work, not surprising since the sources themselves were often encyclopaedias. While not as obviously encyclopaedic as Bailey, Johnson tended to use encyclopaedic sources for words dealing with complicated artefacts, natural objects and phenomena, human institutions, professions, and fields of learning, even in some cases allowing the borrowed passages to serve as definitions (Lynch 2005, 137; Stone 2005).

Johnson’s definitions usually, though not always, went far beyond those of his monolingual English predecessors, whose definitions often consisted of synonyms, with only a bare reference to genus (i.e. superordinate, superset, or blanket or umbrella terms). He frequently defines by genus and both descriptive and functional differentiae, such as the following definition of desk: ‘An inclining table for the use of writers or readers, made commonly with a box or repository under it’ (Stone 2005, 155). Johnson would provide over 1,500 definitions verbatim for the OED. Its editor, James Murray, singled out Johnson as having ‘contributed to the evolution of the modern dictionary’ by ‘the illustration of the use of each word by a selection of literary quotations, and the more delicate appreciation and discrimination of senses which this involved and rendered possible’ (Murray 1900, 116).

Johnson’s Dictionary surpassed the aims and achievements of other dictionaries of his day, becoming what we may consider the first modern dictionary of English, combining the best features of current lexicography and introducing new ones. The presentation of Johnson’s Dictionary, in two folio volumes, with red-letter printing of the name of ‘Samuel Johnson, A.M.’ on the title page, and a lengthy, personalised preface, sets him up as the personal authority in a way that English lexicography had never seen. Johnson’s preface, as well as the title page, established the Dictionary as the first in English by an author, an identifiable writer who used his aims and intentions as the ‘author’ to mediate the linguistic information of the original work.

The unprecedented lengthy preface to the Dictionary, as well as the ‘Grammar of the English Language’ and ‘History of the English Language’, raised the stature and authoritative quality of the work.

One remarkable element – new in English lexicography, though previously introduced in the Vocabolario of the Accademia della Crusca and followed in other national monolingual dictionaries – was the incorporation of thousands of literary and other written quotations as ‘authorities’ or illustrations of language usage. Johnson’s increasing reliance upon empirical written evidence of usage in constructing the wordlist and ‘explanations’ was new, as we have seen. His attention to historical usage and development was noteworthy. And the unprecedented extensive treatment of polysemy (multiple meanings of words in different contexts) represented a dramatic advance in English lexicography (though Boyer’s bilingual dictionary and Benjamin Martin’s dictionary had expanded English lexicographical efforts in this direction, and the great Italian dictionary had also addressed multiple uses of headwords). Johnson relied on bilingual lexicons as prototypes for his heroic treatment of phrasal verbs (such as to put on and to put out), which, though they are particularly problematic for English users, his monolingual lexicographer predecessors hardly even mentioned. Under the intransitive verb fall, for example, Johnson provides twenty-eight instances of phrasal combinations, such as fall away or fall back; under the transitive verb take, he lists fifty-three, including take off, and two separate listings for take up (sense 104, ‘To engross; to engage’, and 114, ‘To adopt; to assume’, each illustrated by ten quotations from authorities). Bailey, by comparison, lists not one under take. Finally, the intertextual aspect of Johnson’s Dictionary, in which he incorporates quotations from thousands of written sources, emphasises uniquely in English history the relation of Johnson’s lexicography to the world of letters (Johnson was a great critic, of course).

Several editions of Johnson’s folio Dictionary, including the heavily revised fourth edition of 1774, continued to appear during Johnson’s lifetime and afterwards until the end of the century. A handy abridged quarto edition in two volumes, published soon after the first edition of the folio in January 1756, appealed to a much wider audience. New editions of Bailey’s various dictionaries were strong competitors throughout this time. Significant lexicographic innovations during these decades were few, however. Yet increasing attention began to be paid to one area of lexicographic responsibility into which neither Johnson’s nor Bailey’s dictionaries reached, either descriptively or prescriptively: that of pronunciation. In general, where there was disagreement, Johnson preferred the version of pronunciation closest to the orthoepy (spelling), but otherwise, did not attempt to promote any version as ‘correct’ – a sign, I would suggest, of his limited concern with standardisation and attendance to a ‘correct’ language use related to class or education.

Nevertheless, concerns over correct pronunciation are discernible in the early eighteenth century, largely driven by Scottish, Irish, and provincial English concerns of not being able to speak English English. Early dictionaries, starting with Thomas Dyche’s A Dictionary of All the Words Commonly Used in the English Tongue of 1723, begin using accent (‘stress’) marks for their entry words. Bailey’s edition of his Universal Etymological English Dictionary of 1727 (called, confusingly, ‘Volume II’) comprehensively incorporates accentuation, a feature all dictionaries, including Johnson’s, afterwards used. These lexicographers acknowledged that their ‘common’ readers – those who learned to read outside of traditional boys’ public schools – would have little knowledge of Latin and so would not know where to place the stress in polysyllabic Latinate words (Beal 2009, 150). Dyche and Pardon’s New General English Dictionary (1735) found its niche ‘for the Use and Improvement of such as are unacquainted with the Learned Languages’ (title page). The burgeoning reading public, from a variety of backgrounds, needed help with any uncertainty of pronunciation.

By the middle of the century, actual ‘pronouncing dictionaries’ began to appear. James Buchanan’s Linguae Britannicae Vera pronuntiatio (1757) introduced phonetic equivalencies and other tools (indicating the pronunciation of sounds) for users to voice the ‘correct’ pronunciations. In his preface, Buchanan (a Scot) unabashedly announced that his efforts were intended to help those ‘north Britons’ suddenly expected to speak English (for some, nearly a foreign language) in such a way that it would not sound like a regional dialectical variant. Other dictionary-makers (from the provincial areas under concern) took advantage of this need and the nervousness of inadequacy behind it: William Johnston’s Pronouncing and Spelling Dictionary (1764), for example, promised to provide the pronunciation of sounds ‘which we would all along keep in view, … that pronunciation of them, in most general use, amongst people of elegance and taste of the English nation, and especially in London’ (‘Introduction’). The 1770s saw even more pronouncing dictionaries, especially guides to explicitly ‘proper’ and ‘polite’ pronunciation, fuelled in part by upward mobility among the population, including and especially those from Scotland, Ireland, or provincial England. The guides to pronunciation became more sophisticated; improvements included increasingly advanced forms of notation (Beal 2009, 156).

For example, William Kenrick’s New Dictionary of the English Language (1773) and especially Thomas Sheridan’s General Dictionary of the English Language (1780) employed sophisticated annotations for pronunciation, including superscripted numbers marking vowel sounds and italics marking consonantal distinctions, to enable users to use ‘true pronunciation’ (Beal 2009, 160). These dictionaries, and Sheridan’s in particular, were very much prescriptive, designed to ‘fix’ pronunciation according to ‘polite’ and ‘correct’ standards. An Irishman himself, Sheridan wrote ‘not only for foreigners, but … Provincials … [i.e.] all British Subjects, whether inhabitants of Scotland, Ireland, Wales, the several counties of England, or the city of London, who speak a corrupt dialect of the English tongue’ (quoted in Beal 2009, 167). There was no doubt as to the desired standard and the social disgraces faced by ignorant speakers.

John Walker’s Critical Pronouncing Dictionary of 1791 was the most successful commercially; the fact that it remained influential throughout the next century indicates how stubborn the nervousness and aspiration of the reading and speaking public really was. Walker offered methodical and systematic explanations, providing evidence and the impression of authoritative knowledge and power (his Pronouncing Dictionary was prefaced by over five hundred detailed rules he called ‘principles’, keyed to dictionary entries for which the usages were unclear, varied, or debated). Readers were eager for clear, authoritative rules for correct speech. All provincial pronunciations were to be corrected. National and regional variation was discouraged, indeed suppressed: ‘Regional variants were viewed as “defects” or “mistakes”, which could be corrected by the purchase of a pronouncing dictionary written by somebody familiar with the educated speech of London’ (Beal 2010, 32). Standardisation became codified, and guides to ‘correct’ speech proliferated. As the eighteenth century moved into the nineteenth, normative approaches to the English language were stronger than they had ever been; the nineteenth century would develop its own kind of conformity to ‘polite’ standards of speaking and writing.

Summing up his assessment of seventeenth- and early eighteenth-century lexicography, Noel Osselton states:

In one way or another, the works of the early lexicographers [up to but excluding Johnson] thus came to incorporate much of what we should expect to find in monolingual English dictionaries today. But pronunciations (beyond mere word-stress), the meaning of compound nouns, set collocations, phrasal verbs, particles, abbreviations, idiomatic expressions (other than proverbs), irregular plurals, all kinds of grammatical information – anything like a systematic coverage of these was to be for future generations of dictionary-makers.

(Osselton 2009, 154)

Johnson addresses several of these concerns, most spectacularly and brilliantly, phrasal verbs. The pronouncing dictionaries focused on pronunciation, expanding the treatment of phonetics and proper speech. Increasingly, the dictionary was becoming modern.

Table 11.1 Key eighteenth-century dictionaries of English (including bilingual and pronouncing dictionaries)

Dictionary title	Date of first publication	Compiler/author
Royal Dictionary (French and English; English and French)	1699	Abel Boyer
A New English Dictionary	1702	J. K. (probably John Kersey)
Edward Phillips’s The New World of English Words, revised	1706	John Kersey
Dictionarium Anglo-Britannicum	1708	John Kersey
An Universal Etymological English Dictionary	1721	Nathan Bailey
A Dictionary of All the Words Commonly Used in the English Tongue	1723	Thomas Dyche
An Universal Etymological English Dictionary, Volume II	1727	Nathan Bailey
Cyclopaedia, or an Universal Dictionary of Arts and Sciences	1728	Ephraim Chambers
Dictionarium Britannicum	1730 (revised 1736)	Nathan Bailey
A New General English Dictionary	1735	Thomas Dyche and William Pardon
Thesaurus Linguae Latinae compendiarius (English–Latin; Latin–English)	1736	Robert Ainsworth
Lingua Britannica Reformata	1749	Benjamin Martin
A Dictionary of the English Language	1755 (fourth revised edition 1773)	Samuel Johnson
Linguae Britannicae Vera pronuntiatio	1757	James Buchanan
Pronouncing and Spelling Dictionary	1764	William Johnston
New Dictionary of the English Language	1773	William Kenrick
General Dictionary of the English Language	1780	Thomas Sheridan
Critical Pronouncing Dictionary	1791	John Walker

In the nineteenth century, lexicographers found it necessary to respond to or augment Johnson’s authority as the arbiter of the English language. Joseph Worcester’s dictionaries in the mid-nineteenth century were built on Johnson’s general principles. Charles Richardson and Noah Webster engaged in angry rebuttals of his models and authority. James Murray’s dictionary, the Oxford English Dictionary, was originally called the New English Dictionary; it was conceived in 1857 as ‘a volume supplementary to … Johnson, or to Richardson, and containing all words omitted in either of these dictionaries’. It remained indebted especially to Johnson’s practice of selecting illustrative quotations (Reddick 1996, 176). In short, English lexicography advanced in dramatic and profound ways in the eighteenth century and set the stage for the great innovations of the dictionaries of the nineteenth, twentieth, and twenty-first centuries.

Chapter 12 Samuel Johnson and the ‘First English Dictionary’

If the public knows one thing about the history of English dictionaries, it is that Samuel Johnson’s Dictionary of the English Language (1755) is the ‘first English dictionary’. A quick search of databases for ‘Johnson’ and ‘first English dictionary’ will turn up hundreds of results in newspapers, magazines, and online sources. In 2005, on the Dictionary’s 250th anniversary, one helpful correspondent to the Montreal Gazette clarified things for less enlightened readers: ‘I should explain that the Dr Johnson mentioned above was neither paediatrician nor urologist, but the author, among other things, of the first Dictionary of the English Language’.

So says the myth – but the myth is wrong. English had a printed monolingual dictionary a full century and a half before Johnson’s and, while there are many ways to count dictionaries, none of them puts Johnson’s first. The English Short Title Catalogue records 811 English printed books published through 1755 with the word ‘dictionary’ in the title; Robin Alston’s monumental bibliography says Johnson’s is the 177th printing of a general monolingual English dictionary; if we exclude subsequent editions and reprints, looking only at the first printing of each title, it is the twenty-first general monolingual English dictionary.

Some of the commentators have tried to qualify ‘first’. Lord Macaulay, for instance, in 1856 called Johnson’s Dictionary ‘the first dictionary which could be read with pleasure’. More recently, linguist David Crystal called it ‘the first attempt at a truly principled lexicography’. Some people in the know have tried to make the paradoxical claim that Johnson’s really was the first English dictionary through a witty and paradoxical redefinition of one or more of the terms. In a lecture called ‘Dictionary’ Johnson, delivered in 1967, J. P. Hardy admits all the ways in which Johnson was anticipated by his predecessors – ‘Yet, after this has been said’, he writes, ‘it can still be demonstrated that his was essentially the first English dictionary’. For him, ‘first’ means primus inter pares – first in our affection, if not in our chronologies. In a piece in the New York Times Book Review, James L. Clifford acknowledges that Johnson’s was not the first English dictionary; ‘Yet if he made no discoveries, Johnson was the first in England to combine in one reliable work the various functions we now demand of a dictionary’. And Walter Jackson Bate calls it ‘the first modern English dictionary’, though he never bothers to define modern.

Can these claims be salvaged? Johnson’s Dictionary was not the first, but was it the first to do anything particular? In fact, Johnson made a number of innovations in English lexicography, though almost nothing he did was without precedent in classical or continental lexicography. James Sledd and Gwin J. Kolb put it plainly in one of the most important books about the Dictionary: ‘Johnson, as lexicographer, asked no questions, gave no answers, and invented no techniques which were new to Europe, though they may very well have been new to English lexicography’ (1955, 44). But perhaps the most valuable ‘first’ for our purposes is that Johnson’s Dictionary is the first English reference work for which there is detailed information about its genesis and production. Some of that information is fragmentary and some is contradictory, but we know much more about how he proceeded than we do about any of his predecessors (and most of his successors).

Johnson’s Dictionary began as a booksellers’ project. A conger of publishers pooled their resources (and shared their risks) in publishing a large dictionary. They wanted something on the scale of the great dictionaries of the French and Italian academies, the Vocabolario degli Accademici della Crusca (1612) and the Dictionnaire de l’Académie françoise (1690). Great Britain, though, lacked an academy on the French or Italian models. There had been calls for a comparable English academy for many years – Daniel Defoe and Jonathan Swift were prominent advocates – but these demands never came to anything. Producing a comparable English dictionary would have to be the work of an individual.

Why the booksellers tapped Johnson for this role is a mystery. Though he would go on to be one of the most distinguished men of letters of the eighteenth century, having edited Shakespeare’s plays in eight volumes and written ten volumes of literary biography, when they approached him in 1746 Johnson had no record of scholarly achievement. His only comparable work was a five-volume library catalogue, the Catalogus bibliothecæ Harleianæ, which he prepared with William Oldys between 1743 and 1745. Whatever the booksellers’ reasons, though, the choice was a good one. Johnson had read widely in English literature and had a famously retentive memory. He was also exceptionally sensitive to subtle shades of meaning. He admitted years later that a project like this was on his mind even before the booksellers approached him: ‘Dodsley first mentioned to me the scheme of an English Dictionary’, he told his friend James Boswell; ‘but I had long thought of it’ (Boswell 1791, 189).

The European dictionaries required not only the labour of entire learned academies; they also took a long time to prepare. The Académie Française consisted of forty renowned scholars – the ‘immortals’, as they are known – who worked for forty years. Johnson had only himself and a small number of ‘amanuenses’, who were employed in routine copying tasks. When Johnson promised to complete the work on his dictionary in just three years, people scoffed. His friend and biographer, James Boswell, records a conversation where he addressed the scale of the task:

Dr Adams found him one day busy at his Dictionary, when the following dialogue ensued. ‘Adams. This is a great work, Sir. How are you to get all the etymologies? Johnson. Why, Sir, here is a shelf with Junius, and Skinner, and others; and there is a Welch gentleman who has published a collection of Welch proverbs, who will help me with the Welch. Adams. But, Sir, how can you do this in three years? Johnson. Sir, I have no doubt that I can do it in three years. Adams. But the French Academy, which consists of forty members, took forty years to compile their Dictionary. Johnson. Sir, thus it is. This is the proportion. Let me see; forty times forty is sixteen hundred. As three to sixteen hundred, so is the proportion of an Englishman to a Frenchman.’ With so much ease and pleasantry could he talk of that prodigious labour which he had undertaken to execute.

(Boswell 1791, 37)

He did not manage it in three years; the work took him nine. Still, the proportion is astonishing.

Upon agreeing to prepare the Dictionary, Johnson spelled out the principles that would guide his work. This proposal, too, marks a kind of first – it is the first detailed statement of the principles that would guide the creation of an English dictionary. It was published in 1747 as A Plan of a Dictionary of the English Language. There he declared his intention to ‘sort the several senses of each word’ and to trace the evolution of each word from ‘its natural and primitive signification’ through its extensions and metaphorical uses. He would provide extensive illustrative quotations, showing how words had been used by ‘writers of the first reputation’. ‘By this method’, he wrote, ‘every word will have its history, and the reader will be informed of the gradual changes of the language, and have before his eyes the rise of some words, and the fall of others’.

Johnson then began an extensive reading programme. He worried that his own ‘zeal for antiquity’ – he was well read in medieval and early modern authors – ‘might drive me into times too remote, and croud my book with words now no longer understood’, and therefore made the late sixteenth-century poet Sir Philip Sidney ‘the boundary, beyond which I make few excursions’. (Those ‘few excursions’ include Robert Grosseteste in the thirteenth century, Geoffrey Chaucer in the fourteenth, and Sir Thomas More in the early sixteenth.) He chose ‘to admit no testimony of living authours’, but there, too, he made exceptions, including the novelist Samuel Richardson, the poet Edward Moore, and the dramatist Henry Brooke. He gave particular attention to the period from the 1580s to about 1660: ‘I have studiously endeavoured to collect examples and authorities from the writers before the restoration, whose works I regard as the wells of English undefiled, as the pure sources of genuine diction’ (Dictionary xix).

Most of the authors he read were what one would call ‘literary’, including Sidney, Francis Bacon, William Shakespeare, John Milton, Sir Thomas Browne, Samuel Butler, John Dryden, Joseph Addison, Jonathan Swift, and Alexander Pope. But he also read Christian apologists and theologians like Richard Hooker, Richard Allestree, John Tillotson, and Robert South; philosophers like John Locke and Joseph Glanvill; ‘natural philosophers’ (we would call them scientists) like Sir Isaac Newton and Robert Boyle; politicians like the Earl of Clarendon and Sir William Temple; and so on. There were few areas of intellectual inquiry where he did not do at least some reading. He did, however, place a few restrictions on his reading on political or ideological grounds. He excluded Thomas Hobbes ‘because I did not like his principles’. Milton’s verse is among the most widely cited literature in the Dictionary, but his prose works – particularly his politically radical and theologically heterodox pamphlets – are read much more selectively. Nearly all the authors were male, though about a half-dozen women, including Elizabeth Carter, Hester Mulso, and Jane Collier, made the cut.

Out of the hundreds of books Johnson used, thirteen survive, allowing us to see how he proceeded. As he read, he marked the books by underscoring potential headwords and marking quotations with a pair of vertical lines, one at the beginning, one at the end. He then wrote the first letter of the headword in the margin. All this work he did by himself, but he trusted the copying to his amanuenses, who wrote the quotations onto sheets of paper and then cut them into slips. By the time he had finished his reading, he had hundreds of thousands of slips of paper. These were arranged in alphabetical order, and then he began working through the alphabet, subdividing words into senses.

Little is distinctive about Johnson’s wordlist. Noah Porter, who revised Noah Webster’s American Dictionary of the English Language, wrote in 1864 that ‘Johnson’s was in fact the first dictionary with anything like a complete vocabulary’. This is not quite true: Johnson was not the first to include the common vocabulary, nor was he the most comprehensive. The exact number of entries in the Dictionary depends on exactly what and how one counts, but one often-cited total is 42,773 headwords. This does not make it the most comprehensive dictionary of its time: the second edition of Nathan Bailey’s Dictionarium Britannicum (1736), Johnson’s most important predecessor, had around 60,000 headwords. Still, Johnson treated the common vocabulary more attentively than anyone had before him.

Nearly all of Johnson’s words came from his reading project, though he realised that he might miss some words. He therefore supplemented his reading by going through all the major dictionaries and noting words he had not found in use: ‘in reviewing my collection’, he admits with embarrassment, ‘I found the word SEA unexemplified’ (Dictionary xxii). In cases like that he was able to go back to his books and find appropriate quotations. In other cases, though, he entered words based only on their appearance in other dictionaries:

Many words yet stand supported only by the name of Bailey, Ainsworth, Philips, or the contracted Dict. for Dictionaries subjoined: of these I am not always certain that they are seen in any book but the works of lexicographers. Of such I have omitted many, because I had never read them; and many I have inserted, because they may perhaps exist, though they have escaped my notice: they are, however, to be yet considered as resting only upon the credit of former dictionaries.

(Dictionary vi)

Having established his wordlist, Johnson started defining. Some of his definitions have become famous. He was an Englishman well known for teasing the Scots, for instance, and humorously expressed his prejudice in the definition of oats: ‘A grain, which in England is generally given to horses, but in Scotland supports the people’. His own profession, lexicographer, is ‘A writer of dictionaries; a harmless drudge, that busies himself in tracing the original, and detailing the signification of words’. Sometimes he shows a political edge: excise is ‘A hateful tax levied upon commodities, and adjudged not by the common judges of property, but wretches hired by those to whom excise is paid’, and while a Tory (his own political identity) is ‘One who adheres to the antient constitution of the state, and the apostolical hierarchy of the church of England’, the rival Whig is merely ‘The name of a faction’. Other definitions are famous because they are impenetrable, as when he defined cough as ‘convulsion of the lungs, vellicated by some sharp serosity’ and network as ‘Any thing reticulated or decussated, at equal distances, with interstices between the intersections’. And a few definitions were simply wrong:

A few of his definitions must be admitted to be erroneous. Thus, Windward and Leeward, though directly of opposite meaning, are defined identically the same way … A lady once asked him how he came to define Pastern the knee of a horse: instead of making an elaborate defence, as she expected, he at once answered, ‘Ignorance, Madam, pure ignorance’.

(Boswell 1791, 79)

About these he was admirably frank in the Dictionary itself. ‘Some words there are which I cannot explain’, he acknowledges, ‘because I do not understand them’.

These eccentric and inadequate entries, though, constitute only a tiny proportion of the whole, and his tens of thousands of definitions reveal him to be an excellent definer. His preface makes him the first English lexicographer to spell out some of the difficulties in writing useful ‘explanations’, as he called his definitions. Thomas Carlyle admired the Dictionary’s ‘clearness of definition, its general solidity, honesty, insight and successful method’, and even Macaulay, who had few kind words for Johnson, praised his definitions for showing ‘much acuteness of thought and command of language’ (Macaulay 1856, 37).

The definitions show another respect in which Johnson’s Dictionary deserves to be called first: his was the first dictionary to be planned with extensive use of numbered senses. Benjamin Martin’s Lingua Britannica Reformata, which appeared a few years before Johnson, made some use of numbered senses, but Johnson expressed his intention to divide senses before Martin published, and he used numbered senses much more extensively and to greater effect. He was certainly the first to make minute discriminations in meaning, and he gave the core vocabulary of English more attention than any previous English lexicographer.

The definitions in most seventeenth- and early eighteenth-century English dictionaries are often little more than single synonyms. This run of entries in John Kersey’s New English Dictionary (1702) is typical:

Ling, a sort of salt-fish.

Ling, or furze.

Ling-wort, an herb.

To linger, or delay.

A Lingerer.

A Linger, or linget, a bird.

Kersey’s wordlist is limited, his definitions are vague, and sometimes he does not even bother with a definition at all (as with lingerer, listed just to show that the word exists). Johnson’s most important predecessor, Nathan Bailey, was more expansive in his definitions in 1736:

Ling, a sort of salt fish.

Ling Wort, the herb angelica.

To Li′nger [of langern, Teut.] to delay, to loiter; also to pine away with a disease.

Li′ngots [with Chymists] iron moulds of several shapes, in which melted metals are usually poured.

Li′ngua, the tongue; also a language or speech, L[atin].

Lingua′cious [linguax, L.] long-tongued, blabbing, talkative.

Benjamin Martin, writing in 1749 in Lingua Britannica Reformata, took more trouble to provide multiple definitions for words with more than one sense:

LING, 1 a fish so called.
1. 2 heath, or furze.
LI′NGEL (of lingula, L. a dim. of lingua a tongue) a little tongue, or thong of leather.
To LI′NGER, 1 to loiter, or be long in doing.
1. 2 to languish, or continue long in.
2. 3 to hanker after, or desire.
LINGUA′CITY (of linguicitas, L. of lingua a tongue) talkativeness, or the being full of talk.

Johnson’s definitions, by comparison, are detailed and precise:

Li′nen. n.s. [linum, Latin.] Cloth made of hemp or flax.
Li′nen. adj. [lineus, Latin.]
1. 1. Made of linen.
2. 2. Resembling linen.
Li′nendraper. n.s. [linen and draper.] He who deals in linen.
Ling. n.s. [ling, Islandick.]
1. 1. Heath. This sense is retained in the northern counties; yet Bacon seems to distinguish them.
2. 2. [Linghe, Dutch.] A kind of sea fish.
Ling. The termination notes commonly diminution; as, kitling, and is derived from klein, German, little; sometimes a quality; as, firstling, in which sense Skinner deduces it from langen, old Teutonick, to belong.
To Li′nger. v.n. [from leng, Saxon, long.]
1. 1. To remain long in languor and pain.
2. 2. To hesitate; to be in suspense.
3. 3. To remain long. In an ill sense.
4. 4. To remain long without any action or determination.
5. 5. To wait long in expectation or uncertainty.
6. 6. To be long in producing effect.
To Li′nger. v.a. To protract; to draw out to length. Out of use.
Li′ngerer. n.s. [from linger.] One who lingers.
Li′ngeringly. adj. [from lingering.] With delay; tediously.
Li′nget. n.s. [from languet; lingot, French.] A small mass of metal.
LI′NGO. n.s. [Portuguese.] Language; tongue; speech. A low cant word.
Lingua′cious. ad. [linguax, Latin.] Full of tongue; loquacious; talkative.

Here we see Johnson’s attention to parts of speech, as when he distinguishes transitive from intransitive verb senses (v.a., ‘verb active’, and v.n., ‘verb neuter’) and the nominal and adjectival uses of linen. The typography of his headwords distinguishes root words (in ALL CAPS, as with lingo) from derived forms (in Small Caps); the italics on lingo indicate the Portuguese word is not yet entirely naturalised as English. Usage notes identify one word as obsolete and another as ‘low cant’.

Johnson was at his best in treating the most polysemous words in the language – he identified many senses that had been neglected by all previous lexicographers. He is the first English lexicographer to write about the difficulty of defining what are now called phrasal verbs:

We modify the signification of many verbs by a particle subjoined; as to come off, to escape by a fetch; to fall on, to attack; to fall off, to apostatize; to break off, to stop abruptly; to bear out, to justify; to fall in, to comply; to give over, to cease; to set off, to embellish; to set in, to begin a continual tenour; to set out, to begin a course or journey; to take off, to copy; with innumerable expressions of the same kind, of which some appear wildly irregular, being so far distant from the sense of the simple words, that no sagacity will be able to trace the steps by which they arrived at the present use.

This attention paid off, for he made tremendous progress in recording senses that others had neglected. Bailey’s definition for go in the Dictionarium Britannicum (1736), for instance, reads, in full, ‘to walk, move, &c. to pass’. Martin, more attentive to senses, identifies fourteen meanings for go in 1749. Johnson, however, distinguishes fully sixty-eight senses. This is typical of the little verbs that are part of the core English vocabulary. Bailey’s seven-word definition of get – ‘to obtain, to acquire, to find out’ – is expanded to seven separate senses in Martin, but this becomes thirty-one numbered senses in Johnson, including meanings as diverse as ‘To win’ (‘Henry the sixth hath lost / All that which Henry the fifth had gotten’), ‘To prevail on; to induce’ (‘the king could not get him to engage in a life of business’), ‘To move; to remove’ (‘Rise up and get you forth’), and ‘To become by any act what one was not before’ (‘The laughing sot … Bathes and gets drunk’). Bailey pays more attention to take, for which he provides eighteen different meanings (‘To take Root, (in Plants) to sprout or push downwards’; ‘To take a Walk, to go a Walking’; ‘To take after [or resemble] any Person or Thing’). Martin discovers seventeen (‘1 to receive from another. 2 to seize, or lay hold of. 3 to drink, or swallow, as to take physic’, and so on). Johnson, by contrast, identifies 133 distinct meanings, most of them never recorded in any earlier English dictionary. It was not merely verbs: he was similarly attentive to other polysemous words, distinguishing ten senses of virtue, twelve of tender, fourteen of walk, and nineteen of service.

Omitted from the selection from Johnson’s definitions above is Johnson’s most important innovation in English lexicography: dozens of quotations, showing the words in use. He was the first to make extensive use of illustrative quotations – something familiar in continental lexicography, but, at least on this scale, something genuinely new in Britain. As his title page announces, the words are ‘Illustrated in Their Different Significations by Examples from the Best Writers’.

Johnson’s reading project in English literature became the basis of his wordlist, and – unlike the French academicians, who invented quotations in their Dictionnaire – he backed up nearly every sense of every word with one or more quotations from English literature. These 114,000 or so quotations are the heart of the dictionary, and they account for the bulk of the text in the book. No work was more cited than the King James Version of the Bible, which illustrates words like god, faith, love, hell, judgment, and hope. Among works with named authors, Shakespeare is the most common – all of his recognised plays are quoted, and Johnson praises him for providing examples of ‘the diction of common life’. Other great writers provide essential context, and reveal just how carefully he read the books he marked up.

There is one other respect in which Johnson can claim to be a ‘first’. Despite the existence of dozens of earlier English dictionaries in the century and a half before Johnson, there remained a curiously widespread complaint that English still lacked a proper dictionary. In his essay ‘Of the Original and Progress of Satire’ (1693) the poet and playwright John Dryden is direct: ‘we have yet no English Prosodia, not so much as a tolerable Dictionary, or a Grammar; so that our Language is in a manner Barbarous’. Nearly half a century later, after Kersey and Bailey had added their dictionaries to the stack, the philosopher and historian David Hume complained that ‘Elegance and Propriety of Stile have been very much neglected among us. We have no Dictionary of our Language, and scarce a tolerable Grammar’. And in 1747, critic and clergyman William Warburton noted that ‘the English tongue, at this Juncture’ has ‘neither Grammar nor Dictionary, neither Chart nor Compass, to guide us through this wide sea of Words’.

The complaints about the lack of a dictionary became even more pressing when Johnson’s was under way. A few months before the Dictionary was about to appear, Lord Chesterfield puffed the forthcoming work and insisted that ‘The time for discrimination seems to be now come. Toleration, adoption and naturalisation have run their lengths. Good order and authority are now necessary’. He attributed this sad state of affairs to the lack of an authoritative dictionary: ‘we had no lawful standard of our language set up, for those to repair to, who might chuse to speak and write it grammatically and correctly’. It is, he wrote, ‘a sort of disgrace to our nation, that hitherto we have had no … standard of our language; our dictionaries at present being more properly what our neighbours the Dutch and the Germans call theirs, word-books, than dictionaries in the superior sense of that title’ (Chesterfield 1754). (The Oxford English Dictionary notes that wordbook ‘is often used where it is desired to avoid the implication of completeness or elaboration of treatment characteristic of a dictionary or lexicon’.) In these inferior wordbooks, ‘All words, good and bad, are there jumbled indiscriminately together’. In proclaiming that Johnson’s dictionary would succeed where all the others failed, he manages to redefine dictionary and in the process to eliminate all the others from competition, leaving room for Johnson’s to be the first.

And in this sense, it may be true, for Johnson’s was the first dictionary about which such grand pronouncements could be made. None of the twenty earlier English lexicographers achieved a comparable position in British culture. The closest Nathan Bailey ever came to being a household name is a reference in Henry Fielding’s novel Tom Jones (1749): ‘“Will you never learn the proper Use of Words?” answered the Aunt. “Indeed Child, you should consult Bailey’s Dictionary.”’ Benjamin Martin clearly hoped to occupy the post of Britain’s authoritative lexicographer – his preface boasts that ‘this Dictionary is by much the most perfect of its Kind’ – and yet the Lingua Britannica Reformata created little stir on its appearance. The fact is that lay readers routinely thought to look in a dictionary when they had questions about usage.

All of that changed with Johnson’s Dictionary, which created a stir almost from the day it appeared. Not everyone, of course, was pleased with it. The linguistic theorist John Horne Tooke was the most vocal of the early critics; he found the Dictionary ‘most imperfect and faulty’ and called it ‘the least valuable of any of his productions’. He objected strenuously to Johnson’s inclusion of obscure words, arguing that ‘Nearly one third of this Dictionary is as much the language of the Hottentots as of the English’. Another critic, Archibald Campbell, parodied Johnson’s fondness for difficult Latinate words: ‘Without dubiety you misapprehend this dazzling scintillation of conceit in totality, and had you had that constant recurrence to my oraculous dictionary, which was incumbent upon you from the vehemence of my monitory injunctions, it could not have escaped you’.

Most, though, thought the Dictionary a literary and intellectual milestone, an important moment in the history of Great Britain itself. The Scottish economist Adam Smith wrote an enthusiastic notice in The Edinburgh Review (despite Johnson’s swipe at the Scots), and the actor David Garrick, Johnson’s onetime student and later close friend, celebrated the Dictionary in explicitly nationalist terms:

Talk of war with a Briton, he’ll boldly advance,

That one English soldier will beat ten of France;

Would we alter the boast from the sword to the pen,

Our odds are still greater, still greater our men …

And Johnson, well arm’d like a hero of yore,

Has beat forty French, and will beat forty more!

Despite the strong notices and critical buzz, the work sold slowly at first; its £4 10s. price tag was too much for any but the wealthiest readers. But in 1756 a smaller abridged edition appeared, this time without the quotations, and from that time well into the nineteenth century, Johnson’s Dictionary was one of the most respected books in the English language. All the lexicographers in his wake over the next century were forced to situate their works in a Johnsonian tradition.

In the years and decades after Johnson’s Dictionary was published, there was a widespread sense that the lack felt so keenly by writers like Dryden, Hume, and Warburton was filled: the English language finally had a proper dictionary. The second edition of the Encyclopædia Britannica (1778–83), for instance, notes that ‘The only attempt which has hitherto been made towards forming a regular dictionary of the English language, is that of the learned Dr Samuel Johnson’ – and so, in a single sentence, the dozens of earlier dictionaries are wiped out of existence. Thomas Tyers made a similar claim in 1783, in his Historical Essay on Mr Addison: that Johnson was the first and only lexicographer to produce a dictionary worthy of the name.

Soon enough, Johnson’s name was shorthand for lexicography generally; to name him was to invoke linguistic authority itself. When Henry Tilney questions Catherine Morland’s use of the word nicest in Jane Austen’s Northanger Abbey, Eleanor warns, ‘You had better change it as soon as you can, or we shall be overpowered with Johnson’. No name before Johnson’s was enough to strike that sort of terror into people, and only one since – Noah Webster’s – has joined it.

And so, if we adjust our criteria and allow ‘the first dictionary’ to mean ‘the first standard dictionary’ – the first one widely perceived as an authoritative standard – then Johnson’s does seem to become number one. Britain had redefined dictionary to mean ‘authoritative dictionary’, thereby excluding everything that came before as mere ‘wordbooks’. Whether or not Johnson was entirely comfortable with that role is another question: when Chesterfield tried to turn him into a linguistic dictator, Johnson refused to play the role. Nine years of work had taught him that usage is what shapes the language, and no lexicographer has the power to shape the language to his whim. Still, he was proud of his achievement of giving the English language a dictionary to rival the works of Italy and France. When Boswell carelessly mused that ‘You did not know what you were undertaking’ when he wrote the Dictionary, Johnson shot back, ‘Yes, Sir, I knew very well what I was undertaking, – and very well how to do it, – and have done it very well’.

Nineteenth-Century English Dictionaries: Descriptivism

Chapter 13 The Making of American English Dictionaries

Immigrants from Europe were too busy settling into America in the seventeenth century to worry much about dictionaries, though they did adopt new words and new senses of old ones from the day after they established a colony into the nineteenth century, in the early Republic, when Americans – notably Noah Webster – finally started to account for their English as fully and carefully as their English and Scottish cousins had accounted for theirs. Subsequently, the history of American dictionaries was largely an entanglement of learning, patriotism, commerce, and cultural authority. Webster and the Merriam-Webster Company bring the story full circle, as, apparently, the alpha and the omega of American lexicography, although one hopes for a twenty-first century resurgence of American enthusiasm for dictionaries and a robust dictionary market.

Americans have made many more dictionaries – indeed, excellent dictionaries – than an essay of this scope can mention, let alone describe. Here, we consider the general commercial dictionaries and historical dictionaries, unfortunately overlooking many worthy place-name dictionaries, folk dictionaries, usage dictionaries, thesauri, and terminological dictionaries, even those, like Black’s Law Dictionary, that are both culturally significant and uniquely American.

Contact

English settlers arrived in the New World in the very early seventeenth century at four points: Cuper’s Cove in Newfoundland (1610), St George’s in Bermuda (1612), the Jamestown Colony of Virginia (1607), and the Massachusetts Bay Colony (1628). They were not prepared for what they found, neither materially nor lexically. For instance, what Spaniards called maize, New Englanders called Indian corn; but they borrowed the Native American word hominy for corn reduced to grits and cooked for food. Common American English terms for flora and fauna are reanalysed versions of Native American words: persimmon and hickory, quahog and raccoon. Settlers often borrowed Native American names for places they settled, from the Abagadasset River in Massachusetts (now Maine) to Yokum in the same commonwealth’s Berkshire County. By the end of the century, native-born descendants of English settlers would establish backlog and bull frog as distinctively American English, and while the insular English drew lots to determine allotted roles, played the lots ‘lottery’, and paid lots ‘taxes’, their American cousins were dividing land up into lots in a new sense of the word as early as 1633.

Settlers and their descendants, as well as visitors, noticed the new words right away. Captain John Smith, who helped to found Jamestown, first recorded raugroughcum or raccoon in 1608. Opossum/possum and moose were not far behind. But while American vocabulary accumulated, some saw corruption in ways Americans declared their independence of insular English usage. As often happened in the history of English lexicography, a Scot, the Reverend John Witherspoon led the way. Fifteen years after he arrived in New Jersey to become president of the College of New Jersey – later Princeton University – Witherspoon contributed a series of essays critical of American usage to the Pennsylvania Journal and The Weekly Advertiser (1781). Some already patriotic Americans responded with letters in favour of American English. The Reverend Jonathon Boucher, similarly disapproving, avoided controversy, because bits of his eighteenth-century Glossary of Archaic and Provincial Words were published posthumously (1807 and 1832).

After the War of Independence and gradual stabilisation of the American Republic, people recognised, not just American words, phrases, and usage, but a national language – propose a national ideology, and dictionaries are not far behind. One of the earliest was John Pickering’s A Vocabulary or Collection of Words and Phrases which Have Been Supposed to Be Peculiar to the United States of America (1816). But a Connecticut patriot, Noah Webster, first attempted a systematic approach to American English in his Compendious Dictionary of the English Language (1806). Critics rejected it; Webster was not cowed. He laboured for another two decades and more before he published the first great American dictionary.

Noah Webster, the Brothers Merriam, and the First War of the Dictionaries

Webster wrote his dictionaries – as do all who write dictionaries – in a specific cultural and political environment. ‘Americans of his day were obsessed with the idea of an independence from Britain complete not merely in a political sense, but in every sense. Extremists advocated the adoption of an utterly different language’ (Leavitt 1947, 13). English prevailed, however, and Webster was central to recording and stabilising an American variety of that language, especially in his monumental two-volume American Dictionary of the English Language (1828), with some 70,000 entries, covering not only basic English vocabulary, but Americanisms, new technical terms, and Webster’s one coinage, demoralize, which is, of course, familiar today.

The American Dictionary is an American classic, the foundation of American general lexicography and a model for other lexicographers, even British ones, like John Ogilvie, whose similarly two-volume Imperial Dictionary of the English Language: A Complete Encyclopedic Lexicon, Literary, Scientific, and Technological (1847–50) was based on Webster’s last revision (1841) of his dictionary. Webster’s definitions are notably clear and precise. James Murray, chief editor of the OED, called Webster ‘a born definer of words’. Webster explained words key to developing American institutions and culture in expansive entries – law, for instance, admits twenty-six senses, including ‘Municipal or civil laws are established by the decrees, edicts or ordinances of absolute princes, as emperors and kings, or by the formal acts of the legislatures of free states. Law therefore is sometimes equivalent to decree, edict, or ordinance’, which articulates distinctively American political experience.

One cannot overlook some deficiencies in Webster’s work. He resisted the New Philology, which articulated the relations among many unexpectedly connected Indo-European languages, and was himself given to ‘fantastic speculations, devoid of any save that of historical curiosity’ (Landau 2001, 71), sometimes tracing words back to supposed origins in Chaldean, a Semitic rather than Indo-European language. Webster was an ardent spelling reformer, and the American Dictionary incorporated many doomed Websterian spellings – he liked to remove unnecessary letters, preferring bred to bread, for instance. Later editions of Webster’s 1841 dictionary adopted conventional spellings and sounder etymological principles.

The American Dictionary – large and expensive – had limited distribution, but from the outset Webster expected to produce an abridged version, which was published in 1829 as an affordable octavo volume and immediately transformed an iconic national dictionary into a democratic one. A decade later, Webster embarked on the wholesale revision published in 1841, ‘under the title An American Dictionary of the English Language, 2nd Edition, Corrected and Enlarged, but it soon became almost universally known by the popular name of Webster’s Unabridged’ (Leavitt 1947, 36), and we call Merriam-Webster’s big dictionaries – even the online edition – by that name to this day.

After Webster died, George and Charles Merriam bought the rights to the American Dictionary. Masters of stereotyped printing, which was especially cost-effective for very large print runs of school books, the Bible, etc., the Merriams saw the potential in dictionary sales. They engaged Professor Chauncey A. Goodrich of Yale College, Webster’s son-in-law and literary executor, as chief editor of a one-volume, newly revised and enlarged edition – it contained some 85,000 entries – published in 1847, which proved as successful as the Merriams had hoped. Yet, the next decade would prove turbulent, with the eruption of the infamous war of the dictionaries.

Joseph E. Worcester, though a critic of Webster’s spelling reforms, had compiled the Abridged dictionary of 1829 under Webster’s supervision. When he published Worcester’s Pronouncing and Explanatory Dictionary (1830), Webster accused him of plagiarism, he responded with further criticism, people took sides, and ‘by the time Worcester’s Universal and Critical Dictionary appeared in 1846 there were all the makings of a first-class fracas’ (Leavitt 1947, 53). Worcester fuelled the controversy with subsequent editions, and Dictionary of the English Language (1860), with 104,000 or so entries, remained Webster’s chief competitor for decades, yet the Merriams effectively won the war – the attention brought by endless press battles drove sales through the roof. Still, the so-called war alerted Goodrich and the Merriam brothers against complacency, and even though the United States found itself on the eve of the Civil War, they devised plans to put Worcester behind them once and for all.

Late Nineteenth-Century American Lexicography

Worcester’s last dictionary was widely praised in the press upon its publication in 1860 (Friend 1967, 96) and prominent twentieth-century scholars such as George Phillip Krapp (1925, I, 371–2) who, in his English Language in America, judged it better than Webster’s on many points. But its quality was not the only issue. During the 1850s, Worcester had added encyclopaedic material and illustrations to new editions of the Universal and Critical Dictionary, and with each successive printing of its dictionary, Merriam-Webster titted for those tats, but it needed a thorough revision, one that replaced Webster’s etymologies according to much improved knowledge of comparative Indo-European philology. Worcester had improved on Webster by providing restrained etymologies and some cognate forms, not exceeding his relatively limited knowledge. Goodrich, the Merriams, and Webster’s heirs settled on Goodrich’s chief assistant, Noah Porter, to take the reins, and Porter convinced them all to hire the prominent German etymologist C. A. F. Mahn to write etymologies for the new edition from scratch.

The new work, An American Dictionary of the English Language, Royal Quarto Edition (1864), known in professional shorthand as ‘Webster-Mahn’, established a dictionary lineage that would last nearly a century, and that most Americans believed set the standard of American lexicography. Although there were intermediate printings with sections of new entries appended at the dictionary’s end, the next full revision, Webster’s International Dictionary was published in 1890, again under Porter’s leadership. The wordlist had expanded to 175,000 items, around 56,000 more than in Webster-Mahn. Webster’s New International Dictionary (1909), edited by William T. Harris, had grown to 400,000 entries. Webster’s New International Dictionary, Second Edition (1934) was Merriam-Webster’s most ambitious undertaking ever, with a staff of more than 300 – not all of them in Springfield, Massachusetts – and at a cost of $1,300,000, or, in 2018, nearly $25,000,000. Led by William Allan Neilson, President of Smith College, with daily operations directed by Thomas A. Knott and Paul W. Carhart, etymologies revised by Harold H. Bender of Princeton University, and 12,000 or so illustrations revised by H. Downing Jacobs, the printed dictionary comprised roughly 552,000 entries. It was so monumental, and so well respected, that the next revision, Webster’s Third (1961) was rejected by many American dictionary users – indeed, some resistance continues in the twenty-first century.

Merriam-Webster may have won the war against Worcester, but it faced bracing competition, nonetheless, principally from Funk & Wagnalls, a well-established New York publisher that brought out its Standard Dictionary of the English Language in 1893, edited by one of the firm’s owners, Isaac Funk. The Standard Dictionary included 304,000 entries, far outpacing the size of Webster’s International, published just three years earlier. After years of head-to-head competition, Funk & Wagnalls published its New Standard Dictionary in 1913, with 450,000 entries, still ahead of Merriam-Webster’s flagship dictionary. The Standard and New Standard ‘introduced lasting changes in dictionary practice’, that ‘mark[ed] the maturity of the unabridged as a genre’ (Landau 2001, 86). For instance, rather than organise entries with senses in historical order, Funk & Wagnalls listed them from most common to most specialised. Etymologies, formerly placed at the beginnings of entries, were demoted to their ends – definitions and pronunciations mattered more to dictionary users than word histories. Merriam-Webster would win this market skirmish when it published Webster’s Second, but Funk & Wagnall’s challenging performance no doubt prompted that colossal effort.

American Dictionaries at School and University

Merriam-Webster had developed a series of school dictionaries in the mid-nineteenth century, but publication of Webster’s Collegiate Dictionary (1898) extended the school tradition into American colleges and universities. Funk & Wagnalls entered this market with the College Standard Dictionary (1922), edited by Frank H. Vizetelly. But after World War II, when many men matriculated at colleges and universities, funded by the GI Bill of 1944, the demand for college dictionaries expanded tremendously (for a full treatment, see Landau 2009). Funk & Wagnalls seized the opportunity with a second edition of the College Standard (1947), supervised by Charles Earle Funk. Soon after publication of Webster’s New World Dictionary of the American Language (1951) followed a college edition (1953), both edited by David B. Guralnik and Joseph H. Friend. Webster’s New World College Dictionary, as it is now called, entered its fifth edition in 2016, while Webster’s Collegiate has reached an eleventh edition, first published in 2003. These dictionaries, as well as their ‘parents’, are usually updated in intermediate printings or continuously online.

In the 1930s and 1940s, the psychologist Edward L. Thorndike applied what we knew then about childhood learning to construct word books and dictionaries, producing a series of ‘Thorndike-Century’ dictionaries for different levels of education, from the beginning – precisely, American grades three to five – to a ‘senior’ dictionary for high-school students. These works were forward looking. In later editions of the 1950s and 1960s, he was joined by Clarence L. Barnhart, whose contribution was so considerable that the series was re-named ‘Thorndike-Barnhart’. Barnhart had become famous immediately after World War II as chief editor of yet another of the spate of college dictionaries, the American College Dictionary (1947). In fact, it was the first of them to reach the bookstores and so captured an unexpectedly large share of the growing market. Unlike its competitors, the American College Dictionary was built to be itself, rather than abridged and adapted from a larger dictionary.

Eventually, all the major American dictionary publishers offered college and school dictionaries and thesauri. Such dictionaries introduced young people to dictionary brands but also – except for the American College Dictionary – capitalised on each signature dictionary, making the most of the research and writing that had gone into it. Of course, students needed dictionaries, but college dictionaries were so affordable, they insinuated that everyone ought to own at least two dictionaries, a small one in college – a desk reference – and an unabridged one to go along with the white picket fence. As material objects, dictionaries signified status within the American class system in the so-called post-war period. The extent to which they will continue to do so remains to be seen (see Adams 2018).

The Second War of the Dictionaries: Webster’s Third and its Adversaries

Backlash over publication of Webster’s Third (1961) proved the significance of dictionaries in American culture, regardless of the merits of the public criticism or Merriam-Webster’s defence. Academic views of language changed considerably in the first half of the twentieth century, marked by the rise of linguistics as a discipline, punctuated in 1933 with publication of Leonard Bloomfield’s Language. Though published in 1934, Webster’s Second might be considered the last great nineteenth-century dictionary and the apotheosis of the Merriam-Webster dictionary tradition. Many people – both Merriam-Webster insiders and members of the American dictionary-reading audience – saw no reason to challenge that tradition. Merriam-Webster was the dictionary brand Americans could rely on for sound information about American English words and sound advice about usage.

When Philip Babcock Gove succeeded John Bethel as general editor of the Merriam-Webster dictionaries, he resolved to build the next edition of the big dictionary on linguistic principles. Thus, Webster’s Third benefited from more systematic approaches to etymology, pronunciation, and defining. For instance, Edward Artin, the pronunciation editor, went into the field and recorded speech, so that he could identify the range of American pronunciations, rather than merely represent the ‘general American’ norm, and he included variant pronunciations in entries, suggesting, for the first time in the history of American lexicography, that a dictionary’s job was to record language facts rather than render judgement about usage. If people across America pronounced a word differently, then dictionary users should know that and should also stop assuming that one pronunciation – the supposedly ‘standard’ pronunciation – was better than the others, ‘correct’ where the others were incorrect.

Gove and his associates introduced descriptive linguistics into the prescriptive public discourse about English usage. Descriptivists believe that linguists, lexicographers, and teachers should describe how we speak or write, not the way we should speak or write. Prescriptivists believe that some authority must propose and regulate what is standard or correct and what is not. So, they were naturally shocked when Gove announced the consensus among linguists: language is constantly changing; change is normal (that is, not something to worry about or resist); spoken English, not written English, constitutes ‘the’ language; correctness depends on how people actually speak; and usage is relative, due to regional, gender, and class identities, among others. Dividing the world into descriptivists and prescriptivists is unhelpfully reductive; ultimately it proves a red herring in arguments about usage and language authority. Yet, prescriptivists in 1961 took Webster’s Third and the principles on which it was founded as an act of culture war.

A chapter of this size cannot dig into the fascinating details of the Webster’s Third story, neither what actually changed from Webster’s Second to Webster’s Third, nor what people said or assumed had changed. Two fine books, Herbert C. Morton’s The Story of Webster’s Third: Philip Gove’s Controversial Dictionary and Its Critics (1994) and David Skinner’s The Story of Ain’t: America, Its Language, and the Most Controversial Dictionary Ever Published (2012), leave no stone unturned and, refreshingly, take different perspectives on Gove’s method and the controversy. Skinner puts ain’t into his title because, supposedly, Webster’s Third had entered it for the first time, thus legitimating it, and, by not labelling it as substandard, proposed it as acceptable American English. The first supposition was false, the second closer to true. Some people objected to Merriam-Webster’s apparent permissiveness, expecting a dictionary to uphold usage standards. Journalists had a field day: ‘Good English Ain’t What We Thought’, ‘Saying Ain’t Ain’t Wrong’, and ‘Say It “Ain’t” So!’ were sample headlines upon the dictionary’s release. Another headline, ‘But What’s a Dictionary For?’, asked the fundamental question.

One captures the majority journalistic response to Webster’s Third succinctly in just such headlines. Suddenly, under the influence of linguistics, Merriam-Webster was too hip to slang: ‘Dig Those Words’ and ‘Webster’s Way Out Dictionary’, said the headlines. Suddenly, under the influence of linguistics, time-honoured laws of usage were overturned – ‘100,000 Words Become Legal’ – and the whole point of language condemned – ‘The Death of Meaning’ – the language markets were upended – ‘Logomachy – Debased Verbal Currency’ – and institutions went up in flames – ‘Anarchy in Language’. ‘Keep Your Old Webster’s’, The Washington Post advised, because, one editorial claimed, ‘New Dictionary Cheap, Corrupt’. One critic scolded the mere ‘Ruckus in the Reference Room’, while another returned to the metaphors of dictionary wars with ‘Sabotage in Springfield’. Those on the side of Webster’s Third came up with such gems as ‘Linguistic Advances and Lexicography’, ‘English as It’s Used Belongs in Dictionary’, ‘Webster Editor Disputes Critics; Says New Dictionary Is Sound’, ‘A Lexicon for the Scientific Era’, and ‘The Lexicographer’s Uneasy Chair’. We know who won the war of the headlines. In theirs, opponents of Webster’s Third essentially declared the second war of the dictionaries, though there were no dictionaries ready to fight against Merriam-Webster’s permissiveness at the time. That would soon change.

James Parton, president of the American Heritage publishing company, had tried to buy Merriam-Webster in 1959, but failed. Whatever the merits of Webster’s Third – and there are many – the incendiary response to it, however unfair, opened the American dictionary market to alternatives. Parton hired William Morris – who had once been a Merriam-Webster salesman – to produce an American Heritage Dictionary of the English Language that could step into the prescriptive role Merriam-Webster had disavowed in Webster’s Third. Whereas Webster’s Third was irresponsibly permissive, the American Heritage Dictionary would advise on usage, even if it did not prescribe it. And it would not do so capriciously, but with the help of a ‘usage panel’ whose opinions would supposedly inform some five hundred usage notes, a veritable dictionary of usage incorporated in the dictionary’s general and otherwise utterly conventional structure. Webster’s Third came under attack, too, because its definitions were complex, single sentences of a pattern that would not, it turned out, fit all words equally well. The American Heritage Dictionary definitions were written with its audience in mind, rather than on semantic principles. The dictionary also included an appendix of Proto-Indo-European roots – written by Harvard University professor Calvert Watkins – and when appropriate referred to it in entry-level etymologies.

The American Heritage Dictionary was no flash in the pan; its fifth edition – probably the last to appear in print – was published in 2011. It broadened the sense among Americans of what a dictionary can do and presented its users with lexical and usage information missing in other dictionaries. Gradually, however, as it added new encyclopaedic features – word history notes, ‘our living language’ notes, and regional notes – and revised its structure, it became a nearly descriptive dictionary, much more like Webster’s Third, the dictionary it was meant to challenge, than its own first edition (Adams 2015). Interestingly, the Merriam-Webster dictionaries gradually inserted usage and word history notes into their dictionaries, as with Worcester a century earlier, beating competitors by absorbing their best ideas and practices.

Other dictionary programmes – like the Webster’s New World line, published by the World Publishing Company of Cleveland, Ohio – entered the expanding and diversifying American dictionary market. The American College Dictionary was wildly successful, and Random House decided to base yet another competitor, the Random House Dictionary of the English Language (1966), edited by Jess Stein and Laurence Urdang, on it. The big Random House dictionary was not a reaction to Webster’s Third – editing began in the mid-1950s – but it certainly benefited from the temporary hobbling of Webster’s thoroughbred unabridged, and its introduction remarks on how the Random House dictionary navigates the impasse between the Scylla and Charybdis of doctrinaire descriptivism and prescriptivism. Random House’s process was unusual and worth noting. Generally, college and school dictionaries are cut-down versions of large, if not unabridged, parent dictionaries. Before the Random House dictionary, no one had built a big dictionary up from a college dictionary. Random House was also the first dictionary to rely on computers for parts of the editing and production processes. The 1960s, given the controversy over Webster’s Third and the market created by the controversy, marked the onset of an American dictionary heyday that would last into the twenty-first century.

American Historical Dictionaries and Historical Dictionaries of American English

The OED has dominated the world of historical dictionaries of English since its completion. For a while, the ten-volume Century Dictionary (1889–95), also a dictionary made on historical principles, was its chief competition. The great Sanskrit scholar William Dwight Whitney of Yale, the college and ultimately university that supplied so many Merriam-Webster editors, conceived it and led the project to its completion. Unfortunately, it has been so long out of print that America’s quite successful attempt at a general English historical dictionary goes largely unremembered. The Century is a remarkable dictionary and has been beautifully accounted for – conception, typography, etymology, definitions and usage, pronunciation, illustration, and influence – in a special section of an issue of the journal Dictionaries (Chisholm 1996). But, while the Century was America’s answer to the OED, it was not a historical dictionary of American English.

In 1919, William A. Craigie, one of the OED’s editors, proposed several historical dictionary schemes to the Philological Society of London, among them a dictionary of American English. He left England for the University of Chicago to make that dictionary, the Dictionary of American English (DAE), in 1925; it began to appear in a series of twenty fascicles in 1936 and was published in four volumes from 1938–44. Craigie was one of the most experienced and prominent anglophone lexicographers in the world, yet Americans noted immediately the strange proposition that a Scot from England was best qualified to compile a dictionary of American English. It was a rare post-colonial moment, and the American editors sometimes resisted Craigie’s been-knighted imperial authority (Adams 1998) to the last line of Z.

DAE ignores much American English. As Craigie explained in the preface to the first volume, the dictionary could not include all English used in America – it would have taken forever to collect the information and another forever to write entries and see them into print, in total something like the amount of time and labour that went into the OED. The more modest DAE focused on Americanisms, senses of words unique to America, words and uses more common in America than in England, and ‘every word denoting something which has a real connection with the development of the country and the history of its people’, the last an incoherent criterion. DAE includes next to no slang and a limited selection of regionalisms, lapses that necessitated the Historical Dictionary of American Slang and the Dictionary of American Regional English (DARE). DAE also imposed 1900 as a deadline for new words, though early twentieth-century quotations sometimes figured in entries for words well established in the nineteenth century. After 1900, much American English ensued, and DAE is now far behind the national language, a historical historical dictionary, so to speak.

A Dictionary of Americanisms on Historical Principles (1951), edited by one of DAE’s assistant editors, Mitford M. Mathews, better stands the test of time. In two volumes, it is certainly more manageable than DAE. DAE had included and marked many Americanisms, but Mathews felt that the larger dictionary recorded them too sparingly. DAE included some 35,000 main entries – in fact covering many derivatives, phrases, sayings, etc., within those entries – and some 150,000 quotations, from more than 2,500 sources. By contrast, the Dictionary of Americanisms included 14,000 entries based on 100,000 quotations from 4,000 sources – Mathews saw the abundant evidence as culturally important; it distinguishes American from British and other Anglophone experience.

Both DAE and the Dictionary of Americanisms belong squarely to the tradition of anglophone historical lexicography established by the OED, and it is no surprise that Craigie packed that method in his luggage when he left England for Chicago – he was confident that one method fit all English vocabulary, regardless of place, culture, or history. DARE’s chief editor, Frederic G. Cassidy, claimed that DARE followed the OED’s example, too, but the claim was true only to a point. Cassidy had worked on the Middle English Dictionary and Early Modern English Dictionary projects – the latter was barely begun and never finished – both conceived originally as period supplements to the OED. But Cassidy’s vision of DARE was very different from the OED.

First, Cassidy and his colleagues quoted a wider array of material than the OED and other historical dictionaries. Because newspapers are mostly local – disregard the handful of national newspapers of record, like the New York Times – they well support the quest for regional words, so are more prominent in DARE. Then, he also quoted scholarship on the pronunciations or regional distributions of words, or field guides to describe flora and fauna, practices eschewed by other historical dictionaries. DARE is a bold synthesis of linguistic atlas and historical dictionary. Between 1965 and 1970, Cassidy and a team of fieldworkers administered a questionnaire with 1,847 questions meant to elicit regional usage to 2,777 informants in 1,002 communities across America, and the pinpoint responses appear in many entries, an unusual stream of information within a dictionary text.

DARE illustrates the regional distribution of certain words on computer-generated maps reconfigured according to population density rather than political geography, which thus presents readers with a new America, or, at least, a different way of understanding the old one. DARE contains 3,000 of those maps, accompanying 60,000 headwords and senses across 5,544 pages in five volumes, with a sixth volume devoted to apparatus, all published between 1985 and 2013. Few historical dictionaries have captured public imagination as surely as DARE – lionised in the press, admired by the academy, funded by foundations and interested individuals, it is a democratic dictionary, a dictionary of the people in which lots of people participated. It speaks in American voices of underlying American ideologies.

Twilight of the American Dictionary

In the twenty-first century, America mostly stopped making new dictionaries: Random House closed its dictionary office in 2002; Houghton Mifflin discontinued the focused American Heritage editorial programme in 2018. These dictionaries will be revised; new editions will appear. But the publishers depend on loose teams of freelance lexicographers to do the work. They are dictionaries without vision now, rote exercises in reference publishing, lacking any specifically American perspective, dictionaries outside of the cultural argument. Oxford University Press may still be in the game, but the New Oxford American Dictionary’s most recent edition appeared in 2001 – seventeen years (at this writing) is a long interval between editions of such a dictionary. Only Merriam-Webster is committed existentially to producing American dictionaries for the general commercial market.

Similarly, academic dictionaries are less well funded than in the past – what foundation today would accept responsibility for a several decades long project like DARE? DARE relied heavily on the National Endowment for the Humanities, but its editors could not find enough patrons to support a post-2013 research programme, so the online edition cannot benefit from continual revision in the way of the OED and Merriam-Webster’s current Unabridged. DARE has inspired other historical dictionaries of regional American English, notably Michael B. Montgomery and Joseph S. Hall’s Dictionary of Smoky Mountain English (2004). Perhaps, the future of American lexicography depends on making dictionaries of that scope and kind, or within the dictionary domains left out of this chapter.

Of course, there are Web-based dictionaries now – Wordnik, for instance, and dictionary.com. These sites are destinations for word lovers and include a lot more than definitions and the like, the things one expects to find in print dictionaries. So does Merriam-Webster online. But Merriam-Webster is a dictionary programme that has adapted to the online environment, not an online enterprise anchored by a dictionary. Such online dictionaries may be based in the United States, but they are American dictionaries in the sense that Amazon is an American business. They are not acts of patriotism; they are not partisans in the culture wars. It will be interesting to see how long Merriam-Webster – the through line of Americanness in American dictionaries – can endure.

Chapter 14 The Oxford English Dictionary

The Oxford English Dictionary (OED) is described on its website as ‘the definitive record of the English language’, and this is no exaggeration. Begun over 150 years ago, the OED is the largest, most comprehensive, scholarly, and authoritative dictionary of the English language. It covers more than 600,000 words from all varieties of English, over a period of one thousand years. There have been two editions of the OED, and the third edition is currently being worked on by a team of seventy people in Oxford. The process is slow because of the size of the task (the first and second editions were ten volumes and twenty volumes respectively) and the level of precision and rigour which the editors insist on maintaining. For example, the entry for the verb run is currently over 600 senses in length, and took one editor over nine months to revise.

As an ‘historical’ dictionary, the OED shows how words are used across time and describes them from their first recorded usage to the present day, traced through three-and-a-half million quotations from varied sources that are all dated and verifiable. Therefore, one of the most distinctive parts of an entry in an historical dictionary such as the OED is the large quotation paragraph that follows the definition, and includes citations from sources as varied as literature, newspapers, poetry, specialist journals, song lyrics, film scripts, letters, diaries, even Twitter and emails. Hence, one finds all kinds of language from slang to literary language to scientific vocabulary being exemplified by all kinds of citations from Eminem to Cursor Mundi to Principia Botanica. If an English word appears in a dated source, and is used by writers over a number of years, then it is eligible for inclusion in the OED.

The structure of an OED entry is unmistakable and instantly recognisable. Take any entry from 1884, and place it beside one from 2019 - the structure is identical. The entry starts with a headword in bold typeface, followed by a part of speech and pronunciation. These are followed by a list of variant spellings, an etymology, the definition, and the quotation paragraph.

The OED is not to be confused with the ‘Oxford Dictionary’. Although both are published by Oxford University Press, they represent two different types of dictionaries. There are ‘synchronic’ dictionaries, such as Oxford Dictionary Online, which cover one point in time, usually focusing on current language; and in an entry, the most important or common meanings are given first. Whereas with historical or ‘diachronic’ dictionaries, such as the OED, meanings are ordered chronologically starting with the first recorded use. Historical dictionaries also illustrate how words are used and change over time and include obsolete and historical terms.

The Original Vision of the OED

In the middle of the nineteenth century, the English-speaking world was ready for a new dictionary. In America and in Britain people had begun to articulate the desire for a dictionary with thorough coverage of words and more precise etymologies and definitions. As the New York Times put it in 1858 ‘There is no doubt about the want of a better English dictionary’ (2 November, p.4). It had been one hundred years since the publication of Samuel Johnson’s magisterial Dictionary of the English Language (1755), thirty years since Noah Webster’s American Dictionary of the English Language (1828), and twenty years since Charles Richardson’s quirky New Dictionary of the English Language (1836–7). Over this period huge advances had taken place in philological scholarship which made aspects of these dictionaries seem out-of-date and inadequate. They were seen as excessively subjective and prescriptive, and lacking scientific rigour, systematic investigation, and comprehensive coverage of the lexicon.

Indeed, the many deficiencies of contemporary English dictionaries were highlighted in two famous speeches delivered in 1857 to the London Philological Society, the oldest learned society in Britain devoted to the study of language, by Richard Chenevix Trench (1807–86), who along with Frederick Furnivall (1825–1910), and Herbert Coleridge (1830–61), became one of the three founders of the OED. In these lectures, entitled ‘On Some Deficiencies in Our English Dictionaries’, Trench highlighted the ways that existing dictionaries fell short: they failed to survey literature for suitable quotations to illustrate the first use of a word, its etymology, and its meaning; they failed to include obsolete terms by any consistent method; they failed to discriminate between synonyms; they missed many useful illustrative quotations; they were cluttered with irrelevant and redundant information; and they were inconsistent in their coverage of families and groups of words. Trench and his colleagues declared that it was time for ‘an entirely new Dictionary; no patch upon old garments, but a new garment throughout’, and called on the Philological Society to support its creation (1860, 1).

The esteemed members of the London Philological Society were supportive of the proposal, but one wonders whether they would have so readily agreed to begin such a daunting task had they known that the New English Dictionary on Historical Principles (it was not officially called the Oxford English Dictionary (OED) until 1933), would take seventy years to complete. The aims of the dictionary project were hugely ambitious: to be an ‘inventory of the language’ describing the pronunciation, history, meaning, and usage of every word in the English language from 1150 C.E. to the current day, divided into three historical periods: c. 1250–1526, 1526–1674, and 1674 onwards. As the co-founder and first editor, Herbert Coleridge, declared: ‘every word could tell its own story - the story of its birth and life, and in many cases of its death, and even occasionally of its resuscitation’ (quoted in Trench 1860, 72). These aims echoed the vision of earlier German lexicographers such as Franz Passow and Jacob Grimm, and spoke to a vision of a dictionary that attempted to be descriptive rather than prescriptive, and this could only be achieved lexicographically through the analysis of thousands of printed sources showing how words were used over time.

When faced with the daunting task of describing every word across such vast periods and giving a history of each one, the founders of the dictionary realised that a small group of men could not do it alone. They opened up the task to the public and invited them to read their local texts and send in words and citations. Crowdsourcing in this way was an idea that had been successfully pioneered in Germany several decades earlier by the lexicographers Wilhelm and Jacob Grimm for the Deutsches Worterbuch which sought assistance from the German scholarly community. The OED founders went one step further, inviting not just scholars but also the general public. In this way, the OED could accurately be described as the wikipedia of the nineteenth century.

Throughout the dictionary’s seventy-year compilation, several thousand people responded to the invitation to contribute to the dictionary. These volunteers made up a collaborative network that included men and women of all ages from varied social backgrounds and disparate corners of the globe. There were three main types of public volunteers: Readers who read books and sent in citations; specialists who were experts in a particular subject and advised on certain words; and subeditors who arranged quotations, prepared definitions, and marked and corrected proofs.

This network of contributors expanded vastly under the leadership of the third editor, James Murray (1837-1915), in 1879, but even the first editor, Herbert Coleridge, by 1859 had attracted 150 volunteers who helped him by reading books and writing out citations that demonstrated the usage and meaning of words on ‘slips’ of 4 x 6-inch paper. When Murray took over the editorship, he made an official appeal to English speakers around the world, asking them to read their local texts and send him examples of how words were used. The response was massive. So many people responded, from within the British Empire and beyond it, that Royal Mail installed a red postbox outside Murray’s house in north Oxford to cope with the volume of mail. Murray devised a system of storage for all the slips in shelves of pigeon holes that lined the shed, later called the Scriptorium, in the back garden of his house where he and his assistant editors worked.

Nineteenth-Century Context: Empire, Europe, and Continental Philology

The mid-nineteenth century was a crucial time in history for English lexicography. Many factors coincided at this moment - the emergence of continental philology; the automatisation of the postal service; the use of steam technology for ships, trains, and printing presses; cheaper publications; and a growing reading public - to create the perfect conditions for the successful creation of a new English dictionary that was historical, descriptive, comprehensive, and publicly crowdsourced.

The period had witnessed advances in the study of the history and evolution of language in Europe but stagnation amongst British philologists who were slow to catch up. While scholars on the Continent such as the Danish Rasmus Rask (1787–1832), the German Franz Bopp (1791–1867), and the German Jacob Grimm (1785–1863) were pioneering methods for systematically comparing languages and tracing a word’s etymology, their British counterparts had been stuck in older models of speculative language theory. The establishment of the London Philological Society in 1842 marked a watershed moment for British disciples of the new comparative philology, and it is little surprise that the proposal for a new dictionary based on scientific, historical principles came from within its ranks. It was clear to most members of the Philological Society that Europe was more advanced than England in the study of philology and the creation of historical dictionaries. Several European languages already had large comprehensive dictionaries in progress: in Germany, the Brothers Grimm had begun Deutsches Worterbuch in 1838; in France, Emile Littre had begun the Dictionnaire de la langue francaise in 1841; and in the Netherlands, Matthias de Vries had begun Woordenboek de Nederlandsche Taal in 1852. British lexicographers had some catching up to do. It was time to shift away from speculative language theory, as evidenced by the fanciful etymologies of Richardson and Webster, in favour of more systematic and empirical investigations, and that vision was borne out in the creation of the first edition of the OED.

The First Edition of the OED

The first edition of the OED (OED1) was proposed in 1857, begun in 1859, and completed in 1928. Although it was known informally as the ‘Oxford English Dictionary’ since the 1890s, it was originally called the ‘New English Dictionary on Historical Principles’ and officially kept this title until 1933. The first edition covered 400,000 words in ten volumes, and took six main editors, several teams of editorial assistants, and thousands of volunteers around the world to complete it. The dictionary was pioneering in its policy and practice: it was historical rather than synchronic; descriptive rather than prescriptive; and global rather than confined to Britain in both its content and compilation.

While no other dictionary until the OED had combined all these elements, no one element was unique to the OED itself. There had been earlier dictionaries in English or other languages that were comprehensive (Johnson’s dictionary and the Grimms’s dictionary of German), historical (Jamieson’s dictionary of Scots), descriptive (Richardson’s dictionary), and crowdsourced (the Grimms’s dictionary, although this was confined to a network of scholars rather than the public). What was unique about OED1 was that it combined all these features: it aimed to include every word in the English language; it traced a word’s usage across time by showing its first appearance in a written source until the present day; it attempted to describe how a word was actually used rather than how it should be used; and it was crowdsourced and drew on the help of members of the public rather than mere scholars and specialists (although it included them too, of course). No other dictionary comprised all these features simultaneously.

Lexicon totius Anglicitatis: a Dictionary of All English

The founders aimed to create a Lexicon totius Anglicitatis, a dictionary of all English (Trench 1860, 64). As Trench explained, the editors would draw ‘as with a sweep-net over the whole extent of English literature’ so that ‘innumerable words … which are lurking unnoticed in every corner of our literature, will ever be brought within our net’. According to him, ‘the business which [the lexicographer] has undertaken is to collect and arrange all words … whether they do or do not commend themselves to his judgement’ (Trench 1860, 69-70).

While it is never possible to document every single word in a language, in practice the editors of OED1 did their best to implement these lofty ideals of inclusion and comprehensive coverage, except for the coverage of swear and cuss words. Victorians are frequently portrayed as prudish when it comes to sex and morality, and although many historians contest this characterisation, the OED’s treatment of cussing, swear words, and taboo terms is one way in which the editors live up to this portrayal. Despite existing in written English before the nineteenth century, words such as fuck and cunt were excluded from the OED. In addition, scholars have highlighted ways in which the OED editors displayed biases relating to ideologies of race, gender, and class in the dictionary entries but, while there are isolated instances that support such criticism, it is certainly not as endemic as some scholars have implied.

If the dictionary was to include all words in English, then its editors needed to reach out far beyond Britain to include all varieties of English. At a time in the nineteenth century when the British Empire was at its zenith in terms of expanse and power, Murray’s invitation to readers around the globe ensured that the OED was an international text. Indeed, words flooded in from contributors around the world, most of which were included in the dictionary, making it truly global in coverage. There is a myth that the first editors of the OED were anglocentric and deliberately excluded slang or foreign words but this is far from the truth. They included thousands of loanwords and words from World Englishes which were sent in from the public around the world - the first page alone included aardvark, aardwolf, ab2, aba, abaca. Indeed, apart from their prudish coverage of swear words, the OED1 editors were incredibly enlightened in their coverage of marginal language, including slang, and they frequently quoted from non-canonical and unconventional sources.

When the OED was completed in 1928, the British Prime Minister declared it a ‘national treasure’. This echoed a sentiment that had haunted the dictionary throughout its compilation. Many people had tried to ascribe nationalistic intentions to the editors of the dictionary, but such intentions are difficult to prove and little evidence supports this view. The editors had a more ambivalent understanding of the role of the dictionary in national identity, and this was rooted in their view that the English language was global and extended beyond the boundaries of England. This should not surprise us: as literary criticism has reminded us in recent decades, the author’s intention is not necessarily the same as the reader’s understanding and the reception of a text.

The Editors of the First Edition of the OED

One of the three founders of the OED, Herbert Coleridge, grandson of the poet Samuel Coleridge, became its first editor, but he died within two years of the job at the age of thirty-one (supposedly a result of sitting through a lecture at the London Philological Society in damp clothes) (Murray 1977, 136). His inspiring vision - that ‘every word could tell its own story’ - set out the editorial policy for the dictionary. Coleridge initiated many enduring editorial practices such as reaching out to volunteers for assistance.

Frederick Furnivall, who alongside Coleridge and Trench had founded the dictionary project, took over after Coleridge’s death and managed the project from 1861 to 1879. Not much direct work on the dictionary took place during Furnivall’s reign but he did set up systems and support structures that would prove invaluable to the project in the long run. He was a hugely colourful figure, who in typical Victorian fashion was involved in many activities simultaneously - from coaching the first female rowing team to creating several societies such as the Early English Text Society, the Chaucer Society, and the New Shakspere Society which republished hundreds of early English texts which were subsequently used as evidence in the dictionary.

It took twenty-five years for the project to publish its first portion of the alphabet, A-Ant, which came out in 1884 under the watch of the third and most famous editor, James Murray (1879–1915). He was joined by a team of about a dozen assistants and three main editors - Henry Bradley (1845–1923), Charles Onions (1873–1965), and William Craigie (1867–1957). Murray’s commitment to the OED was legendary. He worked on it every day from 1879 until his death in 1915. He had started to work on the dictionary in the house in north Oxford where he lived with his wife Ada and their eleven children. But eventually the number of books and volume of papers became too much and he moved the project out of the house into a purpose-built shed, his ‘Scriptorium’, in the back garden. Murray worked here with a small team of assistant editors, despite the cold and dank conditions, for the rest of his life. He died on the letter T, and never knew whether his life’s work would ever be finished.

Murray had trained an excellent team who, despite World War I, ensured that work continued on the project. Henry Bradley took over as chief editor until his own death in 1923, when William Craigie took over the project, in collaboration with Charles Onions, and saw it through to completion in 1928.

The OED Supplements

In his speech at the launch party of the ten volumes of the first edition of the OED on the 6 June 1928, the Prime Minister Stanley Baldwin reminded everyone that the mammoth task was not exactly over: a supplement volume was pending. In fact, work was already under way and the editors had been gathering materials for a supplement for decades. The OED’s long gestation meant that, by 1928, it was over forty years after the first fascicle had been published, and all that time the editors had been collecting slips for hundreds of additions and revisions for all parts of the alphabet. If words or corrections came to light after the relevant OED1 alphabetical range had been published, such as for example appendicitis or aeroplane (which came to the editors’ attention after the letter A had been published), then the evidence was written on slips and put in a file called ‘Supplement’. These were set aside for the day when a proper supplement volume would be published. The lexicographic policies and practices remained largely the same on the Supplement as they had on the first edition of the OED, except the editors dropped the policy of marking foreign words in the dictionary as ‘alien or not yet naturalized’ by the use of two small parallel lines beside the headword. (This policy was reinstated by Robert Burchfield in the 1970s and 1980s but was dropped again for the third edition.)

As soon as the ten volumes of the first edition were published, Craigie and Onions, the two main editors still involved in the project, immediately began updating it by compiling new entries for missing words, or new information for existing entries. The task of editing the 1933 Supplement was complicated by the fact that the editors lived on separate continents and communication between them was laboured. Onions and his team were based in Oxford, and Craigie and his team were based in Chicago where they were also working on the Dictionary of American English (1938–42). In 1933, a single volume Supplement was published and the original name of the dictionary ‘New English Dictionary on Historical Principles, founded mainly on the materials collected by the Philological Society’ was officially changed to the title it had been informally known as for several decades: the Oxford English Dictionary. More Supplement volumes would follow over the next fifty years. The ‘first Supplement’ by Craigie and Onions in 1933 was followed by the ‘second Supplement’ by Robert Burchfield which was published in four volumes in 1972 (A-G), 1976 (H-N), 1982 (O-Scz), and 1986 (Se-Z) respectively. Originally from New Zealand, Burchfield had come to Oxford as a Rhodes Scholar in 1949. He remained chief editor until the publication of the fourth and final volume of the Supplement in 1986.

The Second Edition of the OED

In 1986, two of Burchfield’s editorial assistants, John Simpson (b. 1953) and Edmund Weiner (b. 1950) became co-editors of the OED, preparing the second edition which was published in 1989. The second edition was not really a revision of the first edition but rather a combination of the first edition with Burchfield’s Supplement and 5,000 new entries. But the process of integrating these texts and bringing them together as one digitised entity was hugely complicated and a massive deal in data science in the 1980s when the dictionary team collaborated with computer scientists to bring about the electronification of the OED. The process cost $13.5 million over five years, and the second edition was published in twenty volumes of 22,000 pages in 1989. A CD-ROM version of the work, which was considered cutting-edge technology at the time, was released in 1992. In 1993 and 1997, three volumes of Additions to the Second Edition, edited by Michael Profitt (b. 1965), who would later become editor of the third edition of the OED (OED3), were published.

The Third Edition and OED Online

A team of seventy editors is currently working on the third edition of the OED which began in 1993 and will probably be completed in 2028, upon the one hundredth anniversary of the publication of the first edition. The OED3 is the first complete revision of the first edition and the Supplement volumes, and therefore involves the re-working of many entries which have remained untouched since the nineteenth century. In addition to revising existing entries, the OED3 adds over 3,000 new words or senses every year. Not only are these sourced by searching large databases and corpora, but also in the same way that they were 150 years ago – by appealing to the public for help. Recent appeals have included seeking assistance from the public, especially via social media, for words relating to science fiction, varieties of English, parenting, hobbies, regional terms, and youth culture.

Under the editorial leadership of John Simpson until 2013, followed by Michael Proffitt, the third edition is published gradually online: every three months new entries and revised portions of the alphabet are added to OED Online (www.oed.com, launched in 2000). Hence OED Online comprises a mix of second- and third-edition material. There are several additional features of an entry on OED Online which do not appear in any printed sources. These include the integration of the Historical Thesaurus of the Oxford English Dictionary so that users can not only read a definition of a word and learn about the life of that word across time, but also link through to the historical thesaurus which tells the user about that word’s semantic domain and its synonyms across time. In addition, users of OED Online are presented with statistics on the current frequency of a word in printed sources in a ‘frequency band’ which appears after the variant forms and before the etymology. It draws on Google Books Ngrams data, and gives the user an idea of how common a word is in contemporary usage. OED Online is currently offered under paid subscription (or free to public libraries in the United Kingdom).

OED as Data

The OED has always been slightly ahead of the curve in lexicographic method and practice, and in its application of new technology. In the nineteenth century, it was the first English dictionary to demonstrate the innovative descriptive and historical principles of continental philology. In the twentieth century it applied and developed cutting-edge technology to integrate its various versions and to create the second edition. Now in the twenty-first century the OED is offering its data for experimentation with digital humanities, and for those who require high-quality, curated language data for the development of morphological analysers, ngrams, virtual assistants, text-to-speech tools, topic modelling, sentiment analysis, part-of-speech taggers, and chatbots. Indeed, the vision of the marketing team at the OED is shifting away from seeing the dictionary as a discrete printed text towards seeing it as language data with unlimited functions and uses. Hence, the OED is forging collaborations with scholars, industry, and local communities in order to respond to changing uses of dictionary data, which includes the development of apps and APIs, and the increasing support of machine learning, natural language processing, and artificial intelligence. It remains to be seen whether these recent digital developments outlive the need for the OED to be offered to users as a discrete text. One thing remains certain: no other language content can match the OED in its high degree of quality, curation, and scholarly rigour.

Twentieth and Twenty-First-Century Dictionaries

Chapter 15 The English Period Dictionaries

The idea for a series of English period dictionaries on historical principles dates to 1919, when William A. Craigie, one of the chief editors of the Oxford English Dictionary (OED), proposed such a series (to include the Middle English, the Early Modern English, and the Modern English periods, and eventually to include the Old English period) in order to extend and supplement the treatment in the OED. Work on an Early Modern English Dictionary (1475–1700) began in 1928 at the University of Michigan under the direction of Charles C. Fries and was carried out in earnest until 1939, but was then suspended for both financial and theoretical reasons, though some work was carried on informally until 1943. Further research was carried out, primarily on the citation file, between 1968 and 1978, but plans for a full-fledged dictionary were ultimately abandoned. In 1994 the quotation slips (both those originally donated by Oxford University Press and those collected at Michigan) were sent to the Press for use in its third edition (in progress) of the OED. Work on a separate dictionary of Modern English has also been subsumed by OED3. The other two – the Middle English Dictionary and the Dictionary of Old English – are either completed or still in progress.

The Middle English Dictionary

History

Three years after Craigie’s proposal the Modern Language Association of America (MLA) began to take an interest in a comprehensive dictionary of Middle English, and by 1925 had assumed responsibility for promoting its compilation. In that year, work began at Cornell University, with Clark S. Northup as editor. Northup was able to obtain the Middle English slips for A through G that had been collected for the OED, and he and his staff supplemented these slips by excerpting various Middle English texts. By 1928, funds for the work at Cornell had been exhausted, and during the next two years the MLA tried to secure both funding and one or more university sponsors for the project.

In early 1930, the University of Michigan invited the MLA to move the dictionary to Ann Arbor, in the expectation that the presence of the dictionary would benefit the Early Modern English Dictionary already in progress there, to which the OED had already donated its entire stock of quotations for the Early Modern English period. The Cornell materials were transferred to the University of Michigan, and Samuel Moore of the Department of English was chosen as editor. The OED’s Middle English slips from H to Z were already in Ann Arbor when Moore began his work, the total donation (A through Z) amounting to approximately 430,000 slips, including both those used in the printed dictionary and those rejected; in the next few years, the slips for the 1933 OED Supplement were also transferred to the University of Michigan.

During the first fifteen years, under editors Moore (1930–4) and Thomas A. Knott (1935–45) – which coincided with the Depression and World War II – the main activity of the small staff, assisted by a number of volunteers, was to carry out an extensive and systematic reading programme to supplement the original collection of citations; it has been estimated that by 1944 the collection of slips had grown to 1,360,400. During these same years Moore and two of his editors (Sanford Meech and Harold Whitehall) completed and published a dialect survey, ‘Middle English Dialect Characteristics and Dialect Boundaries’ (1935), and Knott and his staff prepared and circulated some specimens of individual letters in 1937 and 1940 (L, and later A), though these caused disappointment and produced serious criticism from some reviewers, and very little further work was completed during the War years.

It was not until 1946, with the appointment of Hans Kurath as editor (1946–61), that the Middle English Dictionary (MED) in the form in which it exists today began to take shape. By the end of that year, Kurath had drawn up a formal editing plan, which was based on the strengths of the MED collection and on what he believed could be done well: 1) there was to be a full display of Middle English quotations from the extensive collection in Ann Arbor; 2) there was to be a systematic treatment of what he called ‘the formal features of M[iddle] E[nglish] – spellings, grammatical forms and regional variants’; and 3) the meanings of the Middle English words were to be conveyed ‘in the briefest form possible – by giving the Modern English equivalents (with clarifying comments, when needed) and resorting to explicit definition only when translation into M[oder]n E[nglish] is not feasible or [is] misleading’ (cited from Kurath’s unpublished 1946 report on the MED).

Between 1946 and 1952, editing began on E and then progressed to F (A, B, C, and parts of D had already been edited according to Knott’s plan, but were postponed for re-editing until Kurath’s plan had been tried out on E and F). During this period, all slips were refiled using Southeast Midland headwords, the dating of manuscripts was set by correspondence with librarians and scholars, the short titles with their dates were put into final form, and finally, in 1952, the first fascicle (the first part of E) was published by the University of Michigan Press. Two years later, a description of Kurath’s editing plan was published in the original Plan and Bibliography (1954). His successors, Sherman M. Kuhn (1961–83) and Robert E. Lewis (1982–2001) followed that plan in broad outline and basic essentials through the final fascicle in 2001, though they made a number of improvements as time went on.

Between 1952 and 1984, fascicles were published at an average rate of two per year, progressing from E and F to A through D, then from G to the end of P; and from 1984 to 2001, from Q on, at an average rate of three per year, thanks to a change to a computer-assisted system. With the publication of the final X-Y-Z fascicle in 2001, the completed MED proper runs to 14,939 pages in 115 fascicles (combined into thirteen volumes), with 54,081 separate entries and 891,531 quotations. A second edition of the Plan and Bibliography appeared a few years later, in 2007.

Since 1998, an electronic version of the MED, developed under the editorial direction of former editor Frances McSparran, has been available online (https://quod.lib.umich.edu/m/mec/); it constitutes one of the three resources in the Middle English Compendium (the other two are the HyperBibliography of Middle English, based on the MED bibliographies, and the Corpus of Middle English Prose and Verse, a series of fully searchable electronic texts linked to the HyperBibliography).

Since the early years of publication, a supplement has been part of the long-range plan. The files for it began to be organised in the 1960s, and between then and 2001 they were systematically added to, resulting in some 20,000 slips of supplementary materials. Some preliminary work was done during the summer of 2001, when four members of the production staff, under the direction of McSparran and editor Marilyn Miller, organised and classified these materials, partially entered the additional quotations into the computer, and proofed them against the texts. Since 2016, Paul Schaffner (former editor and currently senior associate librarian at the University) and a small staff have been appending these new quotations to the appropriate entries online, revising some definitions and etymologies, correcting the obvious errors, and adding new entries to both the HyperBibliography and the Corpus of Middle English Prose and Verse.

Characteristics and Innovations of the Middle English Dictionary

The first key characteristic of the MED is the time period it covers. The MED covers the variety of English known as Middle English, which was spoken and written in England, Ireland, and Wales during the period between Old English and Early Modern English. Without going into detail about the arguments for the dates, the editors of the MED set 1100, when the first Middle English began to appear, as their beginning date, and 1475, ‘the beginning of printing’, as their end date. As time went on, the end date was gradually extended to 1500, since so many texts without known composition dates appear in manuscripts dated simply ‘a1500’, which can mean any of the following: ‘between 1475 and 1500’, ‘late fifteenth century’, or ‘any time in the fifteenth century’. Printed books, however, all produced after 1475, are excluded, on the grounds that the invention of printing led to a standardisation of spelling practices that anticipates those of Modern English.

The second feature worth addressing is the thorough overhauling of the bibliographical apparatus, which was one of the first items of business during Kurath’s editorship and was undertaken between 1946 and 1949 by three members of the editorial staff (Margaret Ogden, Charles Palmer, and Richard McKelvey). The previous apparatus had accumulated over a number of years and contained a mixture of composition dates and manuscript dates and various other inconsistencies. The system that was adopted contained what has been called the ‘double-dating’ feature. As Kurath put it in his unpublished 1947 report:

We have decided to assign the MS [manuscript] date to all texts, and to add the composition date in parentheses if the text was composed a quarter of a century or more earlier than the date of the MS from which we quote. Paleographic evidence gives us a fairly reliable approximate date for all the MSS, whereas the composition date is often highly conjectural.

Because the purpose of the MED is to describe, or recover, the English language current between 1100 and 1500, this bibliographic system, with its focus on manuscripts, puts the emphasis on the closest thing available to the running passages of speech or writing used as data for modern dictionaries. Each manuscript, whether preferred or non-preferred, is treated as a witness (more coherent or less coherent, more reliable or less reliable, as witnesses are), and the dated short title stands for that witness rather than for the edition in which it happens to appear. If a manuscript reproduces a garbled or erroneous reading, a bracketed reconstructed reading may be added after it to indicate what the reading was presumably intended to be or what it was derived from, but the manuscript reading is kept intact. In that way, the integrity of the scribe, the actual witness, and of his manuscript, his testimony, are preserved.

Third, the MED, though it belongs to the genre of historical dictionaries given its coverage of a period of historical time and its use of chronologically arranged quotations, has characteristics of a synchronic dictionary, covering as it does a specific slice of the language as a whole. But this raises the issue of how to distinguish the chronological (or diachronic) dimension from the synchronic dimension. Many changes occurred during the nearly 400-year period of Middle English, and it would give a misleading impression if there were not some way to call attention to these changes. Lexical changes can be seen in the dated quotations, but orthographic (and, through the orthographic, phonological) and morphological changes, on the other hand, can be seen only in the variant spellings and forms, of which, depending on the word, there may be a large number represented in the quotations. The OED calls attention to the diachronic aspects of these variant spellings and forms by labelling them by century in its first and second editions, and from the sixteenth century on in its third edition (in progress). The MED, in contrast, labels them ‘early’ (up to c. 1300) or ‘late’ (between 1450 and 1500), with unmarked spellings and forms to be assumed to cover the whole period (or as much of it as is covered by quotations) or at least as not specifically ‘early’ or ‘late’. These two labels have been used from the early volumes of the MED on, though the number of occurrences of the label ‘early’ increased greatly in the later volumes, by more than double in Kuhn’s part (G through P) and more than eightfold in Lewis’s (Q through Z).

Fourth, it is notable that Middle English is predominantly dialectal in both its spoken and its written varieties, and it was not until the fifteenth century that this regional diversity gave way to the dialect of the Southeast and Central Midlands that was becoming current in London, and (in writing) to ‘Chancery’ English, the official usage of the London administration. This situation accounts for the rationale, early in the history of the MED, for the dialectal survey carried out by Moore, Meech, and Whitehall (1935), and for the heavy emphasis on regional dialect in the original Plan (1954). But it also raised the issue of how to indicate the geographical/regional dimension and how to distinguish it from the chronological dimension. Kurath seldom used dialectal labels among the variant spellings and in the form sections in the first six volumes (A through F), preferring instead to let the list of regional texts and manuscripts and the discussion of dialectal characteristics in the Plan stand in their place. In Kuhn’s part (G through P) the number of labels increases, especially from M on; in Lewis’s (Q through Z) there was a further, larger increase. In addition, there was an attempt in the later volumes to indicate the combination of the diachronic and the geographical/regional (e.g. ‘early SWM [Southwest Midland]’) whenever possible, a practice reflected in OED3.

Fifth, in the editing plan devised by Kurath in the late 1940s, the MED was conceived as primarily a bilingual translation dictionary, intended to be used by those who are fluent in English, and that general orientation has remained throughout. A Modern English translation or paraphrase is often the most effective way to convey the meaning of the Middle English word and is the preferred method in the MED. This typically involves the use of a synonym; if there is any ambiguity to the synonym, a second and sometimes a third synonym or a qualifying phrase or statement may be added. Explicit definition is normally reserved for words with involved senses (frequently verbs or abstract nouns) or words which have no equivalents in Modern English (medieval tools and weapons, legal concepts, medical terms, and the like).

In the course of the MED’s production, however, there has been a gradual increase in the amount of explicit definition, and the definitions themselves have become more elaborate, more precise, and more descriptive or contextual, with more editorial guideposts for the reader along the way. This is in keeping with the consensus among lexicographers that definitions are the core of a dictionary and that the MED has an obligation to define as accurately as possible. The editors have always tried to use at least one quotation for each quarter century, if available, but in the later volumes it is not unusual to find two, three, or more quotations for each quarter century, especially from the last quarter of the fourteenth century to the end of the fifteenth century. At the same time, the average length of the quotations has increased, so that the cuttings could stand on their own syntactically and would give the reader enough context to actually show how they documented, or illustrated, the definitions.

The Dictionary of Old English

History

The Dictionary of Old English (DOE, www.doe.utoronto.ca) is a historical dictionary based on records written in English between 600 and 1150. It documents the earliest period of the language, just as the MED provides a comprehensive record of English vocabulary between 1100 and 1500. The DOE will take its place with the MED and the OED, the series providing full coverage of the English lexicon. The editor of the OED, James Murray, never intended to provide complete coverage of Old English vocabulary. His goal was to survey the history of a word from its first appearance in the language down to modern times. If a word did not survive beyond the Old English period (i.e. beyond 1150), it was not included in OED. The DOE complements the OED by providing entries for the 80 per cent of the Old English vocabulary that was excluded by Murray’s editorial policy. The DOE also complements the MED in its analysis of the transitional texts between late Old English and early Middle English so that no words falling outside the boundaries of either stage of the language are overlooked.

Before the DOE there was only one dictionary of Old English which attempted to be comprehensive: An Anglo-Saxon Dictionary, edited by J. Bosworth and T.N. Toller (1898), together with its Supplement, edited by T.N. Toller (1921), and an Enlarged Addenda and Corrigenda edited by A. Campbell (1972). Because of its insufficiencies, a new Old English dictionary was proposed at a specially convened conference at the University of Toronto in 1969. Attendees from Canada, the US, and Great Britain supported the initiative. Through the Old English Group of the Modern Language Association, an International Advisory Committee was established for the DOE, which appointed Angus Cameron (Toronto), and Christopher Ball (Lincoln College, Oxford) as Editors that same year. Ball resigned in 1976 while Cameron continued as editor until his death in 1983.

Cameron planned for three stages in the production of the DOE: the development of the research collection (completed in 1975); the digitisation of the corpus of Old English texts and the running of its concordances (completed in 1981); and the writing of the DOE (still in progress).

1 Development of the Research Collection

As the first step to collection development, Cameron prepared a catalogue by genre of all texts in the language (poetry, prose, Old English glosses to Latin texts and glossaries, and finally, inscriptions both in runes and in the Latin alphabet) as one had not previously existed. He then assembled a microfilm library of the major manuscripts containing Old English, together with hard copy. This library has lately been enhanced by the availability of digital archives (e.g. Parker on the Web and the British Library’s Digitised Medieval Manuscripts). Next, he gathered a collection of the best print editions of Old English texts which were checked for accuracy against the manuscripts. This collection of editions is constantly updated as new editions appear and are verified against the manuscripts. Another component of the research collection was the reference library of dictionaries. Other Old English dictionaries were of primary importance, as were the MED, the OED, the Dictionary of the Older Scottish Tongue, and the English Dialect Dictionary. Dictionaries of the cognate Germanic languages, Medieval Latin dictionaries, and specialised dictionaries were also collected. Finally, the largest single category of material assembled in the collection was an archive of some 3,000 word studies, selected primarily with semantic considerations in mind. In 1983, the project published a bibliography and index of Old English word studies as a guide to the collection. The original archive has since been augmented as new word studies are published, and a digitised bibliography and word index exist internally at the project, with plans to release this tool in the future. The main collecting phase of the project took place between 1970 and 1975.

2 Digitisation of Old English Texts

The second stage of the project was the computer processing of the Old English materials. Cameron’s foresight in the late 1960s anticipated the role technology could play in lexicography, for digitisation would revolutionise how dictionaries were written and read. The movement from manuscripts to megabytes was fairly straightforward. Cameron first delimited the corpus, beginning with the earliest texts and continuing on to texts of the mid-twelfth century. As costs in the 1970s were too prohibitive for online editing, this material was typed in a typeface legible to a primitive optical scanner and then read onto magnetic tape. The tapes were then printed and proofread. Through the mid and late seventies, the editing, correcting, concording, and lemmatisation of the Old English Corpus were carried out using the text processing system, LEXICO, developed by Richard Venezky, subsequently the DOE’s Director of Computing until 2004.

The culmination of the second stage of the project was marked by three events. The first was the printing of the corpus on slips in concordance format, each slip containing the concorded word in a full sentence of context with a reference back to its printed edition. The second was the publication of A Microfiche Concordance to Old English. The concordances of individual texts were concatenated into one long alphabetic sequence and published on microfiche, each word being cited with a full sentence of context, except for some 200 spellings of high-frequency words such as he (‘he’), ond (‘and’), which were given only frequency counts. These high-frequency words were later concorded and published in full in 1985 as A Microfiche Concordance to Old English: The High-Frequency Words. The two complementary concordances represented a linguistic milestone: it was the first time the corpus of any sizeable language had been published in analysed form, and the first time any dictionary had published its full citation base. The third significant event in DOE’s digitisation was the publication in 1981 of the corpus in text format, which completed the initial computerisation phase.

In the early years of the project, Cameron carried out his research with the help of a secretary, a copyeditor, and a research assistant. He began to build up his research team in the second stage of the project: in 1976 Ashley Crandell Amos was appointed, in 1977 Sharon Butler, and in 1978 Antonette diPaolo Healey.

3 Writing the Dictionary of Old English

Specimen entries for the DOE, produced in the 1970s and discussed at various conferences of Anglo-Saxonists and lexicographers, led to the ten-field format for DOE entries: headword; part of speech; attested spellings; occurrences and usage; definitions; citations; citation parenthetical material (mainly readings from manuscript variants and citations from Latin sources); Latin equivalents (Latin corresponding to the Old English headword and found in the same context in the same manuscript); Old English references (the headword in relation to its word family); and secondary references (references to entries in other dictionaries, mainly later reflexes of the headword). In the early 1980s, the number of headwords was estimated to be 33,000 to 35,000, and that range is still accurate today. The search for a computer system then began in earnest. Up until 1982, the project had been using large mainframe computers, but for a long-term project such as the DOE, a stand-alone system was essential. Amos, who succeeded Cameron as editor in 1983, oversaw with Venezky the installation of the first computer system that year. During this time, Amos also devoted her superb analytical abilities to formulating guidelines for writing the DOE, thereby setting the project on the sure footing which led to the publication of its earliest letters. Unfortunately, Cameron did not live to see the publication of any letter of the DOE, and Amos saw publication of only the first two (D and C). Healey succeeded Amos as editor in 1989. Between 1986 and 2017, nine letters, containing 15,545 headwords, appeared in various formats, but never in print. A, Æ, B, C, D, and E were published only on microfiche; A to F on CD-ROM, and F on microfiche; A to G online and on CD-ROM, and G on microfiche; and A to H online and on CD-ROM. I, the tenth letter of a 22-letter alphabet, appeared online in 2018 under the editorship of Haruko Momma. This brief publication history suggests a corresponding narrative in which technological developments enabled lexicographic advances.

Characteristics and Innovations of the Dictionary of Old English

First and foremost, the heart of the DOE is its electronic corpus of edited texts verified against manuscripts. Cameron’s development of the corpus allowed the DOE a fresh start and a comprehensive examination of the surviving material of Old English. This is perhaps the most significant difference between the DOE and earlier Old English dictionaries. The corpus includes at least one copy of every extant text and forms the citation base of the DOE. From the earliest recorded English glossaries (dated to the end of the seventh century) to the eighteenth-century copies of Old English texts with no extant manuscript authority, all are included as constituting the body of Old English. Cameron decided also to include multiple copies of a text in the corpus if the material was of importance for date, dialect, point of view, etc. Common sense dictated to Cameron that except for a unique manuscript witness, such as the Beowulf manuscript, there is not always a ‘single’, ‘definitive’, ‘authoritative’ text. This holistic view still guides editorial thinking. Today, the 3,060 texts of Old English embody some twenty-five million characters, occupying sixty megabytes. In literary terms, this size is about five times that of the Collected Works of Shakespeare. The corpus is updated as improved editions appear and rare new finds are incorporated. The corpus is distributed in two ways. The 2009 Text Corpus on CD-ROM appears in HTML (HyperText Markup Language) and XML (eXtensible Markup Language). The XML release conforms to the latest guidelines, known as TEI-P5, of the Text Encoding Initiative, which sets the standard for the creation, markup, and distribution of electronic text. The 2009 Web Corpus is the latest iteration of the concording programmes the corpus has undergone. It enables scholars to create interactive concordances on the Old English and Latin in the corpus. The output is a sentence of context which can expand into three sentences. It also allows Boolean searches as well as searches on phrases, features most useful for textual analysis.

Second, the corpus is an essential tool for writing the DOE: it determines the number of headwords, the shape of the definitions, and the exemplary citations. Its spellings populate the attested spellings field of the DOE. The purpose of this field is to list all the attested spellings of a word, in parsed order. The attested spellings, representing material not available in previous dictionaries, lays out for the reader the evidence for testing the statements in Old English grammars. Frequency counts, given in the occurrences field, are another unique feature of the DOE that is generated from the corpus. They allow the reader to know what proportion of the evidence has been cited in the entry. Usage labels, now assigned more confidently because of the corpus, call the reader’s attention to patterns or facts – to restrictions in use or occurrence by manuscript date, dialect, genre, text, or author.

Third, in the entry-writing plan devised by Amos in the mid-1980s, the DOE was conceived of as an Old English/Modern English translation dictionary. In general, the aim was simplicity in defining. Where it is possible, without misleading, to use a one-word equivalent, this is done. The editors begin with the assumption that the senses of the word under consideration are essentially the same, and let the evidence force them into the creation of sub-senses. On the other hand, they at times define more elaborately than did previous dictionaries and with finer subdivisions, often based on context or collocation. In some cases, this may represent the fact that they must analyse a larger amount of material; in others it may only be a matter of approach. One of the main principles in defining is not to claim more knowledge than exists. If one definition seems likelier than another, it appears first, and the other possibilities which cannot be entirely discounted follow. For difficult words of uncertain meaning, editors tend to abandon the role of arbiter and instead summarise previous suggestions, leaving readers to judge for themselves. To some readers this may appear as scholarly rectitude; to others, the total abdication of editorial authority. This strategy clearly marks the DOE as a product of its times.

Fourth, the culmination of recent editorial labours was the 2016 release of the DOE: A to H online. Bundled into the electronic DOE is a bibliography of the short titles and editions of Old English texts in the corpus, as well as the short titles and editions of the Latin sources cited in DOE. The bibliography is accessible at two points: through each short title at the start of a citation by a hotlink, and on the DOE’s homepage under the heading, ‘List of Texts’. In addition to simply browsing the DOE, users can navigate using a dropdown menu. The section search models itself on the ten-field structure –the logical structure of a DOE entry – to which a markup scheme is applied for efficient searching. Now that searches are no longer restricted to headwords and the tyranny of the alphabet, the DOE can become an important research tool for interrogating legal, medical, social, literary, cultural, and other issues in Anglo-Saxon England, as well as for investigating questions of language, such as morphology, spelling, semantics, and even notions of genre or the idiolect of named authors. A dictionary is a significant repository of the culture of an age. A tagged dictionary makes this repository accessible.

Finally, technology has not only enabled new ways of searching the DOE, but has allowed it to connect outward to other digital resources. One development is the DOE’s links to other dictionaries. Users of the online DOE are able to click on hotlinks – to the OED since 2007 and the MED since 2016 – and go to those online dictionaries instantaneously. In turn, the OED and the MED have reciprocally linked back to the DOE. This advance in the history of English lexicography allows readers to trace easily the development of specific words from their beginning up to the present. The three major historical dictionaries of English are now in mutual conversation. In 2016 the DOE: A to H online enlarged its system of hotlinks to include CoNE (Corpus of Narrative Etymologies from primitive Old English to early Middle English). The information CoNE provides on the etymologies of the Germanic vocabulary attested in the Linguistic Atlas of Early Middle English Corpus (1175–1325) enriches the DOE, as etymologies were originally excluded. Technology has also enabled the display of image as well as text in the DOE to address a genuine research need: to provide readers with the visual evidence for the DOE’s interpretation of some textual difficulties. In collaboration with Stanford University Libraries’ Parker on the Web digital archive, the DOE: A to H online in 2016 linked contested words or passages in a citation to a thumbnail image of its manuscript context. Here a picture is genuinely worth a thousand words.

Chapter 16 English-as-a-Foreign-Language Lexicography

Teaching English as a foreign or additional language is a global business, which developed rapidly in the second half of the twentieth century and now employs thousands of people – teachers, examiners, course-materials writers, publishers, and lexicographers. Learners of a second or subsequent language usually acquire a dictionary to help them understand and use unfamiliar words in the new language. In the beginning stages, this will normally be a bilingual dictionary, but as learners become more advanced they may choose to use a monolingual dictionary especially designed for them as advanced learners. It is these monolingual learners’ dictionaries (MLDs) that are the topic of this chapter. As we shall see, English MLDs are a triumph of British lexicography; they constitute a genre of dictionary that is distinct from general-purpose dictionaries aimed at native speakers, and yet innovations in lexicography pioneered by MLDs have influenced native-speaker dictionaries as well.

Beginnings

The development of English learners’ dictionaries began in the 1930s in India and Japan. Working as a teacher of English in India, Michael West, together with colleague James Endicott, compiled a monolingual English dictionary in which word meanings were explained using a limited defining vocabulary of 1,490 words, so that learners could more easily understand the definitions of words that they found in texts that they were reading. It was called the New Method English Dictionary and was published in 1935. The emphasis was, thus, on enabling users to decode texts, because the words used to define items looked up in the dictionary were selected based on the likelihood that they would already be familiar to users of the dictionary.

The English teachers working in Japan, at the Institute for Research in English Teaching (IRET), were more concerned to help students with their encoding skills – writing rather than reading. H. E. Palmer and A. S. Hornby recognised that, while native-speaker dictionaries contained information useful for decoding, they provided little help for encoding. The information that Palmer and Hornby thought was needed encompassed the syntactic operation of words, especially of verbs, and the regular lexical combinations of words, such as in collocations and idioms. Palmer’s A Grammar of English Words was published in 1938; it contained around 1,000 entries of the most troublesome words for a learner of English. Verbs were categorised by one or more ‘verb patterns’ that indicated which syntactic structures were possible for the verbs taking each pattern. For example, the verb remind is listed as entering the following patterns:

V. P.4 (Verb + Direct Object) ‘remind somebody’ If I forget, please remind me
V. P.10 (Verb + Direct Object + Prep + Prep. Obj) ‘remind somebody of something’
V. P.17 (Verb + Direct Object + to + Infinitive) ‘remind somebody to do something’
V. P.23 (Verb + Direct Object + (that) + Clause) ‘remind somebody that … ’ Please remind me that I have to write a letter

Each entry is also supplied with copious examples, many illustrating typical phrases in English, so that a learner, if they could not remember the grammatical information, could memorise the phrases.

Palmer’s work was highly innovative, and it was followed in 1942 by the equally innovative Idiomatic and Syntactic English Dictionary (ISED), compiled by A. S. Hornby, together with colleagues E. V. Gatenby and H. Wakefield, and published by Kaitakusha in Tokyo. As the title implies, the dictionary aimed especially to help learners with encoding (writing) by providing information on the syntactic and lexical patterning of words. The verb patterns developed at IRET, which Palmer had used, make an appearance. For example, the entry for the verb rock reads as follows:

rock vt & i (P 1, 10, 18) cause to sway or swing backwards and forwards or from side to side, as to rock a baby in its cradle (i.e. to send it to sleep); to rock oneself from side to side.

Additionally, nouns are designated as C (countable) or U (uncountable) to indicate their possible co-occurrence with determiners, e.g. much snow (U), many snowdrops (C). Almost 1,500 illustrations help to explain the meaning of nouns with a concrete reference; snow, snowdrops, snow-plough, and snow-shoe are illustrated in this way. Definitions are couched in as simple a language as possible, though without a limited defining vocabulary; examples, crafted for the purpose, demonstrate how words are used in context.

After World War II, in 1948, the ISED was published in the UK by Oxford University Press under the title A Learner’s Dictionary of Current English. In its revised and expanded second edition, published in 1963 and edited solely by A. S. Hornby (Gatenby and Wakefield had died, though their names still appeared on the title page), the word ‘advanced’ was added to the title to become The Advanced Learner’s Dictionary of Current English, to distinguish it from smaller dictionaries aimed at less advanced learners of English. The second edition retains the emphasis on the idiomatic and syntactic characteristics of words, but with many more example sentences included. More attention is paid to the decoding needs of learners as well, with the inclusion of a broader range of scientific and technical terms.

A third edition appeared in 1974, for which the publisher’s name was added to the title, so it now became the Oxford Advanced Learner’s Dictionary of Current English (OALDCE). It was edited by A. S. Hornby, with the assistance of A. P. Cowie. This reset and thoroughly revised edition benefited from research on the English language undertaken by the Survey of English Usage at University College London. The verb patterns were reordered, and a handy reference list was provided on the inside back cover; though there is a hint in the entries that the editors were beginning to realise that the verb pattern codes were perhaps not the most understandable and useful way to present grammatical information. The entry for remind begins as follows:

remind vt [VP6A, 11, 14, 17, 20, 21] ~ sb (to do sth/that …); ~ sb of sth/sb, cause (sb) to remember (to do sth. etc); cause (sb) to think (of sth) …

Here, the patterns are spelled out as well as being coded. A large number of examples follows, illustrating both syntactic and lexical structures. Additionally, many of the pictorial illustrations are replaced by black-and-white photographs, phonetic transcriptions are revised, and US pronunciation is also indicated where relevant.

In the by now burgeoning and lucrative industry of teaching English as a foreign language, the OALDCE had set the standard, and had also been the only contender in the advanced MLD field for some thirty years. That was about to change.

The Challengers

In 1978, the publishing house Longman entered the MLD market with the Longman Dictionary of Contemporary English (LDOCE), under the editorship of Paul Procter. In 1972, Longman had published A Grammar of Contemporary English, and the title of LDOCE, as it came to be called, was meant to resonate with that of the grammar and to form a companion volume to it. LDOCE exhibited a number of important innovations, not least of which was the employment of a limited defining vocabulary, so that all definitions and examples were couched in the approximately 2,000 words that constituted this vocabulary, thus attempting to fulfil lexicographers’ aspirations to describe the meaning of words using simpler words than those being described. Moreover, every definition and example was computer-checked to ensure that only words from the defining vocabulary were being used. If it was found necessary to use a word not in the defining vocabulary, then it was spelled in small capital letters, to act as a cross-reference. For example, the definition for renal reads: ‘of, near, or concerning the parts of the body (kidneys) that separate waste matter from the blood and send it out of the body in a liquid form’. This definition thus avoids using the word urine but needs to include kidney, neither of which is in the defining vocabulary; kidney is, therefore, in small capitals.

In presenting grammatical information for verbs, adjectives, and nouns, LDOCE used a coding system that was rather more transparent than the verb pattern codes of OALDCE. The LDOCE codes consisted of a letter followed by a number. The letter ‘I’ stood for intransitive, with ‘T’ indicating transitive and ‘D’ ditransitive (with two objects); the number ‘Ø’ stood for not followed by anything, ‘1’ for followed by one or more nouns or pronouns. After that the numbers become arbitrary; ‘3’ indicates followed by an infinitive with to, ‘4’ means followed by the -ing form, and so on. The codes are displayed in a table in the inside back cover of the dictionary. The entry for remind begins as follows:

remind v [T1 (of); D5; V3] 1 (of a person) to tell or cause (someone) to remember (a fact, or to do something) … 2 (of a thing or event) to make (someone) remember (a fact, or to do something) …

Each sense of remind is illustrated by multiple examples. Each pictorial illustration has several words associated with it; some depict scenes, such as a kitchen or a building site, while others show examples of a category, such as plants or ships.

LDOCE also began the trend, which continues to this day, of taking measures to make information in the dictionary as accessible as possible to users and to facilitate finding the information looked for as effortlessly as possible. In the first edition of LDOCE such measures included: giving headword status to phrasal and prepositional verbs, such as look up, look after; making compound words headwords, such as shortsighted, short story, short-term; and putting idioms and other fixed expressions in bold print within an entry, such as in short, little/nothing short of, make short work of, short and sweet. Users could expect to search less for items nested within an entry and find more information in bold, either as headwords or standing out in an entry.

A second edition of LDOCE was published in 1987 under the editorship of Della Summers; it took users’ reactions to the first edition into account. The 2,000-word defining vocabulary was retained as a welcome innovation by users. However, the ‘impenetrable’ grammatical coding was radically simplified, and the range and clarity of examples came under review. The entry for remind now reads:

remind v [T (of)] to tell or cause (someone) to remember (a fact, or to do something): (examples …). [+obj +to-v] Remind me to write to Mother. [+obj +that] She reminded me that I hadn’t written to Mother …

remind sbdy. of sbdy./sthg. phr v [T] to appear to (someone) to be similar to: This hotel reminds me of the one we stayed in last year.

A few transparent letter symbols, like ‘T’ and ‘I’, ‘U’ and ‘C’, are retained, but in general there is more explicit spelling out of grammatical patterning.

In the same year another challenger entered the lists: the Collins COBUILD English Language Dictionary, with John Sinclair as the chief editor. In many ways, the COBUILD dictionary represented a radical departure from previous practice. It was the first dictionary to be based entirely on a computerised corpus of texts. The corpus informed the choice of words to include in the headword list, the order of the meanings within an entry, and the examples included, which were all taken or adapted from the corpus. The strapline for the dictionary was ‘helping learners with real English’.

The use of a computer corpus was not the only innovation. All words were defined using full-sentence definitions, so that the explanation of meaning reads as if it were a teacher talking in the classroom. Here is the definition of the first sense of remind:

If someone or something reminds you of a fact or event that you are already aware of, they do or say something which makes you think about that fact or event.

The meaning of nouns is usually explained, as in the following definition for mustard:

Mustard is a yellow or brown paste which tastes hot and spicy. You often have a small amount of mustard with meat such as beef or ham.

The definitions contain some indication of typical grammatical and lexical patterning. However, the detailed information about the grammatical operation of words is contained in an ‘extra column’ to the right of the entry. The first sense of remind is marked as:

v+o: usu+of/about/report-cl

The abbreviations used are explained at the appropriate place in the dictionary; report-cl, for example, is listed after report and reportage. The extra column also contains lexical information relating to synonymy, antonymy, and hyponymy: repudiate is marked as the synonym of reject and the antonym of accept; reprimand has the hypernym scold and the synonym admonish.

The headword list in the COBUILD dictionary was based on the principle of one entry per spelling. This means that homonyms, as well as items belonging to more than one word class, are dealt with in a single entry. Under light, for example, both the ‘not dark’ and the ‘not heavy’ meanings are entered, as well as the noun, verb, and adjective uses of each of these. Although informed by frequency of occurrence in the corpus, entries do have a rational structure, and each sense (thirty-one in the case of light) starts on a new line. Nevertheless, this arrangement does not make it easy to navigate some of the longer entries, and the practice was abandoned in later editions of the dictionary.

Some of COBUILD’s innovations were influential in the subsequent development of MLDs, especially the use of a computer corpus as the source of lexical data and the use of full-sentence definitions, though no other dictionary has used full-sentence definitions for all words and senses.

In 1989, Oxford published the fourth edition of the OALDCE, although only the words Oxford Advanced Learner’s Dictionary appeared on the front cover. It was edited by A. P. Cowie, Hornby having died in 1978. The verb pattern scheme was thoroughly revised, with grammatical codes now more transparent, consisting of a capital letter (‘I’ for intransitive, ‘T’ for transitive, ‘C’ for complex-transitive, etc.) followed by one or two lower-case letters (‘n’ for noun, ‘pr’ for prepositional phrase, ‘t’ for to-infinitive, and so on). Close attention was also paid to the treatment of idioms and phrasal verbs, with a revised entry structure facilitating access to such items. Several thousand new words were added, examples and pictorial illustrations were renewed, and ‘notes on usage’ were adopted to explain the differences between near-synonyms or semantically related words. Change, for example, has a usage note distinguishing it from alter, modify, and vary. Otherwise, the fourth edition of OALD, as it would come to be known, was following in the Hornby tradition.

The Year of the Dictionaries

The year 1995 became known in lexicographical circles as ‘The Year of the Dictionaries’. LDOCE appeared in a third edition; the second edition of COBUILD came out, as did the fifth edition of OALD; and a new MLD, the Cambridge International Dictionary of English (CIDE), appeared. Some wondered whether MLDs had reached their apogee.

LDOCE3, edited by Della Summers, is corpus-based and encompasses both British and American English, with a particular focus on spoken English. The corpus informs the coverage (i.e. which words to include in the dictionary). Frequency information from the corpus is used to determine the order of homographs as well as the order of meanings within an entry, and explicit frequency information is given for words that are among the 3,000 most frequent items in both spoken and written English. For example, remind is marked as S1, W2, indicating that it is among the 1,000 most frequent words in spoken English and among the 1,000–2,000 most frequent words in written English; while report (verb) is marked as S3, W1, indicating that it is among the 2,000–3,000 most frequent words in spoken English, but among the 1,000 most frequent in written English.

Grammatical coding has all but disappeared from LDOCE3. The symbols ‘I’, ‘T’, ‘C’, and ‘U’ are retained, but everything else is spelled out, and each structural pattern is illustrated by one or more examples. More extensive information on collocations and idioms is included: under populate is entered ‘densely / heavily / highly / thickly populated’, as well as ‘thinly / sparsely populated’, all given in bold type. Accessibility – finding the meaning of a word that you are looking for – is aided by the inclusion of ‘signposts’ in longer entries; each meaning starts on a new line and is preceded by the signpost, so that a user can glance down the column and identify the desired meaning. The entry for remember has the signposts the past, information/facts, to do/get something, keep sth in mind, honour the dead, give sb a present. Even longer entries are prefaced by a ‘menu’ of headings, under each of which a number of meanings is grouped: the entry for put has a menu of eleven headings, encompassing some twenty-seven meanings, beginning with the headings move sth, change sb’s situation, say/express, ask for an answer/decision. While pictorial illustrations are retained, the dictionary also includes a number of full-page colour plates, illustrating scenes like the ‘kitchen’, and also nouns such as ‘physical contact’ and prepositions of ‘position and direction’.

LDOCE3 was also issued with a companion CD-ROM, which contained the dictionary, together with audio pronunciations and a raft of search facilities (on CD-ROM dictionaries, see Hargraves this volume).

COBUILD2, based on an expanded corpus of 200 million words, also contains frequency information, though on a different scale from that in LDOCE3 and undifferentiated for spoken and written language. There are five bands, indicated by black diamonds, of unequal size: the band indicating most frequent words, marked with five black diamonds, contains 700 words; the next band 1,200 words; the third band 1,500 words; the fourth band 3,200 words; and fifth band (just one black diamond) 8,100 words. For example, eye is in the most frequent band, eyebrow has two black diamonds, and eyelash only one. The editors had realised that some dictionary users found the policy of one entry per spelling rather confusing; in this edition, some of the longer entries are supplied with ‘superheadwords’, to divide the entry into semantically coherent sections. The entry for fire, for example, has been divided into three, with the superheadwords ‘fire 1 burning, heat or enthusiasm’, ‘fire 2 shooting or attacking’, and ‘fire 3 dismissal’. Full-sentence definitions have been retained for all meanings, and the examples have been newly chosen from the expanded corpus. The extra column contains the frequency bands, as well as the grammatical information; hyponymy is no longer indicated, although synonyms and antonyms are.

OALD5, edited by Jonathan Crowther, showed that it had learned from its competitors. It used the 100-million-word British National Corpus to inform lexicographical decisions, including adding to the vocabulary coverage, determining syntactic patterning, and providing appropriate examples. It also, for the first time, used a limited defining vocabulary – of 3,500 words – for the description of meanings. Additionally, it sought to enhance its attractiveness to the advanced learner by including a number of ‘study pages’ on grammatical topics, as well as full-colour maps and pages of cultural information.

CIDE, from Cambridge University Press, edited by Paul Procter, was the fourth MLD to enter the market, and it innovated in a number of interesting ways. As had by now become normal practice in MLD lexicography, CIDE was based on a corpus, the Cambridge Language Survey (CLS). Headwords are based not on spellings or lexemes, but on meanings or senses; words may have multiple entries, each provided with a ‘guide word’; low, for example, is entered seven times, as low distance, low small in amount, low not important, low not honest, low sound, low sad, and low cow noise. This has the effect of increasing the number of headwords, but decreasing the length of entries, and arguably making it easier for the user to find the meaning they are searching for. Definitions are written using a limited defining vocabulary of under 2,000 words.

Close attention is paid in CIDE to phraseology: appropriate prepositions, collocations, and idioms are marked in bold within entries, and the dictionary is provided with a comprehensive phrase index, in which a phrase is entered under all its main words, with page and line number indicating where it is entered in the dictionary. Grammatical information is always linked to an example, which is given first; so, the entry for remind includes the following:

remind obj … v [T] to make (someone) aware of something they have forgotten or might have forgotten • Could you remind Paul about dinner on Saturday? • Please remind me to post this letter [+ obj + to infinitive] • I rang Jill and reminded her (that) the conference had been cancelled [+ obj + (that) clause]

The dictionary also contains a number of ‘language portraits’, many of which deal with grammatical topics.

As well as the CLS, CIDE used a corpus of learner English, from which a number of lists of ‘false friends’ (words that are spelled similarly in two languages but which have different meanings) were compiled, covering the major languages of Europe together with Japanese, Korean, and Thai. These lists were not continued in the second edition of the dictionary, published in 2003 and renamed the Cambridge Advanced Learner’s Dictionary (CALD); this edition, however, included a large amount of additional material to help the learner with using English.

Further Competitors

A fifth MLD came on the market in 2002: the Macmillan English Dictionary for Advanced Learners (MEDAL), edited by Michael Rundell. MEDAL incorporated many of the innovations from previous dictionaries: it is corpus-based; it uses menus for longer entries; it explains meaning using a 2,500-word defining vocabulary; it has the occasional full-sentence definition; and it gives frequency information for the 7,500 most frequent words. The headwords for the entries of these words are printed in red, and they are divided into three bands marked by red stars – three for the most frequent group. These 7,500 words are also those that are deemed a learner’s basic productive vocabulary, and so they have a more detailed treatment than the other words entered in the dictionary. This detail includes: grammatical patterning, spelled out and not in code; lexical patterning, with typical collocations; and copious examples from the corpus. By this time, computer corpora had become so large that the traditional lexicographer’s tool, the concordance program, produced an unmanageable amount of data, especially for more frequent words; MEDAL used a software tool called Sketch Engine, developed by Adam Kilgarriff and Pavel Rychlý, the output from which was a one-page lexical profile or ‘word sketch’. MEDAL’s information on syntactic patterns and collocational behaviour is based on these sketches.

While MLDs generally included information on US English (pronunciation and vocabulary) as well as British English, MEDAL had separate editions for British and American English. A second edition of MEDAL was published in 2007, but since 2012 it has been available only on the Internet, where it can be regularly updated (on electronic dictionaries, see Hargraves this volume).

The first specifically American English MLD appeared in 2008: the Merriam-Webster’s Advanced Learner’s English Dictionary (MWALED), edited by Stephen J. Perrault. It identifies 3,000 core vocabulary words, which are underlined in blue, and for which each entry contains an extensive range of examples, which are also printed in blue. The wealth of examples – which accompany words outside of the core vocabulary, too – is probably MWALED’s signature characteristic, and they also give some indication of typical lexical patterning. Definitions are said to be written in ‘simple and clear language’, but not with a limited defining vocabulary.

Electronic Variants

From the mid-1990s, MLDs were also made available on CD-ROM, usually as an accompaniment to the print edition, though at additional cost. The CD-ROM typically contained the full text of the print dictionary, along with additional material in the form of audio pronunciations of each word, extra pictorial illustrations, possibly extra examples, and exercises to test a learner’s vocabulary knowledge or to foster vocabulary-building. The CD-ROM version also allowed varying types of search facilities; all allowed the headword list to be searched, along with any items nested within entries, such as derivatives and idioms; and some allowed the complete text of the dictionary to be searched, including definitions and various kinds of labels. One or two MLDs on CD-ROM, notably CALD and LDOCE, offered the facility of searching by semantic field, so that, for example, all the words in the dictionary relating to music or to fashion could be listed.

As the Internet became faster with the introduction of broadband, and more reliable during the early 2000s, dictionary publishers began to migrate their products to the World Wide Web. Initially the Internet versions of dictionaries, including MLDs, merely replicated the print version, but over the intervening years MLDs in particular have begun to exploit the potential of the electronic medium. All MLDs now have an Internet presence, MEDAL exclusively so. All are free to access, but the user has to contend with advertisements appearing on the pages displayed. Some online dictionaries offer a subscription service that is advert-free and gives access to additional material.

Ways in which online MLDs seek to enhance what they offer to users are various. COBUILD is integrated with the general-purpose Collins English Dictionary and it offers video pronunciations. OALD has links to the Oxford Collocations Dictionary, as well as the facility to access etymology (word origins) and to access extra examples; it has also arranged its vocabulary into around twenty-four ‘topics’, or semantic fields, such as ‘Animals’, ‘Body and appearance’, ‘Business’, ‘Travel and tourism’, ‘War and conflict’, and ‘Work’. MEDAL provides a ‘word forms’ box, listing the possible inflectional variants of a headword; it also indicates synonyms and related words with a link to an expanded ‘thesaurus’ entry of semantically related terms.

LDOCE online has the widest range of additional features. Entries are introduced with the ‘word family’ of the headword (i.e. words that are morphologically linked); remind, for example, has ‘mind’, ‘minder’, ‘reminder’, ‘mindless’, ‘minded’, ‘mindful’, ‘mindlessly’. A ‘verb table’ is provided to display the inflectional variants of verb words. It is not just the headwords that have an audio pronunciation, but the examples as well, so that a user can hear words pronounced in context. Some words are provided with collocation boxes, to display typical collocations of the headword, and there is a thesaurus function. Additionally, the vocabulary has been arranged under around 200 topic headings, including ‘Advertising and marketing’, ‘Geology’, ‘Nutrition’, ‘Women’, and ‘Youth’. Lists of extra examples from the corpus are given, though these do not have audio pronunciations. Finally, where relevant, entries are included from the Longman Business Dictionary – for example, invest and startup.

Conclusion

Apart from MEDAL, MLDs have continued to be published in successive print editions. At the time of writing, OALD is in its ninth edition (2015), LDOCE in its sixth (2014), COBUILD in its ninth (2018), CALD in its fourth (2013), and MWALED in its second (2016). However, many users will be accessing their favoured MLD in its Internet format, and it is here that future significant developments can be expected. Without the space constraints of the print format, and with the flexibility of the electronic medium, there is much potential for further enhancement. However, MLD publishers will need to bear in mind that users may well be accessing their dictionaries not on the larger screens of desktop computers or laptops, but on the smaller screens of tablets and smartphones, which raises the challenge of how to present information, sometimes for very long entries, for ready accessibility on the small screen.

Learners’ dictionaries have come a long way since they were first conceived in the 1930s; they have evolved to satisfy users’ perceived needs more adequately, both in the range of information that they contain and in the ways in which it is made accessible. If anything, the user is presented with an overwhelming plethora of dictionary information from which they must select the particular item that they have need of for a specific lookup. It may be that adaptive technology will in the future be able to ascertain a user’s specific need and select and present only the information required to satisfy it.

Chapter 17 Electronic Dictionaries

Three features characterise the broad class of products called electronic dictionaries: the dictionary data is stored in digital format; the user interacts with the data on a screen; and use of the dictionary requires electric current. These features evolved in the digital technology revolution that affected every form of text publication starting in the mid-twentieth century and continuing even today.

Digital technology has had a profound and generally beneficial effect on dictionaries and other language reference tools. Electronic dictionaries continue to evolve and it seems likely that for people born in the current century and beyond, ‘dictionary’ may cease to have its primary denotation as a thick book filled with a list of words in alphabetical order, along with their definitions. A few databases that are legitimately included under the umbrella of ‘electronic dictionary’ as described above are intended for machine use, and only indirectly, if at all, for human use. These include such things as dictionaries of data structures, computing subroutines, or correct spellings that are accessed primarily by programs, not by people. Such dictionaries, while sharing many technological features with human-use dictionaries, are largely ignored in this chapter, which focuses on dictionaries in the traditional sense: repositories of words and their definitions for use as language reference tools.

Digitisation of Databases

A distinguishing aspect of all electronic dictionaries is that their users are considerably closer to the publisher-owned data that constitutes the dictionary – in the sense that the users have more complete access to that information, sometimes directly interacting with it. This model is in contrast to the older model, in which dictionary users were presented with an ink-and-paper representation (and often, a considerable reduction) of the dictionary data. The foundational development that enabled this more intimate and thorough access to dictionary data began with the digitisation of dictionary databases in the 1960s.

The evolution of typography away from ‘hot metal’ (that is, type and other printing elements produced by a metallic casting machine) and toward phototypesetting, in which templates for printing are produced by the exposure of typographic shapes on photosensitive paper, began after the mid-twentieth century. This development was accompanied by a proliferation of magnetic and electronic media for the storage of data. These two major changes to methods for producing words on paper evolved hand-in-hand, by necessity: the input for phototypesetting required a form of encoding best achieved by some kind of computer. For a very brief period, some phototypesetting machines accepted punched cards or paper tape as input, but very soon all inputs for typeset material were digitised and stored on magnetic or electronic media.

Dictionary data, which by its nature is highly regular and heavily formatted, was an ideal candidate for digitisation. The large variety of typographic elements that occur in a single dictionary entry required painstaking effort to produce, maintain, and change in metal typography, and the possibility of converting such complexities to simple codes and commands was a great benefit for publishers. They began the process of moving their paper-based data to digital storage in the 1960s. Efforts were at first largely home-grown because there was very little in the way of industry-wide standards for data storage and formatting at that time. The first versions of ASCII (American Standard Code for Information Interchange) had come into use, enabling the standard encoding of alphanumeric characters but little else, and publishers were left to their own devices in figuring out how to encode the intricacies of their data and translate it reliably to a printed page. In the 1970s and 80s, work was under way to standardise a platform-independent markup language, which eventually evolved into SGML (Standard Generalised Markup Language) and its many contemporary descendants, of which XML (Extensible Markup Language) is particularly prominent today. With this, dictionary publishers (and for that matter, publishers of any sort of text) had all that was required to translate encoded and marked-up text into user-friendly, readable print publications. These standards also enabled the rise of commercial dictionary software systems that have now largely replaced the earlier home-grown ones.

This migration to electronic encoding and storage made possible a facility that in itself has revolutionised text storage and language reference data in particular: searchability. The ability to find all occurrences of a single word, tag, or other text string in a large database with a single command issued from a keyboard immediately supplanted the older, less reliable, and extremely labour-intensive process of indexing. The labour involved in manually indexing a dictionary database – creating, in effect, a concordance of the dictionary – was so onerous that it was never actually undertaken, despite the benefits it would have provided. For example, the ability to generate a simple list of every definition that included the word fish would have provided a basis for spotting possible important omissions and uniformity of treatment; a collection of all the words that use terms such as container, vessel, or receptacle as a hypernym would have enabled the development of much more consistent definitions.

Perhaps the most revolutionary aspect of digitally stored and accessed text was the ease with which it can be converted to hypertext. Hypertext, to quote the definition from the CD-ROM version of the Random House Webster’s Unabridged Dictionary (1999), is ‘a method of storing data through a computer program that allows a user to create and link fields of information at will and to retrieve the data nonsequentially’. Of the wide variety of print publications that have existed since the invention of the printing press, it is hard to think of one more ideally suited to take advantage of hypertext than dictionaries, because so many parts of dictionary entries are interrelated by nature. Interconnecting more of them by means of hypertext would be a natural way to make all the data in the dictionary more immediately and easily accessible to the user.

Ironically, not a single element in the display of the definition of hypertext on the CD-ROM noted above is clickable, in the sense of being linked to additional definitions or information. This points to a challenge all owners of text data faced when they began to convert publications that had always existed only on paper to digital format. Few if any publishers had experience with the capabilities of the new medium, and so the earliest electronic dictionaries were in essence a transfer of the print-dictionary entry to a screen. There, each entry enjoyed the benefit of being slightly less cluttered, searchable, and perhaps more profusely illustrated and exemplified than was ever practical in a printed dictionary, but in no other way different.

Electronic dictionaries today, having now benefited from a development period spanning decades and existing in a highly competitive environment, increasingly take full advantage of the many user-friendly features that hypertext, multimedia, advanced markup languages, and web-browser extensions make possible to enhance the presentation of dictionary data to users. What is presented to users in an electronic dictionary now goes far beyond what was presentable on paper: electronic dictionary users may, for example, listen to pronunciations of words, find numerous images associated with the word they are looking up, and see multiple attributed examples taken from real-language usage of the words they are interested in.

The Development of Corpora

The advancing technology that enabled the digital capture, storage, and organisation of dictionary data also gave rise to a different but closely related development in text capture: the development of searchable text corpora. Electronic dictionaries and text corpora exist independently of each other, but in practice their use and development is so closely interconnected that the user cannot examine one without frequently coming into contact with the other. Therefore, a brief overview of corpus history is appropriate here.

Scholars, just like publishers, began taking advantage of the ability to capture text digitally and electronically in the 1960s, but with a different object: to create databases for researching language. Starting with the Brown Corpus (1967, about a million words), linguists were able to use computer algorithms to query large text databases in order to discover patterns in and gather statistics about language that would have been unobtainable before. Prior to the digital age, the most sophisticated corpus-like tool available to the researcher was the concordance: an extremely labour-intensive index of, for example, all the mentions of all the words in a body of work, such as the Bible or the oeuvre of Shakespeare. With the development of text corpora created by digital capture and the querying software that accompanies these corpora, a concordance of a particular word or phrase can be called up with a single search command.

Dictionary publishers very quickly saw the great benefit of having such a large body of searchable text available for their use. It enabled them to discover features of language not immediately evident to the intuitions of lexicographers, such as the variety and relative frequency of the different senses of polysemous words, the distribution of a word’s most frequent or salient collocations, or a statistical picture of the frequencies of a verb’s inflections. Accordingly, dictionary publishers began to develop their own corpora, and in some cases cooperated to produce corpora for common use. The Collins COBUILD Dictionary (1987) was the first English dictionary compiled with the aid of a corpus and it is still regarded as groundbreaking today, for that reason and others. Since that time, all reputable dictionary publishers have either developed their own corpora, or have subscribed to corpora through services such as Sketch Engine (www.sketchengine.co.uk). These corpora are accessible by lexicographers for the preparation of dictionary entries, and today the necessity of consulting corpus data in compiling dictionaries is widely accepted. Indeed, a dictionary of a modern language, or even an ancient language, would be considered a very poor product if it were compiled without the aid of a corpus today.

Historically, one of the chief constraints on the publisher of a print dictionary was space. Dictionaries had necessarily to present the greatest possible amount of information in the smallest possible space, which often meant that the optional elements – primarily, illustrative examples of word use – got short shrift, and could be eliminated unless they were deemed absolutely essential. Electronic dictionaries, with liberal and in some cases unlimited amounts of nominal space, eliminate this constraint, and so dictionary publishers are free to illustrate a word sense with as many examples as seems practical and helpful. Corpora provide a trove of such examples that give the lexicographer a wide selection of choices from natural language. The inclusion of examples taken from real usage stored in corpora is now often a selling point for dictionaries.

The Handheld Electronic Dictionary: A Limited Technology

A logical development in the concurrent evolution of digital media and the miniaturisation of electronic components was the proliferation of novel personal devices that integrated these advances in technology. The late twentieth century was the backdrop for innovative devices that were touted, when introduced, to render obsolete the older products that they were intended to improve upon. Examples include the stand-alone word processor, intended to displace the typewriter, and the personal digital assistant, designed to supplant the personal calendar or diary. Both of these devices, only a short time later, are now more or less obsolete. The same can be said of the handheld electronic dictionary. This device, comparable in size to today’s smart phones, incorporated an entire dictionary database stored digitally, a miniature keyboard from which the user typed in look-ups, and a small display screen, typically using LCD (liquid crystal display) technology. These devices were popular with monolingual, bilingual, and multilingual dictionaries and probably had their greatest market penetration in Asia.

While these devices, being both portable and searchable, efficiently addressed two shortcomings of cumbersome paper dictionaries, they fell far short of the great facility that online dictionary websites (discussed below) provide. The handheld dictionaries were effectively as constrained by space restrictions as their book counterparts because storage technology was not very advanced at the time that handhelds were developed. Additionally, handheld dictionaries provided no effective means of updating, meaning that their content was likely to become dated as quickly as that in paper dictionaries.

After peaking in the 1990s, handheld dictionaries have disappeared from many markets for want of demand. There is still a limited market for them, mainly in Asia, where they are aimed at a diverse student population that is not permitted to use smart phones in classrooms but may use these devices.

The Dictionary on CD-ROM

In the 1990s, when the price of consumer-level digital technology declined sharply and it became increasingly common for individuals and families to own personal computers, publishers of paper dictionaries felt the need to enhance the marketability of their product with an electronic version. This often took the form of a CD-ROM, tucked into a transparent pocket between the final page and the back cover of the book. While the data in digital format was, in principle, available for distribution before this time, it was not practical to make it available to end-users because of its volume: the maximum capacity of the old-fashioned floppy disk (1.44 MB at most) would have made it necessary to supply a small box of floppies with the book. As CD-ROMs (maximum capacity of about 700 MB) became a standard storage medium and a CD-ROM reader became a standard feature on personal computers, it was a natural development to include the data in searchable format for the end user.

For dictionary users who were already computer-literate, or at least aspiring to be, the inclusion of a CD-ROM dictionary with the book was indeed a value proposition, primarily for the same reason that digital data was much more valuable to publishers: searchability. With few products on the market to serve as models, dictionary publishers had a clean slate when they began to market their content on CD-ROM, and so the presentation of data in CD-ROM dictionaries was varied and experimental from the beginning, and continues in kind today. Some publishers, for reasons explained below, were parsimonious with access to their data and offered little more than simple look-ups. Others, perhaps inspired by internal use of the data, built search features that they found to be useful in-house into the CD-ROM interface. The 2003 CD-ROM version of the Merriam-Webster 11th Collegiate Dictionary, for example, includes the following search options:

Entry word is …

Defining text contains …

Rhymes with …

Forms a crossword of …

Is a cryptogram of …

Is a jumble of …

Homophones are …

Etymology includes …

Date is …

Verbal illustration contains …

Author quoted is …

Function label is …

Synonymy paragraph contains …

Usage paragraph contains …

Usage note contains …

In addition to these, advanced searches are offered in which the searches above can be combined with Boolean operators. Wildcard characters (* for multiple characters, ? for a single character) are also permitted. To name only a couple of useful examples, such facility would enable the user to find all nouns ending in -ture or words beginning in al- whose etymology includes ‘Arabic’.

By contrast, the New Oxford Dictionary of English on CD-ROM (2000) offers a single facility: the ability to look up a word that is included in the dictionary’s headword list. This contrast can be explained by the fact that, from the publisher’s point of view, there were risks associated with making their databases – the product of years if not decades of labour-intensive development – available in a form that might be digitally capturable by users. For many other kinds of books, copyright is a relatively straightforward issue: copyrighted content is regarded as unique and, in most cases, is readily identifiable as such. Dictionary data, by contrast, contains large amounts of information that does not vary significantly from one dictionary to another: pronunciations, etymologies, some definitions, and headword lists themselves may show only minor variation from one dictionary to its competitors in the marketplace. As such, the risk to dictionary publishers in offering their valuable data, literally on a platter, was that it could be copied, perhaps systematically and inexpensively altered in minor ways, and then repackaged and rebranded as a different dictionary at little cost to the data pirates. Attempts to prove copyright infringement in dictionaries are difficult, labour-intensive, and not generally successful.

The response of most publishers to these perceived risks of data appropriation was to place limitations on what data was easily retrievable by using search algorithms incorporated into the user interface of the CD-ROM. All CD-ROM dictionaries offer the user the core capability of a paper dictionary: the ability to look up a word. Many CD-ROM dictionaries, unfortunately, offer little more than this, and make it impossible, for example, to retrieve multiple definitions from a single complex or wildcard search, or any number of other functions that a computer user would expect to be standard features of a searchable dataset. In other words, publishers’ attempt to protect their data and thwart possible theft also had the effect of hobbling the user’s access to it.

CD-ROM dictionaries are still marketed today, some as stand-alone products and some as an inclusion in a print dictionary. It is not unusual to find CD-ROM dictionaries in the ‘bargain’ section of a bookstore, where they have perhaps landed because of a lack of demand, but they are still very useful products at times and places where Internet access is poor or unavailable.

Yet another aspect of commercial importance for digitally distributed dictionary data on portable media is the fact that a single distribution medium easily becomes a source for multiple users. A print dictionary is typically used by one person at a time and so it was reasonable, in the print era, for an individual to own a dictionary for personal use. A dictionary loaded onto a personal computer from a CD-ROM is also typically used by one person at a time. But a dictionary loaded onto a server, or loaded onto multiple PCs from a single CD-ROM, could in fact be used by an unlimited number of humans at one time. As a result, dictionary publishers had to contend with the possibility that the sale of a single CD-ROM might serve the needs of a large community and represent the loss of considerable sales. Their only easy means of preventing free (after the first user) distribution of their data was to go to the expense of providing individual access keys for each CD-ROM, thereby limiting their use to a single device or owner.

From the dictionary user’s point of view, a dictionary on a CD-ROM has fewer limitations than a paper dictionary. There are fewer but still considerable space limitations, and there is no possibility of keeping the CD-ROM current. Like books, they are ‘read-only’. This is not to say that a dictionary on CD-ROM is an obsolete technology: they have the great advantage of being usable even in the absence of an Internet connection, which, while increasingly taken for granted in today’s world, is not available in all places at all times.

The Internet Migration

The boom in consumer software and technology that developed in the 1990s was accompanied by another related development: that of increasing access to and facility with the World Wide Web – the part of the Internet that people usually mean when they say ‘Internet’. This is also the part of the Internet that increasingly became host to dictionary databases that were made available to the public via a user interface. The migration of dictionary data to the Internet shares many characteristics with transitions by other varieties of consumer-focused media that originally existed only or primarily in print versions. A look at the earliest presence and history of dictionaries online reveals that, like the owners of other forms of print media, dictionary publishers:

did not at first fully understand, and therefore did not at first fully exploit the capabilities of the new medium;
were at first reluctant to make any data available for free, until it became clear that many others were doing so and that failure to provide at least some free content online was tantamount to disappearing from the marketplace;
have experimented with a variety of models aimed at maximising both their market presence and their profitability, two goals that are often in conflict in the online world;
currently struggle to maintain a distinctive presence in the online world for a number of reasons discussed below.

While the move to online access has posed many challenges for publishers and owners of dictionary data, one challenge from the print era has largely disappeared in the era of the online dictionary: the challenge of keeping data up to date. Print dictionaries were typically updated, if at all, on a cycle based on decades. With the advent of the CD-ROM it became marginally more economical to issue more timely updates to dictionaries, based on the relative ease of issuing an updated CD-ROM as opposed to a new and updated book printing. Dictionaries online, by contrast, may have only a thin veil between the actual dictionary database and the user – namely, the user interface – and so it is now possible for publishers to offer new dictionary data to users in real time: there need be no barrier or time lag between the update of the dictionary owner’s data and its availability to users. Dictionary publishers can, in principle, respond to world events that have a lexical implication by augmenting or updating their databases immediately.

Today, a handful of the traditional and well-known dictionary publishers have made a successful migration online, to the point that their websites are now a popular destination for people looking up the meanings of words, just as printed dictionaries were for earlier generations. Through continuous improvements to their websites, aggressive marketing, and the use of social media, they have succeeded in establishing a presence in today’s digitally focused world. A number of dictionary publishers did not make that transition, but that does not mean that their data is now gathering dust in the stacks of libraries and home bookshelves. Many fully mature and well-researched dictionaries that ceased publication in paper live on in licensed data that is now available on a variety of dictionary aggregator sites, which can be thought of as indexers of dictionaries and dictionary definitions. The best known, surely by virtue of its ownership of the domain, is dictionary.com, but many other data-aggregation sites enjoy considerable traffic by offering an eclectic mix of licensed dictionary data, home-grown content, and advertising. While such websites lack the imprimatur of association with a traditional dictionary publisher – a feature that may be prized by older dictionary users – the dictionary aggregators wisely focus their marketing, content, and visual appeal toward a younger generation that is less likely to have grown up with a brand-name dictionary that was regarded as authoritative.

The presence of so much dictionary data online today, much of it ‘unbranded’, has had the effect of eroding the authority associated with the well-known dictionary publishers of the twentieth century. While some established dictionary publishers still have an active Internet presence, they compete with websites whose greatest claim to authority is their ownership of a user-friendly domain name associated with language reference, and whose presence on the Internet is mainly in the interest of generating ad revenue.

As the Internet and the various features of access and presentation associated with it have evolved, the number of websites that have some claim to the notion of ‘dictionary’ have also proliferated. Students investigating a word’s meaning today might choose from a more traditional dictionary publisher’s website – such as Cambridge, Merriam-Webster, or Macmillan – or they might have better luck consulting UrbanDictionary.com, the crowdsourced online dictionary for modern slang. Wiktionary.com, another crowdsourced online dictionary, combines the features of a traditional dictionary with a wiki. Still other lexically focused websites combine elements of a traditional dictionary with many other kinds of information. Examples include Wordnik.com, Vocabulary.com, WordReference.com, and OneLook.com; the last, for instance, indexes numerous dictionaries and includes a ‘reverse dictionary’, in which users enter words they would expect to find in a definition and see what words are returned. Google itself licenses some dictionary data and provides on-demand definitions from its search engine when a user types, for example, ‘define litigation’ or ‘apoplexy definition’. Finally, many universities and other research institutions have digitised data from historical dictionaries and made it available online, thereby converting these works into electronic dictionaries. While these historical dictionaries usually lack the technological bells and whistles of modern online dictionaries, they provide wide access to important data that was formerly available only to a few in select physical locations.

The increasingly interactive nature of the Internet has meant that members of the public are not merely users of data, but also contributors of data. Owners of dictionary websites can have far more direct communication with their users than was ever possible in the print era; this is partly provided by feedback facilities that site owners can easily integrate into their websites. The Internet has provided another means of communication between user and publisher that users may only be dimly aware of: the opportunity for publishers and website owners to track user behaviour simply by capturing searches or search sequences. This data is invaluable to data owners for identifying gaps in their headword lists, and also for gaining insight into how users navigate the website. While it is impossible to know what an individual user takes away from a website, data on when they start their visit, what they look for, and when they leave are readily detectable. Analysing such data trails can provide useful information that enables site owners to improve navigability and add new features or content that seem to be in demand.

Universal Access and the Dictionary in the Cloud

The Internet grew up at a time when households were beginning to use personal computers and the two developed in sync, with ever-more sophisticated web content demanding greater bandwidth. Modem technology kept pace pretty well with these demands, but for most of a decade, the Internet continued to be a nominal space to be visited while sitting at a computer, typically at work or at home. It was in one sense a vast reference library that could be accessed through one portal.

Now we are surrounded by Wi-Fi and we expect or hope to find usable bandwidth wherever we go, to enable whatever devices we carry with us – laptop, tablet, or mobile phone – to get us onto the Internet at will. With this development, the Internet has ceased to be a notional space accessed through a unique portal, and is now, in the familiar metaphor, a ‘cloud’ that surrounds us wherever we go, storing the countless bits of information that we may want to access at any moment. These bits of information certainly include the definitions of words that we do not know, or do not feel completely confident about using.

The idea of the dictionary developed over centuries, from the earliest print appearances of lists of ‘hard words’, often without definitions, to the place of privilege in the mid-twentieth century of an authoritative book that could be found in nearly every home, and was provided as a necessary gift to every young scholar. In the few decades since then, the idea of the dictionary has rapidly evolved to become, especially for today’s digital natives, an amorphous collection of data that lives in the cloud and that should be selectively retrievable, in a quick and easy way, to anyone who desires to find the definition of a word they do not know, using whatever device they have at hand. Makers of electronic dictionaries now have at their disposal all the technological enhancements that have grown up with digital technology, although it is not always clear whether they actually need all of these, or what they should do with them.

The needs of the contemporary dictionary user are varied, and perhaps broader in spectrum than the needs or interests of last century’s dictionary user, simply because of the wealth of information available in electronic dictionaries and the ease with which that information can be accessed. But it is important to keep in mind that the core need of the dictionary user remains unchanged over hundreds of years: the need to know the meaning of an unknown word encountered in reading or heard in speech. Fulfilling this simple need is still best handled in a simple way: by providing the user with a definition that is sufficiently short, simple, and comprehensible to be retained in short-term memory and substituted in the place of the encountered word as a first step toward integrating the word into the user’s lexicon.

In their efforts to make their products the newest, best, and most dazzling, makers of electronic dictionaries today must not lose sight of the fact that the core need of their user is a simple one that can be met with a simple solution, provided to them with what is now relatively simple technology. There are surely many more innovations in electronic dictionaries and other word reference products to come, but it is important that developers do not overlook or go too far beyond the core need of a person looking up a word – to learn quickly and efficiently what that word means.

Chapter 18 English Dictionaries and Corpus Linguistics

A standard dictionary presents the words of a language and their meanings, along with certain other information such as parts of speech. Until 1987 evidence for words and their meanings was collected in one of two ways.

The first way was to collect citations from literature. Great scholarly dictionaries such as the Oxford English Dictionary (OED) organised ‘reading programmes’ in which readers would read through books and other texts, and copy out ‘interesting’ citations. Inevitably, as James Murray pointed out in his presidential address to the Philological Society in 1880, not only was this methodology extremely time-consuming; it also led to distortion:

While rare, curious, and odd words are well represented, ordinary words are often most meagrely present; and the editor or his assistants have to search for precious hours for examples of common words, which readers passed by … Thus of abusion, we found in the slips about 50 instances: of abuse not five.

Murray attempted to deal with this problem by issuing an additional instruction to readers, asking them to collect examples of common, everyday words. In this, he was only partly successful. He saw the problem very clearly, but nineteenth-century technology was not able to provide a satisfactory solution. This had to wait for the emergence of corpus linguistics, which could not only provide plentiful examples of ordinary uses of ordinary words but could also provide evidence for these statistically significant connotations and combinations.

The other pre-corpus method for collecting lexicographical data involved consultation of the lexicographers’ intuitions, in some cases augmented by a ‘directed reading programme’, in which readers collected vocabulary from texts in particular fields such as science, sports and pastimes, and contemporary slang. It was not widely acknowledged, but nevertheless true, that lexicographers consulted previously published dictionaries, to prompt their intuitions. From the eighteenth century onwards – and indeed earlier – lexicography has been accretive, especially insofar as definition writing is concerned. James Murray, as editor of the OED, made no secret of the fact that if he found a perfectly good definition of the word in Samuel Johnson’s dictionary of 1755 or some other pre-existing dictionary, he copied it verbatim, with an explicit attribution.

This methodology is as flawed as reliance on a reading programme. Critical examination of definitions in existing dictionaries may lead to improvements in the understanding of word meaning, but all too often examination of definitions in existing dictionaries could be uncritical and even result in mindless copying of errors and inadequacies.

What Happened in 1987?

In 1987, the first edition of the COBUILD dictionary was published by Collins publishers. Set up at the University of Birmingham in 1980, the project name was an acronym for ‘Collins Birmingham University International Language Database’. It was the first attempt to compile a dictionary on the basis of corpus evidence. In the years that followed, other dictionary publishers rapidly followed Collins’s lead. Today, every reputable dictionary makes at least some use of corpus evidence.

By today’s standards, the corpus used for the first edition of COBUILD was pathetically small. It consisted of 7.3 million words during the initial compilation (1980–5) and had risen to eighteen million words during final checking of the text (1986). Today’s corpora (corpora is the plural of the word corpus) consist of hundreds of million, billions, and even trillions of words of text. So why was the COBUILD corpus so important? Two features stand out: 1) The texts were in machine-readable form, which meant that computers could be used for their analysis; 2) Previous electronic corpora, such as the Brown Corpus and the Lancaster-Oslo/Bergen Corpus (LOB), had consisted of just one million words each. In corpus linguistics, size matters, for a very simple reason. If there are (say) 200,000 different word forms (‘word types’) in a corpus, then in a corpus of one million words there is room for only five occurrences of each word on average. But a few words (for example the, of, and that) are ‘space hogs’: they are very very frequent, which leaves even less space for all the other words. As a result, in a corpus of only one million words, it is impossible to distinguish statistically significant co-occurrences from random or chance co-occurrences. By moving from one million words to a corpus seven-and-a-half times larger, COBUILD crossed a threshold. For the first time ever, it became possible to measure collocations and to see how they might affect meaning of words.

In 1987, at around the same time that the COBUILD dictionary was published, John Sinclair, the founder of the COBUILD research programme, wrote a short paper called ‘The Nature of the Evidence’, discussing (among other things) the comparative frequency of phrasal verbs formed with the verb set. He predicted, ‘We will require a fairly large number of occurrences of the combination of forms to show the characteristic patterns. This in turn means very large amounts of text, running into the hundreds of millions [of words]’. This was an early foray into what was later to become the fashionable field of distributional semantics. Nowadays, we have corpora of hundreds of millions of words and indeed billions – riches that were unimaginable only thirty years ago. However, lexicographers have been slow to recognise the highly patterned nature of linguistic behaviour and to take advantage of the evidence that had become available in the form of large corpora.

Inevitably, working with what we now realise was a very small corpus, the COBUILD lexicographers in the 1980s felt that the corpus evidence needed to be supplemented occasionally by consultation of existing dictionaries and by appeals to intuition – in particular, collective intuition, whereby members of the lexicographical team compared their understandings of the conventional meanings and phraseology of particular words. By the end of the project, the corpus had grown to eighteen million words. By this stage, if a COBUILD lexicographer was in doubt about a particular phrase or meaning, he or she gradually began to trust the corpus evidence (especially if the corpus provided several examples of the doubtful phrase), rather than continuing to rely on intuitions.

The impact of corpora on dictionaries has been discussed in several places in twenty-first-century literature on lexicography, in particular on pages 53–96 of Atkins and Rundell (2008), in Hanks (2009), and in a special issue of the International Journal of Lexicography (Hanks 2008) dedicated to the memory of John Sinclair (all of which are listed in the Further Reading section at the end of this volume).

What is a Corpus?

In today’s world, corpus linguistics is steadily replacing earlier traditions of speculative linguistics, in which investigators consulted their intuitions in order to address questions concerning syntax, phraseology, and meaning. So what is a corpus, and why are corpora important for the study of words and in particular for the compilation of dictionaries?

As we shall see in the next section, there are many different kinds of corpora, and not all of them are equally suitable for all purposes. What they all have in common includes features such as large size, multiple sources (a corpus consists of many different texts), and machine readability.

Different Kinds of Corpora

There are many different kinds of corpora, and they are – or can be – used for many different purposes in linguistic and other research. Some corpora are more suitable for lexicographical purposes than others. Computational analysis enables researchers, including lexicographers, to see how words go together (‘collocations’). Around each word in the language, several patterns of collocations can be perceived. Different patterns provide clues to the different meanings of a word. Idiosyncratic and other unusual uses can also be found. Insofar as these are deliberate, such uses can be regarded as creative exploitations of the conventional patterns of usage that make up a language. It is not always easy to distinguish creative and poetic uses of language from mistakes. A good dictionary aims to describe conventional usage and meaning, which means that unusual (creative and poetic) uses of words should be disregarded by the lexicographers. Not all dictionaries observe this distinction. Large scholarly dictionaries sometimes record idiosyncratic uses of words, sometimes in a haphazard and unsystematic way, rather than confining themselves to the description of conventional meanings of words. Worse still, most dictionaries focus on fine-grained meaning distinctions, while failing to say anything about the phraseological patterns that are associated with each word. Large corpora nowadays offer lexicographers an opportunity to remedy these deficiencies and to create empirically well-founded explanations of word use and meaning. Unfortunately, because lexicography is labour-intensive and funding is in short supply, these exciting possibilities are not being realised systematically at the time of writing.

General Corpora

Typically, a modern general corpus consists of a large number of recently published written texts that have been edited and proofread. A collection of texts that have been well edited and carefully proofread gives lexicographers and other researchers best chances of discovering the normal phraseology – and hence the meanings – associated with each word in a language. Thus, the main purpose of a general corpus is – or could be – to enable researchers, including lexicographers, to study patterns of word use and to see how differences in meaning are reflected in differences in phraseology. Unfortunately, however, all too often dictionaries use a corpus to bolster pre-existing sense distinctions, rather than to investigate and report more radical topics such as the relationship between meaning and phraseology.

Some modern corpora aim to present a ‘balanced and representative’ sample of the language. The problem here, as any statistician will tell you, is that a representative sample of anything is reliant on the existence of a generally accepted classification of the different elements that make up a population. Unfortunately, there is no generally accepted classification of English text types. So, in the absence of such a conventional classification, by a strict interpretation of this definition of representativeness, this particular aim of a general corpus is bound to fail. Nevertheless, the British National Corpus (which aimed to achieve such a sample when it was being built in the early 1990s) is generally perceived as presenting a reasonable cross-section of twentieth-century English usage, and is widely used as such. It consists of 100 million words of British English text, of which approximately ten million words are transcriptions of speech.

A different approach to creating a general corpus was adopted by the Bank of English at the University of Birmingham. This was a greatly enlarged version of the corpus used for the COBUILD dictionary in the 1980s. It was characterised, perhaps rather unfairly, as ‘anything goes’. It is claimed in publicity literature, cited for example in Wikipedia, that it currently consists of 650 million words of mainly written text from all over the English-speaking world. At one stage in building this corpus, a certain amount of balance was achieved by noting domains that were poorly covered and adding texts in such domains that would give better coverage.

There are now many other corpora of English texts, most of which are freely available to researchers. The leading general corpus of American English is COCA (Corpus of Contemporary American English). According to its compilers at Brigham Young University, Utah, it consists of more than 560 million words from more than 160,000 texts. The corpus is roughly balanced between five genres: spoken American English, fiction, popular magazines, newspapers, and academic journals. It is matched by COHA (Corpus of American Historical American English), consisting of 400 million words of American English from between 1810 and 2000.

Newspaper Corpora

In the early days of corpus linguistics, a popular slogan was ‘More data is better data’. The easiest way of building a large corpus of written, published texts in any language was and is to reach agreement with a newspaper publisher or an online newswire service to allow a research group or dictionary publisher to incorporate large sections of daily or weekly newspapers or journals into an electronic corpus. A corpus of newspaper texts satisfies most of the main criteria for a good corpus for lexicographical purposes. Such a corpus is not only built up of texts that are comparatively easily obtained and up to date; it also contains texts on many different subjects, written in different genres (news reports, commentaries, reviews, and even fiction), written by several different people who for the most part are professional writers. Moreover, generally the text of a newspaper or journal has been edited and proofread before publication.

Domain-Specific Corpora

By the end of the twentieth century, English had established itself not only as a major language, with around 365 million native speakers worldwide, but also as the leading lingua franca in the world, serving as a medium for international communication among non-native speakers. It is the leading medium, not only for social intercourse, but also for communication in the sciences and other domains. It is not surprising, therefore, that domain-specific corpora are being developed at an astonishing rate. Corpora of texts in fields such as medicine have existed since the 1990s. They are now being joined by corpora in every imaginable domain, ranging from sports and pastimes to hard sciences, the environment, ecology, and so on. Such corpora are typically built by web crawlers (programs that start by looking for a few keywords in a selected domain and accrue texts containing these terms and related terminology – generally supported by some interaction with a human expert in the field). In corpora of domains such as chemistry, the problem for lexicography is no longer finding candidate lexical items for possible inclusion in a dictionary, but rather deciding what to leave out. Terms found only in a domain-specific corpus belong in a term bank rather than a dictionary for general public use.

Historical Corpora

Just as computational analysis of a contemporary corpus can help people to understand meaning and usage in the present-day language, so a well-focused corpus of texts from a particular period will help readers to understand the relationship between a writer who was active in that period and the general conventions of word use and meaning that were prevalent in his or her time. When studying the work of a long-dead writer, it is important to understand the extent to which he or she was using innovative phraseology or new words and to which his or her phraseology and words were current and conventional at the time, though now obsolete. A historical corpus of the relevant period provides the necessary information, though at the present time such corpora have not yet been systematically developed.

There have been many attempts to build historical corpora of texts in specific domains or other specific selections. Most of them are open to the criticism that was levelled in the 1960s and 70s at the Brown and LOB corpora (discussed earlier), namely that they are too small (in most cases, under one million words) to enable statistical analysis of collocations. For this reason, they fail to some extent to shed light on the conventional meaning and phraseology of words. In other cases, the collections consist of photographic images of the texts, so are unsuitable for searching and analysis by corpus-driven computer tools.

From the point of view of historical lexicography in English, the most exciting recent development is undoubtedly the Text Creation Partnership (TCP), which coordinates efforts by several universities and other research organisations to encode electronic texts of early printed books, known collectively as EEBO (Early English Books Online). This resource consists of 755 million words of texts from more than 25,000 historical texts, ranging from the 1470s to the 1690s.

Spoken Corpora

General corpora and many of the domain-specific and other corpora mentioned so far contrast with corpora of spoken texts, which present a different picture of the language. A well-transcribed corpus of unscripted (i.e. spontaneous) conversation provides evidence of attempts by speakers to activate conventions of the language in order to communicate, but only rarely for what those conventions are. Dictionary makers are interested primarily in reporting conventional words and their meanings, rather than the psychological struggles of users of a language to put meanings into words. For that reason, spoken corpora must be treated with caution by lexicographers. Having said that, it must also be acknowledged that some words and phrases are more typically used in speech than in writing. So ideally a modern corpus contains both carefully edited, published texts and transcripts of natural, unscripted conversation.

An extreme example of a spoken text that was made available for computational analysis but that does not satisfy normal criteria or corpus-hood is the Challenger Inquiry. This was a (largely spoken) record of NASA’s investigation of the Challenger disaster, in which a spacecraft exploded shortly after take-off, killing all on board. From the point of view of language analysis, the Challenger Inquiry is too narrowly focused to serve as a representative sample of spoken American English.

Corpora of Headlines and Advertisements

It is well known that the syntax and vocabulary of newspaper headlines differ in many respects from continuous prose. Some adventurous researchers are now building corpora of such texts for qualitative analysis in relation to the text or products that they introduce – in some cases represented by visual images, rather than or as well as prose texts. Such corpora provide (potentially) interesting insights into linguistic and other human behaviour, but are of little interest to lexicographers, who are more interested in words used in continuous prose.

Bilingual Corpora

Finally, mention must be made of bilingual corpora of translated texts, although this is not strictly relevant to English dictionaries. A classic example is Canadian Hansard. In Canada parliamentary proceedings are reported in both English and French. Ever since the 1980s, computational linguists have been using Canadian Hansard as a basis for developing machine translation programs and improving bilingual dictionaries. There is now a large number of bilingual and indeed multilingual texts, some of which have been incorporated into corpora for research purposes.

Preparing a Corpus for Use

Once a text for inclusion in a corpus has been collected in machine-readable form, some basic processes must be carried out before it can be used. These processes are now standard and universal in corpus linguistics, with minor variations. They are implemented by suites of programs.

First, the text must be cleaned up – headers, footers, page numbers, etc., are removed, together with other ‘noise’. Removing noise from a corpus has become a big deal in recent years, for several reasons. One is that some marketing organisations now use a computer to generate vast quantities of meaningless text in order to embed the names of commercial products in freely available online texts. Conscientious corpus builders make strenuous efforts to identify and remove computer-generated meaningless noise.

Next, the text must be ‘tokenised’, which means put into a format in which each word in the text is recognised as a separate ‘token’, to which various kinds of information can be appended. This is usually done by putting each token on a separate line, so that other information can be added alongside it. Punctuation marks are treated as separate tokens. Some multiword expressions such as of course are treated by some programs as a single word spelled with a space in the middle.

Finally, the tokens must be ‘lemmatised’. In other words, the base form of each inflected lexical item must be stated explicitly. For example, SWIM is the name of the lemma realised in texts by the lexical types swim, swims, swimming, swam, swum.

Using a Corpus

After a corpus has been created, a suite of corpus tools is used to assist the lexicographer or indeed any other researcher. These tools are now fairly standard. The most lexicographer-friendly such suite is the Sketch Engine <https://sketchengine.eu>. This enables the lexicographer to create a concordance, select a sample, and sort the sample into a preferred order. In Figure 18.1, a sample of just twenty-six lines (out of 1882 hits for this verb in the British National Corpus) has been selected and presented as a KWIC (keyword in context) file. It is also possible to select whole sentences but the KWIC format makes it easier to place the target word in the centre of each line and to line them up one under the other. Additionally, this small sample has been sorted according to the letter or punctuation mark immediately to the right of the target word. Other sorting procedures are possible, for example sorting to the left of the target word.

Figure 18.1 Sample of twenty-six lines presented as a KWIC file

Now at last, lexicography can begin. In PDEV (Pattern Dictionary of English Verbs, discussed in detail later in this chapter), lines in the sample are sorted into patterns. PDEV has detected five patterns for the verb execute. The comparative frequency of each pattern is expressed as a percentage of the sample analysed. Normally, samples consist of at least 250 corpus lines, and comparative percentages are based on such analyses. Each pattern element is represented by one or more semantic types, for example ‘Action’. Optionally, a contextually assigned role, for example ‘= Crime’, gives a more precise typification of the collocate. Semantic types represent intrinsic properties of words at their most literal. Contextual roles are much more variable. To take a recent example, it is indisputable that criticising the government is an action. On the other hand, if someone is executed for criticising the government of their country, the context assigns the semantic role ‘crime’ to that particular occurrence although, in the English-speaking world, criticising the government is not normally regarded as a crime. In other words, ‘Crime’ is not a semantic type of the word criticising.

As can be seen in Figure 18.2, every pattern in PDEV is accompanied by an interpretation of the meaning, called the ‘implicature’. Implicatures are normally based on analysis of at least 250 corpus lines. They are equivalent to dictionary definitions, but instead of defining the word in isolation, they explain the pattern as a whole. The sample of twenty-six corpus lines in this chapter was selected to give a flavour of the work, as 250 corpus lines would be far too many for this chapter.

Figure 18.2 Pattern and primary implicatures

There is a lot more to be said about the procedures of corpus pattern analysis. The main points have been summarised here.

Different Kinds of English Dictionaries

There are many different kinds of English dictionary, ranging from tiny pocketbooks to the grand OED in twenty large volumes (second edition). The third edition will be even larger and is planned as an online resource, rather than as a collection of printed volumes. In the twenty-first-century the market for dictionaries as printed books has declined dramatically; most dictionary users now consult an online product, free of charge, rather than looking in a printed book. In itself, this shift makes excellent sense, but unfortunately the shift to dictionaries as electronic products has been accompanied by a decline in quality. Corpus-based studies have shown that dictionaries sometimes fail to report word meaning accurately, but in addition online dictionaries, including works published by big software houses, seem to have reinforced the practice of speculating about word meanings on the basis of introspection, rather than analysis of evidence of word use.

Dictionaries on Historical Principles

Among dictionaries that base their explanations on analysis of evidence of word use, two very different principles for organising the entries can be identified: historical principles and synchronic principles. Dictionaries on historical principles aim to tell the story of the development of meaning of each word over time. Dictionaries on synchronic principles aim to give an account of the contemporary meaning of each word, sometimes adding obsolete or obsolescent senses at the end of the entry, for the benefit of readers of the literature of the past.

The distinction between historical principles and synchronic principles is important, because word meaning is unstable. Words tend to change their meanings or acquire new meanings unpredictably and unexpectedly. This can have a profound effect on the organisation of the dictionary entries and has been the source of some confusion among people who ought to know better, especially computational linguists.

Traditional large scholarly dictionaries such as OED and Webster’s Third (Merriam-Webster’s Third New International Dictionary, 1961) are organised on historical principles. In such a dictionary, the first meaning given for the word camera – following the etymology (from a Latin word meaning ‘chamber’) – is ‘the treasury department of the papal curia’ or, even more historically, ‘a small vaulted room’. In OED, it is not until sense 4b that we find ‘a device for taking photographs’. Sense 4c is ‘a device for capturing moving pictures or video signals’. Sense 4a is the obsolete ‘camera obscura’. This, of course, is historically correct, but is surely something of a distraction from the meaning of the word in modern English.

The adjective nice has changed its meaning many times over the centuries. In a dictionary on historical principles, the entry for this word starts by showing its derivation from Latin nescius ‘ignorant’, which was also one of its meanings in fourteenth-century English, then traces its development through ‘lascivious’, ‘elegant’, ‘precise’, ‘fastidious or fussy’, ‘cultured’, ‘shy or modest’, ‘subtle’, ‘appetising’, finally ending up with the modern meaning, ‘pleasant or attractive’. (This brief summary does not do justice to the full OED entry for this adjective, which consists of fourteen main sense distinctions and twenty-eight sub-senses.) Many thousands of similar dictionary entries based on historical principles could be cited.

Historical principles for lexicography were established during the eighteenth century, at least in part as a result of the mistaken belief that etymology guarantees meaning. Throughout Europe during the Enlightenment, the curious belief was accepted without challenge that the oldest meaning of a word is somehow more correct than its contemporary acceptation. A moment’s thought should be sufficient to convince you that this cannot possibly be true. The English word subject is derived from Latin subjectus, the etymological meaning of which is ‘something thrown under’, but of course this is not and never has been the ‘true’ meaning of the word in English. Indeed, the Latin word had already gone a long way towards developing the modern meaning during the period of classical Latin at the time of the Roman Republic.

If etymology could guarantee meaning, the literal meaning of a word would be the meaning of the letters of which the word is composed – which is, of course, nonsense. The great English lexicographer Samuel Johnson already noticed that etymological meaning is not a good guide to meaning in a contemporary language. In the preface to his Dictionary of 1755, he observed that while the English word ardent was undeniably derived from Latin ardens, meaning ‘burning’, the English word has never had this meaning in English. If your house is on fire, you do not call the fire brigade and say, ‘My house is ardent’. Despite this insight, Johnson clearly regarded it as part of the lexicographer’s job to ‘busy himself with tracing the original … of words’. That, indeed, forms part of his 1755 definition of the word lexicographer.

Dictionaries on Synchronic Principles

Perhaps writers and thinkers in the eighteenth, nineteenth, and early twentieth centuries believed that the conventional meanings of the words that they used were so obvious that no effort was needed to explain them. If that is what they believed, corpus-driven lexicography in recent years has shown that they were wrong. Or rather, they were not always right; reliance on introspection (a.k.a. common sense) sometimes misses an important point, as in the case of file and gleam (discussed below).

The first English dictionary to attempt to explain the meanings of contemporary words was Funk and Wagnalls’s Standard Dictionary of the English Language (1893–5).

By contrast with OED, Collins English Dictionary (CED) (1979), which, like Funk and Wagnalls’s claimed to ‘put the modern meaning first’, starts its entry for nice with four senses that are supposedly still current (or were, in 1979): ‘pleasant’, ‘kind’, ‘subtle’, and ‘precise’. These are followed by five rare or obsolete senses (‘fastidious’, ‘foolish’, ‘delicate’, and ‘shy or reserved’. A modern corpus linguist might challenge the notion that the word nice is used to mean ‘subtle’ or ‘precise’ in modern English. These senses were apparently left unlabelled as a sop to readers who might believe that the English language had not yet left the eighteenth century. Had such readers (and the lexicographers – including the present writer – who, in 1979, pandered to them) had access to a corpus of contemporary English, containing few if any uses of nice in either of these senses, a different decision might have been made, namely to omit these senses or at least label them as obsolete. In the absence of such evidence, lexicographers tend to make very conservative judgements. The notion that a dictionary should state all and only the possible meanings of all and only the words of a language is hard to relinquish. But it must be relinquished if the dictionary is to be a learning tool of maximum usefulness.

Dictionaries for Foreign Learners

The English language has established itself as a vehicle for international communication. As a result, millions of people throughout the world are learning English-as-a-foreign language (EFL). They need reliable dictionaries to support their efforts. There are now half a dozen reliable EFL dictionaries, and all of them draw heavily on corpus evidence to tease out, confirm, and refine statements about meanings of words in contemporary English.

A slightly sad story is the history of A. S. Hornby’s Idiomatic and Syntactic English Dictionary (ISED) (1942). This was a beautifully selective learning tool, designed to help learners acquire sufficient English to express themselves (rather than to understand unfamiliar words which they might encounter when reading or listening to broadcasts). This distinction between receptive skills and productive skills in language is still important in language teaching, but it is no longer observed in lexicography, no doubt because the distinction is fuzzy. Different learners need different words.

ISED is still available in Japan. Publication of this splendid work was taken over in 1948 by Oxford University Press, who re-published it as the (Oxford) Advanced Learner’s Dictionary. In the second edition (1963) thousands of entries were added, most of them being of questionable usefulness for learners seeking a tool to help them in productive use of the language. This flaw – if it is a flaw – has also been characteristic of subsequent editions. The sixth edition, published in 2000, benefited greatly from the British National Corpus, but no attempt was made in the selection of entries to return to Hornby’s original distinction between a dictionary as an aid for language learning as a productive skill and a dictionary for passive, receptive use.

Hornby also developed a theoretical framework for describing syntactic patterns of verb use. These patterns were rather abstract, but were applied systematically to every word in the dictionary and were very influential in English-language teaching until the 1990s, when they were replaced by rather simpler grammatical description, which conveyed much the same information.

Dictionaries of Collocations

An interesting corpus-driven innovation in the twenty-first century has been the emergence of a new lexicographic genre, namely dictionaries of collocations, which would not have been possible without corpora. Up to now, the approach taken by dictionaries of collocations has been pragmatic rather than theoretical. In other words, statistically significant word combinations are listed, but little or nothing is said about how such combinations affect meaning. This would appear to be a field wide open for further development.

How Have Dictionaries Made Use of Corpus Evidence?

Among other things, corpus evidence has encouraged lexicographers to leave out or label rare and obsolete senses, those senses for which no evidence was found in a corpus. They were judged to be merely confusing for learners of English, rather than helpful. Corpus evidence also helped EFL lexicographers to identify most frequent meaning or meanings of each word and to place them first. This was judged to be particularly helpful in learner’s dictionaries. Research into dictionary use has shown that many learners do not read further in a dictionary entry than the first meaning.

A controversial feature of COBUILD – a feature that has been widely misunderstood by academic linguists, though sometimes praised by students – is its practice of stating a phraseological definiendum for each sense, as well as an explanation of the meaning. For example, the first sense of gallop is explained as follows:

When a horse gallops, it runs very fast so that all four legs are off the ground at the same time in each stride.

To those who object that zebras can also gallop, COBUILD would offer at least three answers:

1. In the English language, galloping is stereotypically associated with horses.
2. A learner who does not know the word gallop will get more benefit from a stereotypical explanation specifying horses as grammatical subject than from a lot of linguistic metalanguage.
3. Understanding meaning is a process governed by analogy rather than definition, so presenting a stereotypical phrase as the definiendum is more useful for learners than trying to write a definition that covers all possibilities.

There are, of course, other senses of gallop. Phraseology is a good guide to meaning, but is not always sufficient. For example, the sentence ‘He galloped into the hotel’ is ambiguous. Was he on horseback, or is this a metaphor meaning that he ran fast? In real usage, the wider context almost always disambiguates, but in a dictionary both interpretations must be recorded if there is sufficient evidence to show that they are both used. What a lexicographer should not do is to speculate about remote possibilities.

COBUILD’s practice of systematically embedding the definiendum in the explanation is not always successful. An entry that could be better is the adjective brief. This is explained as follows: ‘Something that is brief lasts for only a short time’. The problem here is the word ‘something’ in the definiendum. It seems to suggest that anything – absolutely anything – can be described as ‘brief’. A more thoughtful glance at a corpus could convince an alert lexicographer that this is not accurate. Typically, events and processes are brief, but physical objects and states of affairs are not. In this case, the wrong genus word was chosen for the definiendum.

A great deal more work is needed on the classification of semantic types as arguments of verbs and stereotypical phraseology before COBUILD’s lead can be safely followed by future dictionaries. Such work has already been started in a research project based at the University of Wolverhampton. This is the Pattern Dictionary of English Verbs (PDEV), which is held up by lack of funds rather than problems in the theoretical basis of the work. Results so far are freely available at http://pdev.org.uk. The aim of PDEV is to show how stereotypical meanings can be mapped onto stereotypical phraseology for each sense of each verb in the English language. If the British National Corpus is a reliable guide, there are fewer than 6,000 verbs in normal use in English. So far, PDEV has completed analysis of over 1,400 verbs, while another 400 are in progress at the time of writing. PDEV recognises that meanings are associated, not with words in isolation, but with phraseological patterns of word use. Verbs in particular depend for their meaning, not merely on syntactic distinctions such as transitive/intransitive – a distinction that was regularly made in pre-corpus dictionaries – but more importantly on collocations. To take a very simple example, executing a person and executing a will are both phrases containing execute as a transitive verb. But meaning, of course, is very different. Obviously, it depends on the semantic type of the direct object. But in real usage, problems start to arise. For example, does executing an order belong in the same meaning category as executing a will?

How Could Dictionaries Make Better Use of Corpus Evidence?

With the exception of COBUILD, current English dictionaries have tended to regard corpora as quarries for evidence to support preconceived lexical semantic distinctions, which were dreamed up in the pre-corpus era on the basis of introspection. Speculation on the basis of introspection needs to be replaced by empirical analysis of corpus data. Unfortunately, empirical analysis is both time-consuming and expensive, and in today’s world funding tends to go to projects that promise ‘magic bullet’, often computer-driven, rather than painstaking analysis of the relationship between meaning and usage. Nevertheless, if we really want to understand how meaning works in English or any other language, painstaking analysis of this kind will be necessary.

In future dictionaries, explanations will, hopefully, aim to state stereotypical generalisations rather than necessary conditions for the ‘correct’ meaning of a term. This principle is based on theoretical work in the 1970s by Eleanor Rosch on prototype theory and by Hilary Putnam on stereotypes. They showed that human beings make and understand meanings by analogies with their shared knowledge of prototypes and stereotypes, rather than by defining words in terms of necessary conditions. It is the job of a synchronic dictionary to record stereotypes of phraseology as well as meaning. So far, dictionaries have focused on meaning, but have not done a good job on phraseology. Rosch and Putnam between them exploded the theoretical basis for definition by necessary conditions, but unfortunately the belief that necessary conditions govern meaning is hard to dislodge from popular expectation.

Meaningful use of language typically depends on the construction of clauses by speakers and writers. The pivotal word in each clause is normally a main verb. For this reason, corpus pattern analysis starts with verbs, aiming to use corpus evidence to construct stereotypical clauses and explain their meaning, which then serve as stereotypical examples, enabling the dictionary user to work out the meaning of words in context by commonsensical analogies. The nouns (heads of phrases) in a clause are known as arguments of a verb. It could be said that, in addition to contributing to the meaning of what is said, a verb organises relations among the nouns in a clause. Nouns, on the other hand, tend to organise the relationships between language and the world – or concepts in the world. The third major part-of-speech class consists of adjectives. Adjectives in English typically have two quite different functions: they can either be predicative, which means that they function like verbs, as in I am happy to see you, or they can be attributive, which means that they can subclassify nouns, as in a happy lexicographer. These two grammatical functions deserve separate treatment, which in most dictionaries they do not get.

These broad generalisations may be helpful in the corpus analysis of word meaning. To give an accurate account of the meaning of a verb, it is necessary to identify the roles played by nouns in patterns of argument structure. On the other hand, the collocational relationships around nouns may be much looser: co-occurrence of spider with web anywhere within the same sentence or paragraph should be sufficient to distinguish the meaning of the latter word from other stereotypical expressions such as a web of deceit or a web of intrigue.

Corpus lexicography requires (but all too often does not get) training not only in syntax and valency, but also in semantic types and collocational analysis. At the same time Saussure’s distinction between parole and langue is relevant. Corpus evidence consists of a number of different paroled items. The task of the lexicographer is to perceive the underlying patterns that constitute langue. In all of this, close attention to detail and alertness to the possibility of meaning change are essential. Some examples follow.

The word file seems to be changing its meaning, or rather acquiring additional meanings. Comparatively few uses today of this word as a verb involve placing papers in a filing cabinet. More often, they imply initiating a procedure, as in file a lawsuit, file a complaint, or file a flight plan. At the same time, the noun file has become far more common as a term in computing than a term in office procedure. A modern lexicographer, working on synchronic principles, has to decide which meaning of the noun is salient and therefore which to place first.
Cognitive salience and frequency (the latter might be termed ‘social salience’) are independent variables. For example, for most people the verb sweep conjures up an image of a person using a broom or some other device to remove dirt, leaves, or other unwanted stuff from a surface. This is the cognitively salient meaning, but in fact it is comparatively rare. Corpus evidence shows that this verb is more often used as a verb of movement, as in She swept into the room or A thunderstorm swept across East Anglia.
Corpus evidence can show how context affects meaning. For example, the verb treat with a human or animal direct object often denotes applying a medical procedure. However, if an adverbial of manner is added, the meaning changes to something much more general. The contrast is between sentences such as They treated her at the scene of the accident and sentences such as They treated her with respect.
Corpus evidence also reveals ongoing changes in meaning. The verb launch traditionally denotes putting a boat ship into the water, and of course the word still has that meaning. However it is more often used in modern English to denote sending a missile or satellite on its way through the air or into space. Even more common is that it is used to denote commencement of some activity, as in launch an attack or launch a marketing campaign. The latter sense has been extended to the product being marketed. It is now standard to talk of launching a new product into the market – a far cry from a boat or a missile.
Sometimes the corpus can reveal unexpected facts. It is uncontroversial that a gleam denotes a ray of light, but if someone has a gleam in their eye, it is normally mischievous, malicious, or wicked, rather than happy.

Conclusion

The examples cited in this chapter are no more that the tip of the iceberg. In the days when lexicographers had nothing more to go on than their intuitions in order to make meaning distinctions, it was perhaps not surprising that dictionaries had so little to say about how one meaning of the word should be distinguished from another. But now, we have massive phraseological evidence of how meaning distinctions are made, but in many cases lexicography is stuck in a time warp. What is needed is a massive programme of phraseological research showing how each meaning of each word is dependent on sets of prototypical phraseology and collocations. If lexicography can do this, it will provide a new, lexically based impetus for the study of language.

Prototype theory, inherited from Eleanor Rosch (1975), and stereotype theory – the philosophical equivalent developed at around the same time by Hilary Putnam – have an important role to play as organising principles for future corpus-driven lexical analysis. To understand their importance for dictionary-making we must go back over a hundred years, to Saussure’s 1916 distinction between parole and langue. Every individual use of a word, including uses recorded in corpora, is an event in parole. But taken together, a collection of uses (citations) can demonstrate the existence of an underlying pattern, which is part of langue, the linguistic heritage shared by all members of a speech community. Future dictionaries must record these patterns and distinguish them from what J. R. Firth (1950) called ‘the general mush of goings-on’.

Chapter 19 Natural Language Processing in Lexicography

Natural language processing and computational linguistics are two closely related fields that aim to build systems that can automatically carry out tasks related to text, such as syntactic parsing, question answering, and translation. In this chapter, we will not distinguish between these fields, and will refer to them both as natural language processing.

Natural language processing and lexicography are intertwined. Dictionaries are essential elements of many natural language processing systems, and natural language processing is crucial to the practice of modern lexicography. As an example of the former, many systems for sentiment analysis – predicting whether a text expresses a positive or negative sentiment – rely on specialised dictionaries that indicate, for each word, the degree to which it is typically used positively or negatively. The focus of this chapter is the latter – the role of natural processing in lexicography, and how this might change in the future. A very brief history of the two fields is important for understanding their relationship to one another.

Focusing specifically on (monolingual) English lexicography, there are two particularly important developments with respect to lexicographical evidence and how it is used. The first is the COBUILD project of the 1980s, which can be viewed as the beginning of modern lexicography. This project was the first in which the corpus was the primary source of lexicographical evidence. The corpus replaced manually collected citations, which had been the primary source of evidence since this practice was introduced by Samuel Johnson in 1755 in A Dictionary of the English Language. Corpora allowed lexicographers to easily retrieve a random sample of usages of a target word, which was a tremendous advantage in that it allowed lexicographers to examine the usage of a word without the bias introduced through the process of selecting citations. Nevertheless, the volume of data available – even in the corpora of the 1980s, which are relatively small by today’s standards – presented a challenge to manual inspection. This led to a second major development: the application of statistical methods to summarise corpus evidence. In 1990, Church and Hanks – themselves a computational linguist and lexicographer respectively – proposed the use of pointwise mutual information to measure the strength of association between words. These word association measures provided summaries of the volume of data in a corpus, and enabled collocations, prototypical word combinations, and word senses to be quickly identified. The application of statistical methods in lexicography subsequently led to the development of WordSketches and the Sketch Engine, which has become a widely used lexicographic tool.

Turning to natural language processing, this field underwent a so-called ‘statistical revolution’ in the late 1980s to early 1990s. Leading up to this, systems typically relied on hand-crafted rules that needed to be carefully engineered for a particular task and domain. The resulting systems tended to be brittle, in that they would fail when applied outside their narrow intended scope. The corpus-based statistical methods that took over the field in this revolution led to more robust systems. One particularly important development in natural language processing at the heart of the statistical revolution was again the work of Church and Hanks, on measures of strength of association, which is in fact still a key component of some contemporary natural language processing systems. This statistical revolution led to the widespread use of machine learning techniques in natural language processing throughout the 1990s, and continues today.

With the use of statistical methods established in lexicography, and natural language processing developing (primarily statistical) methods for automatically inferring, or ‘learning’, lexical knowledge from corpora, it has become increasingly possible to automate certain aspects of lexicography. This has brought about two main benefits in lexicography: efficiency and, perhaps surprisingly, quality. With aspects of tasks such as building headword lists, identifying collocations, and selecting example sentences being automated, these tasks can be accomplished more quickly, with lexicographers freed from some drudgery to concentrate on tasks that cannot (yet) be automated. With respect to quality, automation introduces a greater degree of systematicity. The output of automated corpus analysis methods supports lexicographers in making decisions that were previously based on intuition.

This chapter discusses the role of natural language processing in lexicography. Corpora are the foundation of modern lexicography, and natural language processing has played an important role in the construction of corpora, particularly web corpora. However, other contributions to this volume focus on web corpus construction, and so it will not be addressed in this chapter (see chapters by Hargraves and Hanks). We will start by considering how natural language processing is applied to pre-process corpora to add morpho-syntactic knowledge that supports lexicographic analysis. We will then examine statistical methods, specifically pointwise mutual information, for identifying collocations in corpora. These methods are the foundation for approaches to automatically constructing thesauri from corpora, which we will consider next. One key task in lexicography is identifying the various senses of lemmas. We will discuss the natural language processing tasks of word sense disambiguation and induction, and how they relate to this. We will then examine applications of natural language processing to select dictionary example sentences, and very recent methods for automatically generating definitions. We will then discuss specialised types of dictionaries that can already be automatically produced, and consider whether it might be possible in the future to fully automate the construction of dictionaries.

The statistical revolution was an important development in natural language processing that has enabled the automation of many tasks related to lexicography. Natural language processing has since undergone another revolution, with neural network-based methods having taken over the field in recent years, leading to widespread advances. Throughout this chapter we will also discuss ways in which these advances could benefit lexicography.

Corpus Pre-processing

A corpus query system allows a user to search a corpus for usages of a given term. Such systems typically allow queries for a specific word form, but also more nuanced queries, such as for a particular lemma (e.g. cat in either its plural or singular form), or queries restricted to a specific part of speech (e.g. dog used as a verb). Lemmatisation – determining the lemma corresponding to a particular word form – and part-of-speech tagging – determining the part of speech, or syntactic category, for each word token in a text – are therefore important problems in natural language processing for supporting such queries. Parsing – automatically determining the syntactic structure of sentences – is another important task for allowing queries that incorporate syntactic information, and for supporting analysis of collocations and building distributional thesauri, which are discussed below.

English has relatively simple morphology. Lemmatisation is therefore rather straightforward, and can be accomplished by rule-based approaches. Part-of-speech tagging and parsing, on the other hand, rely on more sophisticated statistical, or more recently, neural network-based methods. Approaches to these tasks typically use supervised machine learning. The high-level idea behind these approaches is that rather than attempt to write a set of rules describing how to determine the correct part of speech for a word in context, or the syntactic structure of a sentence, a system can automatically ‘learn’ how to carry out these tasks. In order to do so, training data is required. In the case of part-of-speech tagging, this is a corpus in which each word token has been annotated by a human with its corresponding part of speech. For parsing, a corpus in which each sentence is accompanied by its corresponding syntactic analysis (e.g. parse tree) is required. Such manually annotated corpora are expensive to create due to the human labour required, and are typically much smaller than the corpora used for lexicographical analysis. An algorithm is then applied to this training data to learn a model (i.e. for part-of-speech tagging or parsing) that can then be applied to new texts.

To see how this learning can work, we will consider part-of-speech tagging using a hidden Markov model, a fairly conventional approach for this task. Using the training corpus, which includes the part of speech of each word token, we can estimate the probability of (a word tagged with) one part of speech following another – for example, the probability of a noun following a determiner, or a verb following a preposition. We can also estimate the probability of a particular word occurring given a particular part of speech – for example, the probability of the word the given that the part of speech is a determiner. Once these probabilities have been learned from the training corpus, we can use them to infer the parts of speech for new texts using the Viterbi algorithm. This example is only intended to give a high-level sense of how supervised machine learning operates. Part-of-speech taggers often incorporate further sources of knowledge, such as additional contextual information, and can use discriminative, as opposed to generative (as in the case of a hidden Markov model), approaches.

In the case of parsing, although there is a rich tradition of work in natural language processing on parsing with context-free grammars, dependency parsing is perhaps more widely used. A dependency parse directly represents the grammatical relations that hold between pairs of words – which is particularly important information for lexicographical analysis, in particular for identifying collocations – as opposed to explicitly representing constituency. Systems for dependency parsing can be very fast, particularly transition-based dependency parsers, and can therefore be easily applied to large corpora.

Collocations

Because of the very large size of modern corpora, methods for summarising the information in them are valuable. One common approach to accomplish this is to use measures of lexical association to identify frequent word combinations. This information can be helpful for lexicographers in tasks such as identifying collocations, multiword expressions, and word senses.

Pointwise mutual information (PMI) forms the basis for an early, and still widely used, approach to measuring the strength of association between words. It is defined as follows:

PMI (w_{1}, w_{2}) = {log}_{2} \frac{P (w_{1}, w_{2})}{P (w_{1}) P (w_{2})}

where $w_{1}$ and $w_{2}$ are two words, $P (w_{1}, w_{2})$ is the probability of $w_{1}$ and $w_{2}$ co-occurring, and $P (w_{1})$ and $P (w_{2})$ are the probabilities of $w_{1}$ and $w_{2}$ , respectively. These probabilities can be easily estimated by counting the number of times the item of interest (i.e. $w_{1}$ , $w_{2}$ , or the co-occurrence of $w_{1}$ and $w_{2}$ ) occurs, and then dividing by the number of words in the corpus. PMI identifies word combinations that co-occur more often than expected to by chance. $PMI (w_{1}, w_{2})$ is high when $w_{1}$ and $w_{2}$ frequently co-occur, but the frequencies of $w_{1}$ and $w_{2}$ are relatively low.

Although we defined PMI for two words, it is commonly applied to words, lemmas, or the combination of a lemma and particular part of speech. Various definitions of co-occurrence can also be used, for example, adjacent words – words occurring within a window of n words, or a sentence – or words occurring in a grammatical relation. Oftentimes a list of items ranked based on their strength of association is manually examined. In this case, part-of-speech filters can be used to focus on particular constructions, such as noun compounds.

Pointwise mutual information is just one of a family of lexical association measures including log likelihood ratio, t-score, and dice co-efficient. A variant of the latter is used in the WordSketches provided by the SketchEngine.

Distributional Thesauri

By examining the contexts in which a word occurs, one can typically determine its meanings. Moreover, words with similar meanings tend to occur in similar contexts. Such observations led to the development of methods in natural language processing for automatically forming representations of the meanings of words based on the contexts in which they occur.

To do this, given a corpus, we form a word–word co-occurrence matrix of size VxV, where V is the number of types (i.e. distinct word forms) in the corpus. The rows of the matrix represent ‘target’ words; the columns represent ‘context’ words. Each cell i,j in the matrix (i.e. the cell for row i and column j) represents the number of times context word j co-occurs with (i.e. occurs in the context of) target word i. Each row in this matrix is then a vector of length V, which is a distributional representation of the corresponding target word.

There is, however, considerable variation in how such a word–word co-occurrence matrix can be constructed. As for the case of lexical association measures, the definition of context can vary. Co-occurrence can be defined in terms of a fixed-size window of words, such as two words to the left and right of the target word. It can also instead be restricted to co-occurrence within grammatical relations. The word–word co-occurrence matrix need not in fact be constructed for word forms, but can instead be based on lemmas, or the combination of a lemma and a part-of-speech tag. Moreover, not all words (or lemmas, as the case may be) are necessarily used as context words. For example, high frequency function words – the, of, a – referred to as stop words, frequently co-occur with many words. Co-occurrence with stop words therefore provides little information, and stop words are therefore often excluded as context words, as are low-frequency context words. (As such, the co-occurrence matrix is not necessarily VxV.) Finally, strength of association can be more informative than co-occurrence frequency in distributional representations. A measure such as pointwise mutual information is therefore often used instead of co-occurrence frequency in the cells of the matrix.

Distributional similarity can be computed between two words by measuring the similarity of the vectors representing those words. A range of measures can be applied, with cosine similarity being particularly widely used. Cosine similarity corresponds to the cosine of the angle between two vectors. It is 1 when the vectors point in the same direction (i.e. two words that have identical co-occurrence vectors) and 0 when the vectors are orthogonal (i.e. two words that do not co-occur with any common words). A distributional thesaurus can then be easily computed by finding the n most distributionally similar words for each target word.

These vectors representing words have length roughly V (i.e. the number of word types in the corpus, which is typically in the tens or hundreds of thousands) and are typically very sparse (i.e. many values are 0) because many pairs of words never co-occur. Moreover, near synonymy amongst context words is not accounted for. For example, if one target word occurs with bicycle, and another target word occurs with bike, this would not contribute to the similarity of these target words. To address these limitations, approaches to forming dense vector representations – where words are represented by much shorter vectors, typically with only hundreds of dimensions – have been considered. Recently, approaches based on neural networks have been developed to learn such word representations, commonly referred to as ‘word embeddings’. One widely used approach is the word2vec skipgram model. In contrast to standard models of distributional similarity, which are based on frequency counts, this model is based on prediction – a vector representing the target word is used to predict each of the co-occurring context words. Each token in the corpus is considered in turn as the target word, and the results of the predictions are used to refine the word representations. Word embeddings obtained through these models often outperform traditional distributional representations, for example, when correlating the word similarities predicted by the model with human judgments of word similarity.

Word Sense Disambiguation and Induction

By identifying collocations for a target lemma, lexical association measures can help lexicographers identify word senses. There is also, however, a large body of research in natural language processing on automatically identifying word senses.

Word sense disambiguation is the task of automatically selecting the appropriate sense from a predefined set of senses – such as those in a dictionary – for an instance of a target word in context. Word sense disambiguation could have numerous applications in lexicography and dictionaries, for example, allowing lexicographers to issue sense-specific searches through a corpus query system, and enabling dictionary users to look up information on the usage of a word in a particular context and be directed to sense-specific information. One common approach to word sense disambiguation applies supervised machine learning. In this setup, manually annotated training data is required, which in this case is (preferably a large number of) usages of a target lemma along with their sense, provided by a human, based on a predetermined sense inventory. Each instance of the target word can be represented as a vector based on the context in which it occurs; this is not unlike the case of forming vector representations of word types discussed with respect to distributional similarity, but in this case only one instance is available from which to form the representation. A classifier (e.g. a support vector machine) can then be trained on these representations and their manually labelled senses, which can then be applied to predict the sense for new unlabelled instances.

Word sense disambiguation relies on a sense inventory being available. Word sense induction is a related task that aims to automatically determine the various senses of a word without relying on a pre-existing sense inventory. In word sense induction, a system automatically clusters (i.e. groups together) instances of a target lemma based on their sense, such that all instances with the same sense are in the same cluster, and each cluster contains only instances with the same sense. This is similar to the analysis lexicographers carry out when identifying the various senses of a word. In contrast to the supervised approaches that are commonly applied for word sense disambiguation, word sense induction systems tend to use unsupervised learning, including various clustering methods, and Bayesian approaches based topic modelling (e.g. latent Dirichlet allocation).

Although word sense induction could potentially automate a key aspect of lexicography, it has not yet been widely applied for this, in part because the quality of the sense clustering is not yet sufficient. One alternative that has been considered, although only to a limited degree, is to apply a word sense induction system to produce a preliminary sense grouping, which could then be manually refined by lexicographers. Nevertheless, word sense induction has been successfully applied to help lexicographers identify new word senses.

Word senses are not discrete. The boundaries between senses are such that multiple word senses can apply, to varying degrees, to a given corpus instance. There is substantial variation amongst dictionaries in terms of sense granularity – lumping usages together into finer-grained senses, versus splitting them into coarser ones. (Word senses can even be argued to not exist at all!) These issues are a part of why word sense disambiguation and induction are such challenging problems for natural language processing. Corpus pattern analysis groups usages of a target word based on its regular patterns of usage, as opposed to the much more abstract notion of word sense. Automatically identifying and distinguishing between corpus patterns might therefore be less difficult problems for natural language processing.

Examples

Dictionaries often include real or lightly edited corpus examples of words to illustrate their usage. However, not all corpus instances make for good dictionary examples – particularly for language learners – and identifying such examples can be a time-consuming task. Nevertheless, good example sentences typically have some common properties. For example, they avoid words that learners are unlikely to know and anaphora that would require additional context beyond the sentence to understand. Moreover, examples ideally contain common patterns of usage, for example, showing typical collocations for a target lemma. These common properties of good dictionary examples can be leveraged to automatically identify them. The GDEX system scores sentences based on a variety of factors that could indicate that they are good examples. These factors include sentence length and word frequency (as proxies for difficulty of interpretation for learners), the number of anaphors (to avoid sentences that likely require broader context to interpret), and whether the target lemma occurs in the main clause of the sentence. Candidate examples, sorted by their scores as computed by GDEX, can then be presented to a lexicographer to speed up the process of selecting examples to be included in a dictionary.

Definitions

Modern lexicography can be viewed as a two-step process. In the first step, ‘analysis’ lexicographers analyse corpus data to determine relevant facts about language, which are then entered into a lexical database. In the second step, ‘synthesis’, lexicographers use the contents of the database to produce a dictionary. Much work in natural language processing related to (semi-)automating lexicography focuses on analysis; here we consider the automatic creation of definitions, one aspect of synthesis.

Some sentences are themselves definitions. Although such sentences are typically rare, in a sufficiently large corpus, there could be many of them. Some work in natural language processing has therefore focused on automatically extracting definitions from corpora. ‘Hearst patterns’ are lexico-syntactic patterns that indicate hyponymy (‘is a’) relationships; for example NP, NP, and other NP as in apples, bananas, and other fruits, indicates that apples and bananas are both kinds of fruit. By searching for such patterns in a large corpus, hyponyms can be automatically discovered. This idea can also be applied to find definitions by using patterns such as X is defined as Y. More sophisticated approaches based on supervised machine learning have also been proposed to distinguish between definitional and non-definitional sentences. However, one limitation to any approach to finding definitions is that definitional sentences simply might not exist for many words in a corpus.

Very recently, approaches to automatically generating definitions using neural network approaches – specifically, encoder-decoder models – have been proposed. In an encoder-decoder model, an encoder reads the input and encodes it as a fixed-length vector; the decoder then uses this representation to decode the output. This model can be applied to many problems. For example, in machine translation, a source language sentence is encoded into a vector (e.g. using a recurrent neural network), and the decoder generates a target language sentence from this vector (again using a recurrent neural network). In image captioning, an image is encoded as a vector (e.g. using a convolutional neural network), and a caption can then be generated from this vector (once again using a recurrent neural network). To apply the encoder-decoder model to generate definitions, the encoder represents the meanings of words as vectors via word embeddings; the decoder then generates definitions from these representations using a recurrent neural network. Training the model requires a corpus to learn word embeddings from, and a set of words and their definitions.

Automatically-Constructed Dictionaries

One interesting possibility at the intersection of natural language processing and lexicography is whether it will be possible to fully automate dictionary construction. For certain types of specialised dictionaries and lexicons, this has already been done. For example, sentiment lexicons indicating the degree to which words are typically used positively or negatively can be automatically derived, in part by using lexical association measures to identify words that are strongly associated with known positive and negative words. Normalisation lexicons, which associate non-standard word forms (e.g. tmrw) – which are commonly found in social media text – with their standard forms (e.g. tomorrow) are an important component of some natural language processing systems, and can be automatically built using methods that incorporate distributional similarity – non-standard word forms and their corresponding standard forms tend to occur in similar contexts. Texts that provide location metadata can be used to identify regionalisms, which could form the basis for a dictionary of regionalisms. Frequency dictionaries can be largely automatically derived from corpora, as can example dictionaries (dictionaries that only provide examples for each headword) by applying methods such as GDEX.

Here, we will consider whether the various natural language processing methods discussed so far could be used to automatically construct a more conventional dictionary. Starting from a corpus, there would be no need to choose a list of headwords; in a fully automatically produced dictionary, an entry could potentially be derived for any lemma that occurs sufficiently frequently. Corpus pre-processing methods including lemmatisation, part-of-speech tagging, and parsing could be applied to the corpus. With this morpho-syntactic information, methods for finding multiword expressions, building from measures of strength of association, could be used to identify multiword headwords. A word sense induction system could be run to cluster the usages of each lemma (possibly including multiword expressions) according to sense. From this clustering, it would be possible to automatically identify collocations and prototypical patterns of usage at the level of word senses. Furthermore, methods such as GDEX could be applied to extract example sentences, again at the word sense level. Methods for automatically generating definitions could potentially be extended to provide definitions for word senses. Distributional similarity could be applied to identify sense-specific synonyms, and corpus metadata could be used to infer usage and regional labels. The result would be something resembling a conventional dictionary.

Most of the technology to do this already exists, so it does not seem unreasonable to think that automatically generated dictionaries could be a future possibility. However, word sense induction remains a very challenging problem in natural language processing, and techniques for generating definitions have only very recently been proposed. High-quality methods for fulfilling these tasks would seem to be necessary for this to succeed. Another consideration is that even if it were possible to generate a dictionary automatically, would such a dictionary be useful to anybody? The needs of the intended user of a dictionary need to be taken into account in its creation. User needs would therefore also have to be considered in building a system for automatically generating dictionaries. For example, a word sense induction system would need to produce not just a sense clustering, but one that makes sense distinctions that are appropriate for the intended user. Similar considerations would apply to definition generation.

Many dictionaries already exist, and keeping dictionaries up-to-date is an ongoing and expensive task. Although the possibility of automatically creating dictionaries is interesting, aside from the case of very specialised types of dictionaries, methods for automatically or semi-automatically updating dictionaries appear to have more immediate practical value. Another future possibility is therefore to use automated methods to produce dictionary updates, which could then be examined and revised by lexicographers, and optionally presented to users in the period before they are able to be manually examined.

Conclusion

The application of natural language processing has enabled many lexicographical tasks to be automated to varying degrees. This can speed up the process of writing and updating dictionaries, and can also improve quality through increased systematicity. This chapter has presented an overview of natural language processing methods that are currently used in lexicography for tasks including pre-processing corpora, identifying collocations, creating distributional thesauri, and extracting good dictionary examples. We also considered word sense disambiguation and induction, which are challenging problems in natural language processing, but advances which could lead to further automation of key tasks in lexicography. We also considered very recent work on automatically generating definitions, and discussed how, in the future, these various technologies could potentially be put together to create dictionaries entirely automatically.

Book contents

Part II - English Dictionaries Throughout the Centuries

Summary

Information

Prologue

The Age of Hand-Press Printing

The Age of Machine-Press Printing

The Age of the Digital Dictionary

Edmund Coote’s The English Schoole-Maister (1596)

Robert Cawdrey’s A Table Alphabeticall (1604)

Comparing Coote and Cawdrey

Conclusion

Table 10.1 Prominent seventeenth- and eighteenth-century English dictionaries, and the texts upon which they were based

Table 11.1 Key eighteenth-century dictionaries of English (including bilingual and pronouncing dictionaries)

Contact

Noah Webster, the Brothers Merriam, and the First War of the Dictionaries

Late Nineteenth-Century American Lexicography

American Dictionaries at School and University

The Second War of the Dictionaries: Webster’s Third and its Adversaries

American Historical Dictionaries and Historical Dictionaries of American English

Twilight of the American Dictionary

The Original Vision of the OED

Nineteenth-Century Context: Empire, Europe, and Continental Philology

The First Edition of the OED

Lexicon totius Anglicitatis: a Dictionary of All English

The Editors of the First Edition of the OED

The OED Supplements

The Second Edition of the OED

The Third Edition and OED Online

OED as Data

The Middle English Dictionary

History

Characteristics and Innovations of the Middle English Dictionary

The Dictionary of Old English

History

1 Development of the Research Collection

2 Digitisation of Old English Texts

3 Writing the Dictionary of Old English

Characteristics and Innovations of the Dictionary of Old English

Beginnings

The Challengers

The Year of the Dictionaries

Further Competitors

Electronic Variants

Conclusion

Digitisation of Databases

The Development of Corpora

The Handheld Electronic Dictionary: A Limited Technology

The Dictionary on CD-ROM

The Internet Migration

Universal Access and the Dictionary in the Cloud

What Happened in 1987?

What is a Corpus?

Different Kinds of Corpora

General Corpora

Newspaper Corpora

Domain-Specific Corpora

Historical Corpora

Spoken Corpora

Corpora of Headlines and Advertisements

Bilingual Corpora

Preparing a Corpus for Use

Using a Corpus

Different Kinds of English Dictionaries

Dictionaries on Historical Principles

Dictionaries on Synchronic Principles

Dictionaries for Foreign Learners

Dictionaries of Collocations

How Have Dictionaries Made Use of Corpus Evidence?

How Could Dictionaries Make Better Use of Corpus Evidence?

Conclusion

Corpus Pre-processing

Collocations

Distributional Thesauri

Word Sense Disambiguation and Induction

Examples

Definitions

Automatically-Constructed Dictionaries

Conclusion

Footnotes