The Future of Dictionaries

Part VI The Future of Dictionaries

Chapter 31 The Future of Dictionaries

31.1 Prologue: On Visionaries and Their Visions

I initially finished this contribution on the day Sue Atkins passed away. Sue Atkins was not only a great visionary in the field of lexicography; she also changed the lives of those who came into her orbit. More than that, she made them work so hard that her futuristic insights could be transformed into actual working tools and products. This achievement should not be taken lightly. People who are ahead of their time often work in isolation or are simply not understood by their contemporaries. Who would have thought that the obscure mathematics of the theoretical physicist Paul Dirac, which had been gathering dust for decades, would be rediscovered and put to good use for detecting and recovering errors in digital computing devices (Reference Kaye, Laflamme, Mosca, Kaye, Laflamme and MoscaKaye et al. 2007)? Think: Scratch a CD-ROM and it can still be read. Listen to Ludwig van Beethoven’s Hammerklavier or his Razumovsky string quartets, knowing that this type of music utterly confused his audience 200 years ago, with musicians then finding it unplayable; yet listen to it today and find it most enjoyable. Also remember that Beethoven knew he was not bringing notes together for his contemporaries, claiming: “Oh, they are not for you, but for a later age” (Reference McCallumMcCallum 2020). Bring in Sue Atkins and realize that her most provocative concept – the “virtual dictionary” (Reference Atkins, Gellerstam, Järborg, Malmgren, Norén, Rogström and PapmehlAtkins 1996), one that exists only at the time of access (Reference de Schryverde Schryver 2003, 162–163) – is simply not for mere mortals to comprehend. As Reference Atkins, Kiefer, Kiss and PajzsAtkins (1992a, 521) rightly warned, if our future dictionaries “do not result in a product light years away from the printed dictionary, then we are evading the responsibilities of our profession.” She spoke those words at a EURALEX conference over three decades ago; since then, in my view, we have indeed been sleeping at the wheel.

Yet in her orbit, Sue Atkins worked tirelessly to bring the best linguistic theories to lexicography, notably the application of Frame Semantics to corpus analysis with the linguist Charles J. Fillmore, which resulted in a string of highly influential theoretical papers (Reference Atkins, Sue and JohnsonAtkins et al. 2003a; Fillmore and Reference Atkins, Kiefer, Kiss and PajzsAtkins 1992, Reference Fillmore, Sue Atkins, Sue Atkins and Zampolli1994, Reference Fillmore and Sue Atkins1998, and Reference Fillmore, Sue Atkins, Ravin and Leacock2000). Her collaboration with the corpus linguist Adam Kilgarriff eventually resulted in the creation of the Sketch Engine, “the de facto standard software for developing dictionaries and for corpus linguistic research” (Reference RundellRundell 2015, 460). And then we have the binary star system, two of the UK’s practical lexicography stars orbiting around some elusive barycentre, performing an elegant tango: Sue Atkins on the one hand and Patrick Hanks on the other. Through their endless teasing and nudging of one another, they moved the field forward with their talks and writings.

The “real story” of the first COBUILD dictionary, for instance, when both Atkins and Hanks were involved in the revolutionary dictionary project of Sue’s brother John McH. Sinclair, remains to be written. To this date, COBUILD 1 (Reference SinclairSinclair 1987), for which Hanks was the managing editor, remains one of Hanks’ major contributions to practical lexicography. The probability measures proposed in Hanks’ most influential paper, “Word association norms, mutual information, and lexicography” (Reference Church and HanksChurch and Hanks 1990), were the starting point for Kilgarriff’s Word Sketches, which is now a component of the Sketch Engine. Another early project, the Hector Project, was “the first systematic attempt ever to link word meaning with word use using corpus evidence” (Reference de Schryver and de Schryverde Schryver 2010a, 11). The most relevant publications for Hector appeared as another pas de deux, at successive COMPLEX conferences: Reference Atkins, Kiefer, Kiss and PajzsAtkins (1992b) and Reference Hanks, Kiefer, Kiss and PajzsHanks (1994). In his magnum opus Lexical Analysis: Norms and Exploitations, Hanks even goes as far as asking the rhetorical questions: “If Sue Atkins shakes the salt, is Sue Atkins to be regarded as some kind of a [FORCE]? […] But then, what if Sue Atkins shakes the world with her revelations about lexicography?” and he concludes with the realization that the semantics at work “cause cognitive destabilization” (Reference HanksHanks 2013, 98).

The application of Frame Semantics to lexicography, the use of the Sketch Engine in corpus-driven dictionary-making, the fruitful dialogue with fellow lexicographic giants – all of this was sped up thanks to Atkins’s association with Michael Rundell, who had also been part of the COBUILD 1 team, where they first met. Later, Atkins and Rundell teamed up to teach others about these novel concepts and new tools, first in South Africa (Grahamstown in 1997, Pretoria in 1998), later in Europe (Brighton in 2001, followed by various other locations and return visits over the next two decades), as well as in Asia (Hong Kong in 2006), Oceania (Auckland in 2012) and the Americas (Boulder, Colorado in 2016). This hugely important outreach program is now known as the “Lexicom workshop in lexicography and lexical computing” series, with Adam Kilgarriff responsible for the “lexical computing” part from 2001 until 2015. The written material first used in South Africa eventually morphed into a key publication: The Oxford Guide to Practical Lexicography (Reference Rundell and FontenelleAtkins and Rundell 2008). Himself an early believer in corpora (Reference Rundell and StockRundell and Stock 1992a, b, c), Rundell contributed significant original thinking about the future of lexicography, in solo work and with Atkins and the colleagues orbiting her (Reference 700Atkins, Rundell and HiroakiAtkins et al. 2003b; Reference Rundell and CorréardRundell 2002; Reference Kilgarriff, Husák, McAdam, Rundell, Pavel, Bernal and DeCesarisKilgarriff et al. 2008; Reference Rundell and CorréardKilgarriff and Rundell 2002; Reference RundellRundell 2014 and Reference Rundell, Hanks and Gilles-Maurice2017; Reference Rundell, Kilgarriff, Meunier, De Cock, Gilquin and PaquotRundell and Kilgarriff 2011).

I have written about the future of lexicography in a number of earlier studies. Two decades ago I wrote “Lexicographers’ dreams in the electronic-dictionary age” (Reference de Schryverde Schryver 2003), a decidedly upbeat and near-exhaustive discussion of the hot ideas in the field at the time. Dream 112 dealt with “electronic dictionaries in which the potential is explored to link an automatically derived dynamic user profile to the proffered multimedia lexicographic output” (Reference de Schryverde Schryver 2003, 189). Seeing that such types of dictionary were not being compiled in the years following the publication of “Dreams,” I grew frustrated and set out to give a fuller account of my own view of what “adaptive and intelligent dictionaries” could look like (Reference de Schryverde Schryver 2010b), using the concept of (Fuzzy) Simultaneous Feedback as a starting point (Reference de Schryverde Schryver 1999, Reference de Schryver2005, and Reference de Schryver, Gouws, Heid, Schweickard and Wiegand2013). At the eLex 2019 conference, I was interviewed for the Brazilian journal Calidoscópio and gave “An overview of digital lexicography and directions for its future” (Reference de Schryver, Chishman and Brunade Schryver et al. 2019a) – an interview in which I was neither upbeat nor frustrated, just realistic.

The problem with being realistic is that we are running after the facts, and the facts are that as lexicographers we are becoming irrelevant. To the majority of today’s youth, a dictionary means either that old book gathering dust on grandmother’s shelf or anything a search engine returns for any imaginable query, whether it be textual (running words), visual (pictures and videos), auditory (music), combinations of these (a newspaper article with photos; a music video), or all of it combined (subtitled movies). While we lexicographers were sleeping at the wheel, we were succeeded by the Big Data companies, and they seem no longer to need us. This is at the very least annoying, as one of the starting points of the development of all search engines was exactly the manually compiled reference works – the ones compiled by us lexicographers. Lexicographers were even used as “guinea pigs” (often unknowingly) to help build and test such systems. To better understand where lexicography and metalexicography come from, in the hope of saying something new and valuable about the future of dictionaries, it is instructive to briefly review the past half century.

31.2 The Past: The Last Half Century Through the Eyes of Sue Atkins

When I learned that Sue Atkins had passed away, my first instinct, immediately implemented, was to invite the entire lexicographic community to re-watch the last talk given by her; it was a recorded conversation with Michael Rundell – as filmed and edited by Izzie Reference ShawShaw (2020). While that video had been online for about a year, the number of views almost doubled overnight, to two thousand. As so well formulated by Rundell in the video, “Sue’s impact on the profession, on the way we produce dictionaries, on the way we think about language, really can’t be overestimated.” The video may be characterized as an extended TED event, marrying a lexicography masterclass, an autobiography, a review of the field from the 1970s onwards, taking in milestone papers and colleagues, while also applying small vignettes to the heavenly bodies in Sue Atkins’ orbit.

It is hard to imagine today that when Sue Atkins started off her career working for Collins, she was simply told to look at other dictionaries for inspiration, to write out her dictionary articles on six-by-six index cards (which had to be sent back to headquarters by post), to use three colors of ink (black, red and green – each for a distinct typeface), and to phone colleagues after 18:30 (for the cheaper tariff). The state of many (competing) bilingual dictionaries then was one in which translation equivalents were listed in succession, separated merely by semicolons. By the time she completed compiling her English–French dictionary (Atkins and Robert 1978), she had developed the first style guide, come up with a practical approach to dealing with register labels, designed methods to choose good examples, and learned to think really hard about how to approach phrasal verbs.

Despite the tension between dictionary compilers who worked for commercial publishers (like Atkins) and dictionary buffs in academia, we learn in the conversation with Rundell how Atkins was a founding member of EURALEX, was elected to its first board (as secretary, naturally), and later became its president. She talks about the Hector Project, the BNC, WASPS, Word Sketches, the Sketch Engine, GDEX and DANTE – all of them milestones in lexicography. Take the Hector Project, for which she was invited to Palo Alto, California, together with Patrick Hanks and Rosamund Moon, where they were dazzled by having banks of six simultaneous screens but also quickly realized the Palo Alto engineers were not really interested in language nor the development of a proper language tool. Even stronger: “Every time it crashed, it was wonderful for them, because they knew what query had crashed it” – it, it later turned out (though they had no idea at the time), was AltaVista, an early Web search engine that was being developed. About the struggles to create the British National Corpus (BNC), we learn that Della Summers (from Longman) came up with the important concept of viewing a corpus as “a pre-competitive resource,” meaning “we could all use the same corpus and produce completely different dictionaries.” To date, this simple concept has still not been grasped, as may for instance be deduced from the latest survey on licensing lexicographic data (Reference Kosem, Nimb, Tiberius, Boelhouwer, Krek, Gavriilidou, Mitits and KiossesKosem et al. 2021b).

In order to fully grasp the genesis of the revolutionary COBUILD project, the reported conversations between Sue Atkins and her brother are revealing. Sue Atkins: “The real trouble about lexicography is that by the time the dictionary is in print, you’ve thrown away 90 percent of what you know about the words. And if you don’t throw that 90 percent away, no one would ever use the dictionary.” To which John Sinclair replied, “Well, I have a way to keep the 90 percent” and he would then explain how he first wanted to build a corpus.

Similarly, the reported conversations between Atkins and Fillmore give us an insight into how the American linguistic icon came to be involved. Fillmore: “I have to admit that you are not going to persuade me that I need a corpus. Because I know that in my head is what I know about language, and that is what I use when I am writing.” Atkins responds with “a little talk on RISK” and asks what the difference is between sentences like “There are too many risks there” and “There is too much risk involved.” Fillmore couldn’t tell and conceded: “I am really terribly sorry; I must now withdraw my comments.” Atkins and Fillmore then went on to work on “Frame Semantics Light,” which could be used by lexicographers.

What may come as a surprise to some readers is that Sue Atkins seems to have been particularly fond of her experiences in South Africa. About Nelson Mandela she says that “he felt particularly strongly that if you didn’t have a dictionary for your native language, you perpetually felt at a loss; you are identified as a second-class citizen.” She thus happily took up the invitation from Penny Silva to teach in South Africa but, tongue-in-cheek, claims: “I did all the wrong things! For instance, I’m only gonna give you half an hour on headwords, because of course headwords are so easy. And it turned out that the most difficult thing of all in many of the African languages was the headword. Which of the many, millions sometimes, forms of the word could you consider the headword?” When she later gave an assignment to compile an article for the word green, it came as a surprise that “many of the languages in the room had no word for green!” Clearly, this was a two-way learning exchange.

A last outstanding feature which needs to be mentioned as it is so characteristic of Sue Atkins is that, throughout the interview, she assigns vignettes to almost all the people she mentions, in order of appearance: Alain Duval, “a great friend and absolutely wonderful linguist”; John Sinclair, “a typical brother in every possible way”; Juri Apresjan, “a great lexical semanticist”; Krista Varantola, “a very honest translator”; Tony Cowie, “a brilliant lexicographer”; Antonio Zampolli, “the lifeblood of European computational linguistic research in those days”; Nicoletta Calzolari, “an extremely gifted linguist”; Beth Levin, “the most meticulous researcher I have ever met”; Igor Mel’čuk, “a brilliant academic lexicographer”; Chuck Fillmore, “the source of all wisdom for lexicographers”; Patrick Hanks, “a great lexicographer and a very-very gifted linguist, especially in lexical semantics; and a very good project manager”; Rosamund Moon, “a distinguished academic and lexicographer”; Jeremy Clear, “the computer whizz at COBUILD”; Nick Ostler, “an extraordinary linguist”; Uli Heid, “a brilliant organizer and a brilliant linguist; and very nice, very pleasant”; Penny Silva, “an absolutely organized, wonderful woman; and brilliant lexicographer”; Michael Rundell, “the most nice person I could live with for two weeks, and yet is competent and would do things really nice, and turn up on time, and know what he was doing”; Danie Prinsloo, “like living with a charged dynamo”; Gilles-Maurice de Schryver, “undoubtedly even more dynamically charged”; and Adam Kilgarriff, “a lateral thinker and completely brilliant.” Probably only a seasoned lexicographer like Atkins could have kept the pace, diligently filling in the vignette slot.

31.3 The Future of Dictionaries: Thinking Out of the Box

So what have we learned so far? Yes, stunningly innovative concepts and tools have been worked on over the past half century, and Sue Atkins was at the center of most of it. But perhaps, lexicographically speaking, Indo-European languages like English or French are simply “too easy” to warrant the development of truly innovative dictionaries, which in turn might explain the frustration some of us experience at the lack of true progress. It starts with the lemma. If it is non-problematic to lemmatize the words from each of the word classes (four morphological forms for verbs in English, three for adjectives, two for nouns, and for most other word classes just one), then, yes, a digital dictionary will simply be a calque of the paper dictionary when it comes to the macrostructure and the question of how to approach the lemmata. But consider a language in which one simply does not even know where to begin the lemmatization. Every attempt to produce a paper dictionary will look different, as every lexicographer will have their own approach, but this can easily be “solved” in a digital environment merely by not forcing any type of lemmatization. Users of such a dictionary simply search for orthographic words as they are pronounced or written, or as they are heard or read. A recent case study is provided by lexicographic work on Hupa, a Native American language of northwestern California (Reference SpenceSpence 2021). While it is laudable that an attempt was made to find the “logical” forms which should serve as lemmata, it seems so much more straightforward for polysynthetic languages like Hupa to employ the power of a computer to do the decomposing and recomposing for the digital dictionary user: simply allow users to input full orthographic forms and give explanations (or translations) for those forms. That would also mean that the dictionary actually has no lemmata and no real macrostructure, but so what? That is truly thinking out of the box.

After having looked into the macrostructure, traditional lexicography dictates that we then need to discuss the microstructure of future dictionaries, followed by possible mediostructures, megastructures, and so on. But this approach is ineffective, I am convinced, as every aspect of future dictionaries, as well as lexicography in general, will be radically different. To illustrate this, I summarize my thoughts using about fifty “oppositions” in which I contrast the past half century with my extrapolations for the future. These oppositions are grouped into five subsections and then recapped in a table after each one.

31.3.1 The Dictionary-Making Process

All future dictionaries will be born digital. While this may be uncontroversial, many will also be compiled via crowdsourcing over the Internet (Arhar Reference Holdt, Špela, Pori, Kosem, Gavriilidou, Mitsiaki and FliatourasHoldt et al. 2020; Reference Lew and Gilles-MauriceLew 2014; Reference Rundell, Hanks and Gilles-MauriceRundell 2017 ) and stored in the cloud, rather than being produced by publishers or academics. The genesis and work-division of future dictionaries will be the result of bottom-up processes rather than the current top-down ones. As anyone will be able to compile dictionaries with anyone else, possibly anonymously, this would be decidedly more democratic than the current undemocratic craft undertaken within a controlled environment (in-house or otherwise). All future dictionary evidence will come from (semi-)automatic corpus extraction rather than from manual reading and marking or, since the COBUILD project, from (semi-)manual corpus extraction. Save dictionaries for newly described languages, most dictionaries of the past were the result of “copying in alphabetical order”; in future dictionaries, corpus material will be analyzed and synthesized from scratch. While entries used to be written by hand, defining and translating will be automated in future. The practice of inventing examples (Reference Fox and SinclairFox 1987) is definitely a thing of a bygone era; only real language as found in corpora of naturally produced speech or text will be deemed acceptable. Actually, the entire compilation process, which used to be undertaken by humans, will be performed by machines. The software and computational routines to make this possible are near-ready (Reference Baisa and KosemBaisa et al. 2019; Reference Kilgarriff, Kovář, Pavel, Granger and PaquotKilgarriff et al. 2010). Dictionary compilation will also be faster than ever before; in effect, and if so wished, it could be “live” online, with users instantly able to use (and comment on or even contribute to) the material. While lexicographers tended to be language graduates, future ones will typically be computational linguists or computer scientists. These changes can be summarized in Table 31.1.

Table 31.1 The dictionary-making process: From real world to born digital

Past and Present	Future
publisher and academia-driven	crowd-sourced
top-down	bottom-up
undemocratic	democratic
manual reading and marking > 1980s: (semi-) manual corpus extraction	(semi-)automatic corpus extraction
copying in alphabetical order	analysis and synthesis from scratch
entries written out by hand	automatic defining and translating
invented examples	corpus examples
dictionary compilation by humans	dictionary assembly by machines
slow production	fast production
lexicographers are language graduates	lexicographers are computational linguists or computer scientists

31.3.2 Supporting Tools and Concepts

Both in terms of tools and concepts, the field of dictionary-making will continue to professionalize. While many (especially large, historical) dictionary projects still make use of index cards stored in shoeboxes or filing cabinets, with lexicographers copying over and typing out material using word processors, all future dictionaries will employ corpus query packages and dictionary writing systems – a distinction which will disappear, as both are needed, so the two will end up being combined seamlessly (Reference de Schryver and Guyde Schryver and De Pauw 2007). While the large majority of the dictionaries of the past were produced in a theoretical vacuum, future ones will start by stipulating and using a linguistic theory, perhaps Meaning-Text Theory (Reference Mel’čuk and FerencMel’čuk 1973), Frame Semantics (Reference FillmoreFillmore 1982), Construction Grammar (Reference Croft, Sutton, Hanks and Gilles-MauriceCroft and Sutton 2016; Reference Fillmore, Kay and O’ConnorFillmore et al. 1988; Reference LakoffLakoff 1987), Generative Lexicon (Reference PustejovskyPustejovsky 1995 ), or the Theory of Norms and Exploitations (Reference HanksHanks 2013). Up to the present day, lexicographers are still trying to cut up the field of lexicography into various fixed dictionary types: glossaries, vocabularies, terminologies; dictionaries, thesauri, encyclopedia; and so on (see Adams, Chapter 1, this volume). In future dictionaries, these distinctions will blur and thus become irrelevant. All but a few of the existing dictionaries show an undue insistence on meaning, while we have known for some time now that words don’t have meanings, only meaning potentials triggered by their contexts; future dictionaries, therefore, will show uses only, literally “mapping meaning onto use” (Reference Hanks and CorréardHanks 2002). A particular problem in bidirectional bilingual lexicography has been the absurd insistence on cramming eight dictionaries into one – trying to satisfy both native speakers and learners, for both encoding and decoding purposes, and including both directions within a single volume (2 x 2 x 2) – with the effect that the target language (TL) always exerts a “pull” on the source language (SL); future digital dictionaries can and will avoid this (Reference Atkins, Gellerstam, Järborg, Malmgren, Norén, Rogström and PapmehlAtkins 1996). While it used to be the case that the procedures seen in dictionaries for major languages like English set lexicographic trends, current and future lexicographic innovations are – and will be – found in dictionaries for languages of limited communication (Reference PrinslooPrinsloo 2005; Reference Prinsloo, Ulrich Heid, Bothma and GertrudPrinsloo et al. 2012; Reference Prinsloo, Theo, Bothma and PrinslooPrinsloo et al. 2017), inclusive dictionaries (Reference Chishman, da Silva, dos Santos, Treichel Vianna, Oliveira, Martins, de Schryver, Gavriilidou, Mitits and KiossesChishman et al. 2021; Reference McKee, Vale, Hanks and Gilles-MauriceMcKee and Vale 2016), and even dictionaries for endangered or revitalized languages (Reference OgilvieOgilvie 2011). These changes are summarized in Table 31.2.

Table 31.2 Supporting tools and concepts: From amateurism to professionalism

Past and Present	Future
shoeboxes, filing cabinets; word processors	corpus query packages; dictionary writing systems
theoretical vacuum	linguistic theories
fixed dictionary types	blurring of all dictionary types
focus on meaning	focus on use
TL exerts a pull on the SL	SL given proper treatment
dictionaries for major languages set the trends	small-language, inclusive and endangered-language dictionaries set the trends

31.3.3 Appearance of the Dictionary

To this day, if one conjures up the image of a dictionary, a physical product comes to mind – even if that dictionary is a digital one. The appearance of the dictionary of the future, however, will be so intangible that one will not be able to form an image of it. To start with the obvious, all future dictionaries will be digital. However, they will also be dynamic rather than static in that the level of detail in the display will be variable, whereby one zooms in to or out from different levels of granularity, with multiple layers of varying complexity in the interpretation, and entire modules (“dictionary slots”) being switched on or off, all of it search-centric (think of the animated visualization of semantic fields, with the object of the search always in the center), and all of it organic and thus changing across time. Going from past to future, one also moves from a finished product to an interactive service. Today, lexicographers deliver a completed dictionary to their users – there is no dialogue to speak of. Hereafter, lexicographers will be conversing with their users, with the dictionary potentially morphing into a different dictionary following each interaction. The greatest problem with the dictionary of the past (and present) is that its published appearance must fit all users for all uses at all times (“one size fits all”). The greatest breakthrough will therefore be achieved by realizing a personal dictionary, unique for each user, adapting to the task at hand, and changing over time with the changing habits and knowledge of the user as well as the evolving language facts. Here, of course, Sue Atkins’ “virtual dictionary” comes into view (Reference Atkins, Gellerstam, Järborg, Malmgren, Norén, Rogström and PapmehlAtkins 1996), with earlier comparable ideas found in Reference DoddDodd (1988 and Reference Dodd and Gregory1989) and later ones in Reference de Schryverde Schryver (2010b).

More obvious is that while present dictionaries already deal with one medium (text), two media (+ drawings/pictures) and even three media (+ audio), future ones will truly include multimedia (+ video, + own recordings). Likewise, while monolingual, bilingual, and trilingual dictionaries are well established, future dictionaries (i.e. “digital databases”) will be able to output truly multilingual and even any-lingual dictionaries on demand (see the concept “one database, many dictionaries” by Reference de Schryver, Joffe, Vincent, Ooi, Talib, Tan, Peter and Tande Schryver and Joffe (2005)).

At present, consulting a dictionary still proceeds very much via the lemma, whereas future dictionaries will allow consultation via any inflected form. While searches for lemmata with unpredictable spellings are often hit-and-miss today, automatic re-routing will become standard in future dictionaries. There will no longer be a need to make a distinction between alphabetic (semasiological) and thematic (onomasiological) dictionaries, as any type will easily be output from the same digital database. More generally, the single access route of the current dictionary will be replaced by multiple access routes, leading not only to what is now the macrostructure but also directly to and into any part of the microstructure.

The study of log files attached to current online dictionaries (Reference de Schryver, Joffe, Williams and Vessierde Schryver and Joffe 2004; Reference de Schryver, Wolfer and Lewde Schryver et al. 2019b ) has revealed that dictionary users no longer consult single words (whether lemmatized or not) and now seek treatment of any length of text. Future lexicographers will thus have to put effort into the handling of all sorts of (frequent) clusters and collocates (Reference 720Grefenstette, Fontenelle, Hiligsmann, Michiels, Moulin and TheissenGrefenstette 1998, 39). More generally, dictionary users have long stopped looking up dictionary material but now routinely search digital dictionaries (de Schryver 2012, 130). Future dictionaries will have to be at least as good as today’s search engines in providing exact answers to imprecise, fuzzy, and even incomplete queries (Reference Varantola and CorréardVarantola 2002, 31).

Whereas dictionary users presently still manually thumb a book or physically type in a query, future users will simply speak to their machines and ask and hear (explanations, translations, examples, etc.), or they will merely look and see (encyclopedic information – in their Google glasses), or even just point a device at objects and hear (decoded local street signs – on their smartphones) (Reference de Schryver, Chishman and Brunade Schryver et al. 2019a, 669–670; Reference Lew and Gilles-MauriceLew and de Schryver 2014, 352).

Current lexicographical products have a clear lexical bias, whereas those of the future will include seamlessly attached multimedia corpora (including original texts, songs, videos), as well as cross references (hyperlinks) not just to dictionary-internal but especially to dictionary-external resources. Users unsatisfied with the synthesis prepared by the machines (even when curated by lexicographers) will be in a position to study the raw data or the summaries prepared by dedicated lexicographic tools such as Word Sketches, being short corpus-generated summaries of how words typically behave in context (Reference Kilgarriff and TugwellKilgarriff and Tugwell 2001), or GDEX, being a tool for the “good” extraction of dictionary examples from corpora (Reference Kilgarriff, Husák, McAdam, Rundell, Pavel, Bernal and DeCesarisKilgarriff et al. 2008).

All in all, then, future dictionaries will cease to be a product and become a true service. As with any other service, users will have to decide how much they want to make use of it. They may very well not even realize that dictionaries are involved in what they are doing. “The dictionary” may still be pinpointed as long as these dictionaries are stand-alone tools, as the great majority are today, or when they come bundled with other language tools, such as in the still impressive COBUILD III on CD-ROM from 2001 (which came together with a thesaurus, a usage guide, a grammar, and a five-million-word corpus – all linked to one another) or such as the dictionaries that are part of spell-checking software. However, in the future most dictionaries will be subsumed by other tools, such as augmented writing assistants – a beautiful example of a work in progress being ColloCaid, a text editor that provides real-time collocation suggestions (Reference Frankenberg-Garcia, Lew, Roberts, Rees and SharmaFrankenberg-Garcia et al. 2019). Picturing the dictionary, therefore – in terms of what it is, what it does or how it looks – will be nigh impossible in the future. All these changes are summarized in Table 31.3.

Table 31.3 Appearance of the dictionary: From physical to intangible

Past and Present	Future
paper	digital
static	dynamic
finished product (compilers -> users)	interactive service (compilers <-> users)
one size fits all	personal, customized
one medium (text); two media (+ drawings/pictures); three media (+ audio)	multimedia (+ video, + own recordings)
mono-, bi- and trilingual	multi- and any-lingual
dictionary entry via lemmata	dictionary entry via lemma or any inflected form
hit-and-miss	automatic re-routing
semasiological vs. onomasiological	no need for semasiological vs. onomasiological distinction
single access route	multiple access routes
word (headword) treatment	treatment of any length of text
(exact) look up	(fuzzy) search
manually thumb a book; physically type in a query	ask and hear; look and see; point and hear; …
lexical bias	lexis + data (i.e. multimedia corpora)
bound x-refs	unbound x-refs
synthesis only	DIY analysis
stand-alone; bundled	subsumed by other tools

31.3.4 Facts About the Dictionary

When future dictionary users “use” their dictionaries (knowingly or, as argued above, in most cases unknowingly), they will know a few facts about their language tool. Firstly, they will know that their tool is always up to date. This will be the case, not only because users will expect it, but especially because it will be a design feature of future dictionaries. (If they are not, the dictionary will not sell – period!) Given that dictionary production will be fully automated and that a variety of monitor corpora will keep a finger on the pulse of language change, dictionary contents will always be up to date, much as the indexes of search engines at present are routinely updated. (No one – well, hardly anyone – is interested in yesterday’s news or the location of yesterday’s traffic jams.) Secondly, future dictionary users will know that the searches they perform will always result in context-sensitive answers. (If they are not, the dictionaries will be useless.) The problem that needs to be (and will be) solved is that of word-sense disambiguation. Without it, dictionaries of the future will simply frustrate users and waste their time. Consider a search for first papers in the online Merriam-Webster Dictionary that I carried out on February 13, 2023. The definition clearly explained that first papers are those submitted as the first step in becoming an American citizen, but not one of the eight automatically generated examples related to this meaning. Thirdly, future dictionary users will know that their accessing the dictionary will always be helpful. Even if there is no perfect answer, the tool will still present the user with interesting information. (If this is not the case, the dictionaries will not be able to sell adverts nor steal users’ data.) After all, even if the e-book you so much wanted to buy is out of stock, you are offered another one which happens to be just as interesting. Fourthly, future dictionary users will know that they are interacting with machines and thus with tools made for machines, to which they just happen to be privy. Those users will be unforgiving: machines don’t make mistakes and are thus “always right.” (If they are not, machines be damned.) Fifthly, while, by and large, the past was about historical (diachronic) dictionaries and the present about learners’ dictionaries, future users will know that their dictionaries are at their best when it comes to neologisms. (If they are not, there are literally “no recent events” and there is thus “no news.”)

When covfefe suddenly appears in a tweet, you simply have to know. And while some dictionary updates of the paper era are legendary, such as the one for sputnik which made it into the Thorndike-Barnhart Comprehensive Desk Dictionary in record time at the end of the 1950s (Reference BarnhartBarnhart 2017; Reference MintonMinton 1958), future neologisms will truly be able to enter dictionaries simultaneously with their entering the language. This fact has not escaped today’s lexicographers. Neologisms and what to do about them in digital lexicography featured at every one of the continental lexicography congresses in 2021. At the Dictionary Society of North America’s 2021 biennial meeting, Stefan Fatsis read a paper (Reference FatsisFatsis 2021) in which he explained that the tracking of failed searches led the dictionary team at Merriam-Webster to take “unprecedented measures” of preparing and releasing “coronavirus words” in about a month’s time. A batch of twenty items went live on March 16, 2020, and included COVID-19, social distancing, self-quarantine, community spread, contact tracing, super spreader, and patient zero. As is often the case with neologisms, most of these words and word combinations already existed but had not yet made it into the dictionary or were given an additional pandemic-related sense or were revised (as with the entry for coronavirus itself) to include cross references (here to COVID-19, in addition to MERS and SARS). As part of the scheduled April 2020 update (of “535 new words”), more coronavirus entries and senses entered the dictionary: self-isolate, PPE “personal protective/protection equipment,” WFH “work(ing) from home,” forehead thermometer, physical distancing – as well as the noun physical distance and the verb physically distance.

At the ASIALEX congress that same year, Yongwei Gao presented, not another update to a dictionary with a few coronavirus terms but two full-blown and published dictionaries (English–Chinese and Chinese–English) solely with terms related to COVID-19. It was no different at the other conferences that year. At AFRILEX, Evans Lwara talked about COVID-19 terms in Chichewa, and at eLex Iztok Reference Kosem, Krek, Gantar, Holdt, Čibej and KosemKosem et al. (2021a) presented a monitor corpus for Slovene, revealing (unsurprisingly) that all the top salient words for 2020 were corona-related. At both EURALEX and AUSTRALEX, finally, GLOBALEX held the second and third iterations of its workshop series on neologisms, the first one of which had taken place at the DSNA 2019 congress. The third one was devoted entirely to COVID-19. These changes are summarized in Table 31.4.

Table 31.4 Facts about the dictionary: From old to new

Past and Present	Future
always out of date	always up to date
context-free information	context-sensitive answers
sometimes helpful	always helpful
for humans (and machines)	for machines (and humans)
focus on historical (diachronic) dictionaries > 1980s: focus on learners’ dictionaries	focus on neologisms

31.3.5 Image of the Dictionary

Much to the chagrin of today’s lexicographic community, the authority with which the fruits of our labor used to be regarded is being eroded; the dictionary of the future will be self-effacing. Ironically, this change is going hand in hand with the (defensible) move from prescriptiveness to descriptiveness in lexicography (see Reference 697AdamsAdams 2015; Chapman, Chapter 16, this volume; Finegan, Chapter 19, this volume). By presenting rather than imposing language facts, however, we concurrently hand over any control we used to have. In future, that handover will have been completed. While the coverage in our dictionaries has out of necessity always been limited, at least it was curated; in the future, with open-ended coverage, the result can only be a ragbag. Given the past prestige of our reference works, people have always been willing to pay for them; in the future only free dictionaries will be considered “good dictionaries.”

Until recently, people understood that it was essential to consult dictionaries and that they could truly benefit from doing so; in the future people will stop realizing that they actually need dictionaries. In the past, dictionary users appreciated that a lot of work went into compiling reference works; in the future, dictionaries will be seen as containers of mere facts, and because one cannot own facts, dictionary contents will be seen as a public good that can be freely copied, as well as used and reused, without acknowledgment. This is much akin to the current student practice of plagiarizing paragraph-long sections from Wikipedia, without any referencing, with the argument that that type of Internet content “belongs to no one.” We are also witnessing the rapid consolidation of the last few remaining reference publishers; a future monopoly is certainly not a good prospect.

Whereas travel to exotic places used to be truly challenging, including in linguistic terms, future dictionaries and the language tools in which they will be embedded will contribute to the illusion that the world’s rich diversity is gone, and in so doing will hasten its actual homogenization. An Ethiopian who wishes to order a dish in China where the menu is in Mandarin only? No problem, point your smartphone to the Chinese characters, and hear them pronounced in Amharic. Then speak and order in Amharic and have your smartphone do the talking for you in Mandarin. A Japanese who wants to make sense of an email sent in Greek? No problem, simply dump the contents into Google Translate. Then compose an answer in Japanese and do the reverse: the answer arrives in Athens in perfect Greek. Millennia of writing and centuries of printing dictionaries have resulted in a wide range of linguistic, typographical, and non-typographical ways to condense material, including abbreviations, codes and symbols, and telegraphese. Interestingly, these techniques also helped to give dictionaries a scholarly, complex appearance. None of this will be needed in future dictionaries, however, and with the disappearance of the looks will also go the visualization of scholarship. All of this boils down to the fact that the symbolic value which (printed) dictionaries had in the past, their Bible-status, will vanish into thin air in the future. These changes can be summarized as in Table 31.5.

Table 31.5 Image of the dictionary: From authoritative to self-effacing

Past and Present	Future
proudly prescriptive	merely descriptive
limited curated coverage	open-ended ragbag coverage
dictionaries are expensive but worth the cost	only free dictionaries are good dictionaries
users know it is essential to consult dictionaries	users do not realize they still need dictionaries
dictionaries are the result of hard and serious labor	dictionaries contain “mere facts,” so can be freely copied
competing and differing reference publishers	consolidation, one-stop service
need to be able at least to read and understand “exotic” scripts and languages	no need to be able to read or understand anything exotic
abbreviations, codes and symbols, telegraphese, …	plain language
symbolic value, Bible-status	no status

31.4 The Future of Lexicographers: Joining Forces with Big Data

During his keynote speech at the EURALEX 1998 congress, Gregory Reference 720Grefenstette, Fontenelle, Hiligsmann, Michiels, Moulin and TheissenGrefenstette (1998) famously asked, “Will there be lexicographers in the year 3000?” Inasmuch as lexicographers are the people who compile dictionaries, the fifty-odd “past ⇔ future oppositions” presented in the five subsections of Section 31.3 should lead to a clear answer because the question can be rephrased as whether or not there will still be dictionaries in the year 3000. Sadly, the answer is obvious: apart from legacy dictionaries, no doubt all of them digital or retro-digitized, no new dictionaries as we know them today will be found a thousand years hence or even a hundred. Unless, of course, both the concept of a dictionary and the job of a lexicographer are redefined. This may be compared with the word chauffeur for a person driving wealthy people around. Initially, a big part of the job of a chauffeur was actually both to stoke a steam engine and keep it running, with the word chauffeur derived from the French verb chauffer ‘to heat.’ Today’s electric cars only use power from the grid, no heat whatsoever is produced in the car, but chauffeur stuck. A decade ago, at the eLex 2011 conference, Michael Rundell, Erin McKean, Adam Kilgarriff, and others got ahead of themselves and openly wondered: “Will there still be dictionaries in 2020?” (see https://videolectures.net/elex2011_bled/, Round Table). The year 2020 came and went, and there are still (some) dictionaries, and there are still (some) lexicographers. That raises this question: Reconsidering the fifty-odd oppositions, is there anything common across them? And if so, can the commonalities be joined together and brought back to the current concept of a dictionary?

In Reference de Schryverde Schryver (2003, 146), I suggested viewing digital dictionaries as “collections of structured electronic data that can be accessed with multiple tools, enhanced with a wide range of functionalities, and used in various environments.” Laurence Urdang, the icon of early American dictionary making with computers, sincerely hoped that was not my definition of a digital dictionary (personal communication, email June 18, 2003), perhaps because an important feature is missing: a focus on the lexicon. But that was by design, as I have always felt uneasy with the artificial distinction made since Reference BloomfieldBloomfield (1933, 274): “The lexicon is really an appendix of the grammar, a list of basic irregularities.” But point taken – Urdang was right: what is common and what defines dictionaries and lexicographers is that they focus on lexical analysis, on meaning, and that they do so by studying and cataloging actual uses of words, collocates, phrases, and much larger chunks of text – these days as seen in unimaginably large quantities in corpora and thus with the help of computers to number-crunch the data and present humans with summaries. Corpora can be raw, but one obtains better results after lemmatization, part-of-speech tagging, and so on. In addition to not explicitly mentioning the “lexicon” nor “lexical analysis,” my earlier characterization of digital dictionaries did not even refer to “language,” whether in its oral or written form – that was by design. With my generic “electronic data” I wished to refer to the bits and bytes, the zeros and ones, of any type of communication, even of the non-verbal type, in any type of medium.

Surely, in future dictionaries one will be able to access the data via signing, smell or touch, with possibly only images, audio, or video in return. In such dictionaries, there is no lexis, no language. Yet information and thus meaning is still conveyed, and the actions still have to be decoded and thus defined. That is the task of a lexicographer, even a future one; that is the task of a dictionary, even a future one. Search the dictionary of the future merely by raising your middle finger toward it, to hear that your sign is used to express displeasure but also that it is rude to signal like that, or to have your dictionary automatically fill in its Unicode character (U+1F595) in the code you are writing. Place your dictionary next to the dish you are about to eat in a Mumbai restaurant and have it pick up the aromas to inform you about the spices and herbs it contains or to present you with pictures of the plants from which the spices and herbs are sourced (Reference HanksHanks 2012, 81). Or tap away a Russian folk tune on your dictionary, and hear and see how Beethoven recomposed it into his Razumovsky string quartets (Reference FerragutoFerraguto 2014; Reference KumarKumar 2020). That is the future of our dictionaries.

Future dictionaries, then, will have more in common with today’s search engines, e-commerce platforms, mobile apps, social networks, personal computers, and (micro)blogging sites, than with current dictionaries. But will future lexicographers also have more in common with today’s data scientists who already work at the companies developing such tools than with current lexicographers? If we are not attentive, if we continue to sleep at the wheel, our profession will indeed be gobbled up by them. So what can we offer them that they now miss? Answer: our craft – as in The Art and Craft of Lexicography (Reference LandauLandau 2001). We are still better than machines (and will likely always remain better) in taking computer-generated summaries on how language works and on how language is used, to derive meaning from that (Reference Hanks and CorréardHanks 2002; Reference Rundell and CorréardRundell 2002). Lexicographers will always better understand lexis, language use, and all types of meaning, and be better equipped to interpret and synthesize it, than people in any other profession, and certainly better than machines, no matter the sophistication of the machine learning or the artificial intelligence (AI) involved. (With the recent arrival of generative AI chatboxes like ChatGPT, I am inclined to want to revise this statement. I am now of the opinion that the future-future has arrived, also in lexicography. For the impact on dictionaries, see de Schryver 2023.)

With characteristic insight, Sue Reference Atkins, Kiefer, Kiss and PajzsAtkins (1992a, 519) once pointed out: “Lexicographers have to be born before they are made.” This led Edward Finegan to observe: “Her statement haunted me a bit as discussion of lexicographical automation may suggest that lexicographers don’t even need to be made – don’t need to exist – so long as the AI folks can finesse their algorithmic magic” (personal communication, email September 10, 2021). I hope to have shown that the Big Data technicians will never be able to supplant us, as long as we continue to convince their employers that they need our skill set, our know-how, and our scientific research.

31.5 Epilogue: A Fact-Based Extrapolation of the Future

Rather than jump right into “the future of dictionaries” to describe that future in medias res (for, frankly, how can one jump into the future when that future has yet to occur?), I have opted not to hypothesize about the future of dictionaries at all but to base all my claims on facts that can be observed today, and to extrapolate from them.Footnote ² What this approach demonstrated about dictionary making is how a real person, who was alive when some readers of this book were (for some time) also alive, changed the course of lexicography toward the future. So lexicographers make the future. By opening and closing with Sue Atkins, this chapter not only fleshes out a recent super-lexicographer but also articulates her network: her epithets for her colleagues show how linguists, lexicographers, and computer scientists work together in developing new approaches to the lexicographer’s work, with the aim of producing radically different types of reference works. Ironically, the core element of Sue Atkins’ vision of the “virtual dictionary,” uttered a quarter century ago, namely that the information provided will exist only at the time of consultation, is now everywhere (think for instance “map,” and see how it forms around you, on the screen in the palm of your hand, with you at the center of it all at all times, even as you move through space and time, as it indeed changes with time) – it is ubiquitous, yet it passed lexicography by. Alphabet (what’s in a name?) did it for general and factual information, Amazon for shopping, Apple for mobile assistants, Meta for social networking, Microsoft for personal computing, and X (formerly Twitter) for micro-blogging – truly, data is the new oil (The Economist 2017) – but Lexicography is not yet there. Rather than have the Big Data companies attempt to do our work, we should convince them that they will be better off by joining forces with Lexicography. Lexicographers have something unique to offer – it is in our DNA; only we know how to get to the bottom of how language works, only we have the craft to derive meaning from usage, and only we are able to synthesize this – beautifully – for both human and machine consumption, in any number of ways.

In future, our lexical analyses will not be wrapped in dictionaries; they will disappear from view to become data among ever more data, in a Linked Data network. But if done right, they could become the “crown data” for which the Big Data companies will be willing to pay good money, as with our crown data their tools and products will both be and sell better than they do today. Lexicographers should therefore continue doing what we already do: trust data, lots of it; have machines do the heavy lifting in pre–processing; then come in for the invaluable finishing conceptual and lexical touches. Given that we have now given up on the idea of the need to work face-to-face, future all-virtual lexicographic teams will become ever more international, will incorporate ever more varied but complementary skills, will work on ever more languages and language varieties, and will store their data and analyses in and license them from the cloud – all so that lexicographers will be able to serve users and machines with their ever-growing lexical needs.

Steve Jobs, co-pioneer of the personal-computer revolution and co-creator of a range of i-tools, is known to have said: “A lot of times, people don’t know what they want until you show it to them” (Reference RatcliffeRatcliffe 2018). That should also be the guiding principle of future lexicographers (and metalexicographers): stop asking dictionary users what they want; just make the best possible dictionary tools; and study how they are used. Then share your research results online in continuously growing repositories, approached and searched with the help of increasingly smarter access and summarizing software.

There is, then, a clear path toward future dictionaries and lexicographers. If we seize the moment, both may thrive going forward.

Book contents

Part VI - The Future of Dictionaries

Information

Part VI The Future of Dictionaries

Chapter 31 The Future of Dictionaries

31.1 Prologue: On Visionaries and Their Visions

31.2 The Past: The Last Half Century Through the Eyes of Sue Atkins

31.3 The Future of Dictionaries: Thinking Out of the Box

31.3.1 The Dictionary-Making Process

Table 31.1 The dictionary-making process: From real world to born digital

31.3.2 Supporting Tools and Concepts

Table 31.2 Supporting tools and concepts: From amateurism to professionalism

31.3.3 Appearance of the Dictionary

Table 31.3 Appearance of the dictionary: From physical to intangible

31.3.4 Facts About the Dictionary

Table 31.4 Facts about the dictionary: From old to new

31.3.5 Image of the Dictionary

Table 31.5 Image of the dictionary: From authoritative to self-effacing

31.4 The Future of Lexicographers: Joining Forces with Big Data

31.5 Epilogue: A Fact-Based Extrapolation of the Future

Footnotes

Accessibility standard: Unknown

Why this information is here

Accessibility Information

Book contents

Part VI - The Future of Dictionaries

Information

31.1 Prologue: On Visionaries and Their Visions

31.2 The Past: The Last Half Century Through the Eyes of Sue Atkins

31.3 The Future of Dictionaries: Thinking Out of the Box

31.3.1 The Dictionary-Making Process

Table 31.1 The dictionary-making process: From real world to born digital

31.3.2 Supporting Tools and Concepts

Table 31.2 Supporting tools and concepts: From amateurism to professionalism

31.3.3 Appearance of the Dictionary

Table 31.3 Appearance of the dictionary: From physical to intangible

31.3.4 Facts About the Dictionary

Table 31.4 Facts about the dictionary: From old to new

31.3.5 Image of the Dictionary

Table 31.5 Image of the dictionary: From authoritative to self-effacing

31.4 The Future of Lexicographers: Joining Forces with Big Data

31.5 Epilogue: A Fact-Based Extrapolation of the Future

Footnotes

Accessibility standard: Unknown

Why this information is here

Accessibility Information

Save book to Kindle

Save book to Dropbox

Save book to Google Drive