Special issue on spoken language in time and across time: introduction

The idea of this special issue on Spoken language in time and across time emerged at an international symposium on this topic that we organised at Lund University on 20 September 2019. The purpose of the symposium was to celebrate important past and present achievements of spoken language research as well as past and present corpora available for such research. Some speakers reported on academic and technical advances from the past, while others offered information about state-of-the-art research on spoken language and spoken corpus compilation. Our idea with the symposium was also to bring together early career scholars, somewhat more senior scholars as well as senior scholars – the latter actually active when interest in spoken language and spoken corpus compilation was in its infancy. The type of spoken corpora in focus extended from the world's first publicly available, machine-readable spoken corpus, The London–Lund Corpus of Spoken English (Svartvik 1990), nowadays referred to as LLC–1, through to the spoken parts of The British National Corpora (BNC) from 1994 (BNC Consortium 2007) and 2014 (Love et al. 2017), The Diachronic Corpus of Present-Day Spoken English (DCPSE) consisting of LLC–1 and the British component of The International Corpus of English (ICE–GB), Santa Barbara Corpus of Spoken American English (SBCSAE) (Du Bois et al. 2000–5), The Corpus of Contemporary American English (COCA) (Davies 2008–) and finally the most recent one, The London–Lund Corpus 2 (LLC–2) (Põldvere, Johansson & Paradis 2021a). The symposium thus covered approximately half a century of data from publicly available corpora compiled for multipurpose use by the academic community for research on spoken English in different contexts.

Compared to written language, research on spoken language is very limited. We therefore think that time is ripe to encourage researchers to work on spoken language to raise the level of our current knowledge about this modality, the special conditions associated with it, its different social and geographical variants and its comparison with the written or signed modalities. We foresee new research efforts in both linguistically and psychologically oriented approaches trying to tackle the challenges connected with the spoken language medium, such as the motivations and mechanisms involved in synchronic variation and diachronic change with respect to language use, meaning-making, grammar, prosody, information structure and how speakers behave and interact with one another. We also foresee research on phenomena that are either specific to speech or at least more salient for one and all in spoken communication than in written production such as timing in speech, fillers, pauses, turn-taking, overlaps, laughter, mumbling, slips of the tongue and the ear, and interlocutor uptake. This special issue contains original research on spoken English by some of the participants at the symposium and their various collaborators, and the articles explore some of the above-mentioned topics using multipurpose corpora.

The importance of knowledge about spoken communication in the wild
Knowledge about how we actually communicate in spoken contexts is of utmost importance not only for basic research on speech as a phenomenon in itself but also for successful communication in both professional and private contexts. What are the language resources that are recruited for efficient and smooth communication? What are the temporal flow and the necessary gaps, overlaps and hesitations that constitute natural speech activities? What are the features that facilitate successful meaning negotiation? Knowledge about such foundational aspects is obviously of importance in the language sciences but more and more so also for disciplines where human beings are at the centre, such as psychology, clinical health research, human-computer interfaces, Artificial Intelligence, political science and education.
Spoken corpus data are necessary for explorations of natural language production, speaker uptake and speaker behaviour on the occasion of use in real time. They are also indispensable for the creation of stimuli and hypothesis testing for many types of experiments that take an interest in how communication through the spoken medium is produced, managed, processed and comprehended. In a great deal of today's experimental research, there is a demand for more ecologically valid data and more production data. For instance, the temporal unfolding of spoken language allows speakers to make predictions about their interlocutors' intentions and about where different expressions and structures are heading. Investigations of such aspects allow researchers to theorise about the socio-sensory-cognitive mechanisms and motivation in both planned and impromptu speech, in formal and informal contexts and in monologue and dialogue. This demonstrates that spoken corpora are resources for many more types of methodologies and approaches than what might fall within traditional areas of corpus linguistics, and it is perhaps not an exaggeration to say that spoken corpora are essential for the scientific ecosystem in the language sciences more broadly today.
Looking back, it seems correct to say that what was absolutely groundbreaking with the advent of the first spoken corpus, LLC-1, was not that speakers were recorded, and that it was possible for researchers to listen to their speech, but that different types of communicative situations were identified and systematically recorded, transcribed and made available to researchers in searchable, machine-readable form. In particular, the availability of recordings of spontaneous face-to-face conversation was something extraordinary, and analyses of how spoken discourse unfolds became eye-openers and important sources of inspiration for innovative researchers of the time. Hovering in the air were obviously questions posed by more traditionally disposed scholars: would mundane, everyday conversation among speakers really be worthy of academic research? Time has shown that such research is not only worthy of scientific research but in fact essential for a deeper and broader understanding of how speakers make use of language resources and how they behave and interact in different communicative situations. However, since the compilation, transcription and digitalisation of spoken data are very time-consuming undertakings, the rather dismaying fact is that there are, even today, only a limited number of spoken corpora available for research by the academic community. The simple reason for that is that the more thoroughly the corpus data are processed, the more time it takes to complete the work and make them available. For some research purposes, rough-and-ready corpora are enough, but for many fine-grained and sophisticated purposes, the more thoroughly processed ones with careful annotations and time-stamped sound files are invaluable.
The time of the launch of LLC-1 was the starting point of researchers' interest in spoken language in the wild. As the name of the corpus indicates, LLC-1 derives from two projects, one in London and the other one in Lund, namely the Survey of English Usage (SEU), launched in 1959 by Sir Randolph Quirk at University College London, and the Survey of Spoken English (SSE), launched by Jan Svartvik at Lund University in 1975. Part of LLC-1, with spoken data in computerised form from the 1950s to 1980s, was first made available in A Corpus of English Conversation (Svartvik & Quirk 1980). During the same time in the US, work started on the development of a spoken American corpus, the Santa Barbara Corpus of Spoken American English (Du Bois et al. 2000-5). This was also the time of the release of some important works with influential insights into spoken communication. A great number of seminal journal articles on spoken language had already been published in the 1960s by Wallace Chafe, culminating in his book with the title Discourse, consciousness, and time: The flow and displacement of conscious experience in speaking and writing (1994). Also influential were Herbert Clark's books Arenas of language use (1992) and Understanding language (1996).
Of particular interest in all existing spoken corpora are the recordings and the transcriptions of spontaneous face-to-face conversation. This is probably the type of data that has been most rewarding for new explorations and new insights into human 451 SPECIAL ISSUE ON SPOKEN LANGUAGE IN TIME AND ACROSS TIME communication. Spontaneous conversation can be seen to mirror social action in two important ways. On the one hand, it is in constant flux; that is, it is dynamic, distributed, adaptive and intersubjective in nature. On the other hand, it is intrinsically multimodal in that gestures, pointing, eye gaze and body movements are always present. A large part of people's interaction with others involves describing what our experiences with the world are and how our thoughts are shaped by our experiences. Unfortunately, the spoken corpora mentioned in the introduction do not include video recordings; these would have been excellent but again would have been even more time-consuming to collect, transcribe and annotate for corpora of the size of even the smallest of our spoken corpora. Multimodal corpora that include both sound and video exist and are common practice in Conversation Analysis (e.g. Pomerantz 2021), but they are much more limited in size, accompanied by extremely detailed annotation and generally not open for use by other researchers, unlike the corpora discussed here. What the spoken medium (monologue and dialogue, private and public, planned and impromptu) in those corpora contributes to communication research is the sound side of language including pronunciation of lexical items and prosodic patterns, both of which carry with them meanings of crucial importance and many clues to speaker identity and to the interactive management of dialogue in which production and comprehension are closely intertwined.
As pointed out by Clark (1996), human communication in general and spoken interaction in face-to-face conversation in particular share a lot of traits with ballroom dancing. This is a very apt comparison; both activities require a system of predictions, joint attention, joint activity and a certain amount of flexibility and negotiation across turns. Maybe the dancing simile can also be extended to a comparison between multiparty conversation and dancing in groups rather than just ballroom dancing in pairs. However, this does not change the gist of the comparison since that too requires the same kind of characteristics, albeit maybe not always at the same level of precision. For a successful outcome of both dialogue and dancing, the participants form part of a joint cooperative activity of taking and giving. They construct a common ground, that is, a space of discourse-relevant facts and behaviour (Clark 1996). This shared space is a working space where psychological processes construct and maintain common ground as the conversation unfolds (Pickering & Garrod 2021). It deserves to be mentioned here too that common ground is by no means restricted to dialogic contexts but is also necessary for planned monologues; speakers never speak in a vacuum, but they speak to someone, about something, for a certain reason and with certain intentions.
Most types of dialogic situations are, however, different from monologues in that the former are in a constant flux and the interlocutors must be flexible and adaptive in the interactive work of upholding the joint activity in the negotiation of meanings and intentions, and also they must make the most of timing their contributions. In this shared ground any one of the interlocutors may take the lead and change the direction of the conversation. All dialogic communication and language use are embedded in social conventions that apply in any given situation, ultimately ideally governed by a cooperative principle (Grice 1975). Dialogue constitutes a big challenge to the study of 452 CA RI TA PAR AD IS ET AL.
meaning making since it is distributed across speakers and utterances (Linell 2009;Levinson & Torreira 2015). Traditionally, corpus work has been concerned with the study of language use across speakers, sometimes from a sociolinguistic angle with reference to age, gender and place (e.g. Gardner et al. 2020), but also with focus on language use per se without reference to social aspects but just to various usage-based patterns of grammar, semantics or prosody. The same holds for more psycholinguistic studies of spoken language, where the focus primarily has been on comprehension and where comprehension and production have been held apart (but see Põldvere & Paradis (2020); Põldvere, Johansson & Paradis (2021b) which, using data from LLC-2, focus instead on interactive processes and collaborative behaviour). Thus, the more recent corpora of spoken language such as BNC2014 and LLC-2 may be important sources in marking a shift to studies of more interactive functions of language because for research on interactive patterns, we need corpora of manageable sizes, annotated for detailed communicative phenomena along the temporal flow of speech and/or released with the original sound files of speech in real time.

In-time and across-time contributions
The articles included in this special issue all relate to spoken language in time and across time in one way or another. There are contributions that report on aspects of time in the sense of diachronic explorations of various phenomena across time, and in the sense of the temporal unfolding of speech in real time. The first article, by Nele Põldvere, Victoria Johansson and Carita Paradis, entitled 'On The London-Lund Corpus 2: Design, challenges and innovations', describes the rather cumbersome process of recording different types of spoken discourse in different contexts, transcribing them and providing them with mark-up and annotation, aligning the transcriptions to the sound files and finally making a new corpus of spoken language, the LLC-2, available for use by researchers of the academic community. Speech is a transitory substance that is processed as we go along, leaving memories of what we just happened to pay attention to at a particular point in time. LLC-2 provides us with real-time data of contemporary spoken language, and it gives researchers the opportunity to go back and forth and listen to the recordings. This allows for the freezing of temporal transitoriness, leaving a permanent record of what otherwise is being processed in real time, and the process of compiling the real-time data has turned into a product as it were. Moreover, the design of LLC-2 is closely modelled on the design of LLC-1. This feature is particularly important because it makes it possible for researchers to carry out principled diachronic studies of spoken language use by speakers of British English, using the two sister corpora LLC-1 and LLC-2. The next article, by Charlotte Bourgoin, Gerard O'Grady and Kristin Davidse, deals with the temporal unfolding of spoken English with real-time focus. The title of the article is 'Managing information flow through prosody in it-clefts'. Drawing on data from LLC-1, the authors address the issue of how speakers manage the flow of information in natural conversation in specificational it-clefts by balancing grammatical 453 SPECIAL ISSUE ON SPOKEN LANGUAGE IN TIME AND ACROSS TIME and prosodic resources. Clefts allow speakers to emphasise certain elements of an utterance both through the grammatical construction as such and through the prosodic marking of important information. The authors show that speakers have considerable freedom to decide how to portray prominence using it-clefts. They confirm existing information structure typologies of the syntactic constituents of cleft constructions: new-given, new-new and given-new, and in addition they find a pattern that has not received attention in research on it-clefts, namely given-given; this pattern is the second most common structure in their data. From the point of view of prosody, they show that it-clefts always have a high onset, which signals how the upcoming utterance relates to the expectations that the addressee may have formed based on the previous discursive context. The high onset creates a contrast that communicates that what is said in it-clefted utterances falls foul of the expectations. Jointly, these structural, grammatical and prosodic resources give rise to a range of different possibilities of informational prominence and thereby make it-clefts a particularly effective means of responding to communicative needs and shifting goals in real time.
The third article, by Gunnel Tottie, is a detailed corpus study of not-negated utterances with indefinite complements. The title of the contribution is 'Not-negation revisited: Variation between a and any in verb complements in contemporary spoken American English'. Like the previous article, this is a synchronic study, but this time of American English, on the basis of the spoken part of The Corpus of Contemporary American English (COCA SPOK). It is a microanalysis of variation with a focus on the use of the indefinite determiners a and any with nouns in verb complements in over 21,000 not-negated utterances. The received view in major reference grammars is that singular count nouns in verb complements take the indefinite article, while any is used with singular non-count and plural nouns. However, not much attention has been given to the contextual preferences for these two determiners in not-negated sentences. Tottie shows that, on the whole, there is very little variation. The cases with the indefinite article in the complements make up 90 per cent of the occurrences. More specifically, variation is rare in utterances with copular BE, but more common in utterances with HAVE and existential BE. Structurally speaking, it appears that both contracted uses of negation and HAVE with do-support play a role for speakers in their choice of a or any, and from the point of view of meaning, variation most often happens in combination with abstract nouns. These are two issues that have rarely been discussed in the literature.
The remaining two articles are both diachronic studies of recent change that has taken place over a period of twenty years and both of them use The British National Corpus (BNC2014 and the spoken component of BNC1994). We start with Robbie Love and Niall Curry's study of the use and development of expressions of modality, notably the canonical modal auxiliaries. The title of their contribution is 'Recent change in modality in informal spoken British English: 1990s-2010s'. They approach this topic from the observation that there are contradictory statements about the development of modality expressions in the current literature. Some researchers claim that core modal auxiliaries are in decline, while others state the opposite, namely that they are on the increase. Love and Curry focus on three different groups of forms that express 454 CA RI TA PAR AD IS ET AL. modality in English: core modal auxiliary verbs, semi-modal verbs and a sample of other items that are used as modal expressions, which they refer to as Modality Indicating Devices (MIDs). From the point of view of the frequencies of these forms, they find that core modal auxiliaries appear to be decreasing, while the other two types appear to be stable over time. But they also point out that not all the individual forms in the categories develop in the same direction; some are on the increase, e.g. could, while others are in decline, e.g. must. In addition, they also take a closer look at the internal distribution of the modal functions (epistemic, deontic and dynamic) of the core modal auxiliaries but find no statistically significant differences with respect to their distribution across time. The next diachronic study, by Susan Reichelt, is concerned with the pragmatic markers kind of and sort of in spoken language. The title of her contribution is 'Recent developments of the pragmatic markers kind of and sort of'. She too uses data from the BNC2014 and the spoken component of BNC1994. Her diachronic approach is twofold. She analyses what is going on with respect to the use of these two pragmatic markers across time as it happens within the time period, and in addition she also carries out analyses of what happens in apparent time. She conducts a detailed sociolinguistic analysis of the two pragmatic markers and their development from the point of view of syntactic contexts, age groups, gender and lifespan and shows that the use of kind of has increased considerably over a period of some twenty years, while the use of sort of has been stable and does not seem to have been affected by the increase of kind of for the same function. Her analysis shows that age is a significant predictor of the use of kind of relative to sort of. There is a significant increase of the use of kind of relative to sort of in young speakers' communication, while gender was not found to be a significant predictor of change. Reichelt also reports that syntactic context plays a role in the use pattern and distribution of kind of in that it is mostly used as modifier of nouns and adjectives rather than verbs.
Reichelt's diachronic study of pragmatic markers in spoken language and Love and Curry's study of the development of modality markers across time both point to the usefulness and informativeness of spoken corpora compiled at different points in time and even at intervals that are relatively close in time, as is the case for the spoken BNCs. We hope to see more efforts of spoken data compilation with relatively short intervals in the future to allow for investigations of current shifts and changes. Again it deserves to be mentioned that developments may not only be of a purely linguistic nature pertaining to such phenomena as grammatical structures and meanings of words and constructions, but also to how speakers behave when they communicate with one another in terms of turn-taking strategies, timing and aspects of tact and politeness; see e.g. Põldvere, De Felice & Paradis (forthcoming) on the changing practices of advice-giving and advice uptake in conversation over a period of half a century. Of crucial importance for such investigations is the existence of corpora whose design makes comparisons across time possible in a principled way, as is the case for LLC-1 and LLC-2.
The articles commissioned for this special issue are a testament to the optimism of creating more interest in the spoken medium in order to reach a better understanding of