To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This paper presents a corpus-based investigation of Latin volo ‘to want’, arguing that it exhibits previously overlooked reportative uses from at least the 1st century BCE, whereby speakers attribute beliefs, opinions, or statements to an external source. Focusing on third-person present-tense forms (vult, volunt) across a corpus spanning from the 3rd century BCE to the 2nd century CE, the study analyses the semantic, pragmatic, and morphosyntactic properties of these constructions, as well as their diachronic development. Reportative volo is shown to emerge from ambiguous contexts where volition and doxastic stance overlap – especially in small-clause constructions with subject coreferentiality or passive infinitives of verbs of opinion. Diachronically, it is proposed that the doxastic component – implicit in volitional uses and anchored in the volitional subject – becomes explicit, when the anchoring of an external doxastic source shifts from outside (i.e. the opinion of others) to the volitional subject, who is then reinterpreted as an evidential source. Comparisons with German wollen (and to a lesser extent with French vouloir) contextualise this development within a broader grammaticalisation path from volition to evidentiality. While wollen is already grammaticalised as a reportative marker, Latin volo offers novel diachronic and structurally distinct evidence for this cross-linguistic trajectory.
The opening chapter provides a historical overview of Taiwanese Southern Min (TSM), tracing its development through the convergence of Zhangzhou and Quanzhou dialects. It introduces the subsequent chapters, each dedicated to specific phonological aspects: vowels, consonants, tones, syllable structure, segmental and tonal mutations, tonal domains, rhythm, and the evolving accent patterns of younger speakers, particularly the iGeneration Taiwanese Southern Min (iTSM), which represents a distinctive phonological profile.
The chapter also introduces the Taiwanese Romanization notation system alongside the International Phonetic Alphabet (IPA), the framework for data presentation throughout the study. Three robust TSM corpora, synthesized from earlier National Science Council research, provide the empirical foundation for the analysis. Statistical evaluations of the corpora support investigations into segmental transformations, tonal evolution, and prosodic patterns.
This introduction sets the stage for a comprehensive exploration of TSM phonology, encouraging readers to critically engage with the evidence and form independent interpretations. It prepares readers for a nuanced journey into the complexities of TSM phonology in the chapters ahead.
This chapter argues that the interpretation of the dialogue should not be constrained by its relationship to the Apology, as has often been done, and that its chronological place among the dialogues is uncertain. The dialogue should be interpreted in its own terms.
Corpus linguistics involves compiling and examining authentic samples of everyday communication. Chapter 11, written by Phoebe Lin, explores its significance for L2 teaching and learning. It provides an overview of its goals and methods, and presents recent advances in corpus linguistics for the L2 classroom. This chapter demonstrates new and user-friendly online corpus tools for various L2 teaching and learning contexts, including vocabulary analysis, error correction, and writing idea suggestions. It also summarises findings about data-driven learning (DDL), a specialisation in corpus linguistics that explores best practices when incorporating corpus use inside and outside the classroom. A number of essential and practical questions are addressed, including the class time required for teaching concordancing, the types of learners who will particularly benefit from concordancing training, how to select concordancers based on learner needs, factors to consider when planning lessons involving concordancing, and so on. The chapter concludes by discussing the role of teachers in implementing corpus and concordancing training in the classroom.
Southern Min – the most commonly spoken variant of Taiwanese – has over 100 million speakers. This book provides the first comprehensive analysis of Taiwanese Southern Min (TSM) phonology, filling a critical gap in linguistic research. It demonstrates how the language's sound patterns have evolved over time, and explores its key phonological and tonal features. Beginning with an overview of the language's phonological system, it progresses to specialized topics, including segmental and tonal mutations, tonal domains, and metrical structures. Grounded in three purpose-built corpora, it integrates empirical data and statistical analyses to illuminate phonological processes and patterns. It also explores rarely addressed topics, including phonological interfaces, the rhythms of poetry and folk ballads, and the iGeneration dialectal variety, providing analytical clarity on complex phenomena. Serving as both a detailed reference for researchers and a supplementary text for phonology and Asian linguistics courses, its illuminating insights will inspire further research into this intricate linguistic system.
The present study uses probabilistic models of corpus data in a novel way, to measure and compare the syntactic predictive capacities of speakers' of different varieties of the same language. The study finds that speakers' knowledge of probabilistic grammatical choices can vary across different varieties of the same language and can be detected psycholinguistically in the individual. In three pairs of experiments, Australians and Americans responded reliably to corpus model probabilities in rating the naturalness of alternative dative constructions, their lexical-decision latencies during reading varied inversely with the syntactic probabilities of the construction, and they showed subtle covariation in these tasks, which is in line with quantitative differences in the choices of datives produced in the same contexts.
This chapter describes ongoing corpus-based research on representations of Islam in the British press. The study involved building large corpora of newspaper articles about Islam and/or Muslims and using techniques like collocation and keywords to identify patterns of representation as well as differences between newspapers and change over time. The chapter outlines some of the key findings of the research as well as describing the various impact activities that were carried, and the challenges these presented. This includes working with a number of groups (ENGAGE, MEND, the Centre for Media Monitoring), presenting our work at the Labour Party Conference and in Parliament, as well as giving talks in mosques. We also detail how our project resulted in the creation of additional collections of newer corpora, enabling further examination of how representations have changed over time.
This article revisits the diachrony of the genitive alternation, the alternation between ’s and prepositional phrases headed by of in Present-Day English. It is usually assumed to have developed around 1400 CE. For Old English (c. 650–1000 CE), a different alternation between pre-modifying and post-modifying genitive-case-marked noun phrases is suggested to be the genitive alternation. Building on descriptions of competition between genitive-case-marked noun phrases (gen) and prepositional phrases with of (of) in Old English, and unpicking some of the preconceptions about the alternation in Old English, we propose a bottom-up method for systematically identifying possible alternation between of and gen in the York–Toronto–Helsinki Parsed Corpus of Old English Prose (Taylor et al. 2003). Our findings indicate that there is plausibly an alternation in Old English that stands in continuity with Present-Day English and suggest a more complex diachrony for the alternation characterized by continuity and discontinuity in the alternants and the envelope of variation.
Tagalog adjectives and nouns variably occur in two word orders, separated by an intermediary linker: adjective-linker-noun versus noun-linker-adjective. The linker has two phonologically conditioned surface forms, -ng and na. This article presents a large-scale corpus study of adjective/ noun order variation in Tagalog, focusing in particular on phonological conditions. Results show that word-order variation in adjective/noun pairs optimizes for phonological structure, abiding by phonotactic, syllabic, and morphophonological well-formedness preferences that are also found elsewhere in Tagalog grammar. The results indicate that surface phonological information is accessible for word-order choice.
This report describes a new research resource: a searchable database of 4,700 naturally occurring instances of sluicing in English, annotated so as to shed light on the questions that have shaped research on ellipsis since the 1960s. The paper describes the data set and how it can be obtained, how it was constructed, how it is organized, and how it can be queried. It also highlights some initial empirical findings, first describing general characteristics of the data, then focusing more closely on issues concerning antecedents and possible mismatches between antecedents and ellipsis sites.
This article uses formal and usage-based data and methods to argue for a hybrid model of English tensed auxiliary contraction combining lexical syntax with a dynamic exemplar lexicon. The hybrid model can explain why the contractions involve lexically specific phonetic fusions that have become morphologized and lexically stored, yet remain syntactically independent, and why the probability of contraction itself is a function of the adjacent cooccurrences of the subject and auxiliary in usage, yet is also subject to the constraints of the grammatical context. Novel evidence includes a corpus study and a formal analysis of a multiword expression of classic usage-based grammar.
This article focuses on French espèce de + NP! ‘you + NP!’ to make a case that impoliteness can be conventionalized in linguistic form beyond the level of the lexicon. We argue that the pattern can be considered a construction in its own right and also that it is strongly conventionalized for impoliteness in particular. To support this claim, we adopt both a corpus-based and a questionnaire-based approach. The corpus study reveals not only that espèce de + NP! mainly serves impolite purposes in actual usage but also that it tends to force an impolite interpretation onto noun phrases that do not themselves express negative evaluation. Our questionnaire study complements these findings by showing, inter alia, that the construction is generally judged to be ill-formed when combining with positively evaluative or evaluatively neutral nouns and, at the same time, that such nouns are indeed rated as impolite in the construction. It also points to a difference between calling someone espèce d’idiot! ‘you idiot!’ and calling them just idiot!. We conclude the article with some reflections on why espèce de + NP! is an impoliteness construction.
This article presents a dictionary-based study of vowel reduction and preservation in British English in initial pretonic position and intertonic position. The different variables which have been claimed to influence those processes are tested on a data set of over 4,500 words using regression analyses. Our results confirm the significant effects of syllable structure, position of the vowel, word frequency and opaque prefixation. They also provide weak evidence for other factors such as vowel features and the existence of a base in which the vowel bears a stress, although no clear effects of word segmentability could be found. We also report new findings, as we find that foreign words reduce less than non-foreign words; we find that [+back] vowels reduce less than [−back] vowels in initial pretonic position; and we find a difference in behaviour for vowels followed by /sC/ clusters between non-derived words and stress-shifted derivatives.
This paper investigates the nonstandard use of first‑person singular pronouns (myself and I) in coordinate constructions, such as John and I or John and myself. Native English speakers frequently disregard prescriptive grammar rules by using subject or reflexive forms in place of object forms in sentences like Give those papers to John and I. The frequency of such nonstandard usage raises questions, such as when and why speakers substitute nominative or reflexive pronouns for object pronouns in coordinate constructions, and what evidence exists for the existence of fixed constructions like X and I or X and myself. To address these questions, the study analyzes data from the Corpus of Contemporary American English (COCA). Findings provide strong evidence for the existence of an X and I construction in that the nonstandard form is common after the coordinator but not before. Evidence for an X and myself construction is weaker, since untriggered reflexives also appear outside coordinate constructions. First‑person singular forms are more likely to appear in hypercorrect and untriggered forms that other pronouns. The research suggests that X and I may be stored in a chunk, possibly due to overgeneralizations resulting from prescriptive corrections during language acquisition.
We present a new corpus of child and child-directed speech (CDS) in Palestinian Arabic. It includes transcriptions following the CHILDES guidelines and features recordings of 16 monolingual Palestinian Arabic-speaking children with an age range of 19–58 months and their adult interlocutors. We analyse the children’s morphosyntactic development and identify a variety of target word orders (45 in child speech, 50 in CDS), with prevalent SV(O) structures; we also found high rates of null subjects in both populations, marginal errors in children’s verbal agreement morphology, and early emergence of serial verb constructions, observed from 23 months of age.
Chapter 3 examines the consanguinity of Ovid’s two bodies, or corpora: his body of work (his textual corpus) and his physical body, which here represents his living body, corpse, tomb and biographical life. Medieval commentators took great interest in the relationship between Ovid’s bodies, responding diversely to the opportunities – and challenges – posed by Ovid’s insistent focus on the relationship. Their responses illuminate the mechanisms by which Ovid was transformed from an immoral, salacious poet to a moral, edifying one. A surprising element of that metamorphosis is that the pagan Ovid became a justifiably Christian poet for the medieval age. The chapter discusses Ovid’s presentation of his corpora in the exile poetry and the medieval obsession with Ovid’s tomb, before focusing on three medieval case studies: the Nolo Pater Noster anecdote, a medieval Latin narrative where two clerics are visited by the spirit of Ovid; Guillaume de Deguileville’s Le pèlerinage de la vie humaine and John Lydgate’s English rendering of the text, The Pilgrimage of the Life of Man, where a figure on pilgrimage encounters Ovid’s exilic revenant; and Christine de Pizan’s Le livre de la cité des dames, in which Ovid is resurrected only to be castrated.
This study reflects on Japan's language policy, focusing on the government‑led proposals implemented in 2006, which suggested replacing loanwords with Japanese equivalents, known as Gairaigo Iikae Teian ‘proposals for replacing loanwords’. By investigating English loanwords, this article explores the impact of English on Japanese vocabulary, while providing insights into the practical implementation of the government-led language policy in Japan for a broader global audience. It also clarifies that the objective of the proposals was not to strictly regulate the use of English loanwords but to offer suggestions, with replacement as one strategy to improve communication, especially when disseminating information through government agencies and media organisations. Through a quantitative investigation on the usage of English loanwords in the media, the results reveal that the overall number of media articles containing the loanwords in the proposed list has increased over the last 30 years. The findings also confirm that loanwords and their Japanese equivalents are not in competition, with one replacing the other. Instead, their usage exhibits a parallel trend in both frequency and increase rates.
This chapter provides an overview of corpus-based advances in Construction Grammar. After a brief introduction on kinds of data in linguistics in general and the notion of corpora in particular, I discuss a variety of corpus-based studies categorized into (i) largely qualitative studies, (ii) studies based on frequencies and probabilities, (iii) studies focusing on association strengths, and (iv) statistical as well as machine-learning studies. In each section, representative studies covering a variety of languages and questions are covered with an eye to surveying methodological as well as theoretical advantages. I conclude with an assessment of the state of the art by comparing how recent developments fare relative to Dąbrowska’s discussion of Cognitive Linguistics’s seven deadly sins.
This article explores the language of social media by analyzing a selection of linguistic features in four corpora of Swedish social media available at Språkbanken Text: Blog mix, Familjeliv, Flashback, and Twitter. Previous research describes the language of these corpora as informal, spoken-like, unedited, non-standard, and innovative. Our corpus analysis confirms the informal and spoken-like nature of social media, while also showing that these traits are unevenly distributed across the various social media corpora and that they are also present in other traditional written corpora, such as novels. Our findings also reveal that the social media corpora show traits of involved and interactional language.
Do different emotion terms trigger different metaphorical conceptualizations of emotions? What are the effects of the discourse context of the genre on metaphor choice in the conceptualization of emotion concepts? Finally, are such lexical and discourse–contextual effects on emotion-targeted metaphor choice quantifiable? Prior discourse-oriented research has demonstrated from a largely qualitative perspective that metaphor use is dynamic and sensitive to discursive contextual variables (e.g., Deignan et al., 2013; Semino 2010, 2011; Semino et al., 2013; Dorst 2015; Caballero 2016; Knapton & Rundblad, 2018). In the present study, these questions are addressed from a corpus-based multivariate perspective, where detailed qualitative analysis of found examples is combined with quantitative modeling. The study examines negative self-evaluative emotions in English, operationalized through their two nominal exponents, i.e., shame and embarrassment, as attested in the discourse context of three genres – fiction, magazine and spoken TV language. The data are first analyzed qualitatively for relevant contextual variables and then modelled quantitatively. The results demonstrate that while both lexical and genre effects are observed in metaphor choice in the conceptualization of negative self-evaluative emotional experience, their combined effect should also be accounted for, as these two variables are found to interact with each other.