Search

Using large corpus data for the reconstruction of cross-cultural concept differences: happiness and joy in West Slavic languages and in English
Lucie Saicová Římalová
Journal:

Language and Cognition / Volume 18 / 2026

Published online by Cambridge University Press:

11 May 2026, e32
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Happiness is a complex concept that has been intensively researched from many perspectives, but the linguistic aspects of this phenomenon are still under-researched. Using corpus-based analysis of semantically similar words (word embedding), the author studies lexical units denoting happiness and joy in three West Slavic languages (Polish, Czech, Slovak) and compares them with the corresponding lexical units in English. The results show that despite the mutual linguistic and non-linguistic ties, the Polish, Czech and Slovak understanding of happiness exhibits not only similarities (e.g. the relationship between happiness and joy and the outward orientation of joy) but also significant differences (e.g. the different value of the component ‘luck’ in happiness, a different relationship between joy, sadness and fear, and cross-cultural differences related to religion). The results also highlight similarities and differences between West Slavic languages and English. In addition to this, the study tests the advantages and limitations of the word-embedding analysis for the analysis of concepts and their culturally specific features. The author believes that the method is useful because it offers new insights into the analysed data, but it also requires human oversight and careful interpretation.

From volition to reportativity: the reportative uses of Latin volo in synchrony and diachrony (with remarks on German wollen and French vouloir)
Francesca Dell’Oro
Journal:

Journal of Linguistics , First View

Published online by Cambridge University Press:

06 April 2026, pp. 1-26
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This paper presents a corpus-based investigation of Latin volo ‘to want’, arguing that it exhibits previously overlooked reportative uses from at least the 1st century BCE, whereby speakers attribute beliefs, opinions, or statements to an external source. Focusing on third-person present-tense forms (vult, volunt) across a corpus spanning from the 3rd century BCE to the 2nd century CE, the study analyses the semantic, pragmatic, and morphosyntactic properties of these constructions, as well as their diachronic development. Reportative volo is shown to emerge from ambiguous contexts where volition and doxastic stance overlap – especially in small-clause constructions with subject coreferentiality or passive infinitives of verbs of opinion. Diachronically, it is proposed that the doxastic component – implicit in volitional uses and anchored in the volitional subject – becomes explicit, when the anchoring of an external doxastic source shifts from outside (i.e. the opinion of others) to the volitional subject, who is then reinterpreted as an evidential source. Comparisons with German wollen (and to a lesser extent with French vouloir) contextualise this development within a broader grammaticalisation path from volition to evidentiality. While wollen is already grammaticalised as a reportative marker, Latin volo offers novel diachronic and structurally distinct evidence for this cross-linguistic trajectory.

1 - Introduction
Yuchau E. Hsiao, National Chengchi University, Taiwan
Book:

The Sound Patterns of Taiwanese Southern Min

Published online:

28 February 2026

Print publication:

19 March 2026, pp 1-11
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The opening chapter provides a historical overview of Taiwanese Southern Min (TSM), tracing its development through the convergence of Zhangzhou and Quanzhou dialects. It introduces the subsequent chapters, each dedicated to specific phonological aspects: vowels, consonants, tones, syllable structure, segmental and tonal mutations, tonal domains, rhythm, and the evolving accent patterns of younger speakers, particularly the iGeneration Taiwanese Southern Min (iTSM), which represents a distinctive phonological profile.
The chapter also introduces the Taiwanese Romanization notation system alongside the International Phonetic Alphabet (IPA), the framework for data presentation throughout the study. Three robust TSM corpora, synthesized from earlier National Science Council research, provide the empirical foundation for the analysis. Statistical evaluations of the corpora support investigations into segmental transformations, tonal evolution, and prosodic patterns.
This introduction sets the stage for a comprehensive exploration of TSM phonology, encouraging readers to critically engage with the evidence and form independent interpretations. It prepares readers for a nuanced journey into the complexities of TSM phonology in the chapters ahead.

Chapter 1 - The Dialogue’s Place in the Corpus
Brad Inwood, Yale University, Connecticut
Book:

Plato's Crito

Published online:

04 March 2026

Print publication:

12 March 2026, pp 3-7
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter argues that the interpretation of the dialogue should not be constrained by its relationship to the Apology, as has often been done, and that its chronological place among the dialogues is uncertain. The dialogue should be interpreted in its own terms.

10 - Why Corpus Linguistics Matters to L2 Teaching and Learning
- By Phoebe Lin
Edited by Kathy L. Sands, Marnie Jo Petray, Slippery Rock University of Pennsylvania, Gaillynn D. Clements, Duke University, North Carolina, Lynn Santelmann, Portland State University
Book:

Linguistic Foundations for Second Language Teaching and Learning

Published online:

07 February 2026

Print publication:

05 March 2026, pp 194-213
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Corpus linguistics involves compiling and examining authentic samples of everyday communication. Chapter 11, written by Phoebe Lin, explores its significance for L2 teaching and learning. It provides an overview of its goals and methods, and presents recent advances in corpus linguistics for the L2 classroom. This chapter demonstrates new and user-friendly online corpus tools for various L2 teaching and learning contexts, including vocabulary analysis, error correction, and writing idea suggestions. It also summarises findings about data-driven learning (DDL), a specialisation in corpus linguistics that explores best practices when incorporating corpus use inside and outside the classroom. A number of essential and practical questions are addressed, including the class time required for teaching concordancing, the types of learners who will particularly benefit from concordancing training, how to select concordancers based on learner needs, factors to consider when planning lessons involving concordancing, and so on. The chapter concludes by discussing the role of teachers in implementing corpus and concordancing training in the classroom.

The Sound Patterns of Taiwanese Southern Min

Yuchau E. Hsiao
Published online:

28 February 2026

Print publication:

19 March 2026
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Southern Min – the most commonly spoken variant of Taiwanese – has over 100 million speakers. This book provides the first comprehensive analysis of Taiwanese Southern Min (TSM) phonology, filling a critical gap in linguistic research. It demonstrates how the language's sound patterns have evolved over time, and explores its key phonological and tonal features. Beginning with an overview of the language's phonological system, it progresses to specialized topics, including segmental and tonal mutations, tonal domains, and metrical structures. Grounded in three purpose-built corpora, it integrates empirical data and statistical analyses to illuminate phonological processes and patterns. It also explores rarely addressed topics, including phonological interfaces, the rhythms of poetry and folk ballads, and the iGeneration dialectal variety, providing analytical clarity on complex phenomena. Serving as both a detailed reference for researchers and a supplementary text for phonology and Asian linguistics courses, its illuminating insights will inspire further research into this intricate linguistic system.

Predicting Syntax: Processing Dative Constructions in American and Australian Varieties of English
Joan Bresnan, Marilyn Ford
Journal:

Language / Volume 86 / Issue 1 / March 2010

Published online by Cambridge University Press:

19 February 2026, pp. 168-213
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The present study uses probabilistic models of corpus data in a novel way, to measure and compare the syntactic predictive capacities of speakers' of different varieties of the same language. The study finds that speakers' knowledge of probabilistic grammatical choices can vary across different varieties of the same language and can be detected psycholinguistically in the individual. In three pairs of experiments, Australians and Americans responded reliably to corpus model probabilities in rating the naturalness of alternative dative constructions, their lexical-decision latencies during reading varied inversely with the syntactic probabilities of the construction, and they showed subtle covariation in these tasks, which is in line with quantitative differences in the choices of datives produced in the same contexts.

9 - Using CADS Research to Critique Representations of Muslims in the UK Press
- By Paul Baker, Tony McEnery
Edited by Gavin Brookes, Lancaster University, Niall Curry, Manchester Metropolitan University, Robbie Love, Aston University
Book:

Applications of Corpus Linguistics

Published online:

26 December 2025

Print publication:

22 January 2026, pp 178-195
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter describes ongoing corpus-based research on representations of Islam in the British press. The study involved building large corpora of newspaper articles about Islam and/or Muslims and using techniques like collocation and keywords to identify patterns of representation as well as differences between newspapers and change over time. The chapter outlines some of the key findings of the research as well as describing the various impact activities that were carried, and the challenges these presented. This includes working with a number of groups (ENGAGE, MEND, the Centre for Media Monitoring), presenting our work at the Labour Party Conference and in Parliament, as well as giving talks in mosques. We also detail how our project resulted in the creation of additional collections of newer corpora, enabling further examination of how representations have changed over time.

Identifying alternations in historical corpus data: the genitive alternation in Old English
Roxanne Taylor, Tine Breban, Kersti Börjars
Journal:

English Language & Linguistics , First View

Published online by Cambridge University Press:

07 January 2026, pp. 1-22
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This article revisits the diachrony of the genitive alternation, the alternation between ’s and prepositional phrases headed by of in Present-Day English. It is usually assumed to have developed around 1400 CE. For Old English (c. 650–1000 CE), a different alternation between pre-modifying and post-modifying genitive-case-marked noun phrases is suggested to be the genitive alternation. Building on descriptions of competition between genitive-case-marked noun phrases (gen) and prepositional phrases with of (of) in Old English, and unpicking some of the preconceptions about the alternation in Old English, we propose a bottom-up method for systematically identifying possible alternation between of and gen in the York–Toronto–Helsinki Parsed Corpus of Old English Prose (Taylor et al. 2003). Our findings indicate that there is plausibly an alternation in Old English that stands in continuity with Present-Day English and suggest a more complex diachrony for the alternation characterized by continuity and discontinuity in the alternants and the envelope of variation.

Phonological conditions on variable adjective and noun word order in Tagalog
Stephanie S. Shih, Kie Zuraw
Journal:

Language / Volume 93 / Issue 4 / December 2017

Published online by Cambridge University Press:

01 January 2026, pp. e317-e352
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Tagalog adjectives and nouns variably occur in two word orders, separated by an intermediary linker: adjective-linker-noun versus noun-linker-adjective. The linker has two phonologically conditioned surface forms, -ng and na. This article presents a large-scale corpus study of adjective/ noun order variation in Tagalog, focusing in particular on phonological conditions. Results show that word-order variation in adjective/noun pairs optimizes for phonological structure, abiding by phonotactic, syllabic, and morphophonological well-formedness preferences that are also found elsewhere in Tagalog grammar. The results indicate that surface phonological information is accessible for word-order choice.

The Santa Cruz Sluicing Data Set
Pranav Anand, Daniel Hardt, James McCloskey
Journal:

Language / Volume 97 / Issue 1 / March 2021

Published online by Cambridge University Press:

01 January 2026, pp. e68-e88
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This report describes a new research resource: a searchable database of 4,700 naturally occurring instances of sluicing in English, annotated so as to shed light on the questions that have shaped research on ellipsis since the 1960s. The paper describes the data set and how it can be obtained, how it was constructed, how it is organized, and how it can be queried. It also highlights some initial empirical findings, first describing general characteristics of the data, then focusing more closely on issues concerning antecedents and possible mismatches between antecedents and ellipsis sites.

Formal Grammar, Usage Probabilities, and Auxiliary Contraction
Joan Bresnan
Journal:

Language / Volume 97 / Issue 1 / March 2021

Published online by Cambridge University Press:

01 January 2026, pp. 108-150
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This article uses formal and usage-based data and methods to argue for a hybrid model of English tensed auxiliary contraction combining lexical syntax with a dynamic exemplar lexicon. The hybrid model can explain why the contractions involve lexically specific phonetic fusions that have become morphologized and lexically stored, yet remain syntactically independent, and why the probability of contraction itself is a function of the adjacent cooccurrences of the subject and auxiliary in usage, yet is also subject to the constraints of the grammatical context. Novel evidence includes a corpus study and a formal analysis of a multiword expression of classic usage-based grammar.

Espèce de linguiste! An impoliteness construction in French?
Daniel Van Olmen, Delphine Grass
Journal:

Journal of French Language Studies / Volume 35 / 2025

Published online by Cambridge University Press:

10 November 2025, e18
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This article focuses on French espèce de + NP! ‘you + NP!’ to make a case that impoliteness can be conventionalized in linguistic form beyond the level of the lexicon. We argue that the pattern can be considered a construction in its own right and also that it is strongly conventionalized for impoliteness in particular. To support this claim, we adopt both a corpus-based and a questionnaire-based approach. The corpus study reveals not only that espèce de + NP! mainly serves impolite purposes in actual usage but also that it tends to force an impolite interpretation onto noun phrases that do not themselves express negative evaluation. Our questionnaire study complements these findings by showing, inter alia, that the construction is generally judged to be ill-formed when combining with positively evaluative or evaluatively neutral nouns and, at the same time, that such nouns are indeed rated as impolite in the construction. It also points to a difference between calling someone espèce d’idiot! ‘you idiot!’ and calling them just idiot!. We conclude the article with some reflections on why espèce de + NP! is an impoliteness construction.

An empirical study of vowel reduction and preservation in British English
Part of
- Corpus Phonology
Quentin Dabouis, Jean-Michel Fournier
Journal:

Phonology / Volume 42 / 2025

Published online by Cambridge University Press:

17 September 2025, e15
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This article presents a dictionary-based study of vowel reduction and preservation in British English in initial pretonic position and intertonic position. The different variables which have been claimed to influence those processes are tested on a data set of over 4,500 words using regression analyses. Our results confirm the significant effects of syllable structure, position of the vowel, word frequency and opaque prefixation. They also provide weak evidence for other factors such as vowel features and the existence of a base in which the vowel bears a stress, although no clear effects of word segmentability could be found. We also report new findings, as we find that foreign words reduce less than non-foreign words; we find that [+back] vowels reduce less than [−back] vowels in initial pretonic position; and we find a difference in behaviour for vowels followed by /sC/ clusters between non-derived words and stress-shifted derivatives.

My object pronouns are nominative and reflexive: Nonstandard use of myself and I in coordinate constructions
Charlie Taylor
Journal:

English Today / Volume 41 / Issue 4 / December 2025

Published online by Cambridge University Press:

13 August 2025, pp. 263-269

Print publication:

December 2025
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This paper investigates the nonstandard use of first‑person singular pronouns (myself and I) in coordinate constructions, such as John and I or John and myself. Native English speakers frequently disregard prescriptive grammar rules by using subject or reflexive forms in place of object forms in sentences like Give those papers to John and I. The frequency of such nonstandard usage raises questions, such as when and why speakers substitute nominative or reflexive pronouns for object pronouns in coordinate constructions, and what evidence exists for the existence of fixed constructions like X and I or X and myself. To address these questions, the study analyzes data from the Corpus of Contemporary American English (COCA). Findings provide strong evidence for the existence of an X and I construction in that the nonstandard form is common after the coordinator but not before. Evidence for an X and myself construction is weaker, since untriggered reflexives also appear outside coordinate constructions. First‑person singular forms are more likely to appear in hypercorrect and untriggered forms that other pronouns. The research suggests that X and I may be stored in a chunk, possibly due to overgeneralizations resulting from prescriptive corrections during language acquisition.

A corpus analysis of child and child-directed speech in Palestinian Arabic: A first approach to syntactic development
Tala Nazzal, Anna Gavarró
Journal:

Journal of Child Language , First View

Published online by Cambridge University Press:

16 June 2025, pp. 1-16
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
We present a new corpus of child and child-directed speech (CDS) in Palestinian Arabic. It includes transcriptions following the CHILDES guidelines and features recordings of 16 monolingual Palestinian Arabic-speaking children with an age range of 19–58 months and their adult interlocutors. We analyse the children’s morphosyntactic development and identify a variety of target word orders (45 in child speech, 50 in CDS), with prevalent SV(O) structures; we also found high rates of null subjects in both populations, marginal errors in children’s verbal agreement morphology, and early emergence of serial verb constructions, observed from 23 months of age.

Chapter 3 - Resurrecting Ovid
from Part I - Responding to Exile
Rebecca Menmuir, University of Oxford
Book:

Medieval Responses to Ovid's Exile

Published online:

27 May 2025

Print publication:

12 June 2025, pp 90-122
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 3 examines the consanguinity of Ovid’s two bodies, or corpora: his body of work (his textual corpus) and his physical body, which here represents his living body, corpse, tomb and biographical life. Medieval commentators took great interest in the relationship between Ovid’s bodies, responding diversely to the opportunities – and challenges – posed by Ovid’s insistent focus on the relationship. Their responses illuminate the mechanisms by which Ovid was transformed from an immoral, salacious poet to a moral, edifying one. A surprising element of that metamorphosis is that the pagan Ovid became a justifiably Christian poet for the medieval age. The chapter discusses Ovid’s presentation of his corpora in the exile poetry and the medieval obsession with Ovid’s tomb, before focusing on three medieval case studies: the Nolo Pater Noster anecdote, a medieval Latin narrative where two clerics are visited by the spirit of Ovid; Guillaume de Deguileville’s Le pèlerinage de la vie humaine and John Lydgate’s English rendering of the text, The Pilgrimage of the Life of Man, where a figure on pilgrimage encounters Ovid’s exilic revenant; and Christine de Pizan’s Le livre de la cité des dames, in which Ovid is resurrected only to be castrated.

Reflection on Japan's language policy for English loanwords: Policy aims and media usage analysis
Satoshi Nambu
Journal:

English Today / Volume 41 / Issue 2 / June 2025

Published online by Cambridge University Press:

28 April 2025, pp. 127-136

Print publication:

June 2025
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This study reflects on Japan's language policy, focusing on the government‑led proposals implemented in 2006, which suggested replacing loanwords with Japanese equivalents, known as Gairaigo Iikae Teian ‘proposals for replacing loanwords’. By investigating English loanwords, this article explores the impact of English on Japanese vocabulary, while providing insights into the practical implementation of the government-led language policy in Japan for a broader global audience. It also clarifies that the objective of the proposals was not to strictly regulate the use of English loanwords but to offer suggestions, with replacement as one strategy to improve communication, especially when disseminating information through government agencies and media organisations. Through a quantitative investigation on the usage of English loanwords in the media, the results reveal that the overall number of media articles containing the loanwords in the proposed list has increased over the last 30 years. The findings also confirm that loanwords and their Japanese equivalents are not in competition, with one replacing the other. Instead, their usage exhibits a parallel trend in both frequency and increase rates.

7 - Corpus Linguistics and the Cognitive/Constructional Endeavor
from Part II - Methodological and Empirical Foundations of Constructional Research
- By Stefan Th. Gries
Edited by Mirjam Fried, Univerzita Karlova, Kiki Nikiforidou, University of Athens, Greece
Book:

The Cambridge Handbook of Construction Grammar

Published online:

30 January 2025

Print publication:

06 February 2025, pp 171-195
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter provides an overview of corpus-based advances in Construction Grammar. After a brief introduction on kinds of data in linguistics in general and the notion of corpora in particular, I discuss a variety of corpus-based studies categorized into (i) largely qualitative studies, (ii) studies based on frequencies and probabilities, (iii) studies focusing on association strengths, and (iv) statistical as well as machine-learning studies. In each section, representative studies covering a variety of languages and questions are covered with an eye to surveying methodological as well as theoretical advantages. I conclude with an assessment of the state of the art by comparing how recent developments fare relative to Dąbrowska’s discussion of Cognitive Linguistics’s seven deadly sins.

Exploring the language of Swedish social media: A contrastive corpus analysis
Evie Coussé, Yvonne Adesam
Journal:

Nordic Journal of Linguistics , First View

Published online by Cambridge University Press:

23 January 2025, pp. 1-24
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This article explores the language of social media by analyzing a selection of linguistic features in four corpora of Swedish social media available at Språkbanken Text: Blog mix, Familjeliv, Flashback, and Twitter. Previous research describes the language of these corpora as informal, spoken-like, unedited, non-standard, and innovative. Our corpus analysis confirms the informal and spoken-like nature of social media, while also showing that these traits are unevenly distributed across the various social media corpora and that they are also present in other traditional written corpora, such as novels. Our findings also reveal that the social media corpora show traits of involved and interactional language.

Search Results

Refine search

Refine search

Actions for selected content:

77 results

Using large corpus data for the reconstruction of cross-cultural concept differences: happiness and joy in West Slavic languages and in English

From volition to reportativity: the reportative uses of Latin volo in synchrony and diachrony (with remarks on German wollen and French vouloir)

1 - Introduction

Summary

Chapter 1 - The Dialogue’s Place in the Corpus

Summary

10 - Why Corpus Linguistics Matters to L2 Teaching and Learning

Summary

The Sound Patterns of Taiwanese Southern Min

Predicting Syntax: Processing Dative Constructions in American and Australian Varieties of English

9 - Using CADS Research to Critique Representations of Muslims in the UK Press

Summary

Identifying alternations in historical corpus data: the genitive alternation in Old English

Phonological conditions on variable adjective and noun word order in Tagalog

The Santa Cruz Sluicing Data Set

Formal Grammar, Usage Probabilities, and Auxiliary Contraction

Espèce de linguiste! An impoliteness construction in French?

An empirical study of vowel reduction and preservation in British English

My object pronouns are nominative and reflexive: Nonstandard use of myself and I in coordinate constructions

A corpus analysis of child and child-directed speech in Palestinian Arabic: A first approach to syntactic development

Chapter 3 - Resurrecting Ovid

Summary

Reflection on Japan's language policy for English loanwords: Policy aims and media usage analysis

7 - Corpus Linguistics and the Cognitive/Constructional Endeavor

Summary

Exploring the language of Swedish social media: A contrastive corpus analysis

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

77 results

Summary

Summary

Summary

The Sound Patterns of Taiwanese Southern Min

Summary

Summary

Summary