Content Listing

Automatic Image Tagging for Corpus Linguistics

A Multimodal Study of News Representations of Islam
Paul Baker, Hanna Schmück, Yufang Qian
Published online:

02 July 2025

Print publication:

24 July 2025
- Element
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This Element reports on the creation and analysis of a 1.5-million-word corpus consisting of a year's worth of UK national press news articles about Islam and Muslims, published between December 2022 and November 2023. The corpus also contains 8,546 image files which have been automatically tagged using Google's Vertex AI. Analysis was carried out on three levels a) written text only, b) images only, c) interactions between written text and images. Using examples from the analyses, the authors demonstrate the affordances of these three approaches, providing a critical evaluation of Vertex AI's capabilities and the abilities of popular corpus software to work with visually tagged corpora. The Element acts as a practical guide for researchers who want to carry out this form of analysis. This title is also available as Open Access on Cambridge Core.

Lexical Multidimensional Analysis

Identifying Discourses and Ideologies
Tony Berber Sardinha, Shannon Fitzsimmons-Doolan
Published online:

07 February 2025

Print publication:

27 February 2025
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Lexical Multidimensional Analysis (LMDA), an extension of Biber's (1988) Multidimensional Analysis, seeks to identify dimensions (correlated lexical features across texts in a corpus) unveiling underlying patterns of lexical co-occurrence and variation within texts that are operationalized as a variety of latent, macro-level discursive constructs. Initially developed in the 2010s, LMDA has been applied to diverse domains, including education policy, national representations, applied linguistics, music, the infodemic, religion, sustainability, and literary style. This Element introduces LMDA for the identification and analysis of discourses and ideologies, offering insights into how lexis marks discourse formations and ideological alignments. Two case studies demonstrate the application of LMDA: uncovering discourses on climate change within conservative social media and analyzing ideological discourses in migrant education.

Social Group Representation in a Diachronic News Corpus

Irene Elmerot
Published online:

06 February 2025

Print publication:

06 February 2025
- Element
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Equality is a global factor of prosperity in democratic societies. In this Element, thirty years of newspapers and magazines form the basis of an intersectional study on how different social actors are described in Czechia. A bird's eye perspective points to the news being very white male-oriented, but when scrutinising further, some results differ from previous studies, giving insights on linguistic othering and stratification that may be a threat to equality. The methodology can be used for most languages with a sufficient amount of digitised, annotated and available texts. Since more and more text is being gathered to form datasets large enough to answer any question we might have, this Element helps uncover why we should be careful about which conclusions to draw if the words put into the data are not adapted to the relevant register and context. This title is also available as Open Access on Cambridge Core.

Programming for Corpus Linguistics with Python and Dataframes

Daniel Keller
Published online:

24 May 2024

Print publication:

20 June 2024
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This Element offers intermediate or experienced programmers algorithms for Corpus Linguistic (CL) programming in the Python language using dataframes that provide a fast, efficient, intuitive set of methods for working with large, complex datasets such as corpora. This Element demonstrates principles of dataframe programming applied to CL analyses, as well as complete algorithms for creating concordances; producing lists of collocates, keywords, and lexical bundles; and performing key feature analysis. An additional algorithm for creating dataframe corpora is presented including methods for tokenizing, part-of-speech tagging, and lemmatizing using spaCy. This Element provides a set of core skills that can be applied to a range of CL research questions, as well as to original analyses not possible with existing corpus software.

Collocations, Corpora and Language Learning

Paweł Szudarski
Published online:

23 June 2023

Print publication:

20 July 2023
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This Element provides a systematic overview and synthesis of corpus-based research into collocations focusing on the learning and use of collocations by second language (L2) users. Underlining the importance of collocation as a key notion within the field of corpus linguistics, the text offers a state-of-the-art account of the main findings related to the applications of corpora and corpus-based measures for defining, identifying and analysing collocations as related to second language acquisition. Emphasising the quality of L2 collocation research, the Element illustrates key methodological issues to be considered when conducting this type of corpus analysis. It also discusses examples of pertinent research questions and points to representative studies treated as models of good practice. Aiming at researchers both new and experienced, the Element also points to avenues for future work and shows the relevance of corpus-based analysis for improving the process of learning and teaching of L2 collocations.

Corpus-Assisted Discourse Studies

Mathew Gillings, Gerlinde Mautner, Paul Baker
Published online:

20 March 2023

Print publication:

06 April 2023
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The breadth and spread of corpus-assisted discourse studies (CADS) indicate its usefulness for exploring language use within a social context. However, its theoretical foundations, limitations, and epistemological implications must be considered so that we can adjust our research designs accordingly. This Element offers a compact guide to which corpus linguistic tools are available and how they can contribute to finding out more about discourse. It will appeal to researchers both new and experienced, within the CADS community and beyond.

Shaping Writing Grades

Collocation and Writing Context Effects
Lee McCallum, Philip Durrant
Published online:

12 July 2022

Print publication:

08 September 2022
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This Element explores relationships between collocations, writing quality, and learner and contextual variables in a first-year composition (FYC) programme. Comprising three studies, the Element is anchored in understanding phraseological complexity and its sub-constructs of sophistication and diversity. First, the authors look at sophistication through association measures. They tap into how these measures may tell us different types of information about collocation via a cluster analysis. Selected measures from this clustering are used in a cumulative links model to establish relationships between these measures, measures of diversity and measures of task, the language background of the writer and individual writer variation, and writing quality scores. A third qualitative study of the statistically significant predictors helps understand how writers use collocations and why they might be favoured or downgraded by raters. This Element concludes by considering the implications of this modelling for assessment.

Analysing Language, Sex and Age in a Corpus of Patient Feedback

A Comparison of Approaches
Paul Baker, Gavin Brookes
Published online:

27 June 2022

Print publication:

21 July 2022
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This Element explores approaches to locating and examining social identity in corpora with and without the aid of demographic metadata. This is a key concern in corpus-aided studies of language and identity, and this Element sets out to explore the main challenges and affordances associated with either approach and to discern what either approach can (and cannot) show. It describes two case studies which each compare two approaches to social identity variables – sex and age – in a corpus of 14-million words of patient comments about NHS cancer services in England. The first approach utilises demographic tags to group comments according to patients' sex/age while the second involves categorising cases where patients disclose their sex/age in their comments. This Element compares the findings from either approach, with the approaches themselves being critically discussed in terms of their implications for corpus-aided studies of language and identity.

The Impact of Everyday Language Change on the Practices of Visual Artists

Darryl Hocking
Published online:

21 April 2022

Print publication:

19 May 2022
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The practices of visual artists can never be decontextualised from language. Firstly, artists are constantly in dialogue with their peers, dealers, critics, and audiences about their creative activities and these interactions impact on the work they produce. Secondly, artists' conceptualisations of what artistic practice encompasses are always shaped by wider social discourses. These discourses, however, and their manifestation in the language of everyday life are subject to continual change, and potentially reshape the way that artists conceptualise their practices. Using a 235,000-word diachronic corpus developed from artists' interviews and statements, this Element investigates shifts in artists' use of language to conceptualise their art practice from 1950 to 2019. It then compares these shifts to see if they align with changes in the wider English lexicon and whether there might be a relationship between everyday language change and the aesthetic and conceptual developments that take place in the art world.

Natural Language Processing for Corpus Linguistics

Jonathan Dunn
Published online:

04 March 2022

Print publication:

31 March 2022
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Corpus analysis can be expanded and scaled up by incorporating computational methods from natural language processing. This Element shows how text classification and text similarity models can extend our ability to undertake corpus linguistics across very large corpora. These computational methods are becoming increasingly important as corpora grow too large for more traditional types of linguistic analysis. We draw on five case studies to show how and why to use computational methods, ranging from usage-based grammar to authorship analysis to using social media for corpus-based sociolinguistics. Each section is accompanied by an interactive code notebook that shows how to implement the analysis in Python. A stand-alone Python package is also available to help readers use these methods with their own data. Because large-scale analysis introduces new ethical problems, this Element pairs each new methodology with a discussion of potential ethical implications.

Conducting Sentiment Analysis

Lei Lei, Dilin Liu
Published online:

25 August 2021

Print publication:

23 September 2021
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This Element provides a basic introduction to sentiment analysis, aimed at helping students and professionals in corpus linguistics to understand what sentiment analysis is, how it is conducted, and where it can be applied. It begins with a definition of sentiment analysis and a discussion of the domains where sentiment analysis is conducted and used the most. Then, it introduces two main methods that are commonly used in sentiment analysis known as supervised machine-learning and unsupervised learning (or lexicon-based) methods, followed by a step-by-step explanation of how to perform sentiment analysis with R. The Element then provides two detailed examples or cases of sentiment and emotion analysis, with one using an unsupervised method and the other using a supervised learning method.

Citations in Interdisciplinary Research Articles

Natalia Muguiro
Published online:

02 November 2020

Print publication:

03 December 2020
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This Element explores interdisciplinarity in academic writing. It describes the ways in which disciplines interact when forming interdisciplinary fields and how language reflects (and is reflected by) these interactions. Specifically, bibliographical citations are investigated in corpora of research articles from three interdisciplines: Educational Neuroscience, Economic History, and Science and Technology Studies, as well as the single-domain disciplines from which they are derived. Comparisons are carried out between the interdisciplinary fields and between those fields and their related single-domain disciplines. The study combines analysis of quantitative data and qualitative interpretation by means of close reading. It concludes that bibliographical citations constitute a viable tool to explore interdisciplinary writing in the fields explored. The Element demonstrates that it is possible to describe epistemologically distinct types of interdisciplinarity by means of linguistic evidence.

Doing Linguistics with a Corpus

Methodological Considerations for the Everyday User
Jesse Egbert, Tove Larsson, Douglas Biber
Published online:

13 October 2020

Print publication:

12 November 2020
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Paradoxically, doing corpus linguistics is both easier and harder than it has ever been before. On the one hand, it is easier because we have access to more existing corpora, more corpus analysis software tools, and more statistical methods than ever before. On the other hand, reliance on these existing corpora and corpus linguistic methods can potentially create layers of distance between the researcher and the language in a corpus, making it a challenge to do linguistics with a corpus. The goal of this Element is to explore ways for us to improve how we approach linguistic research questions with quantitative corpus data. We introduce and illustrate the major steps in the research process, including how to: select and evaluate corpora, establish linguistically-motivated research questions, observational units and variables, select linguistically interpretable variables, understand and evaluate existing corpus software tools, adopt minimally sufficient statistical methods, and qualitatively interpret quantitative findings.

Multimodal News Analysis across Cultures

Helen Caple, Changpeng Huan, Monika Bednarek
Published online:

02 September 2020

Print publication:

17 September 2020
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Corpus-based discourse analysts are becoming increasingly interested in the incorporation of non-linguistic data, for example through corpus-assisted multimodal discourse analysis. This Element applies this new approach in relation to how news values are discursively constructed through language and photographs. Using case studies of news from China and Australia, the Element presents a cross-linguistic comparison of news values in national day reporting. Discursive news values analysis (DNVA) has so far been mainly applied to English-language data. This Element offers a new investigation of Chinese DNVA and provides momentum to scholars around the world who are already adopting DNVA to their local contexts. With its focus on national days across two very different cultures, the Element also contributes to research on national identity and cross-linguistic corpus linguistics.

Cambridge Elements

Refine listing

Refine listing

Actions for selected content:

14 results in Cambridge Elements

Automatic Image Tagging for Corpus Linguistics

Lexical Multidimensional Analysis

Social Group Representation in a Diachronic News Corpus

Programming for Corpus Linguistics with Python and Dataframes

Collocations, Corpora and Language Learning

Corpus-Assisted Discourse Studies

Shaping Writing Grades

Analysing Language, Sex and Age in a Corpus of Patient Feedback

The Impact of Everyday Language Change on the Practices of Visual Artists

Natural Language Processing for Corpus Linguistics

Conducting Sentiment Analysis

Citations in Interdisciplinary Research Articles

Doing Linguistics with a Corpus

Multimodal News Analysis across Cultures

Cambridge Elements

Refine listing

Refine listing

Actions for selected content:

Save Search

14 results in Cambridge Elements

Automatic Image Tagging for Corpus Linguistics

Lexical Multidimensional Analysis

Social Group Representation in a Diachronic News Corpus

Programming for Corpus Linguistics with Python and Dataframes

Collocations, Corpora and Language Learning

Corpus-Assisted Discourse Studies

Shaping Writing Grades

Analysing Language, Sex and Age in a Corpus of Patient Feedback

The Impact of Everyday Language Change on the Practices of Visual Artists

Natural Language Processing for Corpus Linguistics

Conducting Sentiment Analysis

Citations in Interdisciplinary Research Articles

Doing Linguistics with a Corpus

Multimodal News Analysis across Cultures