To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This Element reports on the creation and analysis of a 1.5-million-word corpus consisting of a year's worth of UK national press news articles about Islam and Muslims, published between December 2022 and November 2023. The corpus also contains 8,546 image files which have been automatically tagged using Google's Vertex AI. Analysis was carried out on three levels a) written text only, b) images only, c) interactions between written text and images. Using examples from the analyses, the authors demonstrate the affordances of these three approaches, providing a critical evaluation of Vertex AI's capabilities and the abilities of popular corpus software to work with visually tagged corpora. The Element acts as a practical guide for researchers who want to carry out this form of analysis. This title is also available as Open Access on Cambridge Core.
Lexical Multidimensional Analysis (LMDA), an extension of Biber's (1988) Multidimensional Analysis, seeks to identify dimensions (correlated lexical features across texts in a corpus) unveiling underlying patterns of lexical co-occurrence and variation within texts that are operationalized as a variety of latent, macro-level discursive constructs. Initially developed in the 2010s, LMDA has been applied to diverse domains, including education policy, national representations, applied linguistics, music, the infodemic, religion, sustainability, and literary style. This Element introduces LMDA for the identification and analysis of discourses and ideologies, offering insights into how lexis marks discourse formations and ideological alignments. Two case studies demonstrate the application of LMDA: uncovering discourses on climate change within conservative social media and analyzing ideological discourses in migrant education.
Equality is a global factor of prosperity in democratic societies. In this Element, thirty years of newspapers and magazines form the basis of an intersectional study on how different social actors are described in Czechia. A bird's eye perspective points to the news being very white male-oriented, but when scrutinising further, some results differ from previous studies, giving insights on linguistic othering and stratification that may be a threat to equality. The methodology can be used for most languages with a sufficient amount of digitised, annotated and available texts. Since more and more text is being gathered to form datasets large enough to answer any question we might have, this Element helps uncover why we should be careful about which conclusions to draw if the words put into the data are not adapted to the relevant register and context. This title is also available as Open Access on Cambridge Core.
This Element offers intermediate or experienced programmers algorithms for Corpus Linguistic (CL) programming in the Python language using dataframes that provide a fast, efficient, intuitive set of methods for working with large, complex datasets such as corpora. This Element demonstrates principles of dataframe programming applied to CL analyses, as well as complete algorithms for creating concordances; producing lists of collocates, keywords, and lexical bundles; and performing key feature analysis. An additional algorithm for creating dataframe corpora is presented including methods for tokenizing, part-of-speech tagging, and lemmatizing using spaCy. This Element provides a set of core skills that can be applied to a range of CL research questions, as well as to original analyses not possible with existing corpus software.
This Element provides a systematic overview and synthesis of corpus-based research into collocations focusing on the learning and use of collocations by second language (L2) users. Underlining the importance of collocation as a key notion within the field of corpus linguistics, the text offers a state-of-the-art account of the main findings related to the applications of corpora and corpus-based measures for defining, identifying and analysing collocations as related to second language acquisition. Emphasising the quality of L2 collocation research, the Element illustrates key methodological issues to be considered when conducting this type of corpus analysis. It also discusses examples of pertinent research questions and points to representative studies treated as models of good practice. Aiming at researchers both new and experienced, the Element also points to avenues for future work and shows the relevance of corpus-based analysis for improving the process of learning and teaching of L2 collocations.
The breadth and spread of corpus-assisted discourse studies (CADS) indicate its usefulness for exploring language use within a social context. However, its theoretical foundations, limitations, and epistemological implications must be considered so that we can adjust our research designs accordingly. This Element offers a compact guide to which corpus linguistic tools are available and how they can contribute to finding out more about discourse. It will appeal to researchers both new and experienced, within the CADS community and beyond.
This Element explores relationships between collocations, writing quality, and learner and contextual variables in a first-year composition (FYC) programme. Comprising three studies, the Element is anchored in understanding phraseological complexity and its sub-constructs of sophistication and diversity. First, the authors look at sophistication through association measures. They tap into how these measures may tell us different types of information about collocation via a cluster analysis. Selected measures from this clustering are used in a cumulative links model to establish relationships between these measures, measures of diversity and measures of task, the language background of the writer and individual writer variation, and writing quality scores. A third qualitative study of the statistically significant predictors helps understand how writers use collocations and why they might be favoured or downgraded by raters. This Element concludes by considering the implications of this modelling for assessment.
This Element explores approaches to locating and examining social identity in corpora with and without the aid of demographic metadata. This is a key concern in corpus-aided studies of language and identity, and this Element sets out to explore the main challenges and affordances associated with either approach and to discern what either approach can (and cannot) show. It describes two case studies which each compare two approaches to social identity variables – sex and age – in a corpus of 14-million words of patient comments about NHS cancer services in England. The first approach utilises demographic tags to group comments according to patients' sex/age while the second involves categorising cases where patients disclose their sex/age in their comments. This Element compares the findings from either approach, with the approaches themselves being critically discussed in terms of their implications for corpus-aided studies of language and identity.
The practices of visual artists can never be decontextualised from language. Firstly, artists are constantly in dialogue with their peers, dealers, critics, and audiences about their creative activities and these interactions impact on the work they produce. Secondly, artists' conceptualisations of what artistic practice encompasses are always shaped by wider social discourses. These discourses, however, and their manifestation in the language of everyday life are subject to continual change, and potentially reshape the way that artists conceptualise their practices. Using a 235,000-word diachronic corpus developed from artists' interviews and statements, this Element investigates shifts in artists' use of language to conceptualise their art practice from 1950 to 2019. It then compares these shifts to see if they align with changes in the wider English lexicon and whether there might be a relationship between everyday language change and the aesthetic and conceptual developments that take place in the art world.
Corpus analysis can be expanded and scaled up by incorporating computational methods from natural language processing. This Element shows how text classification and text similarity models can extend our ability to undertake corpus linguistics across very large corpora. These computational methods are becoming increasingly important as corpora grow too large for more traditional types of linguistic analysis. We draw on five case studies to show how and why to use computational methods, ranging from usage-based grammar to authorship analysis to using social media for corpus-based sociolinguistics. Each section is accompanied by an interactive code notebook that shows how to implement the analysis in Python. A stand-alone Python package is also available to help readers use these methods with their own data. Because large-scale analysis introduces new ethical problems, this Element pairs each new methodology with a discussion of potential ethical implications.
This Element provides a basic introduction to sentiment analysis, aimed at helping students and professionals in corpus linguistics to understand what sentiment analysis is, how it is conducted, and where it can be applied. It begins with a definition of sentiment analysis and a discussion of the domains where sentiment analysis is conducted and used the most. Then, it introduces two main methods that are commonly used in sentiment analysis known as supervised machine-learning and unsupervised learning (or lexicon-based) methods, followed by a step-by-step explanation of how to perform sentiment analysis with R. The Element then provides two detailed examples or cases of sentiment and emotion analysis, with one using an unsupervised method and the other using a supervised learning method.
This Element explores interdisciplinarity in academic writing. It describes the ways in which disciplines interact when forming interdisciplinary fields and how language reflects (and is reflected by) these interactions. Specifically, bibliographical citations are investigated in corpora of research articles from three interdisciplines: Educational Neuroscience, Economic History, and Science and Technology Studies, as well as the single-domain disciplines from which they are derived. Comparisons are carried out between the interdisciplinary fields and between those fields and their related single-domain disciplines. The study combines analysis of quantitative data and qualitative interpretation by means of close reading. It concludes that bibliographical citations constitute a viable tool to explore interdisciplinary writing in the fields explored. The Element demonstrates that it is possible to describe epistemologically distinct types of interdisciplinarity by means of linguistic evidence.
Paradoxically, doing corpus linguistics is both easier and harder than it has ever been before. On the one hand, it is easier because we have access to more existing corpora, more corpus analysis software tools, and more statistical methods than ever before. On the other hand, reliance on these existing corpora and corpus linguistic methods can potentially create layers of distance between the researcher and the language in a corpus, making it a challenge to do linguistics with a corpus. The goal of this Element is to explore ways for us to improve how we approach linguistic research questions with quantitative corpus data. We introduce and illustrate the major steps in the research process, including how to: select and evaluate corpora, establish linguistically-motivated research questions, observational units and variables, select linguistically interpretable variables, understand and evaluate existing corpus software tools, adopt minimally sufficient statistical methods, and qualitatively interpret quantitative findings.
Corpus-based discourse analysts are becoming increasingly interested in the incorporation of non-linguistic data, for example through corpus-assisted multimodal discourse analysis. This Element applies this new approach in relation to how news values are discursively constructed through language and photographs. Using case studies of news from China and Australia, the Element presents a cross-linguistic comparison of news values in national day reporting. Discursive news values analysis (DNVA) has so far been mainly applied to English-language data. This Element offers a new investigation of Chinese DNVA and provides momentum to scholars around the world who are already adopting DNVA to their local contexts. With its focus on national days across two very different cultures, the Element also contributes to research on national identity and cross-linguistic corpus linguistics.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.