To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The chapter examines how moral rhetoric is used in party communication using dictionary-based text analysis of party manifestos. The main data include 158 manifestos from six English-speaking democracies (Australia, Canada, Ireland, New Zealand, the United Kingdom, the United States) across thirty-four elections. I first measure moral rhetoric in the aggregate. The measure captures the overall level of moral rhetoric used by a party in its campaign. I show that there is variation in moral rhetoric across countries and within countries. I also show the validity of the measurement approach and its robustness to alternatives. Overall, we learn that moral rhetoric is a distinct aspect of party messaging. I then explore patterns in more disaggregate measures of moral rhetoric. Analyses reveal that there are more commonalities in the ways that parties use moral rhetoric than one might expect. Building on the framework of the Moral Foundations Theory, I find that differences in the moral palettes of the left and the right that we can expect based on prior work are more nuanced and not as stark when we examine specific moral foundations separately and when we examine appeals at the level of issues.
In settings of deep poverty and inequality, implementing policies that balance urgent needs with long-term development is crucial. What strategies are used to build public support for long-term oriented policies? Evidence shows that both left- and right-wing governments have played a role in the expansion of social policy. This article explores the context and meanings that governments with different ideologies assign to distributive policies, focusing on how these policies are communicated. In particular, I argue that ideology significantly shapes the framing presidents use when discussing and announcing social policies. Left-leaning governments emphasize social inclusion while right-leaning governments stress the productivity-enhancing aspects of these policies. Using text analysis techniques, including à la carte embeddings (ALC) this study analyzes presidential communications from Argentina, Uruguay, and Chile. The findings show how ideology drives communication strategies, revealing that in more polarized societies, presidents distinguish themselves more consistently through how they construct and communicate these policies.
Research shows that increased participation of women in parliaments benefits climate change outcomes. Yet, the actions taken by women parliamentarians to shape these outcomes have not been identified in the literature. I assert that a primary step by which women may generate impact is by championing environmentalism in their speeches before parliament. To test this, I analyse speeches from the UK House of Commons from 2010 to 2021, and find that women MPs both speak proportionately more about the environment than their male counterparts, and bring environmentalism into debates that are not explicitly coded as environmental. Finally, while Conservative women are outnumbered by men, they contribute significantly more to environmental speeches than their male counterparts. These results suggest that women are disproportionately responsible for embedding environmentalism into political discussions across Parliament and the Conservative Party, and prompt questions around the true cost of unequal representation for our climate.
Efforts to curb online hate speech depend on our ability to reliably detect it at scale. Previous studies have highlighted the strong zero-shot classification performance of large language models (LLMs), offering a potential tool to efficiently identify harmful content. Yet for complex and ambivalent tasks like hate speech detection, pre-trained LLMs can be insufficient and carry systemic biases. Domain-specific models fine-tuned for the given task and empirical context could help address these issues, but, as we demonstrate, the quality of data used for fine-tuning decisively matters. In this study, we fine-tuned GPT-4o-mini using a unique corpus of online comments annotated by diverse groups of coders with varying annotation quality: research assistants, activists, two kinds of crowd workers, and citizen scientists. We find that only annotations from those groups of annotators that are better than zero-shot GPT-4o-mini in recognizing hate speech improve the classification performance of the fine-tuned LLM. Specifically, fine-tuning using the highest-quality annotator group – trained research assistants – boosts classification performance by increasing the model’s precision without notably sacrificing the good recall of zero-shot GPT-4o-mini. In contrast, lower-quality annotations do not improve and may even decrease the ability to identify hate speech. By examining tasks reliant on human judgment and context, we offer insights that go beyond hate speech detection.
An ever-increasing availability of digital texts has opened new research opportunities for political scientists. Yet, researchers who want to utilise these data face several challenges. This paper presents the results of a community-wide survey tapping into various research challenges, training needs, and preferences of scholars using text analysis methodologies. The survey involved respondents from various academic fields and career levels. Our findings indicate that text-as-data methods are gaining momentum in various political science subfields and are used on a wide range of political texts. However, relevant training is not easily accessible to all. Only half of the respondents have ever participated in a training event, though there is a high demand for training opportunities in different formats and at different levels. In ‘Conclusions’, we discuss how the inaccessibility of training risks narrowing the field of researchers.
A close connection between public opinion and policy is considered a vital element of democracy. However, legislators cannot be responsive to all voters at all times with regard to the policies the latter favour. We argue that legislators use their speaking time in parliament to offer compensatory speech to their constituents who might oppose how they voted on a policy, in order to re‐establish themselves as responsive to the public's wishes. Leveraging the case of Brexit, we show that legislators pay more attention to constituents who might be dissatisfied with how they voted. Furthermore, their use of rhetorical responsiveness is contingent on the magnitude of the representational deficit they face vis‐à‐vis their constituency. Our findings attest to the central role of parliamentary speech in maintaining responsiveness. They also demonstrate that communicative responsiveness can substitute for policy responsiveness.
Debates about the European Union's democratic legitimacy put national parliaments into the spotlight. Do they enhance democratic accountability by offering visible debates and electoral choice about multilevel governance? To support such accountability, saliency of EU affairs in the plenary ought to be responsive to developments in EU governance, has to be linked to decision‐making moments and should feature a balance between government and opposition. The recent literature discusses various partisan incentives that support or undermine these criteria, but analyses integrating these arguments are rare. This article provides a novel comparative perspective by studying the patterns of public EU emphasis in more than 2.5 million plenary speeches from the German Bundestag, the British House of Commons, the Dutch Tweede Kamer and the Spanish Congreso de los Diputados over a prolonged period from 1991 to 2015. It documents that parliamentary actors are by and large responsive to EU authority and its exercise where especially intergovernmental moments of decision making spark plenary EU salience. But the salience of EU issues is mainly driven by government parties, decreases in election time and is negatively related to public Euroscepticism. The article concludes that national parliaments have only partially succeeded in enhancing EU accountability and suffer from an opposition deficit in particular.
The promises and pitfalls of automated (computer-assisted) and human-coding content analysis techniques applied to political science research have been extensively discussed in the scholarship on party politics and legislative studies. This study presents a similar comparative analysis outlining the pay-offs and trade-offs of these two methods of content analysis applied to research on EU lobbying. The empirical focus is on estimating interest groups’ positions based on their formally submitted policy position documents in the context of EU policymaking. We identify the defining characteristics of these documents and argue that the choice for a method of content analysis should be informed by a concern for addressing the specificities of the research topic covered, of the research question asked and of the data sources employed. We discuss the key analytical assumptions and methodological requirements of automated and human-coding text analysis and the degree to which they match the identified text characteristics. We critically assess the most relevant methodological challenges research designs face when these requirements need to be complied with and how these challenges might affect measurement validity. We also compare the two approaches in terms of their reliability and resource intensity. The article concludes with recommendations and issues for future research.
From the early use of TF-IDF to the high-dimensional outputs of deep learning, vector space embeddings of text, at a scale ranging from token to document, are at the heart of all machine analysis and generation of text. In this article, we present the first large-scale comparison of a sampling of such techniques on a range of classification tasks on a large corpus of current literature drawn from the well-known Books3 data set. Specifically, we compare TF-IDF, Doc2vec and several Transformer-based embeddings on a variety of text-specific tasks. Using industry-standard BISAC codes as a proxy for genre, we compare embeddings in their ability to preserve information about genre. We further compare these embeddings in their ability to encode inter- and intra-book similarity. All of these comparisons take place at the book “chunk” (1,024 tokens) level. We find Transformer-based (“neural”) embeddings to be best, in the sense of their ability to respect genre and authorship, although almost all embedding techniques produce sensible constructions of a “literary landscape” as embodied by the Books3 corpus. These experiments suggest the possibility of using deep learning embeddings not only for advances in generative AI, but also a potential tool for book discovery and as an aid to various forms of more traditional comparative textual analysis.
Supervised learning is increasingly used in social science research to quantify abstract concepts in textual data. However, a review of recent studies reveals inconsistencies in reporting practices and validation standards. To address this issue, we propose a framework that systematically outlines the process of transforming text into a quantitative measure, emphasizing key reporting decisions at each stage. Clear and comprehensive validation is crucial, enabling readers to critically evaluate both the methodology and the resulting measure. To illustrate our framework, we develop and validate a measure assessing the tone of questions posed to nominees during U.S. Senate confirmation hearings. This study contributes to the growing literature advocating for transparency and rigor in applying machine learning methods within computational social sciences.
Servitization is a key strategy for enhancing competitiveness in manufacturing, yet the managerial drivers behind this transformation remain underexplored. This study investigates the impact of top executives’ service cognition on servitization using a novel index derived from text-mined disclosures of Chinese listed manufacturing firms (2007–2020). Results show that executives’ service cognition significantly promotes servitization, even after controlling for endogeneity using instrumental variables and Heckman’s two-stage model. Mechanism analysis reveals that this cognitive orientation enhances human capital accumulation and R&D investment, which in turn drive higher service levels. Furthermore, the relationship is moderated by executive power concentration and regional internet penetration. Heterogeneity tests indicate stronger effects in high-tech industries, state-owned enterprises, and large firms. These findings highlight the critical role of executive cognition in shaping strategic transformation and offer practical implications for firms and policymakers aiming to foster servitization through leadership development and supportive digital infrastructure.
The sentiment expressed in a legislator’s speech is informative. However, extracting legislators’ sentiment requires human-annotated data. Instead, we propose exploiting closing debates on a bill in Japan, where legislators in effect label their speech as either pro or con. We utilize debate speeches as the training dataset, fine-tune a pretrained model, and calculate the sentiment scores of other speeches. We show that the more senior the opposition members are, the more negative their sentiment. Additionally, we show that opposition members become more negative as the next election approaches. We also demonstrate that legislators’ sentiments can be used to predict their behaviors by using the case in which government members rebelled in the historic vote of no confidence in 1993.
Chapter 7 builds on students’ understanding of arrays and numeric and logical data types from Chapters 2 and 4, demonstrating how to use what they already know to manipulate text in MATLAB. Text in MATLAB comes in two forms: character arrays, in which text is stored in individual letters, numbers, symbols, and spaces; and strings, in which each element of text can store any number of those characters. Differences in the utility of these structures for different tasks are discussed, as is their interchangeability when providing inputs to other MATLAB functions. Once text is introduced, students learn to interface with MATLAB via input/output features, both in the console and in pop-up windows. Lastly, because MATLAB code is also text, students learn to run text as MATLAB code, as well as potential issues with doing so and workarounds to avoid those issues.
Propagandists discredit political ideas that rival their own. In China’s state-run media, one common technique is to place the phrase so-called, in English, or 所谓, in Chinese, before the idea to be discredited. In this research note we apply quantitative text analysis methods to over 45,000 Xinhua articles from 2003 to 2022 containing so-called or 所谓 to better understand the ideas the government wishes to discredit for different audiences. We find that perceived challenges to China’s sovereignty consistently draw usage of the term and that a theme of rising importance is political rivalry with the United States. When it comes to differences between internal and external propaganda, we find broad similarities, but differences in how the US is discredited and more emphasis on cooperation for foreign audiences. These findings inform scholarship on comparative authoritarian propaganda and Chinese propaganda specifically.
A critical challenge for biomedical investigators is the delay between research and its adoption, yet there are few tools that use bibliometrics and artificial intelligence to address this translational gap. We built a tool to quantify translation of clinical investigation using novel approaches to identify themes in published clinical trials from PubMed and their appearance in the natural language elements of the electronic health record (EHR).
Methods:
As a use case, we selected the translation of known health effects of exercise for heart disease, as found in published clinical trials, with the appearance of these themes in the EHR of heart disease patients seen in an emergency department (ED). We present a self-supervised framework that quantifies semantic similarity of themes within the EHR.
Results:
We found that 12.7% of the clinical trial abstracts dataset recommended aerobic exercise or strength training. Of the ED treatment plans, 19.2% related to heart disease. Of these, the treatment plans that included heart disease identified aerobic exercise or strength training only 0.34% of the time. Treatment plans from the overall ED dataset mentioned aerobic exercise or strength training less than 5% of the time.
Conclusions:
Having access to publicly available clinical research and associated EHR data, including clinician notes and after-visit summaries, provided a unique opportunity to assess the adoption of clinical research in medical practice. This approach can be used for a variety of clinical conditions, and if assessed over time could measure implementation effectiveness of quality improvement strategies and clinical guidelines.
This paper studies the role of central bank communication for the monetary policy transmission mechanism using text analysis techniques. In doing so, we derive sentiment measures from European Central Bank (ECB)’s press conferences indicating a dovish or hawkish tone referring to interest rates, inflation, and unemployment. We provide strong evidence for predictability of our sentiments on interbank interest rates, even after controlling for actual policy rate changes. We also find that our sentiment indicators offer predictive power for professionals’ expectations, the disagreement among them, and their uncertainty regarding future inflation as well as future interest rates. Policy communication shocks identified through sign restrictions based on our sentiment measure also have significant effects on real outcomes. Overall, our findings highlight the importance of the tone of central bank communication for the transmission mechanism of monetary policy, but also indicate the necessity of refinements of the communication policies implemented by the ECB to better anchor inflation expectations at the target level and to reduce uncertainty regarding the future path of monetary policy.
It is often argued that when legislators have personal vote-seeking incentives, parties are less unified because legislators need to build bonds of accountability with their voters. I argue that these effects depend on a legislator’s ability to cultivate a personal vote. When parties control access to the ballot and the resources candidates need to cultivate personal votes, they can condition a legislator’s access to these resources on loyalty to the party’s agenda. I test this theory by conducting a difference-in-differences analysis that leverages the staggered implementation of the 2014 Mexican Electoral Reform. This reform introduced the possibility of consecutive reelection for state legislators, increasing their incentives to cultivate personal votes. I study unity in position-taking and voting behaviour of Mexican state legislators from 2012 to 2018. To analyze position-taking, I apply correspondence analysis to a new dataset of over half a million legislative speeches in twenty states. To study voting, I analyze over 14,500 roll-call votes in fourteen states during the same period. Results show that reelection incentives increased intra-party unity, which has broad implications for countries introducing electoral reforms aiming to personalize politics.
A common challenge in studying Italian parliamentary discourse is the lack of accessible, machine-readable, and systematized parliamentary data. To address this, this article introduces the ItaParlCorpus dataset, a new, annotated, machine-readable collection of Italian parliamentary plenary speeches for the Camera dei Deputati, the lower house of Parliament, spanning from 1948 to 2022. This dataset encompasses 470 million words and 2.4 million speeches delivered by 5830 unique speakers representing 77 different political parties. The files are designed for easy processing and analysis using widely-used programming languages, and they include metadata such as speaker identification and party affiliation. This opens up opportunities for in-depth analyses on a variety of topics related to parliamentary behavior, elite rhetoric, and the salience of political themes, exploring how these vary across party families and over time.
Previous accounts have suggested a potential divergence between Xi Jinping and Li Keqiang in their approaches to economic governance. This study examines the policy orientations of the two leaders concerning state–market relations, providing empirical evidence for the recent manifestation of what insiders have termed the “dispute between north and south houses” (nanbeiyuan zhi zheng) and its economic implications. By applying semi-supervised machine learning methods to textual data, this study demonstrates that Li favoured market-oriented policies, whereas Xi displayed a pronounced preference for state-centric strategies. The findings notably indicate an initial divergence in policy orientation, which was followed by a considerable convergence during Xi's second term. Our analysis further reveals that Li's market-oriented rhetoric was particularly prominent during “Mass innovation week,” indicating a campaign-style policy mobilization. Moreover, the analysis identifies that the discursive differences between the two leaders are associated with a decline in firm-level investment, suggesting that disparities in policy orientation may engender political uncertainty. This study contributes to the extant literature on the impact of leadership dynamics on economic policy, the implications of mixed signals from the central leadership and the phenomenon of campaign-style mobilization in China.
We apply moral foundations theory (MFT) to explore how the public conceptualizes the first eight months of the conflict between Ukraine and the Russian Federation (Russia). Our analysis includes over 1.1 million English tweets related to the conflict over the first 36 weeks. We used linguistic inquiry word count (LIWC) and a moral foundations dictionary to identify tweets’ moral components (care, fairness, loyalty, authority, and sanctity) from the United States, pre- and post-Cold War NATO countries, Ukraine, and Russia. Following an initial spike at the beginning of the conflict, tweet volume declined and stabilized by week 10. The level of moral content varied significantly across the five regions and the five moral components. Tweets from the different regions included significantly different moral foundations to conceptualize the conflict. Across all regions, tweets were dominated by loyalty content, while fairness content was infrequent. Moral content over time was relatively stable, and variations were linked to reported conflict events.