To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Edited by
Daniel Naurin, University of Oslo,Urška Šadl, European University Institute, Florence,Jan Zglinski, London School of Economics and Political Science
Empirical legal studies in EU law routinely, if not inevitably, engage with text. From the decisions of national courts applying EU law, applicants’ case filings, to the Court’s own jurisprudence, these texts are an invaluable source of information for researchers seeking to understand the dynamics involved in the shaping of EU law and its broader societal impact. Distilling relevant information from legal texts, however, is anything but trivial. Intended to serve as a reference manual, the chapter offers detailed guidelines to researchers of both law and political science interested in employing a text-as-data approach to the study of EU law. To this end, we elaborate on how to conceptualise real-life phenomena in a way that renders them conducive to measurement, providing practical guidance on hand-coding and the use of deep learning classifiers. Further, we address potential challenges arising in the specific context of EU law. This includes limitations to access to relevant documents, as well as ensuring inter-coder reliability in data collection efforts that require specialised legal expertise.
Understanding the values held by negotiating parties is central to the design and success of international climate change agreements. However, empirical understandings of these values – and the manners by which they structure negotiating countries’ value networks and interactions over time – are severely limited. In addressing this shortcoming, this paper uses keyword-assisted topic models to extract value networks for the 13 most recent Conferences of the Parties (COPs) to the United Nations Framework Convention on Climate Change (UNFCCC). It then uses network analysis tools to unpack these networks in relation to influential values, countries, and time. In doing so, it demonstrates that countries’ core climate change values (i) can be accurately recovered from COP High-level Segment (HLS) speeches and (ii) can, in turn, be used to understand the structure of negotiation networks at the UNFCCC. Analysis of the corresponding value networks for COPs 16–28 indicates that initially central values of “Fairness” and “Power” have increasingly given way to values associated with the “Environment” and “Achievement.” Thus, countries at the UNFCCC have increasingly eschewed values associated with common but differentiated responsibilities in favor of a consensus over the urgency of collectively combating climate change. These and related insights illustrate our approach’s potential for recovering and understanding value networks within climate change negotiations – a critical first step for any successful climate change agreement.
The media has a major influence on public opinions and legitimacy for NGOs, which can have a serious impact on the effectiveness of NGOs’ programs. However, media biases often affect the framing of media objects. For instance, western countries are often portrayed negatively by the media of the Muslims countries. This anti-western bias is less prevalent in English-language media when compared to the local languages newspapers as the English-language media generally target the elites who often hold less anti-western opinions than the general population. As NGOs are usually considered a western construct in the Muslim world, I test whether the media’s sensitivity to its consumers’ sentiments extends to the coverage of NGOs by comparing English and local language (Urdu) newspapers in Pakistan. I confirm that Urdu newspapers portray NGOs more negatively than English-language newspapers and are more likely to question NGOs’ effectiveness and accountability.
This study develops a new text-as-data method for organization identification, based on word embedding. We introduce and apply the method to identify identity-based nonprofit organizations, using the U.S. nonprofits’ mission and activity information reported in the IRS Form 990s in 2010–2016. Our results show that such method is simple but versatile. It complements the existing dictionary-based approaches and supervised machine learning methods for classification purposes and generates a reliable continuous measure of document-to-keyword relevance. Our approach provides a nonbinary alternative for nonprofit big data analyses. Using word embedding, researchers are able to identify organizations of interest, track possible changes over time and capture nonprofits’ multi-dimensionality.
Pretrained text embeddings are a fast and scalable method for determining whether two texts have similar meaning, capturing not only lexical similarity, but semantic similarity as well. In this article, I show how to incorporate these measures into a probabilistic record linkage procedure that yields considerable improvements in both precision and recall over existing methods. The procedure even allows researchers to link datasets across different languages. I validate the approach with a series of political science applications, and provide open-source statistical software for researchers to efficiently implement the proposed method.
Presidents are often viewed as national policy leaders. Yet, they increasingly use negative rhetoric to attack the opposition rather than forge legislative compromise, contrary to theories of going public. Why? I argue presidents facing congressional obstruction eschew short-term policy persuasion. They speak as negative partisans to mobilize co-partisans and shape the longer-term balance of power in Congress, improving future policy-making prospects. I collect all presidential speeches delivered between 1933 and 2024 and use transformer methods to measure how often, and how negatively, presidents reference the out-party. They do so when the policy-making environment is unfavorable: when majorities are tenuous, government is divided, and as elections approach. I provide additional support with a case study of Democrats’ 2009 filibuster-proof Senate majority. Finally, this rhetoric has behavioral impact: presidential negative partisanship decreases co-partisan approval of the opposition. This research alters our understanding of going public and reinforces the partisan dimension of modern presidential representation.
In recent decades, researchers have analyzed professional military education (PME) organizations to understand the characteristics and transformation of the core of military culture, the officer corps. Several historical studies have demonstrated the potential of this approach, but they were limited by both theoretical and methodological hurdles. This paper presents a new historical-institutionalist framework for analyzing officership and PME, integrating computational social science methods for large-scale data collection and analysis to overcome limited access to military environments and the intensive manual labor required for data collection and analysis. Furthermore, in an era where direct demographic data are increasingly being removed from the public domain, our indirect estimation methods provide one of the few viable alternatives for tracking institutional change. This approach will be demonstrated using web-scraping and a quantitative text analysis of the entire repository of theses from an elite American military school.
Explorations of ideology retain special significance in contemporary studies of judicial politics. While some existing methodologies draw on voting patterns and coalition alignments to map a jurist’s latent features, many are otherwise reliant on supplemental proxies – often directly from adjacent actors or via assessments from various prognosticators. We propose an alternative that not only leverages observable judicial behavior, but does so through jurists’ articulations on the law. In particular, we adapt a hierarchical factor model to demonstrate how latent ideological preferences emerge through the written text of opinions. Relying on opinion content from Justices of the Supreme Court, we observe a discernible correlation between linguistic choices and latent expressions of ideology irrespective of known preferences or voting patterns. Testing our method against Martin-Quinn, we find our approach strongly correlates with this validated and commonly used measure of judicial ideology. We conclude by discussing the intuitive power of text as a feature of ideology, as well as how this process can extend to judicial actors and institutions beyond the Supreme Court.
As the use of computational text analysis in the social sciences has increased, topic modeling has emerged as a popular method for identifying latent themes in textual data. Nevertheless, concerns have been raised regarding the validity of the results produced by this method, given that it is largely automated and inductive in nature, and the lack of clear guidelines for validating topic models has been identified by scholars as an area of concern. In response, we conducted a comprehensive systematic review of 789 studies that employ topic modeling. Our goal is to investigate whether the field is moving toward a common framework for validating these models. The findings of our review indicate a notable absence of standardized validation practices and a lack of convergence toward specific methods of validation. This gap may be attributed to the inherent incompatibility between the inductive, qualitative approach of topic modeling and the deductive, quantitative tradition that favors standardized validation. To address this, we advocate for incorporating qualitative validation approaches, emphasizing transparency and detailed reporting to improve the credibility of findings in computational social science research when using topic modeling.
We perform the first mapping of the ideological positions of European parties using generative Artificial Intelligence (AI) as a “zero-shot” learner. We ask OpenAI’s Generative Pre-trained Transformer 3.5 (GPT-3.5) to identify the more “right-wing” option across all possible duplets of European parties at a given point in time, solely based on their names and country of origin, and combine this information via a Bradley–Terry model to create an ideological ranking. A cross-validation employing widely-used expert-, manifesto- and poll-based estimates reveals that the ideological scores produced by Large Language Models (LLMs) closely map those obtained through the expert-based evaluation, i.e., CHES. Given the high cost of scaling parties via trained coders, and the scarcity of expert data before the 1990s, our finding that generative AI produces estimates of comparable quality to CHES supports its usage in political science on the grounds of replicability, agility, and affordability.
Oral argument is the most public and visible part of the U.S. Supreme Court’s decision-making process. Yet what if some advocates are treated differently before the Court solely because of aspects of their identity? In this work, we leverage a causal inference framework to quantify the effect of an advocate’s gender on interruptions of advocates at both the Court-level and the justice-level. Examining nearly four decades of U.S. Supreme Court oral argument transcript data, we identify a clear and consistent gender effect that dwarfs other influences on justice interruption behavior, with female advocates interrupted more frequently than male advocates.
What drives changes in the thematic focus of state-linked manipulated media? We study this question in relation to a long-running Iranian state-linked manipulated media campaign that was uncovered by Twitter in 2021. Using a variety of machine learning methods, we uncover and analyze how this manipulation campaign’s topical themes changed in relation to rising Covid-19 cases in Iran. By using the topics of the tweets in a novel way, we find that increases in domestic Covid-19 cases engendered a shift in Iran’s manipulated media focus away from Covid-19 themes and toward international finance- and investment-focused themes. These findings underscore (i) the potential for state-linked manipulated media campaigns to be used for diversionary purposes and (ii) the promise of machine learning methods for detecting such behaviors.
We demonstrate how few-shot prompts to large language models (LLMs) can be effectively applied to a wide range of text-as-data tasks in political science—including sentiment analysis, document scaling, and topic modeling. In a series of pre-registered analyses, this approach outperforms conventional supervised learning methods without the need for extensive data pre-processing or large sets of labeled training data. Performance is comparable to expert and crowd-coding methods at a fraction of the cost. We propose a set of best practices for adapting these models to social science measurement tasks, and develop an open-source software package for researchers.
The purpose of this short research note is to draw attention to two major pitfalls of working with databases of decisions of the Court of Justice of the European Union. The first one is technical in nature and relates to the discrepant coverage of the Curia and Eur-Lex databases. The second one is linguistic in nature and relates to the fact that most scholars using these databases work in English. New work on this front is capable of addressing the first issue but a change to research practices would be required to address the second.
Primaries might also contribute to party transformation by incentivizing candidates to move position within an election cycle. Candidates might face a “strategic positioning dilemma” if they must first satisfy an extreme selectorate to earn the nomination before facing a comparatively moderate general electorate. This chapter therefore tests whether all candidates in a primary adapt their positions away from the center during the nomination phase of a single election cycle, presenting general election voters with polarized choices. To scale positions both during and after a primary it uses a text-as-data approach based on candidates’ communication on Twitter during the 2020 election cycle. It finds that Democratic candidates who lost primaries became significantly more moderate immediately after their defeat, especially if they lost in ideological or factional primaries. It does not observe this pattern among Republican losers. This chapter demonstrates a further way in which primaries may contribute to polarization, incentivizing candidates to adopt positions further from the ideological center during the nomination phase of the election cycle.
The rise of populism concerns many political scientists and practitioners, yet the detection of its underlying language remains fragmentary. This paper aims to provide a reliable, valid, and scalable approach to measure populist rhetoric. For that purpose, we created an annotated dataset based on parliamentary speeches of the German Bundestag (2013–2021). Following the ideational definition of populism, we label moralizing references to “the virtuous people” or “the corrupt elite” as core dimensions of populist language. To identify, in addition, how the thin ideology of populism is “thickened,” we annotate how populist statements are attached to left-wing or right-wing host ideologies. We then train a transformer-based model (PopBERT) as a multilabel classifier to detect and quantify each dimension. A battery of validation checks reveals that the model has a strong predictive accuracy, provides high qualitative face validity, matches party rankings of expert surveys, and detects out-of-sample text snippets correctly. PopBERT enables dynamic analyses of how German-speaking politicians and parties use populist language as a strategic device. Furthermore, the annotator-level data may also be applied in cross-domain applications or to develop related classifiers.
The economic shock of the Covid-19 crisis has disproportionately impacted small businesses and the self-employed. Around the globe, their survival during the pandemic often relied heavily on government assistance. This article explores how economic relief to business is understood through the lens of deservingness in the public. It examines the case of Germany, where the government has responded to the pandemic by implementing an extensive support programme. Notably, in this context, the self-employed are typically outsiders to the state insurance system. Combining computational social science methods and a qualitative analysis, the article focuses on the debate about direct subsidies on the social media platform Twitter/X between March 2020 and June 2021. It traces variation in the patterns of claim making in what is a rich debate about pandemic state support, finding that this discourse is characterised by the concern that economic relief threatens to blur existing boundaries of worth in society. The reciprocity principle of deservingness theory is pivotal in asserting business identities in times of crisis, yet it also reveals a fundamentally ambiguous relationship with the principle of need. Additionally, the claim of justice-as-redress, as a novel dimension of reciprocity, surfaces as an important theme in this debate.
Several disciplines, such as economics, law, and political science, emphasize the importance of legislative quality, namely well-written legislation. Low-quality legislation cannot be easily implemented because the texts create interpretation problems. To measure the quality of legal texts, we use information from the syntactic and lexical features of their language and apply these measures to a dataset of European Union legislation that contains detailed information on its transposition and decision-making process. We find that syntactic complexity and vagueness are negatively related to member states’ compliance with legislation. The finding on vagueness is robust to controlling for member states’ preferences, administrative resources, length of texts, and discretion. However, the results for syntactic complexity are less robust.
Used by politicians, journalists, and citizens, Twitter has been the most important social media platform to investigate political phenomena such as hate speech, polarization, or terrorism for over a decade. A high proportion of Twitter studies of emotionally charged or controversial content limit their ability to replicate findings due to incomplete Twitter-related replication data and the inability to recrawl their datasets entirely. This paper shows that these Twitter studies and their findings are considerably affected by nonrandom tweet mortality and data access restrictions imposed by the platform. While sensitive datasets suffer a notably higher removal rate than nonsensitive datasets, attempting to replicate key findings of Kim’s (2023, Political Science Research and Methods 11, 673–695) influential study on the content of violent tweets leads to significantly different results. The results highlight that access to complete replication data is particularly important in light of dynamically changing social media research conditions. Thus, the study raises concerns and potential solutions about the broader implications of nonrandom tweet mortality for future social media research on Twitter and similar platforms.
The influence of congressional primary elections on candidate positioning remains disputed and poorly understood. We test whether candidates communicate artificially “extreme” positions during the nomination, as revealed by moderation following a primary defeat. We apply a scaling method based on candidates language on Twitter to estimate positions of 988 candidates in contested US House of Representatives primaries in 2020 over time, demonstrating validity against NOMINATE (r > 0.93) where possible. Losing Democratic candidates moderated significantly after their primary defeat, indicating strategic position-taking for perceived electoral benefit, where the nomination contest induced artificially “extreme” communication. We find no such effect among Republicans. These findings have implications for candidate strategy in two-stage elections and provide further evidence of elite partisan asymmetry.