Skip to main content Accessibility help
×
Hostname: page-component-8448b6f56d-mp689 Total loading time: 0 Render date: 2024-04-23T21:09:31.363Z Has data issue: false hasContentIssue false

3 - Generating Political Event Data in Near Real Time: Opportunities and Challenges

from PART 1 - COMPUTATIONAL SOCIAL SCIENCE TOOLS

Published online by Cambridge University Press:  05 March 2016

John Beieler
Affiliation:
Pennsylvania State University, john.b30@gmail.com
Patrick T. Brandt
Affiliation:
University of Texas, Dallas, pbrandt@utdallas.edu
Andrew Halterman
Affiliation:
Caerus Associates, ahalterman0@gmail.com
Philip A. Schrodt
Affiliation:
Parus Analytical Systems, schrodt735@gmail.com
Erin M. Simpson
Affiliation:
Caerus Associates, emsimpson@gmail.com
R. Michael Alvarez
Affiliation:
California Institute of Technology
Get access

Summary

INTRODUCTION

Political event data are records of interactions among political actors using common codes for actors and actions, allowing for the aggregate analysis of political behaviors. These data include both material interactions between political entities and verbal statements. Such data are common in international relations, recording the spoken or direct actions between nation-states and other political entities. Event data can be generated through either human-coded or machinebased methods. Human-coded event data efforts continue to dominate research on global protests and social movements, although data sets in international relations have led the movement toward automated coding. While humans are better able to extract the meaning in sentences using background knowledge and innate abilities for dealing with complex grammatical constructions, human coding is dramatically more labor and time intensive than machinecoding approaches for anything but small or one-off data sets. Machine-coded methods can attain 70–80% accuracy when compared to a human-coded “gold standard,” which is comparable to, and in some cases exceeds, the intercoder reliability of human coding (King and Lowe, 2004). This makes the machine-coded methods quite scalable in terms of costs and time and thus attractive to academic, government, and private sector researchers.

King (2011) notes that the ability to code and process political texts to generate records like event data will be de rigueur in the later part of the 21st century. Machine-readable text about politics, including news reports, speeches, press conferences, and intelligence reports, are already the basis of many political analyses. The ever-increasing availability of such texts presents both opportunities and challenges because they are a form of “big data.” Even processing just the lead sentences of Reuters and Agence France-Presse (AFP) news reports for the Levant from 1979–2011 generates more than 140,000 distinct time-series records (http://eventdata.parusanalytics.com/data.dir/levant.html), and these sentences could also be processed as a much larger set of network relationships. One recent effort to expand event data collection outside of this geographical region – albeit without the event de-duplication found in most event data sets – has generated nearly a quarter of a billion records. Extrapolating from our coding experience with the Levant and our initial experiments with the EL:DIABLO coding system described later, we estimate that a data collection with duplication controls like that for the Levant data set will generate around 4,000 to 8,000 distinct records per day for the entire globe.

Type
Chapter
Information
Computational Social Science
Discovery and Prediction
, pp. 98 - 120
Publisher: Cambridge University Press
Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×