An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design

Gary King; Will Lowe

doi:10.1017/S0020818303573064

An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design

Published online by Cambridge University Press: 24 July 2003

Gary King and

Will Lowe

Article contents

Abstract
References

Get access

Abstract

Despite widespread recognition that aggregated summary statistics on international conflict and cooperation miss most of the complex interactions among nations, the vast majority of scholars continue to employ annual, quarterly, or (occasionally) monthly observations. Daily events data, coded from some of the huge volume of news stories produced by journalists, have not been used much for the past two decades. We offer some reason to change this practice, which we feel should lead to considerably increased use of these data. We address advances in event categorization schemes and software programs that automatically produce data by “reading” news stories without human coders. We design a method that makes it feasible, for the first time, to evaluate these programs when they are applied in areas with the particular characteristics of international conflict and cooperation data, namely event categories with highly unequal prevalences, and where rare events (such as highly conflictual actions) are of special interest. We use this rare events design to evaluate one existing program, and find it to be as good as trained human coders, but obviously far less expensive to use. For large-scale data collections, the program dominates human coding. Our new evaluative method should be of use in international relations, as well as more generally in the field of computational linguistics, for evaluating other automated information extraction tools. We believe that the data created by programs similar to the one we evaluated should see dramatically increased use in international relations research. To facilitate this process, we are releasing with this article data on 3.7 million international events, covering the entire world for the past decade.

Information

Type: Research Notes
Information: International Organization , Volume 57 , Issue 3 , Summer 2003 , pp. 617 - 642

DOI: https://doi.org/10.1017/S0020818303573064 [Opens in a new window]
Copyright: Copyright © The IO Foundation 2003

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Appelt, Douglas E., and Israel, David J.. 1999 Introduction to Information Extraction Technology. A Tutorial Prepared for IJCAI-99. Available at ⟨http://www.ai.sri.com/~appelt/ie-tutorial/IJCAI99.pdf⟩.Google Scholar

Azar, Edward E. 1982. Codebook of the Conflict and Peace Databank. College Park: Center for International Development, University of Maryland.Google Scholar

Bond, Doug, Jenkins, J. Craig, Taylor, Charles L., and Schock, Kurt. 1997. Mapping Mass Political Conflict and Civil Society: Issues and Prospects for the Automated Development of Events Data. Journal of Conflict Resolution 41 (4):554–79.CrossRef Google Scholar

Bond, Doug, Joe Bond, J. Craig Jenkins, Oh, Churl, and Taylor, Charles L.. 2001. Integrated Data for Events Analysis (IDEA): An Event Form Typology for Automated Events Data Development. Unpublished manuscript, Harvard University, Cambridge, Mass.Google Scholar

Breslow, Norman E. 1996. Statistics in Epidemiology: The Case-Control Study. Journal of the American Statistical Association 91 (433):14–28.CrossRef Google Scholar PubMed

Cowie, Jim, and Lehnert, Wendy. 1996. Information Extraction. Communications of the ACM 39 (1):80–91.CrossRef Google Scholar

Davies, John L., and McDaniel, Chad K.. 1994. A New Generation of International Event-Data. International Interactions 20 (1–2):55–78.CrossRef Google Scholar

Fellbaum, Christine, ed. 1998. WordNet. An Electronic Lexical Database. Cambridge, Mass.: MIT Press.CrossRef Google Scholar

Gerner, Deborah J., Schrodt, Philip A., Francisco, Ronald A., and Weddle, Judith L.. 1994. Machine Coding of Event Data Using Regional and International Sources. International Studies Quarterly 38 (1):91–119.CrossRef Google Scholar

Goldstein, Joshua S. A. 1992. Conflict-Cooperation Scale for WEIS Events Data. The Journal of Conflict Resolution 36 (2):369–85.CrossRef Google Scholar

Goldstein, Joshua S. A., and Pevehouse, Jon C.. 1997. Reciprocity, Bullying and International Conflict: Time-Series Analysis of the Bosnia Conflict. American Political Science Review 91 (3):515–29.CrossRef Google Scholar

Goldstein, Joshua S. A., Pevehouse, Jon C., Gerner, Deborah J., and Telhami, Shibley. 2001. Reciprocity, Tringularity, and Cooperation in the Middle East, 1979–97. Journal of Conflict Resolution 45 (5):594–620.CrossRef Google Scholar

Grishman, Ralph. 1997. Information Extraction: Techniques and Challenges. In Information Extraction. A Multidisciplinary Approach to an Emerging Information Technology, edited by Pazienza, Maria Teresa, 10–27. Berlin: Springer Verlag.CrossRef Google Scholar

Grishman, Ralph, and Sundheim, Beth. 1996. Message Understanding Conference 6: A Brief History. In Proceedings of the Sixteenth International Conference on Computational Linguistics (COLING-96), edited by Grishman, Ralph and Sundheim, Beth, 466–71. Copenhagen.CrossRef Google Scholar

Jurafsky, Daniel, and Martin, James H.. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, N.J.: Prentice Hall.Google Scholar

King, Gary, and Zeng, Langche. 2001a. Explaining Rare Events in International Relations. International Organization 55 (3):693–715.CrossRef Google Scholar

King, Gary, and Zeng, Langche. 2001b. Logistic Regression in Rare Events Data. Political Analysis 9 (1):137–63.CrossRef Google Scholar

King, Gary, and Zeng, Langche. 2002. Estimating Risk and Rate Levels, Ratios, and Differences in Case-Control Studies. Statistics in Medicine 21 (10):1409–27.CrossRef Google Scholar PubMed

Laurance, Edward J. 1990. Events Data and Policy Analysis: Improving the Potential for Applying Academic Research to Foreign and Defense Policy Problems. Policy Sciences 23 (2):111–32.Google Scholar

Leng, Russell J., and Singer, J. David. 1988. Militarized Interstate Crises: The BCOW Typology and its Applications. International Studies Quarterly 32 (2):155–73.CrossRef Google Scholar

McClelland, Charles A. 1978. World Event/Interaction Survey (WEIS) Project, 1966–1978. Ann Arbor, Mich.: Inter-University Consortium for Political and Social Research.Google Scholar

Merritt, Richard L. 1994. Measuring Events for International Political Analysis. International Interactions 20 (1–2):3–33.CrossRef Google Scholar

Most, Benjamin A., and Starr, Harvey. 1984. International Relations, Foreign Policy Substitutability, and “Nice” Laws. World Politics 36 (3):383–406.CrossRef Google Scholar

Rummell, Rudolph J. 1975. The Dimensions of Nations. Beverly Hills, Calif.: Sage.Google Scholar

Schrodt, Philip A. 1995. Event Data in Foreign Policy Analysis. In Foreign Policy Analysis: Continuity and Change in its Second Generation, edited by Neack, Laura, Haney, Patrick J., and Hay, Jean A. K., 145–66. Englewood Cliffs, N.J.: Prentice-Hall.Google Scholar

Schrodt, Philip A., and Gerner, Deborah J.. 1994. Validity Assessment of a Machine-Coded Event Data Set for the Middle East, 1982–92. American Journal of Political Science 38 (3):825–54.CrossRef Google Scholar

Schrodt, Philip A., and Gerner, Deborah J.. 2000. Cluster-Based Early Warning Indicators for Political Change in the Contemporary Levant. American Political Science Review 94 (4):803–18.CrossRef Google Scholar

Schrodt, Philip A., Davis, Shannon G., and Weddle, Judith L.. 1994. Political Science: KEDS—A Program for the Machine Coding of Event Data. Social Science Computer Review 12 (4):561–88.CrossRef Google Scholar

Sowa, John F. 1999. Knowledge Representation Logical, Philosophical and Computational Foundations. Pacific Grove, Calif.: Brooks Cole.Google Scholar

Sundheim, Beth. 1992. Overview of the Fourth Message Understanding Evaluation and Conference. In Proceedings of the Fourth Message Understanding Conference, edited by Sundheim, Beth, 3–22. San Mateo, Calif.: Morgan Kaufmann.Google Scholar

Sundheim, Beth, ed. 1991. Proceedings of the Third Message Understanding Conference. San Mateo, Calif.: Morgan Kaufmann.Google Scholar

Taylor, Charles Lewis, Joe Bond, Doug Bond, Jenkins, J. Craig, and Kuzucu, Zeynep Benderlioglu. 1999. Conflict-Cooperation for Interstate and Intrastate Interactions: An Expansion of the Goldstein Scale. Paper presented at the 40th Annual Convention of the International Studies Association, February, Washington, D.C. Available at ⟨http://www.ciaonet.org/isa/trc01/⟩.Google Scholar

Article contents

An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests