Skip to main content Accessibility help
×
Home

An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design

  • Gary King and Will Lowe

Abstract

Despite widespread recognition that aggregated summary statistics on international conflict and cooperation miss most of the complex interactions among nations, the vast majority of scholars continue to employ annual, quarterly, or (occasionally) monthly observations. Daily events data, coded from some of the huge volume of news stories produced by journalists, have not been used much for the past two decades. We offer some reason to change this practice, which we feel should lead to considerably increased use of these data. We address advances in event categorization schemes and software programs that automatically produce data by “reading” news stories without human coders. We design a method that makes it feasible, for the first time, to evaluate these programs when they are applied in areas with the particular characteristics of international conflict and cooperation data, namely event categories with highly unequal prevalences, and where rare events (such as highly conflictual actions) are of special interest. We use this rare events design to evaluate one existing program, and find it to be as good as trained human coders, but obviously far less expensive to use. For large-scale data collections, the program dominates human coding. Our new evaluative method should be of use in international relations, as well as more generally in the field of computational linguistics, for evaluating other automated information extraction tools. We believe that the data created by programs similar to the one we evaluated should see dramatically increased use in international relations research. To facilitate this process, we are releasing with this article data on 3.7 million international events, covering the entire world for the past decade.

Copyright

References

Hide All
Appelt, Douglas E., and Israel, David J.. 1999 Introduction to Information Extraction Technology. A Tutorial Prepared for IJCAI-99. Available at ⟨http://www.ai.sri.com/~appelt/ie-tutorial/IJCAI99.pdf⟩.
Azar, Edward E. 1982. Codebook of the Conflict and Peace Databank. College Park: Center for International Development, University of Maryland.
Bond, Doug, Jenkins, J. Craig, Taylor, Charles L., and Schock, Kurt. 1997. Mapping Mass Political Conflict and Civil Society: Issues and Prospects for the Automated Development of Events Data. Journal of Conflict Resolution 41 (4):554–79.
Bond, Doug, Joe Bond, J. Craig Jenkins, Oh, Churl, and Taylor, Charles L.. 2001. Integrated Data for Events Analysis (IDEA): An Event Form Typology for Automated Events Data Development. Unpublished manuscript, Harvard University, Cambridge, Mass.
Breslow, Norman E. 1996. Statistics in Epidemiology: The Case-Control Study. Journal of the American Statistical Association 91 (433):1428.
Cowie, Jim, and Lehnert, Wendy. 1996. Information Extraction. Communications of the ACM 39 (1):8091.
Davies, John L., and McDaniel, Chad K.. 1994. A New Generation of International Event-Data. International Interactions 20 (1–2):5578.
Fellbaum, Christine, ed. 1998. WordNet. An Electronic Lexical Database. Cambridge, Mass.: MIT Press.
Gerner, Deborah J., Schrodt, Philip A., Francisco, Ronald A., and Weddle, Judith L.. 1994. Machine Coding of Event Data Using Regional and International Sources. International Studies Quarterly 38 (1):91119.
Goldstein, Joshua S. A. 1992. Conflict-Cooperation Scale for WEIS Events Data. The Journal of Conflict Resolution 36 (2):369–85.
Goldstein, Joshua S. A., and Pevehouse, Jon C.. 1997. Reciprocity, Bullying and International Conflict: Time-Series Analysis of the Bosnia Conflict. American Political Science Review 91 (3):515–29.
Goldstein, Joshua S. A., Pevehouse, Jon C., Gerner, Deborah J., and Telhami, Shibley. 2001. Reciprocity, Tringularity, and Cooperation in the Middle East, 1979–97. Journal of Conflict Resolution 45 (5):594620.
Grishman, Ralph. 1997. Information Extraction: Techniques and Challenges. In Information Extraction. A Multidisciplinary Approach to an Emerging Information Technology, edited by Pazienza, Maria Teresa, 1027. Berlin: Springer Verlag.
Grishman, Ralph, and Sundheim, Beth. 1996. Message Understanding Conference 6: A Brief History. In Proceedings of the Sixteenth International Conference on Computational Linguistics (COLING-96), edited by Grishman, Ralph and Sundheim, Beth, 466–71. Copenhagen.
Jurafsky, Daniel, and Martin, James H.. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, N.J.: Prentice Hall.
King, Gary, and Zeng, Langche. 2001a. Explaining Rare Events in International Relations. International Organization 55 (3):693715.
King, Gary, and Zeng, Langche. 2001b. Logistic Regression in Rare Events Data. Political Analysis 9 (1):137–63.
King, Gary, and Zeng, Langche. 2002. Estimating Risk and Rate Levels, Ratios, and Differences in Case-Control Studies. Statistics in Medicine 21 (10):1409–27.
Laurance, Edward J. 1990. Events Data and Policy Analysis: Improving the Potential for Applying Academic Research to Foreign and Defense Policy Problems. Policy Sciences 23 (2):111–32.
Leng, Russell J., and Singer, J. David. 1988. Militarized Interstate Crises: The BCOW Typology and its Applications. International Studies Quarterly 32 (2):155–73.
McClelland, Charles A. 1978. World Event/Interaction Survey (WEIS) Project, 1966–1978. Ann Arbor, Mich.: Inter-University Consortium for Political and Social Research.
Merritt, Richard L. 1994. Measuring Events for International Political Analysis. International Interactions 20 (1–2):333.
Most, Benjamin A., and Starr, Harvey. 1984. International Relations, Foreign Policy Substitutability, and “Nice” Laws. World Politics 36 (3):383406.
Rummell, Rudolph J. 1975. The Dimensions of Nations. Beverly Hills, Calif.: Sage.
Schrodt, Philip A. 1995. Event Data in Foreign Policy Analysis. In Foreign Policy Analysis: Continuity and Change in its Second Generation, edited by Neack, Laura, Haney, Patrick J., and Hay, Jean A. K., 145–66. Englewood Cliffs, N.J.: Prentice-Hall.
Schrodt, Philip A., and Gerner, Deborah J.. 1994. Validity Assessment of a Machine-Coded Event Data Set for the Middle East, 1982–92. American Journal of Political Science 38 (3):825–54.
Schrodt, Philip A., and Gerner, Deborah J.. 2000. Cluster-Based Early Warning Indicators for Political Change in the Contemporary Levant. American Political Science Review 94 (4):803–18.
Schrodt, Philip A., Davis, Shannon G., and Weddle, Judith L.. 1994. Political Science: KEDS—A Program for the Machine Coding of Event Data. Social Science Computer Review 12 (4):561–88.
Sowa, John F. 1999. Knowledge Representation Logical, Philosophical and Computational Foundations. Pacific Grove, Calif.: Brooks Cole.
Sundheim, Beth. 1992. Overview of the Fourth Message Understanding Evaluation and Conference. In Proceedings of the Fourth Message Understanding Conference, edited by Sundheim, Beth, 322. San Mateo, Calif.: Morgan Kaufmann.
Sundheim, Beth, ed. 1991. Proceedings of the Third Message Understanding Conference. San Mateo, Calif.: Morgan Kaufmann.
Taylor, Charles Lewis, Joe Bond, Doug Bond, Jenkins, J. Craig, and Kuzucu, Zeynep Benderlioglu. 1999. Conflict-Cooperation for Interstate and Intrastate Interactions: An Expansion of the Goldstein Scale. Paper presented at the 40th Annual Convention of the International Studies Association, February, Washington, D.C. Available at ⟨http://www.ciaonet.org/isa/trc01/⟩.

An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design

  • Gary King and Will Lowe

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed