Hostname: page-component-848d4c4894-xm8r8 Total loading time: 0 Render date: 2024-06-14T18:12:28.815Z Has data issue: false hasContentIssue false

Clause Analysis: Using Syntactic Information to Automatically Extract Source, Subject, and Predicate from Texts with an Application to the 2008–2009 Gaza War

Published online by Cambridge University Press:  01 March 2017

Wouter van Atteveldt*
Department of Communication Science, VU University Amsterdam, The Netherlands. Email:
Tamir Sheafer
Department of Political Science and Department of Communication, The Hebrew University of Jerusalem, Israel
Shaul R. Shenhav
Department of Political Science, The Hebrew University of Jerusalem, Israel
Yair Fogel-Dror
Department of Political Science, The Hebrew University of Jerusalem, Israel


This article presents a new method and open source R package that uses syntactic information to automatically extract source–subject–predicate clauses. This improves on frequency-based text analysis methods by dividing text into predicates with an identified subject and optional source, extracting the statements and actions of (political) actors as mentioned in the text. The content of these predicates can be analyzed using existing frequency-based methods, allowing for the analysis of actions, issue positions and framing by different actors within a single text. We show that a small set of syntactic patterns can extract clauses and identify quotes with good accuracy, significantly outperforming a baseline system based on word order. Taking the 2008–2009 Gaza war as an example, we further show how corpus comparison and semantic network analysis applied to the results of the clause analysis can show differences in citation and framing patterns between U.S. and English-language Chinese coverage of this war.

Copyright © The Author(s) 2017. Published by Cambridge University Press on behalf of the Society for Political Methodology. 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Authors’ note: The research was partly supported by the Israel Science Foundation and the Ministry of Science, Technology, and Space, Israel. The data and R scripts for replicating the validation and substantive analyses are published in the Harvard Dataverse (Van Atteveldt, Sheafer, Shenhav, and Fogel-Dror 2016).

Contributing Editor: R. Michael Alvarez


Baker, C., Fillmore, C., and Cronin, B.. 2003. The structure of the framenet database. International Journal of Lexicography 16(3):281296.Google Scholar
Blei, D. M., Ng, A. Y., and Jordan, M. I.. 2003. Latent dirichlet allocation. The Journal of Machine Learning Research 3:9931022.Google Scholar
Carreras, X., and Màrquez, L.. 2005. Introduction to the conll-2005 shared task: semantic role labeling. In Proceedings of the ninth conference on computational natural language learning . Stroudsburg, PA: Association for Computational Linguistics, pp. 152164.Google Scholar
Chen, D., Schneider, N., Das, D., and Smith, N. A.. 2010. SEMAFOR: frame argument resolution with log-linear models. In Proceedings of the 5th international workshop on semantic evaluation . Stroudsburg, PA: Association for Computational Linguistics, pp. 264267.Google Scholar
Collingwood, L., and Wilkerson, J.. 2012. Tradeoffs in accuracy and efficiency in supervised learning methods. Journal of Information Technology & Politics 9(3):298318.Google Scholar
De Marneffe, M., MacCartney, B., and Manning, C.. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of LREC , vol. 6, pp. 449454.Google Scholar
D’Orazio, V., Landis, S. T., Palmer, G., and Schrodt, P.. 2014. Separating the wheat from the chaff: applications of automated document classification using support vector machines. Political Analysis 22(2):224242.Google Scholar
Entman, R. M. 2008. Theorizing mediated public diplomacy: The U.S. case. International Journal of Press/Politics 13:87102.Google Scholar
Fellbaum, C., ed. 1998. WordNet: an electronic lexical database. Cambridge, MA: MIT Press.Google Scholar
Fogel-Dror, Y., Sheafer, T., Shenhav, S. R., and Van Atteveldt, W.. 2015. Real-time sentiment analysis in the context of a political conflict. In Annual meeting of the American political science association San-Francisco, CA .Google Scholar
Grimmer, J. 2010. A bayesian hierarchical topic model for political texts: measuring expressed agendas in senate press releases. Political Analysis 18(1):135.Google Scholar
Grimmer, J., and Stewart, B. M.. 2013. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21:267297.Google Scholar
Grossman, D. A., and Frieder, O.. 2012. Information retrieval: algorithms and heuristics , vol. 15, Springer Science & Business Media.Google Scholar
Hillard, D., Purpura, S., and Wilkerson, J.. 2008. Computer assisted topic classification for mixed methods social science research. Journal of Information Technology and Politics 4(4):3164.Google Scholar
Kellstedt, P. M. 2003. The mass media and the dynamics of American racial attitudes . New York: Cambridge University Press.Google Scholar
Laver, M., Benoit, K., and Garry, J.. 2003. Extracting policy positions from political texts using words as data. American Political Science Review 97(2):311331.Google Scholar
Lowe, W., and Benoit, K.. 2013. Validating estimates of latent traits from textual data using human judgment as a benchmark. Political Analysis 21(3):298313.Google Scholar
Miller, G. 1995. WordNet: a lexical database for English . New York: ACM Press.Google Scholar
Monroe, B. L., Colaresi, M. P., and Quinn, K. M.. 2008. Fightin’words: lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis 16(4):372403.Google Scholar
Quinn, K. M., Monroe, B. L., Colaresi, M., Crespin, M. H., and Radev, D. R.. 2010. How to analyze political attention with minimal assumptions and costs. American Journal of Political Science 54(1):209228.Google Scholar
Roberts, M. E. 2015. Introduction to the virtual issue: recent innovations in text analysis for social science. Political Analysis 23:254277.Google Scholar
Ruigrok, N., and Van Atteveldt, W.. 2007. Global angling with a local angle: How U.S., British, and Dutch newspapers frame global and local terrorist attacks. The Harvard International Journal of Press/Politics 12:6890.Google Scholar
Schrodt, P. A.2014. TABARI: textual analysis by augmented replacement instructions, version 0.8.4b3; Scholar
Schrodt, P. A., and Gerner, D. J.. 1994. Validity assessment of a machine-coded event data set for the Middle East, 1982–1992. American Journal of Political Science 38(3):825854.Google Scholar
Schrodt, P. A., and Gerner, D. J.. 2000. Cluster-based early warning indicators for political change in the contemporary levant. American Political Science Review 94(4):803818.Google Scholar
Schrodt, P. A., Gerner, D. J., and Yilmaz, O.. 2005. Using event data to monitor contemporary conflict in the Israel-Palestine dyad. International Studies Perspectives 6(2):235251.Google Scholar
Sebastiani, F. 2002. Machine learning in automated text categorization. ACM Computing Surveys 34(1):147.Google Scholar
Sheafer, T., and Shenhav, S. R.. 2010. Mediated public diplomacy in a new era of warfare. The Communication Review 12:272283.Google Scholar
Sheafer, T., Shenhav, S. R., Takens, J., and Van Atteveldt, W.. 2014. Relative political and value proximity in mediated public diplomacy: the effect of state-level homophily on international frame building. Political Communication 31(1):149167.Google Scholar
Slapin, J. B., and Proksch, S.-O.. 2008. A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52(3):705722.Google Scholar
Stone, P. J., Dunphy, D. C., Smith, M. S., and Ogilvie, D. M. et al. . 1966. The General Inquirer: a computer approach to content analysis . Cambridge, MA: MIT Press.Google Scholar
Van Atteveldt, W. 2008. Semantic network analysis: techniques for extracting, representing, and querying media content (dissertation) . Charleston, SC: BookSurge.Google Scholar
Van Atteveldt, W.2013. News media: platform or power broker? A study of political quotes in newspaper content using syntactic analysis. Presented at the New Directions in Analyzing Text as Data workshop, LSE, 27–28 September.Google Scholar
Van Atteveldt, W., Kleinnijenhuis, J., and Ruigrok, N.. 2008. Parsing, semantic networks, and political authority: using syntactic analysis to extract semantic relations from Dutch newspaper articles. Political Analysis 16(4):428446.Google Scholar
Van Atteveldt, W., Sheafer, T., Shenhav, S. R., and Fogel-Dror, Y.. 2016. Replication data for: clause analysis: using syntactic information to automatically extract source, subject, and predicate from texts with an application to the 2008–2009 Gaza War. doi:107910/DVN/DZZXAD, Harvard Dataverse, V1 [UNF:6:IdSlgh3RYlPHO1Hq0pCahQ==].Google Scholar
Young, L., and Soroka, S.. 2012. Affective news: the automated coding of sentiment in political texts. Political Communication 29(2):205231.Google Scholar
Supplementary material: File

van Atteveldt supplementary material

van Atteveldt supplementary material

Download van Atteveldt supplementary material(File)
File 166.3 KB