Skip to main content Accessibility help
×
Home
Hostname: page-component-544b6db54f-4nk8m Total loading time: 0.349 Render date: 2021-10-16T00:52:22.378Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": true, "newCiteModal": false, "newCitedByModal": true, "newEcommerce": true, "newUsageEvents": true }

Clause Analysis: Using Syntactic Information to Automatically Extract Source, Subject, and Predicate from Texts with an Application to the 2008–2009 Gaza War

Published online by Cambridge University Press:  01 March 2017

Wouter van Atteveldt*
Affiliation:
Department of Communication Science, VU University Amsterdam, The Netherlands. Email: wouter@vanatteveldt.com
Tamir Sheafer
Affiliation:
Department of Political Science and Department of Communication, The Hebrew University of Jerusalem, Israel
Shaul R. Shenhav
Affiliation:
Department of Political Science, The Hebrew University of Jerusalem, Israel
Yair Fogel-Dror
Affiliation:
Department of Political Science, The Hebrew University of Jerusalem, Israel

Abstract

This article presents a new method and open source R package that uses syntactic information to automatically extract source–subject–predicate clauses. This improves on frequency-based text analysis methods by dividing text into predicates with an identified subject and optional source, extracting the statements and actions of (political) actors as mentioned in the text. The content of these predicates can be analyzed using existing frequency-based methods, allowing for the analysis of actions, issue positions and framing by different actors within a single text. We show that a small set of syntactic patterns can extract clauses and identify quotes with good accuracy, significantly outperforming a baseline system based on word order. Taking the 2008–2009 Gaza war as an example, we further show how corpus comparison and semantic network analysis applied to the results of the clause analysis can show differences in citation and framing patterns between U.S. and English-language Chinese coverage of this war.

Type
Articles
Copyright
Copyright © The Author(s) 2017. Published by Cambridge University Press on behalf of the Society for Political Methodology. 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Authors’ note: The research was partly supported by the Israel Science Foundation and the Ministry of Science, Technology, and Space, Israel. The data and R scripts for replicating the validation and substantive analyses are published in the Harvard Dataverse (Van Atteveldt, Sheafer, Shenhav, and Fogel-Dror 2016).

Contributing Editor: R. Michael Alvarez

References

Baker, C., Fillmore, C., and Cronin, B.. 2003. The structure of the framenet database. International Journal of Lexicography 16(3):281296.CrossRefGoogle Scholar
Blei, D. M., Ng, A. Y., and Jordan, M. I.. 2003. Latent dirichlet allocation. The Journal of Machine Learning Research 3:9931022.Google Scholar
Carreras, X., and Màrquez, L.. 2005. Introduction to the conll-2005 shared task: semantic role labeling. In Proceedings of the ninth conference on computational natural language learning . Stroudsburg, PA: Association for Computational Linguistics, pp. 152164.CrossRefGoogle Scholar
Chen, D., Schneider, N., Das, D., and Smith, N. A.. 2010. SEMAFOR: frame argument resolution with log-linear models. In Proceedings of the 5th international workshop on semantic evaluation . Stroudsburg, PA: Association for Computational Linguistics, pp. 264267.Google Scholar
Collingwood, L., and Wilkerson, J.. 2012. Tradeoffs in accuracy and efficiency in supervised learning methods. Journal of Information Technology & Politics 9(3):298318.CrossRefGoogle Scholar
De Marneffe, M., MacCartney, B., and Manning, C.. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of LREC , vol. 6, pp. 449454.Google Scholar
D’Orazio, V., Landis, S. T., Palmer, G., and Schrodt, P.. 2014. Separating the wheat from the chaff: applications of automated document classification using support vector machines. Political Analysis 22(2):224242.CrossRefGoogle Scholar
Entman, R. M. 2008. Theorizing mediated public diplomacy: The U.S. case. International Journal of Press/Politics 13:87102.CrossRefGoogle Scholar
Fellbaum, C., ed. 1998. WordNet: an electronic lexical database. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Fogel-Dror, Y., Sheafer, T., Shenhav, S. R., and Van Atteveldt, W.. 2015. Real-time sentiment analysis in the context of a political conflict. In Annual meeting of the American political science association San-Francisco, CA .Google Scholar
Grimmer, J. 2010. A bayesian hierarchical topic model for political texts: measuring expressed agendas in senate press releases. Political Analysis 18(1):135.CrossRefGoogle Scholar
Grimmer, J., and Stewart, B. M.. 2013. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21:267297.CrossRefGoogle Scholar
Grossman, D. A., and Frieder, O.. 2012. Information retrieval: algorithms and heuristics , vol. 15, Springer Science & Business Media.Google Scholar
Hillard, D., Purpura, S., and Wilkerson, J.. 2008. Computer assisted topic classification for mixed methods social science research. Journal of Information Technology and Politics 4(4):3164.CrossRefGoogle Scholar
Kellstedt, P. M. 2003. The mass media and the dynamics of American racial attitudes . New York: Cambridge University Press.CrossRefGoogle Scholar
Laver, M., Benoit, K., and Garry, J.. 2003. Extracting policy positions from political texts using words as data. American Political Science Review 97(2):311331.CrossRefGoogle Scholar
Lowe, W., and Benoit, K.. 2013. Validating estimates of latent traits from textual data using human judgment as a benchmark. Political Analysis 21(3):298313.CrossRefGoogle Scholar
Miller, G. 1995. WordNet: a lexical database for English . New York: ACM Press.Google Scholar
Monroe, B. L., Colaresi, M. P., and Quinn, K. M.. 2008. Fightin’words: lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis 16(4):372403.CrossRefGoogle Scholar
Quinn, K. M., Monroe, B. L., Colaresi, M., Crespin, M. H., and Radev, D. R.. 2010. How to analyze political attention with minimal assumptions and costs. American Journal of Political Science 54(1):209228.CrossRefGoogle Scholar
Roberts, M. E. 2015. Introduction to the virtual issue: recent innovations in text analysis for social science. Political Analysis 23:254277.Google Scholar
Ruigrok, N., and Van Atteveldt, W.. 2007. Global angling with a local angle: How U.S., British, and Dutch newspapers frame global and local terrorist attacks. The Harvard International Journal of Press/Politics 12:6890.CrossRefGoogle Scholar
Schrodt, P. A.2014. TABARI: textual analysis by augmented replacement instructions, version 0.8.4b3; http://eventdata.parusanalytics.com/software.dir/tabari.html.Google Scholar
Schrodt, P. A., and Gerner, D. J.. 1994. Validity assessment of a machine-coded event data set for the Middle East, 1982–1992. American Journal of Political Science 38(3):825854.CrossRefGoogle Scholar
Schrodt, P. A., and Gerner, D. J.. 2000. Cluster-based early warning indicators for political change in the contemporary levant. American Political Science Review 94(4):803818.CrossRefGoogle Scholar
Schrodt, P. A., Gerner, D. J., and Yilmaz, O.. 2005. Using event data to monitor contemporary conflict in the Israel-Palestine dyad. International Studies Perspectives 6(2):235251.CrossRefGoogle Scholar
Sebastiani, F. 2002. Machine learning in automated text categorization. ACM Computing Surveys 34(1):147.CrossRefGoogle Scholar
Sheafer, T., and Shenhav, S. R.. 2010. Mediated public diplomacy in a new era of warfare. The Communication Review 12:272283.CrossRefGoogle Scholar
Sheafer, T., Shenhav, S. R., Takens, J., and Van Atteveldt, W.. 2014. Relative political and value proximity in mediated public diplomacy: the effect of state-level homophily on international frame building. Political Communication 31(1):149167.CrossRefGoogle Scholar
Slapin, J. B., and Proksch, S.-O.. 2008. A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52(3):705722.CrossRefGoogle Scholar
Stone, P. J., Dunphy, D. C., Smith, M. S., and Ogilvie, D. M. et al. . 1966. The General Inquirer: a computer approach to content analysis . Cambridge, MA: MIT Press.Google Scholar
Van Atteveldt, W. 2008. Semantic network analysis: techniques for extracting, representing, and querying media content (dissertation) . Charleston, SC: BookSurge.Google Scholar
Van Atteveldt, W.2013. News media: platform or power broker? A study of political quotes in newspaper content using syntactic analysis. Presented at the New Directions in Analyzing Text as Data workshop, LSE, 27–28 September.Google Scholar
Van Atteveldt, W., Kleinnijenhuis, J., and Ruigrok, N.. 2008. Parsing, semantic networks, and political authority: using syntactic analysis to extract semantic relations from Dutch newspaper articles. Political Analysis 16(4):428446.CrossRefGoogle Scholar
Van Atteveldt, W., Sheafer, T., Shenhav, S. R., and Fogel-Dror, Y.. 2016. Replication data for: clause analysis: using syntactic information to automatically extract source, subject, and predicate from texts with an application to the 2008–2009 Gaza War. doi:107910/DVN/DZZXAD, Harvard Dataverse, V1 [UNF:6:IdSlgh3RYlPHO1Hq0pCahQ==].CrossRefGoogle Scholar
Young, L., and Soroka, S.. 2012. Affective news: the automated coding of sentiment in political texts. Political Communication 29(2):205231.CrossRefGoogle Scholar
Supplementary material: File

van Atteveldt supplementary material

van Atteveldt supplementary material

Download van Atteveldt supplementary material(File)
File 166 KB
11
Cited by

Send article to Kindle

To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Clause Analysis: Using Syntactic Information to Automatically Extract Source, Subject, and Predicate from Texts with an Application to the 2008–2009 Gaza War
Available formats
×

Send article to Dropbox

To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

Clause Analysis: Using Syntactic Information to Automatically Extract Source, Subject, and Predicate from Texts with an Application to the 2008–2009 Gaza War
Available formats
×

Send article to Google Drive

To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

Clause Analysis: Using Syntactic Information to Automatically Extract Source, Subject, and Predicate from Texts with an Application to the 2008–2009 Gaza War
Available formats
×
×

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *