Natural Sentences as Valid Units for Coded Political Texts

Thomas Däubler; Kenneth Benoit; Slava Mikhaylov; Michael Laver

doi:10.1017/S0007123412000105

Natural Sentences as Valid Units for Coded Political Texts

Published online by Cambridge University Press: 03 May 2012

Slava Mikhaylov and

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

An abstract is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Information

Type: Notes and Comments
Information: British Journal of Political Science , Volume 42 , Issue 4 , October 2012 , pp. 937 - 951

DOI: https://doi.org/10.1017/S0007123412000105 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Däubler (email: thomas.daeubler@mzes.uni-mannheim.de) is at the Mannheim Centre for European Social Research (MZES); Benoit is at the London School of Economics (Methodology Institute) and Trinity College Dublin (Political Science); Mikhaylov is at University College London (Department of Political Science), and Laver is at New York University (Department of Politics). The authors would like to thank Daniela Braun and Hermann Schmitt for valuable comments and kind permission to reproduce some of the experimental results discussed in Braun, Mikhaylov and Schmitt (2010). An earlier version of this Research Note was presented at the ECPR Joint Sessions of Workshops in St. Gallen, 2011, and the authors would also like to thank the participants in the workshop entitled ‘The how and why of party manifestos in new and established democracies’, for helpful feedback.

References

¹ Laver, Michael, Benoit, Kenneth and Garry, John, ‘Extracting Policy Positions from Texts Using Words as Data’, American Political Science Review, 97 (2003), 311–331CrossRef Google Scholar

Slapin, Jonathan and Proksch, Sven-Oliver, ‘A Scaling Model for Estimating Time-Series Party Positions from Texts’, American Journal of Political Science, 52 (2008), 705–722CrossRef Google Scholar

² Budge, Ian, Klingemann, Hans-Dieter, Volkens, Andrea, Bara, Judith, Tanenbaum, Eric, Fording, Richard C., Hearl, Derek J., Kim, Hee Min, McDonald, Michael D. and Mendes, Silvia M., Mapping Policy Preferences: Estimates for Parties, Electors, and Governments, 1945–1998 (Oxford: Oxford University Press, 2001)CrossRef Google Scholar

Klingemann, Hans-Dieter, Volkens, Andrea, Bara, Judith, Budge, Ian and McDonald, Michael, Mapping Policy Preferences II : Estimates for Parties, Electors, and Governments in Eastern Europe, European Union, and OECD 1990–2003 (Oxford: Oxford University Press, 2006)CrossRef Google Scholar

Baumgartner, Frank R., Green-Pedersen, Christoffer and Jones, Bryan D., Comparative Studies of Policy Agendas (London: Routledge, 2007)Google Scholar

³ Krippendorff, Klaus, Content Analysis: An Introduction to its Methodology (Thousand Oaks, Calif.: Sage, 2004)Google Scholar

⁴ By contrast, the second basic data-generating step, in which each text unit is coded by assigning to it a category from the coding scheme, is always endogenous to the text, and indeed forms the core part of the content analysis exercise.

⁵ Monroe, Burt L. and Maeda, Ko, Rhetorical Ideal Point Estimation: Mapping Legislative Speech (Palo Alto, Calif.: Stanford University: Society for Political Methodology, 2004)Google Scholar

Monroe, Burt, Colaresi, Michael and Quinn, Kevin M., ‘Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict’, Political Analysis, 16 (2008), 372–403CrossRef Google Scholar

Hopkins, Daniel J. and King, Gary, ‘A Method of Automated Nonparametric Content Analysis for Social Science’, American Journal of Political Science, 54 (2009), 229–247CrossRef Google Scholar

⁶ Details for those unfamiliar with the Jeopardy game show format, or with Watson and DeepQA, can be found at http://www-03.ibm.com/innovation/us/watson/.

⁷ Krippendorff, Content Analysis: An Introduction to its Methodology, p. 212.

⁸ Laver, Michael and Garry, John, ‘Estimating Policy Positions from Political Texts’, American Journal of Political Science, 44 (2000), 619–634CrossRef Google Scholar

⁹ See, for instance, Klingemann et al., Mapping Policy Preferences II, chaps 4–6.

¹⁰ Budge et al., Mapping Policy Preferences; Baumgartner, Green-Pedersen and Jones, Comparative Studies of Policy Agendas.

¹¹ Andrea Volkens, Manifesto Coding Instructions (2nd revised edn), Wissenschaftszentrum Berlin, Discussion Paper FS III 02-201 (2001), p. 96.

¹² Krippendorff, Content Analysis.

¹³ To return to the Australian 2001 National party example cited earlier, we also observe the following natural sentence: ‘There is no argument about the need for production sustainability and its matching twin, environmental sustainability.’ In this case, the coder deemed this a single QS and coded it as 501 (Environmental Protection: Positive), even though it could plausibly have been seen as comprising two QSs, divided by the ‘and’, with the first coded to 410 (Productivity: Positive) and the second to 501.

¹⁴ In the test results, coders with especially bad first round results had these corrected, and were asked to repeat the experiment. Here we report only the second-round unitization results for coders asked to repeat the test. While these results are not a decisive experiment, given that it is part of a training process of new coders, they are the single largest test of multiple unitizations of a manifesto text available. We thank Andrea Volkens for sharing this data with us.

¹⁵ Krippendorff, Content Analysis.

¹⁶ Laver, Michaeled., Estimating the Policy Positions of Political Actors (New York: Routledge, 2001), pp. 149–161Google Scholar

¹⁷ We are assuming here that differences at the unit level are not the quantity of interest, and that the objective of any unit-based coding exercise is to yield aggregate measures of political content.

¹⁸ To make the identification of natural sentences as unambiguous as possible, with a view to eventually automating this stage completely, we developed a very explicit set of guidelines as to how to identify a natural sentence. A natural sentence delimiter was defined as the following characters: ‘.’, ‘?’, ‘!’, and ‘;’. Bullet-pointed sentence fragments were also defined to be ‘natural’ sentences, even if not ending in one of the five previously declared delimiters. A full set of the coding instructions we issued to coders (ourselves) is available upon request.

¹⁹ Mikhaylov, Slava, Laver, Michael and Benoit, Kenneth, ‘Coder Reliability and Misclassification in the Human Coding of Party Manifestos’, Political Analysis, 20 (2012), 78–91CrossRef Google Scholar

²⁰ To be precise, the number of categories is 57 since it includes ‘uncoded’ as a further category, as is the case in the published CMP data. We did not use the four-digit codes that apply to post-communist countries but aggregated them to their respective three-digit category. However, this affected only 7 out of the total 8,481 (0.08%) QSs. In addition, 24 QS codes (0.28%) could not be identified from the documents since they were not legible.

²¹ The emphasis here is on ‘reconstructed’: we did not ensure that every category percentage from the QS we recorded perfectly matched those reported in the CMP's dataset. An exact replication is not possible, for instance because it appears (not that rarely) that the number of codes on the margins does not correspond to the number of units separated by tick marks (if they are used at all). While we did check that we matched the published figures to a very high degree, a perfect matching is unnecessary since our comparison focuses on units within texts.

²² Lowe, Will, Benoit, Kenneth, Mikhaylov, Slava and Laver, Michael, ‘Scaling Policy Preferences from Coded Political Texts’, Legislative Studies Quarterly, 36 (2011), 123–155CrossRef Google Scholar

²³ Daniela Braun, Slava Mikhaylov and Hermann Schmitt, ‘Computer-Assisted Human Coding: Experimental Tests of Reliability’ (paper presented at ‘Political Parties and Comparative Policy Agendas’, an ESF Workshop on Political Parties and their Positions, and Policy Agendas, University of Manchester, 2010).

²⁴ Mikhaylov, Slava, Laver, Michael and Benoit, Kenneth, ‘Coder Reliability and Misclassification in the Human Coding of Party Manifestos’, Political Analysis, 20 (2012), 78–91CrossRef Google Scholar

²⁵ The excerpt of the 1999 British Liberal Democrats Euromanifesto used in the experiment consists of 83 natural sentences. The Euromanifestos Project previously used this excerpt as the training document and declared it to consist of 112 QSs.

²⁶ Budge et al., Mapping Policy Preferences.

²⁷ As part of the coding procedure, coders coded policy domains (the seven categories defined by the first digit of the CMP code) and coding categories sequentially.

²⁸ Braun, Mikhaylov and Schmitt, ‘Computer-Assisted Human Coding’.

²⁹ Fleiss, Joseph L., ‘Measuring Nominal Scale Agreement among Many Raters’, Psychological Bulletin, 76 (1971), 378–383CrossRef Google Scholar

Fleiss, Joseph L., Levin, Bruce A. and Paik, Myunghee Cho, Statistical Methods for Rates and Proportions (Hoboken, N.J.: Wiley, 2003)CrossRef Google Scholar

³⁰ Roberts, Chris, ‘Modelling Patterns of Agreement for Nominal Scales’, Statistics in Medicine, 27 (2008), 810–830CrossRef Google Scholar PubMed

³¹ Jacob Cohen (‘A Coefficient of Agreement for Nominal Scales’, Educational and Psychological Measurement, 20 (1960), 37–46)CrossRef Google Scholar

Hayes, Andrew F. and Krippendorff, Klaus, ‘Answering the Call for a Standard Reliability Measure for Coding Data’, Communication Methods and Measures, 1 (2007), 77–89CrossRef Google Scholar

³² Landis, J. Richard and Koch, Gary G., ‘The Measurement of Observer Agreement for Categorical Data’, Biometrics, 33 (1977), 159–174CrossRef Google Scholar PubMed

³³ Mikhaylov, Laver and Benoit, ‘Coder Reliability and Misclassification in the Human Coding of Party Manifestos’.

Article contents

Natural Sentences as Valid Units for Coded Political Texts

Abstract

Information

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests