Skip to main content
×
Home
    • Aa
    • Aa

Validating Estimates of Latent Traits from Textual Data Using Human Judgment as a Benchmark

  • Will Lowe (a1) and Kenneth Benoit (a2)
Abstract

Automated and statistical methods for estimating latent political traits and classes from textual data hold great promise, because virtually every political act involves the production of text. Statistical models of natural language features, however, are heavily laden with unrealistic assumptions about the process that generates these data, including the stochastic process of text generation, the functional link between political variables and observed text, and the nature of the variables (and dimensions) on which observed text should be conditioned. While acknowledging statistical models of latent traits to be “wrong,” political scientists nonetheless treat their results as sufficiently valid to be useful. In this article, we address the issue of substantive validity in the face of potential model failure, in the context of unsupervised scaling methods of latent traits. We critically examine one popular parametric measurement model of latent traits for text and then compare its results to systematic human judgments of the texts as a benchmark for validity.

Copyright
Corresponding author
e-mail: kbenoit@lse.ac.uk (corresponding author)
Footnotes
Hide All

Authors' note: Replication materials for this article are available from the Political Analysis dataverse at http://hdl.handle.net/1902.1/20387. Supplementary materials for this article are available on the Political Analysis Web site.

Footnotes
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

K. Benoit , and M. Laver 2012. The dimensionality of political space: Epistemological and methodological considerations. European Union Politics 13: 194218.

K. Benoit , M. Laver , and S. Mikhaylov 2009. Treating words as data with error: Uncertainty in text statements of policy positions. American Journal of Political Science 53(2): 495513.

A. C. Cameron , and P. K. Trivedi 1998. Regression analysis of count data. Cambridge, UK: Cambridge University Press.

A. C. Cameron , and P. K. Trivedi 1998. Regression analysis of count data. Cambridge, UK: Cambridge University Press.

E. Carlstein 1986. The use of subseries methods for estimating the variance of a general statistic from a stationary time series. Annals of Statistics 14: 1171–79.

E. Carlstein 1986. The use of subseries methods for estimating the variance of a general statistic from a stationary time series. Annals of Statistics 14: 1171–79.

K. Church , and W. Gale 1995. Poisson mixtures. Natural Language Engineering 1: 163–90.

K. Church , and W. Gale 1995. Poisson mixtures. Natural Language Engineering 1: 163–90.

A. C. Davison , and D. V. Hinkley 1997. Bootstrap methods and their application. Cambridge, UK: Cambridge University Press.

D. Firth 2005. Bradley-Terry models in R. Journal of Statistical Software 12: 112.

L. A. Goodman 1979. Simple models for the analysis of association in cross-classifications having ordered categories. Journal of the American Statistical Association 74(367): 537–52.

M. Greenacre 2007. Correspondence analysis in practice, 2nd ed. London: Chapman and Hall.

M. Greenacre 2007. Correspondence analysis in practice, 2nd ed. London: Chapman and Hall.

J. Grimmer , and G. King 2011. General purpose computer-assisted clustering and conceptualization. Proceedings of the National Academy of Sciences 108(7): 2643–50.

J. Grimmer , and B. M. Stewart 2013. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21(3): 267–97.

H. R. Künsch 1989. The jackknife and the bootstrap for general stationary observations. Annals of Statistics 17: 1217–41.

M. Laver , K. Benoit , and J. Garry 2003. Estimating the policy positions of political actors using words as data. American Political Science Review 97: 311–31.

C. D. Manning , P. Raghavan , and H. Schütze 2008. Introduction to information retrieval. Cambridge, UK: Cambridge University Press.

S. Mikhaylov , M. Laver , and K. Benoit 2012. Coder reliability and misclassification in the human coding of party manifestos. Political Analysis 20: 7891.

B. L. Monroe , K. M. Quinn , and M. P. Colaresi 2008. Fightin' words: Lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis 16: 372403.

S.-O. Proksch , and J. B. Slapin 2010. Position taking in the European Parliament speeches. British Journal of Political Science 40(3): 587611.

J. B. Slapin , and S.-O. Proksch 2008. A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52(3): 705–22.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×
MathJax
Type Description Title
PDF
Supplementary Materials

Lowe and Benoit supplementary material
Supplementary Material

 PDF (316 KB)
316 KB

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 41 *
Loading metrics...

Abstract views

Total abstract views: 215 *
Loading metrics...

* Views captured on Cambridge Core between 4th January 2017 - 20th September 2017. This data will be updated every 24 hours.