Skip to main content

Understanding Wordscores

  • Will Lowe (a1)

Wordscores is a widely used procedure for inferring policy positions, or scores, for new documents on the basis of scores for words derived from documents with known scores. It is computationally straightforward, requires no distributional assumptions, but has unresolved practical and theoretical problems. In applications, estimated document scores are on the wrong scale and the theoretical development does not specify a statistical model, so it is unclear what assumptions the method makes about political text and how to tell whether they fit particular text analysis applications. The first part of the paper demonstrates that badly scaled document score estimates reflect deeper problems with the method. The second part shows how to understand Wordscores as an approximation to correspondence analysis which itself approximates a statistical ideal point model for words. Problems with the method are identified with the conditions under which these layers of approximation fail to ensure consistent and unbiased estimation of the parameters of the ideal point model.

Corresponding author
Hide All

Author's note: I would like to thank Ken Benoit, Mik Laver, Cees van der Eijk, and Wijbrandt van Schuur for useful comments and discussion. The remaining errors are my own.

Hide All
Baker, F., and Kim, S. H. 2004. Item response theory. 2nd ed. New York: Wiley.
Bartholomew, D. J. 1984. Latent variable models and factor analysis. Vol. 40. London: Charles Griffin and Company Limited.
Beh, E. J. 2004. Simple correspondence analysis: A bibliographic review. International Statistical Review 72: 257–84.
Benoit, K., and Laver, M. 2003. Estimating Irish party positions using computer wordscoring: The 2002 elections. Irish Political Studies 17: 97107.
Benoit, K., and Laver, M. 2008. Compared to what? A comment on “A robust transformation procedure for interpreting political text” by Martin and Vanberg. Political Analysis 16: 101–11.
Benzécri, J.-P. 1992. Correspondence analysis handbook. New York: Marcel Dekker.
Clinton, J., Jackman, S., and Rivers, D. 2004. The statistical analysis of roll call voting: A unified approach. American Journal of Political Science 98: 355–70.
Elff, M. 2008. A spatial model of electoral platforms. Annual meeting of the Political Methodology Society, Ann Arbor, Michigan.
Enelow, J. M., and Hinich, M. J. 1984. The spatial theory of voting: An introduction. New York: Cambridge University Press.
Greenacre, M. J. 1993. Correspondence analysis in practice. London: Academic Press.
Hill, M. O. 1973. Reciprocal averaging: An eigenvector method of ordination. Journal of Ecology 61: 237–51.
Hill, M. O. 1974. Correspondence analysis: A neglected multivariate method. Applied Statistics 23: 340–54.
Jackman, S. 2001. Multidimensional analysis of roll call data via Bayesian simulation: Identification, estimation, inference and model checking. Political Analysis 9: 227–41.
Klemmensen, R., Hobolt, S. B., and Hansen, M. E. 2007. Estimating policy positions using political texts: An evaluation of the wordscores approach. Electoral Studies 26: 746–55.
Laver, M., Benoit, K., and Garry, J. 2003. Extracting policy positions from political texts using words as data. American Political Science Review 97: 311–31.
Lynn, H. S., and McCulloch, C. E. 2000. Using principal component analysis and correspondence analysis for estimation in latent variable models. Journal of the American Statistical Association 95: 561–72.
Mandelbrot, B. 1954. Structure formelle des textes et communication. Word 10: 127.
Monroe, B. L., and Maeda, K. 2004. Talk's cheap: Text-based estimation of rhetorical ideal points. Annual meeting of the Political Methodology Society. Stanford, CA.
Monroe, B., and Maeda, K. 2004. Talk's cheap: Text-based estimation of rhetorical ideal-points. POLMETH Working Paper.
Slapin, J. B., and Proksch, S.-O. 2008. A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52: 705–22.
ter Braak, C., and Prentice, I. C. 2004. A theory of gradient analysis. Advances in Ecological Research: Classic Papers 34: 235–82.
ter Braak, C. J. F. 1985. Correspondence analysis of incidence and abundance data: Properties in terms of a unimodal response model. Biometrics 41: 859–73.
ter Braak, C. J. F., and Looman, C. W. N. 1986. Weighted averaging, logistic regression and the Gaussian response model. Plant Ecology 65: 311.
Zipf, G. K. 1949. Human behavior and the principal of least effort. Reading, MA: Addison Wesley.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *