Published online by Cambridge University Press: 05 September 2017
Scholars are increasingly utilizing online workforces to encode latent political concepts embedded in written or spoken records. In this letter, we build on past efforts by developing and validating a crowdsourced pairwise comparison framework for encoding political texts that combines the human ability to understand natural language with the ability of computers to aggregate data into reliable measures while ameliorating concerns about the biases and unreliability of non-expert human coders. We validate the method with advertisements for U.S. Senate candidates and with State Department reports on human rights. The framework we present is very general, and we provide free software to help applied researchers interact easily with online workforces to extract meaningful measures from texts.
We thank Burt Monroe, John Freeman, and Brandon Stewart for providing comments on a previous version of this paper. We are indebted to Ryden Butler, Dominic Jarkey, Jon Rogowski, Erin Rossiter, and Michelle Torres for their assistance with this project. We particularly wish to thank Matt Dickenson for his programming assistance. We also appreciate the assistance in the R package development from David Flasterstein, Joseph Ludmir, and Taishi Muraoka. We are grateful for the financial support provided by the Weidenbaum Center on the Economy, Government, and Public Policy. Finally, we wish to thank the partner-workers at Amazon’s Mechanical Turk who make this research possible.
Full text views reflects PDF downloads, PDFs sent to Google Drive, Dropbox and Kindle and HTML full text views.