Skip to main content Accessibility help

Using Word Order in Political Text Classification with Long Short-term Memory Models

  • Charles Chang (a1) (a2) and Michael Masterson (a3)


Political scientists often wish to classify documents based on their content to measure variables, such as the ideology of political speeches or whether documents describe a Militarized Interstate Dispute. Simple classifiers often serve well in these tasks. However, if words occurring early in a document alter the meaning of words occurring later in the document, using a more complicated model that can incorporate these time-dependent relationships can increase classification accuracy. Long short-term memory (LSTM) models are a type of neural network model designed to work with data that contains time dependencies. We investigate the conditions under which these models are useful for political science text classification tasks with applications to Chinese social media posts as well as US newspaper articles. We also provide guidance for the use of LSTM models.


Corresponding author


Hide All

Contributing Editor: Daniel Hopkins



Hide All
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., and Devin, M. et al. . 2016. “Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems.” Preprint, arXiv:1603.04467.
Baum, M. A., Cohen, D. K., and Zhukov, Y. M.. 2018. “Does Rape Culture Predict Rape? Evidence from U.S. Newspapers, 2000–2013.” Quarterly Journal of Political Science (QJPS) 13(3):263289.
Beck, N., King, G., and Zeng, L.. 2000. “Improving Quantitative Studies of International Conflict: A Conjecture.” The American Political Science Review 94(1):2135.
Bengio, Y. 2009. “Learning Deep Architectures for AI.” Foundations and Trends in Machine Learning 2(1):1127.
Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., and Mikhaylov, S.. 2016. “Crowd-Sourced Text Analysis: Reproducible and Agile Production of Political Data.” American Political Science Review 110(2):278295.
Bird, S., Klein, E., and Loper, E.. 2009. Natural Language Processing with Python. Sebastopol, CA: O’Reilly Media, Inc.
Burscher, B., Vliegenthart, R., and De Vreese, C. H.. 2015. “Using Supervised Machine Learning to Code Policy Issues: Can Classifiers Generalize across Contexts?The ANNALS of the American Academy of Political and Social Science 659(1):122131.
Carlson, D., and Montgomery, J. M.. 2017. “A Pairwise Comparison Framework for Fast, Flexible, and Reliable Human Coding of Political Texts.” American Political Science Review 111(4):835843.
Chang, C., and Masterson, M.. 2019. “Replication Data for: Using Word Order in Political Text Classification with Long Short-Term Memory Models.”, Harvard Dataverse, V1.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P.. 2002. “SMOTE: Synthetic Minority Over-Sampling Technique.” Journal of Artificial Intelligence Research 16:321357.
Chollet, F. et al. . 2015. Keras. GitHub.
Chung, J., Gulcehre, C., Cho, K., and Bengio, Y.. 2014. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.” Preprint, arXiv:1412.3555.
Cortez, P., and Embrechts, M. J.. 2013. “Using Sensitivity Analysis and Visualization Techniques to Open Black Box Data Mining Models.” Information Sciences 225:117.
Diermeier, D., Godbout, J.-F., Yu, B., and Kaufmann, S.. 2012. “Language and Ideology in Congress.” British Journal of Political Science 42(01):3155.
D’Orazio, V., Landis, S. T., Palmer, G., and Schrodt, P.. 2014. “Separating the Wheat from the Chaff: Applications of Automated Document Classification Using Support Vector Machines.” Political Analysis 22(2):224242.
Elman, J. 1990. “Finding Structure in Time.” Cognitive Science 14(2):179211.
Fernández, A., Garcia, S., Herrera, F., and Chawla, N. V.. 2018. “Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary.” Journal of Artificial Intelligence Research 61:863905.
Friedman, J., Hastie, T., and Tibshirani, R.. 2001. The Elements of Statistical Learning, vol. 1 (Springer Series in Statistics). New York: Springer.
Gers, F. A., Schmidhuber, J., and Cummins, F.. 2000. “Learning to forget: Continual prediction with LSTM.” Neural Comput. 12(10):24512471.
Geurts, P., Ernst, D., and Wehenkel, L.. 2006. “Extremely randomized trees.” Machine Learning 63(1):342.
Google. 2017. Vector Representation of Words.
Graves, A., and Schmidhuber, J.. 2005. “Framewise Phoneme Classification With Bidirectional LSTM and Other Neural Network Architectures.” Neural Networks 18(5-6):602610.
Greff, K., Srivastava, R. K., Koutník, Jan, Steunebrink, B. R., and Schmidhuber, J.. 2016. “LSTM: A Search Space Odyssey.” IEEE Transactions on Neural Networks and Learning Systems 28(10):22222232.
Grimmer, J., and Stewart, B. M.. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21(03):267297.
Han, R., Gill, M., Spirling, A., and Cho, K.. 2018. “Conditional Word Embedding and Hypothesis Testing via Bayes-by-Backprop.” Working Paper.
Hochreiter, S. 1998. “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions.” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6(02):107116.
Hochreiter, S., and Schmidhuber, J.. 1997a. “Long Short-Term Memory.” Neural Computation 9(8):17351780.
Hochreiter, S., and Schmidhuber, J.. 1997b. “LSTM Can Solve Hard Long Time Lag Problems.” In Advances in Neural Information Processing Systems, edited by Mozer, M. C., Jordan, M. I., and Petsche, T., 473479. Cambridge, MA: MIT Press.
Hopkins, D. J., and King, G.. 2010. “A Method of Automated Nonparametric Content Analysis for Social Science.” American Journal of Political Science 54(1):229247.
James, G., Witten, D., Hastie, T., and Tibshirani, R.. 2013. An Introduction to Statistical Learning, vol. 112. New York: Springer.
Jernite, Y., Grave, E., Joulin, A., and Mikolov, T.. 2017. “Variable Computation in Recurrent Neural Networks.” In 5th International Conference on Learning Representations, ICLR 2017.
Johnson, R., and Zhang, T.. 2016.“Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings.” Preprint, arXiv:1602.02373.
Krebs, R. R. 2015. “How Dominant Narratives Rise and Fall: Military Conflict, Politics, and the Cold War Consensus.” International Organization 69(04):809845.
Liang, H., Sun, X., Sun, Y., and Gao, Y.. 2017. “Text Feature Extraction Based on Deep Learning: A Review.” EURASIP Journal on Wireless Communications and Networking 2017(1):211.
Liu, X., Wu, J., and Zhou, Z.. 2009. “Exploratory Undersampling for Class-Imbalance Learning.” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39(2):539550.
Lucas, C., Nielsen, R. A., Roberts, M. E., Stewart, B. M., Storer, A., and Tingley, D.. 2015. “Computer-Assisted Text Analysis for Comparative Politics.” Political Analysis 23(2):254277.
Maknickiene, N., and Maknickas, A.. 2012. Application of Neural Network for Forecasting of Exchange Rates and Forex Trading. Vilnius Gediminas Technical University Publishing House Technika, 122127.
Mikolov, T., Chen, K., Corrado, G., and Dean, J.. 2013a, “Efficient Estimation of Word Representations in Vector Space.” Preprint, arXiv:1301.3781.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J.. 2013b. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems, edited by Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. Q., 31113119. Cambridge, MA: MIT Press.
Mnih, A., and Kavukcuoglu, K.. 2013. “Learning Word Embeddings Efficiently with Noise-Contrastive Estimation.” In Advances in Neural Information Processing Systems, edited by Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. Q., 22652273. Cambridge, MA: MIT Press.
Montgomery, J. M., and Olivella, S.. 2018. “Tree-Based Models for Political Science Data.” American Journal of Political Science 62(3):729744.
Olden, J. D., and Jackson, D. A.. 2002. “Illuminating the Black Box: a Randomization Approach for Understanding Variable Contributions in Artificial Neural Networks.” Ecological Modelling 154(1):135150.
Osowski, S., Siwek, K., and Markiewicz, T.. 2004. “Mlp and SVM Networks-a Comparative Study.” In Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004, 3740. Espoo, Finland: IEEE.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.. 2011. “Scikit-learn: Machine Learning in Python.” Journal of Machine Learning Research 12:28252830.
Prechelt, L. 1998. “Early Stopping-But When?” In Neural Networks: Tricks of the Trade, 5569. Heidelberg: Springer.
Ruizendaal, R.2017. “Deep Learning #4: Why You Need to Start Using Embedding Layers” Towards Data Science, July 17.
Spirling, A. 2012. “US Treaty Making With American Indians: Institutional Change and Relative Power, 1784–1911.” American Journal of Political Science 56(1):8497.
Sun, J.2015. Jieba. Version 0.38.
TensorFlow. 2018. Vector Representations of Words.
Theano Development Team. 2017. LSTM Networks for Sentiment Analysis.
Weiss, G. M. 2004. “Mining with Rarity: a Unifying Framework.” ACM Sigkdd Explorations Newsletter 6(1):719.
Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., and Lipson, H.. 2015. “Understanding neural networks through deep visualization.” Preprint, arXiv:1506.06579.
MathJax is a JavaScript display engine for mathematics. For more information see


Type Description Title
Supplementary materials

Chang and Masterson supplementary material
Chang and Masterson supplementary material

 Unknown (1.2 MB)
1.2 MB

Using Word Order in Political Text Classification with Long Short-term Memory Models

  • Charles Chang (a1) (a2) and Michael Masterson (a3)


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed