Hostname: page-component-89b8bd64d-j4x9h Total loading time: 0 Render date: 2026-05-07T15:56:04.245Z Has data issue: false hasContentIssue false

Machine Learning Human Rights and Wrongs: How the Successes and Failures of Supervised Learning Algorithms Can Inform the Debate About Information Effects

Published online by Cambridge University Press:  26 October 2018

Kevin T. Greene
Affiliation:
Department of Political Science, University of Pittsburgh, Political Science, 4600 Wesley W. Posvar Hall, Pittsburgh, Pennsylvania 15260, United States. Email: mcolaresi@pitt.edu
Baekkwan Park
Affiliation:
Department of Political Science, University of Pittsburgh, Political Science, 4600 Wesley W. Posvar Hall, Pittsburgh, Pennsylvania 15260, United States. Email: mcolaresi@pitt.edu
Michael Colaresi*
Affiliation:
Department of Political Science, University of Pittsburgh, Political Science, 4600 Wesley W. Posvar Hall, Pittsburgh, Pennsylvania 15260, United States. Email: mcolaresi@pitt.edu
Rights & Permissions [Opens in a new window]

Abstract

There is an ongoing debate about whether human rights standards have changed over the last 30 years. The evidence for or against this shift relies upon indicators created by human coders reading the texts of human rights reports. To help resolve this debate, we suggest translating the question of changing standards into a supervised learning problem. From this perspective, the application of consistent standards over time implies a time-constant mapping from the textual features in reports to the human coded scores. Alternatively, if the meaning of abuses have evolved over time, then the same textual features will be labeled with different numerical scores at distinct times. Of course, while the mapping from natural language to numerical human rights score is a highly complicated function, we show that these two distinct data generation processes imply divergent overall patterns of accuracy when we train a wide variety of algorithms on older versus newer sets of observations to learn how to automatically label texts with scores. Our results are consistent with the expectation that standards of human rights have changed over time.

Information

Type
Letter
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits noncommercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
Copyright © The Author(s) 2018. Published by Cambridge University Press on behalf of the Society for Political Methodology.
Figure 0

Figure 1. The out-of-window accuracy for algorithms trained on overlapping/rolling 10 year windows of the lexical report features in the top plot illustrates an upward trend, consistent with a dynamic human rights score generation process. The in-window accuracy, calculated on a held-out evaluation set from within the time period used to train the models, is plotted in the middle window, and by comparison is flat, suggesting that the PTS data has not grown more difficult to learn over time. The lower plot illustrates the average bias in the out-of-window period (predicted value–observed value). There is not a consistently negative bias for models using older data. The black dashed line in each plot represents an ensemble majority vote classifier that takes the predictions of each of the other algorithms as input, and predicts the class that has the plurality of votes. The keys for the algorithms are defined in the supplemental appendix.

Figure 1

Figure 2. Each subplot presents the descending rank (logged) of select terms based on their feature importance for predicting PTS label 4 (red) and 5 (blue) using the unigram Random Forests Model and GINI impurity. The last year of the 10 year rolling training windows are presented on the $x$-axis. The $y$-axis is reversed so that greater importance values (lower ranks) are higher on the plot. Feature importance is computed on decision trees using unigram features within a random forest. The trees in the forest are trained to classify a report as either a 4 or not, and then as either a 5 or not. We use the change in gini impurity from parent to child nodes, weighted by the number of observations that reach each node as our feature importance metric. These values are averaged across the trees. A flat line represents consistent importance in predicting recent scores over the past training windows. The first six rows of plots are all within the top 25 ranked features for either the earliest of the latest windows. The last two rows include words that we pulled from the PTS codebook and Richards discussion.

Supplementary material: File

Greene et al. supplementary material

Greene et al. supplementary material 1

Download Greene et al. supplementary material(File)
File 423.4 KB