Measuring Backsliding with Observables: Observable-to-Subjective Score Mapping

ABSTRACT Multiple well-known democracy-rating projects—including Freedom House, Polity, and Varieties of Democracy (V-Dem)—have identified apparent global regression in recent years. These measures rely on partly subjective indicators, which—in principle—could suffer from rater bias. For instance, Little and Meng (2023) argue that shared beliefs driven by the current zeitgeist could lead to shared biases that produce the appearance of democratic backsliding in subjectively coded measures. To assess this argument and the strength of the evidence for global democratic backsliding, we propose an observable-to-subjective score mapping (OSM) methodology that uses only easily observable features of democracy to predict existing indices of democracy. Applying this methodology to three prominent democracy indices, we find evidence of backsliding—but beginning later and not as pronounced as suggested by some of the original indices. Our approach suggests that the Freedom House measure particularly does not track with the recent patterns in observable indicators and that there has been a stasis or—at most—a modest decline in the average level of democracy.

A lmost all extant measures of democracy involve some degree of subjective coding.As is widely recognized, coder judgments may be affected by many factors that introduce error into the coding of a country.These factors include personal preferences, political preferences, lack of information, biased sources, varying ideas about how to conceptualize democracy, and dataentry mistakes. 1  Little and Meng (2023) identify one specific bias that could have devastating consequences for our understanding of democracy in the twenty-first century.Backsliding, they surmise, is part of the zeitgeist, seemingly confirmed by the rise of Trump in the United States, Modi in India, Orbán in Hungary, and other populists around the world.This vision of doom is trumpeted by major media outlets, which adopt the backsliding frame to explain unfolding events in a readily comprehensible manner.This vision has been adopted by leaders in the West, who see global forces arrayed on either side of a growing divide-between democracies (the good guys) and autocracies (the bad guys).It is catnip to a growing industry of democracy scholars and activists whose business is to be concerned about the fate of democracy.Arguably, doom-saying enhances the importance and funding available to democracy scholars and activists, thereby serving their interests.
© The Author(s), 2024.Published by Cambridge University Press on behalf of American Political Science Association.This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/ by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Corresponding author: Daniel Weitzel is assistant professor at Colorado State University.He can be reached at daniel.weitzel@colostate.edu.John Gerring is professor of government at the University of Texas at Austin.He can be reached at jgerring@austin.utexas.edu.

Daniel Pemstein
is professor of political science and public policy at North Dakota State University.He can be reached at daniel.pemstein@ndsu.edu.Svend-Erik Skaaning is professor of political science at Aarhus University.He can be reached at skaaning@ps.au.dk.
A more benign interpretation is that democracy indices have become more demanding in their standards.Expert coders, primed to find evidence of backsliding, critically examine facts on the ground, sensing a fundamental threat to democracy in every populist outcry.A high score for democracy thus is more difficult to achieve in 2020 than it was in 2000 because coders are more attentive to democratic deficits.Whichever mechanism(s), it is easy to envision backsliding as a shibboleth of the twenty-first century with particular resonance in the West, where all widely used democracy indices are headquartered and produced.The hypothetical result is a systematically biased coding of democracy during the past few decades, an apparent downturn that is the product of a collective miasma or changes in the way democracy is understood.
To illustrate this problem, Little and Meng ( 2023) provide an index of democracy based solely on observable features relevant to democracy and therefore resistant to errant subjective judgments.Because this index demonstrates little change in the past few decades, it seems to corroborate the hypothesis that backsliding is more illusion than reality.Yet, Little and Meng do not regard the index as an adequate annual measure of democracy-a point highlighted in this symposium, in which Knutsen et al. (2024) demonstrate that the index is prone to problems of conceptualization, indicator selection, aggregation, and coverage.This does not mean that Little and Meng are wrong about biases toward backsliding in current democracy indices; however, it does suggest that there may be better ways to address the question.
This study adopts an approach that, although also based on observables, may be less susceptible to the problems identified by Knutsen et al. (2024).Briefly, we train a random forest model to predict existing indices of democracy using only easily observable features of democracy and a sample limited to the years before the generally recognized onset of backsliding.We then apply the model to predict scores for the original indices across the entire period, comparing those scores to the scores recorded by each index.This approach suggests that backsliding is real, although it may begin later and may not be as pronounced as the trajectory registered in other democracy indices.
We first define our methodology, called observable-to-subjective score mapping (OSM) and based on Weitzel et al. (2023a), which includes further details.The second section presents the results of our analyses applied to three prominent democracy indices.The third section discusses various robustness tests.The fourth section considers the missing-data problem posed by measuring a latent concept with observables.The conclusion reflects on what can be learned from this exercise about purported democratic downturns in the twenty-first century.

METHODOLOGY
It is helpful that many features relevant to democracy are observable, or relatively so-observability being a matter of degrees.However, it is no mean feat to measure these features in a comprehensive manner, to select which ones to include in an index, and to arrive at a method of aggregation that has credibility while preserving the nuance required to discern backsliding.
Nuance is an important consideration because most countries regarded as recent backsliders have not abolished elections, outlawed all opposition parties, or dissolved the legislature.Rather, incumbents have figured out clever ways to undermine the independence of institutions and tilt the electoral playing field in their favor.To capture backsliding, it is essential to capture these nuances.Binary indices such as democracy-dictatorship (Cheibub, Gandhi, and Vreeland 2010) and the Boix, Miller, and Rosato (2013) index are not sufficient.
Our approach enlists OSM, an approach to measurement adapted for situations in which both subjectively coded and directly observable ("objective") indicators of a concept are available (Weitzel et al. 2023a).We begin with extant indices of democracy based largely on subjective coding, with the assumption that these measurement instruments have some prima facie validity-or at least did have before the era of alleged backsliding.
We focus on three of the most widely used non-dichotomous measures of democracy: the Polyarchy index from Varieties of Democracy (V-Dem) (Teorell et al. 2019); the Polity2 index from the Polity IV project (Marshall, Gurr, and Jaggers 2015); and the political rights and civil liberties indicators (combined into a single index by addition) from Freedom House (2021).To ensure comparability, we restricted the sample for the following exercise to a common set of 167 polities (see table SI 5 in the online appendix).
Each of the chosen indices forms a target, which we attempted to predict with a wide variety of observable indicators.It is important to be as inclusive as possible in the collection of these indicators to avoid arbitrary ("subjective") exclusions that might bias the results.As long as a feature was observable for a broad set of cases and potentially relevant to democracy, it was included in our canvas.A total of 26 indicators drawn from Weitzel et al. (2023a) was reenlisted for our study (see table SI 1.1 in the online appendix).
These 26 indicators were treated as predictors in a random forest model in which an existing index of democracy is the outcome to be explained.In this instance, the training set was restricted to the pre-backsliding period, when subjective coding was not affected by current expectations of backsliding.
It is an open question about when the concept of backsliding, or democratic downturn, first took hold.A Google Ngram, drawing on the Google Books database, shows an uptick in references to "democratic backsliding" around 2010 (see figure SI 6.1 in the online appendix).To avoid any possible overlap, we restricted the training set to the years before 2000.This training period extends back to 1900 for Polyarchy and Polity2 and to 1972 for Freedom House, the first year of coding for that index (Weitzel et al. 2023b).
We regard this training set as free from the potential bias identified by Little and Meng (2023).Other biases may exist, but these features presumably remain constant through the twentieth and twenty-first centuries; therefore, they either do not affect observed trends over time or they are specific to pre-2000 periods.
The random forest model assigns weights to each of the 26 variables based on their predictive value.As shown in figure 1, these "importance scores" have marginal differences in the scores assigned to each variable across the three democracy indices.However, importance scores are highly correlated, much like the indices themselves.
Finally, we used the OSMs from the twentieth century to predict values for the twenty-first century (2000-2022). 2That is, we used the pre-2000 period to learn how to translate the conceptualizations of each index into an aggregation of observable indicators.This translation-linking measures to concepts-was applied to the out-of-sample period (post-2000) using only observable indicators as input.This protocol purges the post-2000 predictions of any direct human influence, including zeitgeist-driven bias.If the democratic backsliding reported in subjective indices is due to coder expectations about backsliding or changing standards for democracy-rather than some reality "out there"-these predictions should not show any decline.

RESULTS
We first discuss the Polyarchy index.Panel (a) in figure 2 plots the original index and OSM predictions across the entire period of observation, averaging across all 167 countries in our sample (equally weighted).
OSM predictions closely track Polyarchy, with a small divergence at the very end of the period, around 2015, when they increase slightly above Polyarchy.Numeric values, recorded in table 1, show that differences across the two time-series are minute.For example, between 2001 and 2022, the largest difference between Polyarchy and OSM predictions of Polyarchy is 0.048 on a 0-to-1 scale.Only a few points at the very end of the time-series fall outside of the 95% confidence interval of the mean for the OSM prediction (see the shaded region in figure 2). 3  Evidence of a downturn in global democracy is found in Polyarchy (beginning in 2013) and OSM estimates of Polyarchy (beginning in 2018).However, we again emphasize the miniscule nature of these changes, especially for the OSM predictions, which do not surpass the confidence interval of prior point estimates.
Leaving global averages, we also can observe how particular cases performed during the backsliding period.Panel (b) in figure 2 focuses on changes registered for specific countries from 2000 to 2022.The Y axis shows the change in Polyarchy scores.A score higher than zero means that the country's democracy score improved; a score less than zero means that it deteriorated.The X axis records the same information for OSM predictions.

F i g ur e 1
Variable Importance Plot for Three Democracy Indices  Most countries are near the zero point, as shown by the density curves overlaid along the X and Y axes.Those countries that change scores are situated mostly along the diagonal, which demonstrates agreement between Polyarchy and OSM predictions.Several countries fall significantly below the diagonal, indicating that the OSM has a more optimistic view of their trajectory than Polyarchy.This includes Albania, Egypt, Fiji, Hungary, India, Malaysia, Pakistan, the Philippines, Thailand, and Turkey.These cases presumably account for the small divergence between the two indicators that is visible at the end of the time-series in panel (a) of figure 2 and in table 1.These case-rating divergences may reflect excess pessimism in recent Polyarchy ratings but also may reflect the tendency of changes in observable indicators of democracy to miss difficult-to-observe features available to experts.
Figure 3 repeats this exercise for Polity2.Panel (a) reveals an even closer alignment between the original index and OSM predictions than was observed for Polyarchy.Both curves show minor evidence of backsliding-beginning in 2016 for Polity2 and in 2017 for OSM predictions, as shown in table 1.
Panel (b) in figure 3 displays change scores from 2000 to 2018 for all 167 countries as assigned by Polity2 and OSM estimates.Again, most points are close to the center.In a few instances, the OSM provides different assessments.Polity2 is more pessimistic than the OSM about the twenty-first-century trajectories of Comoros, Democratic Republic of the Congo, Fiji, and Iran, for example.
Figure 4 completes the exercise, focusing on the Freedom House index.Panel (a) shows that Freedom House registers a fairly sharp downturn beginning in 2006.Meanwhile, the OSM predictions continue ascending through 2017, after which there is a modest downturn.Unlike for Polyarchy and Polity2, out-ofsample OSM predictions for Freedom House diverge dramatically, with recent observations falling well outside of the confidence intervals.
These divergences also are notable in panel (b) of figure 4, in which the OSM has a decidedly more optimistic view of regime changes in Turkey, the Republic of Congo, and Burundi and a more pessimistic view of developments in Costa Rica and Namibia.
We might regard the divergence between Freedom House and the OSM model as a failing of our modeling approach, especially because two features of the Freedom House index appear to complicate the task of making out-of-sample predictions beyond the observed time-series.First, the index is sluggish, registering few changes through time relative to Polyarchy and Polity2 (see table SI 10.3 in the online appendix).Second, because the Freedom House index begins in 1972, we do not have an extended sample on which to train the random forest model.
However, when the same sample restriction is imposed on Polyarchy and Polity2, we observe only a modest attenuation in alignment; therefore, the shortness of the sample cannot be the entire case.Moreover, results displayed in panel (a) of figure 4 show that the OSM is quite proficient in predicting the first several years of Freedom House, out of sample.Large differences appear only after 2005.
One explanation for the divergence between Freedom House and the OSM is that coding principles changed around the 2006 edition of Freedom House, leading to a fundamentally different data-generating process that the OSM model could not-and, by design, should not-replicate.Consequently, 2006 is the first year that Freedom House publicly released subcategory scores for its extensive questionnaire.At the same time, the number of coders (i.e., analysts) increased steeply (from 14 to 23), after which their number continued to grow, reaching a total of 128 for the 2023 report.Finally, the 2006 Freedom House edition introduced a rewording of several survey questions and the coding guidelines.

F i g ur e 3 Polity2
): Polity2 (dark blue) and OSM predictions for Polity2 (orange) flanked by 95% confidence intervals from 1900 to 2018 (Polity2) and 2022 (OSM).Panel (b): the change in Polity2 scores (Y axis) against the change in OSM predictions (X axis) from 2000 to 2018, for which a positive value indicates an improved democracy score.C o m m e n t a n d C o n t r o v e r s y : S p e c i a l I s s u e o n D e m o c r a t i c B a c k s l i d i n g .
Global Means of Democracy Indices and OSM Predictions