Detecting anomalies in data on government violence

Abstract Can data on government coercion and violence be trusted when the data are generated by state itself? In this paper, we investigate the extent to which data from the California Department of Corrections and Rehabilitation (CDCR) regarding the use of force by corrections officers against prison inmates between 2008 and 2017 conform to Benford's Law. Following a growing data forensics literature, we expect misreporting of the use-of-force in California state prisons to cause the observed data to deviate from Benford's distribution. Statistical hypothesis tests and further investigation of CDCR data—which show both temporal and cross-sectional variance in conformity with Benford's Law—are consistent with misreporting of the use-of-force by the CDCR. Our results suggest that data on government coercion generated by the state should be inspected carefully before being used to test hypotheses or make policy.

1938), applying it to monthly count data from the California Department of Corrections (CDCR) on officers' uses of force against inmates under its jurisdiction in 37 CDCR institutions from 2008 to 2017. Specifically, we compare the distribution of the numerals that occupy the first digit of these counts to the distribution that Benford's Law expects. Our results point to persistent irregularities. We find agreement among four tests of conformity-the x 2 goodness-of-fit test and three alternatives that are less influenced by sample size-that the data do not conform to Benford's Law. Statistical hypothesis tests and further investigation of the data show temporal and crosssectional variance in conformity with Benford's Law and are consistent with misreporting of the use-of-force by the CDCR. Importantly, these findings hold when accounting for the multiple comparisons problem (Benjamini and Hochberg, 1995) and when using techniques informed by Bayesian analysis (Sellke et al., 2001;Pericchi and Torres, 2011). Our results suggest that data on government coercion generated by the state should be inspected carefully before being used to test hypotheses or make policy.

Testing Benford's Law on CDCR uses of force
Benford's Law theorizes that the numerals occupying the first digit in a number-including counts of CDCR uses of force-are distributed P(d) = log 10 (1 + 1 d ) for all d [ {1, ..., 9}. 1 A wide array of phenomena follow the distribution-stock market prices (Ley, 1996), the winning bids in auctions (Giles, 2007), and the population of US counties (Hill, 1995, p. 355 ). 2 Given widespread adherence to Benford's distribution, deviations thereof are often understood as evidence of irregular or fraudulent data (Varian, 1972). The intuition behind forensics investigations is that when humans manipulate or falsify data, they are typically unable to do so in a way that adheres to the Benford distribution because of ignorance of the law's existence, psychological biases, strategic considerations, and inconsistent record keeping. Thus, violation from Benford's Law is understood as indicating human interference in the collection of data. Although legitimate data generating processes can naturally produce some deviations from Benford's Law,3 deviations indicate-at a minimum-that data should be inspected closely before being used for hypothesis testing.
Since 2008, the 35 institutions overseen by the California Department of Corrections have been mandated to collect and publish use-of-force data. Pursuant to California Code of Regulations Title 15, §3268.1 (California Codes of Regulations, Title 15, 3268.1, 1999), "any employee who uses force or observes a staff use of force" is mandated to report details of the incident. Published monthly by the CDCR's Office of Research, each report includes counts of the use of force in each institution, disaggregated by type of force across ten categories. All uses of force can be applied against a single inmate or a group of inmates, and more than one type of force can be used per incident. 4 We test statistical hypotheses about the extent to which CDCR data conform to Benford's Law and probe subsamples of our results to shed light on practical hypotheses about the extent to which CDCR data are fraudulent. 5 We use four statistical tests to evaluate the statistical conformity of CDCR use-of-force data with Benford's Law. The first is the x 2 goodness-of-fit test, the most commonly-used test in existing scholarship. The test statistic is calculated For explanations for Benford's Law, please see Miller (2015).
3 Mebane (2013) shows that strategic voting can make unproblematic election returns data violate Benford's Law. 4 For descriptives, please refer to our Supplemental Appendix. 5 For more on statistical and practical hypotheses, see Cleary and Thibodeau (2015, p. 203) where O i is the observed frequency of digit i, and E i is the frequency expected by the Benford distribution. The statistic is compared to critical values in standard x 2 tables at 8 degrees of freedom with a 95 percent critical value of 15.5 and a 99 percent critical value of 20.1. A drawback of the x 2 test is that its considerable power in large samples means that even slight deviations or just a handful of outliers can result in large test statistics and correspondingly small p-values (Cho and Gaines, 2007). Accordingly, our second test draws from Cho and Gaines (2007) and calculates the Euclidean distance of the observed data from the Benford distribution: (2) In Equation 2, p i is the proportion of observations with numeral i in the first digit, and b i is the proportion expected by Benford's Law. As Cho and Gaines (2007) note, d is not derived from the hypothesis testing framework, so there are no critical values to compare it against. Instead, they recommend calculating d * = d 1.036 , where 1.036 is the maximum possible Euclidean distance between observed data and the Benford distribution-that is, the value that occurs when 9 is the only numeral in the first digit. Thus, d * ranges between 0 and 1, where larger values indicate more significant deviation. For the sake of comparison, Cho and Gaines (2007, The advantage of d * N over d * is that it has critical values that allow for hypothesis testing. Specifically, Morrow (2014, pp. 4-5) shows that 1.33 is the critical value to reject the null hypothesis that the data are distributed per Benford's Law with 95 percent confidence, and 1.57 is sufficient to do so at the 99 percent confidence level.
Our fourth test is the simplest. If a set of numbers follow Benford's Law, then the numerals occupying the first digits will have a mean value of 3.441. Hicken and Mebane (2015) recommend calculating the mean from the observed data and using nonparametric bootstrapping to assess whether the 95 percent confidence interval contains that value.

Empirical results
In testing whether the CDCR data conform to Benford's Law, we limit our analyses to evaluating the aggregated total uses of force, because several of the force types have counts that are too small for reliable inferences about their digit distributions. The dashed line in Figure 1 plots the proportions with which each numeral appears in the first digit of the total data, across all 35 institutions for the period from 2008 to 2017. The solid line plots the proportions expected by Benford's Law. It is evident that the observed data exhibit some inconsistencies with the expected distribution.
Aggregating CDCR data across all institutions over the entire decade conceals important nuance; further interrogation of the data is warranted to determine potential explanations for these anomalies. We rely on theoretical expectations about subsets of the data to further interrogate our results (Cleary and Thibodeau, 2015, p. 212;Rauch et al., 2015, p. 262). In addition to the aggregated data being inconsistent with Benford's Law, we might also expect the data to exhibit particular temporal and cross-sectional patterns if the CDCR consistently manipulated its use-of-force data.
Consider the effect of changes in California's legal or political context on the behavior of prison guards. During the time period covered by our data, two major policy interventions were implemented that reformed when corrections officers could use force and how uses-of-force were to be reported. In August 2010, the CDCR filed notice that it had adopted and implemented statewide required use-of-force reforms associated with a 1995 US District Court ruling that correctional staff at Pelican Bay State Prison routinely used unusual and excessive force against inmates and were negligent in reporting it (Madrid v. Gomez, 889 F. Supp. 1146(N.D. Cal. 1995). A second set of policy reforms went into effect in 2016, when the state further clarified staff reporting responsibilities, updated official reporting procedures, and revised the definitions of acceptable use of force options in the CDCR Department of Operations Manual.
If these reforms had the intended effects of clarifying the conditions under which force should be used and of improving how uses of force are reported, we might expect the data to conform more closely to Benford's Law at two discrete moments. The first is in 2010, when the CDCR implemented use-of-force reforms. If these reforms worked, data from 2010 onward might adhere more closely than the data from 2008 and 2009. The second moment is in 2016, when the second set of reforms came into effect. These additional reforms might further improve the data's conformity to the Benford Distribution compared even to the 2010 reforms. Figure 2 graphs the observed and expected distributions by year and shows significant temporal variation. Compared to 2008 and 2009, data from 2010 onward appear to conform more closely with the Benford distribution; results from 2016 and 2017 appear to conform still more closely to Benford's Law. Table 1 shows tests of conformity to Benford's Law by year. The first row in the second column reports the x 2 test statistic when the data are aggregated across all years and all institutions types. The test statistic is very large, allowing us to reject the null hypothesis that the observed data follow Benford's distribution. The Cho and Gaines (2007) Euclidean Distance proportion (d * ) is more conservative than the x 2 test, but it too indicates nonconformity. Like the x 2 , Morrow's (d * N ) Euclidean Distance test yields a large test statistic that rejects the null at the 99 percent level of confidence. The last two columns show similar results. If the observed data are consistent with Benford's Law, the mean of the first digit would equal 3.441. In these data, the observed mean is 2.96 with 95 percent confidence intervals that do not include the Benford value. Table 1 also allows assessment of how conformity changes over time. First, though the x 2 test statistics are very large early in the period, they decline sharply toward the end of it. According to the x 2 test, in 2014, 2016, and 2017, we cannot reject the null hypothesis that the data appear to conform to Benford's Law. The two Euclidean distance tests and the digit mean tests also identify 2016 and 2017 as years where the data deviate little from Benford's expectations. A second feature of Table 2 is the notable improvement in conformity that occurred in 2010. Compared to 2009, the x 2 statistic in 2010 is about half the size, while the two distance tests also yield considerably smaller values. These results suggest that the 2010 reforms may have had some effect in improving how force is reported. Conformity improves even more in 2016 and 2017, which coincides with the second set of reforms implemented in 2016. In addition to temporal variance, we investigate nonconformance with Benford's Law across institution type as indicative of purposeful misreporting of CDCR use-of-force data. 6 The CDCR divides its institutions into four types. New inmates arrive in the CDCR system at reception centers, where their placements needs are recorded and they are assigned to one of 34 institutions. Inmates can remain in reception centers for up to 120 days. While in reception, inmates are assigned classification scores that facilitate placement at an appropriate institution. High security institutions house the CDCR's violent male offenders; general population facilities house minimum to medium custody male inmates while providing them with opportunities for participation in vocational and academic programs. Female institutions house women offenders across classification scores. Across institution types, we might expect heterogeneity in guard incentives to misreport (or fail to report) violence. Figure 3 graphs the digit proportions for each institution type and plots them against the Benford distribution. The most glaring result is the poor fit of high security institutions, where digits 1-4 are almost uniformly distributed. General population and female institutions also fail to conform to Benford's Law; reception institutions adhere more closely to expectations.   Note: Cells display test statistics in the x 2 and d * N columns, proportions in the d * column, and the first digit mean and bootstrapped confidence interval in the Digit Mean Test columns.
four statistical tests described above to determine whether there is heterogeneity with regard to institution type. In every case-across every test and every institution type-our tests show nonconformity with the expectations of Benford's Law. In the second column, which shows the results of the x 2 test, and the fourth column, which shows the results of Morrow's (d * N ) Euclidean Distance test, we are universally able to reject the null hypothesis that the observed data follow Benford's distribution. The Cho and Gaines (2007) Euclidean Distance proportion (d * ) and the test of the mean of the first digit similarly indicate reason to be concerned with potential irregularities in CDCR-reported data across institution type. Although the data do not conform to expectations for any institution type, we see the lowest conformance with Benford's Law in high security male prisons, presumably locations where guards face high incentives to engage in violence and misreport their behavior. We see the highest conformance with Benford's Law in reception centers-locations where inmates are held for 120 days before being assigned to more permanent locations. 7

Conclusion
In 2018, Dr. Michael Golding, chief psychiatrist for the CDCR, alleged that prison officials "played around with how they counted things and they played around with time frames" in their reports of how information regarding inmate medical and psychiatric care (Stanton, 2018). This sort of "playing around" with counts and timing, if an endemic problem across CDCR institutions and across reporting areas, could account for the violations of Benford's Law we have observed. Our results suggest that there are irregularities in the CDCR's reporting of use-of-force data. The irregularities are less persistent following reforms intended to limit violence and streamline the reporting of violence, and they are the most persistent in high security male prisons. What does this variance tell us about how the likely origin of the misreporting of CDCR data?
From an organizational behavior perspective, our analyses highlight systematic differences over time and space in how officials seem to understand the expected risks, costs, and benefits of misreporting. Prison officials-both guards and their superiors-face incentives to misreport instances of the use of force (Seidman and Couzens, 1974;Maltz, 2007). Misreporting occurs for several reasons: because interactions with inmates are private, because what constitutes force is sometimes unclear, and because there is inadequate infrastructure to catch violators. Conformance with Benford's Law increased in 2010 and 2016 suggests that reforms, which made clearer what constitutes a violation and institutionalized a higher standard of reporting by increasing oversight (e.g., Cook and Fortunato, 2019). That reported data gets "better" with oversight suggests purposive CDCR misreporting, although the data do not allow us to determine whether it originates from guard reports or those compiling aggregate prison data.
The contributions of this work are not limited to scholars of criminology and criminal justice. Social scientists from a variety of substantive interests are often forced to rely on state-supplied data; few explore the content of those data for signs of manipulation by the actors responsible for its generation and release. Although scholars have adduced that governments-particularly nondemocratic ones-release data on state-citizen interactions strategically (Hollyer et al., 2018), few have explored the content of such data for signs of manipulation. How can researchers be confident of the veracity of their data? What constitutes proper use of data pertaining to coercion that is obtained from the coercive organization itself? The forensics approach employed here can serve as a template that researchers use to more deeply and directly engage these important questions.