Synergies and Safeguards

doi:10.1017/9781009334297.013

9 Law and Empathy in the Automated State

Cary Coglianese

9.1 Introduction

Because the future knows no bounds, the future of administrative law is vast. In the near term, administrative law in the United States will undoubtedly center around how the US Supreme Court decides cases raising core administrative law issues such as the nondelegation doctrine and judicial deference to agencies’ statutory interpretation. But over the longer term, new issues will confront the field of administrative law as new changes occur in government and in society. One major change on the horizon will be an increasingly automated administrative state in which many governmental tasks will be carried out by digital systems, especially those powered by AI and ADM tools.

Administrative agencies today undertake a range of activities – granting licenses, issuing payments, adjudicating claims, and setting rules – each of which traditionally has been executed by government officials. But it is neither difficult nor unrealistic to imagine a future in which members of the public, when they interact with the government, increasingly find themselves interacting predominantly with digital systems rather than human officials. Even today, the traditional administrative tasks for which human beings have long been responsible are increasingly augmented by computer systems. Few people today think twice about using government websites to apply for unemployment benefits, register complaints, or file paperwork, rather than visiting or making phone calls to government offices. The federal government in the United States has even created an online portal – USA.gov – that provides its users with easy access to the panoply of resources and digital application processes now available to the public via an extensive network of state and federal government websites.

The transition to this online interaction with government over the last quarter-century foreshadows what will likely be a deeper and wider technological transformation of governmental processes over the next quarter-century. Moving beyond the digitization of front-end communication with government, the future will likely feature the more extensive automation of back-end decision-making, which today still often remains firmly in the discretion of human officials. But we are perhaps only a few decades away from an administrative state that will operate on the basis of automated systems built with ADM and AI tools, much like important aspects of the private sector increasingly will. This will lead to an administrative state characterized by what I have elsewhere called algorithmic adjudication and robotic rulemaking.Footnote ¹ Instead of having human officials make discretionary decisions, such as judgments about whether individual claimants qualify for disability benefits, agencies will be able to rely on automated systems to make these decisions. Claims-processing systems could be designed, for example, to import automatically a vast array of data from electronic medical records and then use an AI system to process these data and determine whether claimants meet a specified probability threshold to qualify for benefits.Footnote ²

If many of the tasks that government currently completes through decision-making by human officials come to be performed entirely by ADM tools and computer systems, how will administrative law respond to this transformation to an automated state? How should it?

Most existing administrative law principles can already accommodate the widespread adoption of automation throughout the administrative state. Not only have agencies already long relied on a variety of physical machines that exhibit automaticity, but an automated state – or at least a responsible automated state – could be thought of as the culmination of administrative law’s basic vision of government that relies on neutral public administration of legislatively delegated authority. Administrative law will not need to be transformed entirely to operate in an era of increasing automation because that automation, when responsibly implemented, will advance the democratic principles and good governance values that have provided the foundation for administrative law.

Nevertheless, even within an otherwise responsible automated state, an important ingredient of good governance could increasingly turn out to be missing: human empathy. Even bureaucracies comprising human officials can be cold and sterile, but an era of extreme automation could present a state of crisis in human care – or, more precisely, a crisis in the lack of such care. In an increasingly automated state, administrative law will need to find ways to encourage agencies to ensure that members of the public will continue to have opportunities to engage with humans, express their voices, and receive acknowledgment of their predicaments. The automated state will, in short, also need to be an empathic state.

9.2 Implementation of the Automated State

The information technology revolution that launched several decades ago shows few signs of abating. Technologists today are both revealing and reaching new frontiers with the use of advanced AI technologies, also referred to as machine learning or predictive analytics. These terms – sometimes used interchangeably – encompass a broad range of tools that permit the rapid processing of large volumes of data that can yield highly accurate forecasts and thereby facilitate the automation of many distinct tasks. In the private sector, AI innovations are allowing the automation of a wide range of functions previously handled by trained humans, such as the reading of chest X-rays, the operation of automobiles, and the granting of loans by financial institutions.

Public administrators have taken notice of these AI advances in the private sector. Some advances in the business world even have direct parallels to governmental tasks. Companies such as eBay and PayPal, for example, have developed their own highly successful automated online dispute resolution tools to resolve complaints without the direct involvement of human employees.Footnote ³ Overall, government officials see in modern data analytics the possibility of building systems that could automate a variety of governmental tasks, all with the potential to deliver increased administrative efficiency, speed, consistency, and accuracy.

The vision of an automated administrative state might best be exemplified today by developments in the Republic of Estonia, a small Baltic country that has thoroughly embraced digital government as a mark of distinction. The country’s e-Estonia project has transformed the nation’s administration by digitizing and securely storing vast amounts of information about individuals, from their medical records to their employment information to their financial statements.Footnote ⁴ That information is cross-linked through a digital infrastructure called X-Road, so that a person’s records can be accessed instantly by any entity that needs them, subject to limits intended to prevent wrongdoing. This widespread digitization has facilitated the automation of a range of government services: Individuals can easily vote, apply for a loan, file their taxes, and complete other administrative tasks without ever needing to interact with a human official, simply by transferring their digital information to complete forms and submit requests. By automating many of its bureaucratic processes, Estonia has saved an estimated 2 percent of its GDP each year. The country is even exploring the use of an automated “judge” to resolve small claims disputes.Footnote ⁵

Other countries such as Denmark and South Korea are also leading the world in the adoption of so-called e-government tools.Footnote ⁶ The United States may not have yet achieved quite the same level of implementation of automated government, but it is certainly not far behind. Federal, state, and local agencies throughout the United States have not only embraced web-based applications – such as those compiled on the USA.gov website – but have begun to deploy the use of AI tools to automate a range of administrative decision-making processes. In most of these cases, human officials remain involved to some extent, but a significant amount of administrative work in the United States is increasingly conducted through digital systems.

Automation helps federal, state, and local governments navigate challenging resource-allocation decisions in the management of public programs. Several state governments in the United States have implemented AI and ADM tools to help make decisions about the award of Medicaid and other social benefits, seeking to speed up and improve the consistency of claims processing.Footnote ⁷ Similarly, the federal Social Security Administration uses automated tools to help support human appeals judges’ efforts to provide quality oversight of an agency adjudicatory process that handles as many as 2.5 million disability benefits claims each year.Footnote ⁸

Municipalities rely on automated systems when deciding where to send health and building inspectors.Footnote ⁹ Some local authorities use such systems when making choices about where and when to deploy social workers to follow up on allegations of child abuse and neglect.Footnote ¹⁰ Federal agencies, meanwhile, have used AI and ADM systems to analyze consumer complaints, process reports of workplace injuries, and evaluate public comments on proposed rules.Footnote ¹¹

Criminal law enforcement agencies throughout the United States also rely on various automated tools. They have embraced tools that automate deployment of officer patrols based on predictions of locations in cities where crime is most likely to occur.Footnote ¹² Many law enforcement agencies have also widely used automated facial recognition tools for suspect identification or security screenings.Footnote ¹³

Regulatory agencies similarly have deployed automated tools for targeting auditing and enforcement resources. States have employed data analytics to detect fraud and errors in their unemployment insurance programs.Footnote ¹⁴ The federal Securities and Exchange Commission and the Internal Revenue Service have adopted AI tools to help detect fraudulent behavior and other wrongdoing.Footnote ¹⁵

In these and other ways, public authorities across the United States have already made considerable strides toward an increasingly automated state. Over the next several decades, governmental use of automation driven by AI tools will surely spread still further and is likely to lead to the transformation of or phasing out of many jobs currently performed by government employees.Footnote ¹⁶ The future state that administrative law will govern will be one of increasingly automated administration.

9.3 US Administrative Law and the Automated State

Can administrative law accommodate an automated state? At first glance, the prospect of an automated state might seem to demand a fundamental rewriting of administrative law. After all, administrative law developed to constrain the discretion of human officials, to keep their work within the bounds of the law, and to prevent the kinds of principal-agent problems that can arise in the relationships between human decision-makers. Moreover, one of administrative law’s primary tenets – that governmental processes should be transparent and susceptible to reason-giving – would seem to stand as a barrier to the deployment of the very AI tools that are driving the emerging trends in automation.Footnote ¹⁷ That is because the algorithms that commonly drive AI and ADM tools – sometimes referred to as “black box” algorithms – have properties that can make them opaque and hard to explain. Unlike traditional statistical algorithms, in which variables are selected by humans and resulting coefficients can be pointed to as explaining specified amounts of variation in a dependent variable, the algorithms that drive AI systems effectively discover their own patterns in the data and do not generate results that associate explanatory power to specific variables. Data scientists can certainly understand and explain the goals and general properties of these “machine learning” algorithms, but overall these algorithms have a degree of autonomy – hence their “learning” moniker – that can make it more difficult to explain precisely why they reach any specific forecast that they do. They do not usually provide any basis for the kind of causal statements often used to justify administrative decisions (such as “X is justified because it causes Y”).

As a result, transparency concerns are reasonable when considering a future of an automated state based on AI systems. But on even a modest degree of additional reflection, these concerns would appear neither to act as any intrinsic barrier in the United States to the reliance on AI automation nor necessarily to demand any fundamental transformation of US administrative law to accommodate an automated state. Administrative law has never demanded anything close to absolute transparency nor required meticulous or exhaustively detailed reasoning, even under the arbitrary and capricious standard of Section 706 of the Administrative Procedure Act.Footnote ¹⁸ Administrative agencies that rely on AI systems should be able to satisfy any reason-giving obligations under existing legal principles by explaining in general terms how the algorithm underlying the AI system was designed to work and demonstrating that it has been validated to work as designed by comparing its results to those generated by the status quo process. An adequate explanation could involve merely describing the type of AI algorithm used, disclosing the objective it was established to meet, and showing how the algorithm processed a certain type of data to produce results that were shown to meet its defined objective as well as or better than current processes.

Such an explanation would, in effect, mirror the kinds of explanations that administrators currently offer when they rely on physical rather than digital machines. For example, in justifying the imposition of an administrative penalty on a food processor for failing to store perishable food at a cool temperature, an administrator need not be able to explain exactly how a thermometer works, just that it reports temperatures accurately. Courts have long treated instrument validation for physical machines as a sufficient basis for agency actions grounded on such instruments. Moreover, they have typically deferred to administrators’ expertise in cases in which government officials have relied on complex instruments or mathematical analyses. In fact, the US Supreme Court in Baltimore Gas & Electric Co. v Natural Resources Defense Council called upon courts to be their “most deferential” when an administrative agency is “making predictions, within its area of special expertise, at the frontiers of science.”Footnote ¹⁹ More recently, the Supreme Court noted in Marsh v Oregon Natural Resource Council that whenever an agency decision “requires a high degree of technical expertise,” we must defer to “the informed discretion of the responsible agencies.”Footnote ²⁰ Lower courts have followed these instructions and in various contexts have upheld agencies’ reliance on complex algorithms and statistical tools (even if not truly AI ones).

It is difficult to see the US Supreme Court gaining any more confidence in judges’ ability to provide independent technological assessments when technologies and statistical techniques grow still more complex in an era of AI. Unless the Court should gain a new source of such confidence and abandon the postures it took in Baltimore Gas & Electric and Marsh, nothing in administrative law’s reason-giving requirements would seem to serve as any insuperable barrier to administrative agencies’ more extensive reliance on systems based on AI tools, such as machine learning or other advanced predictive techniques, even if they are properly characterized today as black box models. That portrayal of AI tools as a black box also appears likely to grow less apt in the coming decades, as data scientists are currently working extensively to develop advanced techniques that can better explain the outputs such complex systems generate.Footnote ²¹ Advances in “explainable” AI techniques likely will only make automation even more compatible with long-standing administrative law values.

Of course, all of this is not to say that agencies will or should always receive deference for how they design or operate their systems. Under the standard articulated in Motor Vehicle Manufacturers Association v State Farm Insurance Co., agencies will still need to provide basic information about the purposes behind their automated systems and how they generally operate.Footnote ²² They will need to show that they have carefully considered key design options. And they will likely need to demonstrate through accepted auditing and validation efforts that these systems do operate to produce results as intended.Footnote ²³ But all this is to say that it will almost certainly be possible for agencies to provide the necessary information to justify the outcomes that their systems produce. In other words, long-standing administrative law principles seem ready and fit for an automated age.

9.4 AI and Good Governance in an Automated State

In important respects, a shift to automated administration could even be said to represent something of an apotheosis of the principles behind administrative law. Much of administrative law has been focused on the potential problems created by the discretion that human officials exercise under delegated authority. By automating administration, these problems can be mitigated, and the control of human discretion may be enhanced by the literal hardwiring of certain governmental tasks.Footnote ²⁴

Automation can advance two major themes that have long characterized much of US administrative law: One theme centers on keeping the exercise of administrative authority democratically accountable, while the other seeks to ensure that such authority is based on sound expert judgment. The reason-giving thrust behind the Administrative Procedure Act’s arbitrary and capricious standard, for example, reflects both of these themes. Reasoned decision-making provides a basis for helping ensure that agencies both remain faithful to their democratic mandates and base their decisions on sound evidence and analysis. Likewise, the institutionalized regimen of White House review of prospective regulations both facilitates greater accountability to a democratically elected president and promotes expert agency decision-making through the benefit-cost analysis that it calls on agencies to conduct.Footnote ²⁵

In the same vein, in approving judicial deference to agencies’ statutory interpretations, it is hardly a coincidence that the US Supreme Court’s widely cited decision in Chevron v Natural Resources Defense Council stressed both reasons of democratic accountability and substantive expertise.Footnote ²⁶ It highlighted how agencies are situated within a “political branch of the Government” as well as how they simultaneously possess “great expertise” – and thus are better suited than courts to make judgments about the meaning of ambiguous statutory terms.Footnote ²⁷ Although the future of the Chevron doctrine itself appears uncertain at best, the Court’s underlying emphasis on accountability and expertise is unlikely to disappear, as they are inherent qualities of administrative governance.

Both qualities can be enhanced by AI and ADM. It is perhaps most obvious that automation can contribute to the goal of expert administration. When automated systems improve the accuracy of agency decision-making – which is what makes AI and other data analytic techniques look so promising – this will necessarily promote administrative law’s goal of enhancing agency expertise. AI promises to deliver the state of the art when it comes to expert governing. When the Veterans Administration (VA), for example, recently opted to rely on an AI system to predict which veterans were at a higher risk of suicide (and thus in need of more urgent care), it did so because this analytic system was smarter than even experienced psychiatrists.Footnote ²⁸ “The fact is, we cannot rely on trained medical experts to identify people who are truly at high risk [because they are] no good at it,” noted one VA psychiatrist.Footnote ²⁹

Likewise, when it comes to administrative law’s other main goal – democratic accountability – ADM systems can also advance the ball. The democratic advantages of automation may seem counterintuitive at first: Machine-based governance would hardly seem consistent with a Lincolnesque notion of government by “the people.” But the reality is that automated systems themselves still demand people who can design, test, and audit such systems. As long as these human designers and overseers operate systems in a manner consistent with the parameters set out for an agency in its governing statute, AI and ADM systems themselves can prevent the kind of slippage and shirking that can occur when agencies must rely on thousands of human officials to carry out major national programs and policies. Even when it comes to making new rules under authority delegated to it by Congress, agencies could very well find that automation promotes democratic accountability rather than impedes it. Some level of accountability will be demanded by the properties of AI tools themselves. To function, the algorithms that drive these tools depend not merely on an “intelligible principle” to guide them; they need a principle that can be precisely specified in mathematical terms.Footnote ³⁰ In this way, automation could very well drive the demand for still greater specification and clarity in statutes about the goals of administration, more than even any potential judicial reinvigoration of the nondelegation doctrine might produce.

Although oversight of the design and development of automated systems will remain important to ensure that they are created in accord with democratically affirmed values, once operating, they should pose far fewer possibilities for the kinds of problems, such as capture and corruption, that administrative law has long sought to prevent. Unlike human beings, who might pursue their own narrow interests instead of those of the broader public, AI and ADM tools will be programmed to optimize the objectives defined by their designers. As long as these designers are accountable to the public, and as long as the system objectives are defined in non-self-interested ways that comport with relevant legislation, then the AI tools themselves pose no risk of capture and corruption. In an important sense, they will be more accountable in their execution than even human officials can be when it comes to implementing law.

This is not to suggest that automated systems will amount to a panacea nor that their responsible development and use will be easy. They can certainly be used in legally and morally problematic ways. Furthermore, their use by agencies will still be subject to constraints beyond administrative law – for instance, legal constraints under the First Amendment or the Equal Protection Clause that apply to all governmental actions. In fact, equality concerns raised by the potential for AI bias may well become the most salient legal issue that automated systems will confront in the coming years. Bias obviously exists with human decision-making, but it also is a concern with AI tools, especially when the underlying data used to train the algorithms driving these tools already contain human-created biases. Nevertheless, absent an independent showing of animus, automated systems based on AI may well withstand scrutiny under equal protection doctrine, at least if that doctrine does not change much over time.Footnote ³¹

Governmental reliance on AI tools would be able to avoid actionable conduct under equal protection analysis even if an administrator elected to use data that included variables on race, gender, or other protected classifications. As long as the objective the AI tool is programmed to achieve is not stated in terms of such protected classifications, it will be hard, if not impossible, to show that the tool has used any class-based variables as a determinative basis for any particular outcome. The outcomes these AI tools generate derive from effectively autonomous mathematical processes that discern patterns among variables and relationships between different variables. Presumably, AI tools will seldom if ever support the kind of clear and categorical determinations based on class-related variables that the US Supreme Court has rejected, where race or other protected classes have been given an explicit and even dispositive weight in governmental decisions.Footnote ³² Even when processing data on class variables, the use of AI tools might well lead to better outcomes for members of a protected class overall.Footnote ³³

Moreover, with greater reliance on AI systems, governments will have a new ability to reduce undesired biases by making mathematical adjustments to their models, sometimes without much loss in accuracy.Footnote ³⁴ Such an ability will surely make it easier to tamp out biases than it is to eliminate humans’ implicit biases. In an automated state of the future, government may find itself less prone to charges of undue discrimination.

For these reasons, it would appear that long-standing principles of administrative law, and even constitutional law, will likely continue to operate in an automated state, encouraging agencies to act responsibly by both preserving democratic accountability and making smarter, fairer decisions. This is not to say that existing principles will remain unchanged. No one should expect that any area of the law will stay static over the long term. Given that some scholars and observers have already come to look critically upon governmental uses of AI and ADM tools, perhaps shifting public attitudes will lead to new, potentially more demanding administrative law principles specifically targeting the automated features of the future administrative state.Footnote ³⁵

While we should have little doubt that norms and best practices will indeed solidify around how government officials ought to use automated systems – much as they have developed over the years for the use of other analytic tools, such as benefit-cost analysis – it is far from clear that the fundamentals of administrative law will change dramatically in an era of automated governance.Footnote ³⁶ Judges, after all, will confront many of the same difficulties scrutinizing AI tools as they have confronted in the past with respect to other statistical and technical aspects of administration, which may lead to continued judicial deference as exemplified in Baltimore Gas & Electric.Footnote ³⁷ In addition, rather than public attitudes turning against governmental use of AI and ADM tools, it may just as easily be expected that public expectations will be shaped by widespread acceptance of AI in other facets of life, perhaps even leading to affirmative demands that governments use ADM tools rather than continuing to rely on slower or less reliable processes.Footnote ³⁸ Cautious about ossifying automated governance, judges and administrative law scholars might well resist the urge to impose new doctrinal hurdles on automation.Footnote ³⁹ They may also conclude, as would be reasonable, that existing doctrine contains what is needed to ensure that government agencies use automated systems responsibly.

As a result, if government agencies wish to expand the responsible use of properly trained, audited, and validated automated systems that are sufficiently aligned with legislative mandates and improve agencies’ ability to perform key tasks, it seems they will hardly need any transformation of traditional administrative law principles to accommodate these innovations. Nor will administrative law need to adapt much, if at all, to ensure that kind of responsible use of automated governance. Overall, an automated state could conceivably do a better job than ever before of fulfilling the vision of good governance that has long animated administrative law.

9.5 Conclusion: The Need for Human Empathy

Still, even if the prevailing principles of administrative law can deal adequately with public sector use of AI tools, something important could easily end up getting lost in an automated state. Such an administrative government might be smarter, more democratically accountable, and even more fair. But it could also lack feeling, even more than sterile bureaucratic processes do today. Interactions with government through smartphones and automated chats may be fine for making campground reservations at national parks or even for filing taxes. But they run the risk of leaving out an important ingredient of good governance – namely, empathy – in those circumstances in which government must make highly consequential decisions affecting the well-being of individuals. In such circumstances, empathy demands that administrative agencies provide opportunities for human interaction and for listening and expressions of concern. An important challenge for administrative law in the decades to come will be to find ways to encourage an automated state that is also an empathic state.

A desire for empathy, of course, need not impede the development of automation.Footnote ⁴⁰ If government manages the transition to an automated state well, it is possible that automation can enhance the government’s ability to provide empathy to members of the public, but only if government officials are sufficiently attentive to the need to do so. This need will become even greater as the overall economy moves toward greater reliance on AI and ADM systems. Society will need to value and find new ways to fulfill those tasks involving empathy that humans are good at providing. The goal should be, as technologist Kai-Fu Lee has noted, to ensure that “while AI handles the routine optimization tasks, human beings … bring the personal, creative, and compassionate touch.”Footnote ⁴¹

Already, public administration experts recognize that this is one of the great potential advantages of moving to an automated state. It can free up government workers from drudgery and backlogs of files to process, while leaving them more time and opportunities to connect with those affected by agency decisions.Footnote ⁴² A recent report jointly issued by the Partnership for Public Service and the IBM Center for Business and Government explains the importance of this shift in what government employees do:

Many observers who envision greater use of AI in government picture more face-to-face interactions between agency employees and customers, and additional opportunities for more personalized customer services. The shift toward employees engaging more with agency customers is expected to be one of several possible effects of automating administrative tasks. Relieved of burdensome paperwork, immigration officers could spend more time interacting with visa applicants or following up on individual immigration cases. Scientists could allot more of their day to working with research study participants. And grants managers could take more time to learn about and support individual grantees. On average, federal employees now spend only 2 percent of their time communicating with customers and other people outside their agencies, or less than one hour in a workweek, according to one study. At the same time, citizens want government to do better. The experiences customers have with companies is driving demand for personalized government services. In a survey of more than 6,000 people from six countries, including the United States, 44 percent of respondents identified personalized government services as a priority.Footnote ⁴³

Not only does a substantial portion of the public already recognize the need for empathic, personalized engagement opportunities with government, but as private sector organizations invest more in personalized services, this will only heighten and broaden expectations for similar empathy from government. We already know from extensive research on procedural justice that the way the government treats members of the public affects their sense of legitimacy in the outcomes they receive.Footnote ⁴⁴ To build public trust in an automated state, government authorities will need to ensure that members of the public still feel a human connection. As political philosopher Amanda Greene has put it, “government must be seen to be sincerely caring about each person’s welfare.”Footnote ⁴⁵

Can administrative law help encourage empathic administrative processes? Some might say that this is already a purpose underlying the procedural due process principles that make up administrative law. Goldberg v Kelly, after all, guarantees certain recipients of government benefits the right to an oral hearing before a neutral decision-maker prior to the termination of their benefits, a right that does afford at least an opportunity for affected individuals to engage with a theoretically empathic administrative judge.Footnote ⁴⁶ But the now-canonical test of procedural due process reflected in Mathews v Eldridge is almost entirely devoid of attention to the role of listening, caring, and concern in government’s interactions with members of the public.Footnote ⁴⁷ Mathews defines procedural due process in terms of a balance of three factors: (1) the affected private interests; (2) the potential for reducing decision-making error; and (3) the government’s interests concerning fiscal and administrative burdens. AI automation would seem to pass muster quite easily under the Mathews balancing test. The first factor – the private interests at stake – will be external to AI, but AI systems would seem always to fare well under the second and third factors. Their great promise is that they can reduce errors and lower administrative costs.

This is where existing principles of administrative law will fall short in an automated state and where the need for greater vision will be needed. Hearing rights and the need for reasons are about more than just achieving accurate outcomes, which is what the Mathews framework implies. On the contrary, hearings and reason-giving might not be all that good at achieving accurate outcomes, at least not as consistently as automated systems. A 2011 study showed that, among the fifteen most active administrative judges in one office of the Social Security Administration, “the judge grant rates … ranged … from less than 10 percent being granted to over 90 percent.”Footnote ⁴⁸ The study revealed, for example, that three judges in this same office awarded benefits to no more than 30 percent of their applicants, while three other judges awarded to more than 70 percent.Footnote ⁴⁹ Other studies have suggested that racial disparities may exist in Social Security disability awards, with certain Black applicants tending to receive less favorable outcomes than white applicants.Footnote ⁵⁰ Against this kind of track record, automated systems promise distinct advantages when they can be shown to deliver fairer, more consistent, and even speedier decisions.

But humans will still be good at listening and empathizing with the predicaments of those who are seeking assistance or other decisions from government, or who otherwise find themselves subjected to its constraints.Footnote ⁵¹ It is that human quality of empathy that should lead the administrative law of procedural due process to move beyond just its current emphasis on reducing errors and lowering costs.

To some judges, the need for an administrative law of empathy may lead them to ask whether members of the public have a “right to a human decision” within an automated state.Footnote ⁵² But not all human decisions are necessarily empathic ones. Moreover, a right to a human decision would bring with it the possibility that the law would accept all the flaws in human decision-making simply to retain one of the virtues of human engagement. If automated decisions turn out increasingly to be more accurate and less biased than human ones, a right to a decision by humans would seem to deny the public the desirable improvements in governmental performance that AI and ADM tools can deliver.

Administrative law need not stand in the way of these improvements. It can accept the use of AI and ADM tools while nevertheless pushing government forward toward additional opportunities for listening and compassionate responses.Footnote ⁵³ Much as the US Supreme Court in Goldberg v. Kelly insisted on a pretermination hearing for welfare recipients, courts in the future can ask whether certain interests are of a sufficient quality and importance to demand that agencies provide supplemental engagement with and assistance to individuals subjected to automated processes. Courts could in this way seek to reinforce best practices in agency efforts to provide empathic outreach and assistance.

In the end, if administrative law in an automated state is to adopt any new rights, society might be better served if courts avoid the recognition of a right to a human decision. Instead, courts could consider and seek to define a right to human empathy.

10 Sorting Teachers Out Automated Performance Scoring and the Limit of Algorithmic Governance in the Education Sector

Ching-Fu Lin

^*

10.1 Introduction

Big data is increasingly mined to train ADM tools, with consequential reverberations. Governments are among the primary users of such tools to sort, rank, and rate their citizens, creating a data-driven infrastructure of preferences that condition people’s behaviours and opinions. China’s social credit system, Australia’s robo-debt program,Footnote ¹ and the United States’ welfare distribution platform are prime examples of how governments resort to ADM to allocate resources and provide public services.Footnote ² Some commentators point to the rule of law deficits in the automation of government functions;Footnote ³ others emphasize how such technologies systematically exacerbate inequalities;Footnote ⁴ and still others argue that a society constantly being scored, profiled, and predicted threatens due process and justice generally.Footnote ⁵ In contemporary workplaces, algorithmically powered tools have also been widely adopted in business practices for efficiency, productivity, and management purposes.Footnote ⁶ Camera surveillance, data analysis, and ranking and scoring systems are algorithmic tools that have given employers enormous power over the employed, yet their use also triggers serious controversies over privacy, ethical concerns, labour rights, and due process protection.Footnote ⁷

Houston Federation of Teachers v Houston Independent School District presents yet another controversial example of government ‘algorithmization’ and the power and perils of automated ranking and rating, targeting at a specific profession – teachers. The case concerns the implementation of value-added models (VAMs) that algorithmically link a teacher’s contributions to students’ growth on standardized tests and hold teachers accountable through incentives such as termination, tenure, or contract nonrenewal. The Houston Independent School District refused to renew more than 200 teachers’ contracts in 2011 based on low value-added scores. The VAM is proprietary and is not disclosed to those affected, precluding them from gaining an understanding of the internal logic and decision-making processes at work, thereby causing serious harm to due process rights. Similar practices prevail across the United States following the enactment of the 2002 No Child Left Behind Act and the 2011 Race to the Top Act, in conjunction with other federal policy actions. Interestingly, until the 2017 summary judgment rendered by the Court in Houston Federation of Teachers v Houston Independent School District, which ruled in favour of the affected teachers, federal constitutional challenges against the use of VAMs for termination or nonrenewal of teachers’ contracts were generally rejected. Yet, the case has received little attention, as it was subsequently settled.

The growing algorithmization of worker performance evaluation and workplace surveillance in the name of efficiency and productivity is not limited to specific industry sectors or incomes, and it has been implemented so rapidly that regulators struggle to catch up and employees suffer in an ever-widening power asymmetry. Algorithmically powered workplace surveillance and worker performance evaluation effectively expand employers’ capacity of control by shaping expectations and conditioning the behaviours of employees, which may further distort the nature of the relationship between the employer and the employed. Furthermore, such algorithmic tools have been widely criticized to be neither reliable nor transparent and also prone to bias and discrimination.Footnote ⁸ Hence, the prevalent use of algorithmic worker productivity and performance evaluation systems poses serious economic, social, legal, and political ramifications.

This chapter therefore asks critical questions that remain unanswered. What are the normative ramifications of this case? How can due process protection – procedural or substantive – be ensured under the maze of crude algorithmic worker productivity and performance evaluation systems such as the VAM, especially in light of the black box problems?Footnote ⁹ Can judicial review provide a viable form of algorithmic governance? How are such ADM tools reshaping professions like education? Does the increasingly blurred line between public and private authority in designing and applying these algorithmic tools pose new threats? Premised upon these scholarly and practical inquiries, this article seeks to examine closely the case of Houston Federation of Teachers v Houston Independent School District, analyze its ramifications, and provide critical reflections on ways to harness the power of automated governments.

10.2 The Contested Algorithmization of Worker Performance Evaluation

Recently, organizations have increased their use of algorithmically powered tools used for worker productivity monitoring and performance evaluation. With the help of camera surveillance, data analysis, and ranking and scoring systems,Footnote ¹⁰ such tools have given employers significant power over their employees. Growing power asymmetry thereby disrupts the labour market and redefines the way people work. Amazon notoriously uses a combination of AI tools to recruit, monitor, track, score, and even automatically fire its employees and contractors, and these second-by-second measurements have raised serious concerns regarding systematic bias, discrimination, and human rights abuse.Footnote ¹¹ Specifically, Amazon uses AI automated tracking systems to monitor and evaluate its delivery drivers, who are categorized as ‘lazy’ if their movements are too slow and receive warning notifications if they fail to meet the required workloads.Footnote ¹² The system can even generate an automated order to lay off an employee without the intervention of a human supervisor.Footnote ¹³ Despite the associated physical and psychological suffering, if an employee does not agree to be algorithmically monitored and controlled, the individual will lose his or her job.Footnote ¹⁴

Cashiers, truck drivers, nursing home workers, and many other lower-paying jobs across various sectors have followed suit in adopting Amazon’s algorithmization of workers’ performance evaluation, aimed at maximizing productivity per capita per second and automating constant micromanagement. Employees who are under such performance evaluation programs can feel pressured to skip interval breaks and bathroom or coffee breaks to avoid adverse consequences.Footnote ¹⁵ According to a recent in-depth study published in The New York Times, eight of the ten largest corporations in the United States have deployed systems to track, often in real time, individual workers’ productivity metrics under varied frameworks of data-driven control.Footnote ¹⁶ The global COVID-19 pandemic has further prompted corporations under profit pressures to keep tighter tabs on employees by means of online and real-time AI evaluation, thus accelerating a paradigm shift of workplace power that was already well underway.Footnote ¹⁷ Many of the practices adopted during COVID-19 will likely continue and become normalized in the post-pandemic era.

White-collar jobs are not immune from the growing algorithmization of worker performance evaluation. Architects, financial advisors, lawyers, pharmaceutical assistants, academic administrators, and even doctors and chaplains can be placed under extensive monitoring software that constantly accumulates records, and they are paid ‘only for the minutes when the system detected active work’, or are subject to a ‘productivity points’ management system that calibrates pay based on individual scores.Footnote ¹⁸ For example, some law firms are increasingly subjecting their contract lawyers to smart surveillance systems that constantly monitor their performance during work days in the name of efficiency facilitation and quality control.Footnote ¹⁹ It appears evident that the growing automation of worker performance evaluation is not limited to specific industry sectors or incomes, and such practices are spreading at such a rapid rate that regulators struggle to catch up and employees suffer from widening power asymmetry.

As Ifeoma Ajunwa, Kate Crawford, and Jason Schultz observe, due to recent technological innovations, data-driven worker performance evaluation in the United States is on the rise through tools including employee ratings, productivity apps, worker wellness programs, activity reports, and color-coded charts.Footnote ²⁰ They further argue that such ‘limitless worker surveillance’ has left millions of employees at the mercy of minute-by-minute monitoring by their employers that undermines fair labour rights, yet the existing legal framework offers few meaningful constraints.Footnote ²¹

Indeed, algorithmically powered workplace surveillance and worker performance evaluation are often adopted by enterprises to increase efficiency and improve productivity, expand corporate capacity by shaping expectations, and condition the behaviours of employees.Footnote ²² However, the adoption of such systems not only intrudes upon the privacy and labour rights of employees,Footnote ²³ but also harms their physical and mental well-being under a lasting framework of suppression.Footnote ²⁴ In a larger context, the dominance of ADM tools for workplace surveillance and worker performance evaluation may distort the nature of the relationship between the employer and the employed and weaken psychological contracts, job engagement, and employee trust.Footnote ²⁵ The gap in power asymmetry is institutionally widened by the systematic use of ADM tools that are neither reliable nor transparent and are also prone to bias and discrimination.Footnote ²⁶

Automated worker productivity monitoring and performance evaluation represents a system of mechanical enforcement without empathy or moral responsibility, which potentially dehumanizes the inherently person-to-person process of work management, reward and punishment allocation, and contractual interactions. These tools, cloaked in the promise of technologically supported management and data-driven efficiency, focus not on process but on results, which are observed and calculated based on arbitrary parameters or existing unfair and discriminatory practices. Given the black box nature of these tools, human supervisors, if any, cannot easily detect and address the mistakes and biases that arise in the ADM process. As a result, the use of algorithmic worker productivity monitoring and performance evaluation systems is increasingly contested and criticized for its controversial economic, social, legal, and political ramifications.

10.3 Sorting Teachers Out? Unpacking Houston Federation of Teachers v Houston Independent School District

Concerns over algorithmic worker productivity monitoring and performance evaluation systems came to light in the recent lawsuit over the use of VAMs in the United States – Houston Federation of Teachers v Houston Independent School District.Footnote ²⁷ This case presents yet another controversial dimension of algorithmic worker productivity monitoring and performance evaluation in the education sector. Houston Federation of Teachers v Houston Independent School District involves the implementation of VAMs by the Houston Independent School District that algorithmically link a teacher’s contributions to students’ growth on standardized tests, the results of which inform decisions on teachers’ tenure or contract (non)renewal. In 2011, the Houston Independent School District, citing low value-added scores, refused to renew its contract with more than 200 teachers. The VAM is proprietary and is not disclosed to those affected, precluding them from gaining an understanding of the internal logic and decision-making processes at work and causing serious harm to due process rights. Similar practices prevail across the United States following the enactment of the 2002 No Child Left Behind Act and the 2011 Race to the Top Act, in conjunction with other federal policy actions. Before the 2017 summary judgment rendered by the Court in Houston Federation of Teachers v Houston Independent School District, which ruled in favour of the affected teachers, federal constitutional challenges against the use of VAMs for termination or nonrenewal of teachers’ contracts were generally rejected. Nevertheless, the case was subsequently settled and has interestingly received little attention. This chapter unpacks the case and endeavours to offer a critical analysis of its legal and policy ramifications.

Since 2010, the Houston Independent School District has applied a data-driven approach to monitor and evaluate teachers’ performance with the aim to enhance the effectiveness of teaching from an outcome-based perspective. The algorithmically powered evaluation system implemented by the Houston Independent School District has three appraisal criteria – instructional practice, professional expectations, and student performance.Footnote ²⁸ To narrow down the parameters for discussion, it should be noted that the primary focus of the case, Houston Federation of Teachers v Houston Independent School District, resides in the third component – student performance. Under the algorithmic work performance evaluation system, it is assumed that student growth and improvement in standardized test scores could appropriately reflect a specific teacher’s impact on (or added value to) individual student performance, which is known as the VAM for teaching evaluations.Footnote ²⁹ By implementing this system, student growth is calculated using the Educational Value-Added Assessment System (EVAAS), a proprietary statistical model developed by a private software company, SAS, and licensed for use by the Houston Independent School District.Footnote ³⁰ This automated teacher evaluation system works by comparing the average test score growth of students taught by the teacher being evaluated with the statewide average for students in the same grade or course. The score is then processed by SAS’s proprietary algorithmic program and subsequently sorted into an effectiveness rating system.Footnote ³¹

In essence, under the VAM model, a teacher’s algorithmically generated score was based on comparing the average growth of student test scores of the specific teacher compared to the average number state-wide, and the score was then converted to a test statistic called the Teacher Gain Index.Footnote ³² This measure was used to classify teachers into five levels of performance, ranging from ‘well above’ to ‘well below’ average.Footnote ³³ It should be noted that the automated teacher evaluation system was initially used to inform and determine teacher bonuses, but as later implemented by the Houston Independent School District, the algorithmic system was used to automate sanctions on employed teachers for low student performance on standardized tests.Footnote ³⁴ The Houston Independent School District declared in 2012 its management goal of ensuring that ‘no more than 15% of teachers with ratings of ineffective are retained’, and around 25 per cent of the ‘ineffective teachers’ were ‘exited’.Footnote ³⁵

The plaintiff in this case, Houston Federation of Teachers, argued that the use of EVAAS violated the following elements of the Fourteenth Amendment.Footnote ³⁶ First, the use of EVAAS violates the procedural due process right of the plaintiff because of the lack of sufficient information needed to meaningfully challenge terminations of contracts based on low EVAAS scores. Second, the substantive due process right is also violated, as there is no rational relationship between EVAAS scores and the Houston Independent School District’s goal of employing effective teachers. Furthermore, since the EVAAS system is too vague to provide notice to teachers regarding how to achieve higher ratings and avoid adverse employment consequences, the use of EVAAS again violates the plaintiff’s substantive due process right. Third, the plaintiff’s right to equal protection is harmed by the Houston Independent School District’s policy of aligning teachers’ instructional performance ratings with their EVAAS scores.

The court began its analysis with the plaintiff’s protected property interests.Footnote ³⁷ Referring to past jurisprudence, the court notes that, regardless of their employment status under probationary, term, or continuing contract, teachers generally have a protected property interest under their respective employment contracts (either during the term of the contract or under continued employment, according to the type of contract).Footnote ³⁸ In this sense, the teachers who were adversely impacted by the use of EVAAS in the present case have a constitutionally protected property interest derived from the contractual relationship. The court denied the Houston Independent School District’s argument that ‘a due process plaintiff must show actual deprivation of a constitutional right’.Footnote ³⁹ Importantly, the plaintiff in the present case sought ‘a declaratory judgment and permanent injunction’ barring the use of EVAAS in determining the renewal or termination of teacher contracts rather than monetary compensation and seeking an institutional and systematic outcome. According to past jurisprudence relevant to this case, ‘[o]ne does not have to await the consummation of threatened injury to obtain preventive relief’. Such a statement recommends that a demonstration of ‘realistic danger’ be sufficient.Footnote ⁴⁰ As the facts of the case demonstrate a relationship between EVAAS scores and teacher employment termination, the court found that the VAM evaluation system ‘poses a realistic threat to protected property interests’ for those teachers.Footnote ⁴¹

The court then turned to the procedural due process issue, which consists of the core value of ‘the opportunity to be heard at a meaningful time and in a meaningful manner’ to ensure that governmental decisions are fair and accurate.Footnote ⁴² The Houston Federation of Teachers argued that the Houston Independent School District failed the minimum procedural due process standard to provide ‘the cause for [the teacher’s] termination in sufficient detail so as to enable [the teacher] to show any error that may exist’. The algorithms and data used for the EVAAS evaluation system were proprietary and remained unavailable and inaccessible to the teachers who were affected, and the accuracy of scores could not be verified.Footnote ⁴³ To address this issue, the court first acknowledged that, as the Houston Independent School District had admitted, the algorithms were retained by SAS as a trade secret, prohibiting access by the teachers as well as the Houston Independent School District, and any efforts to replicate the scores would fail. Furthermore, the calculation of EVAAS scores may be erroneous due to mistakes in the data or the algorithm code itself. Such mistakes could not be promptly corrected, and any reanalysis would potentially affect all other teachers’ scores.Footnote ⁴⁴

The court then agreed to the plaintiff’s application of the following standard from Banks v. Federal Aviation Admin., 687 F.2d 92 (5th Cir. 1982), that ‘due process required an opportunity by the controllers to test on their own behalf to evaluate the accuracy of the government-sponsored tests’.Footnote ⁴⁵ When a potential violation of constitutional rights arises from a policy that concerns trade secrets, ‘the proper remedy is to overturn the policy, while leaving the trade secrets intact’.Footnote ⁴⁶ Even if the Houston Independent School District had provided the teachers some basic information (e.g., a general explanation of the EVAAS test methods) under the standard adopted in Banks v Federal Aviation Admin., the measure still falls short of due process, since it does not change the fact that the teachers are unable to verify or replicate the EVAAS scores.Footnote ⁴⁷ Since it is nearly impossible for the teachers to obtain or ensure accurate EVAAS scores and they are therefore ‘unfairly subject to mistaken deprivation of constitutionally protected property interests in their jobs’, the Houston Independent School District was denied summary judgment on this procedural due process claim.Footnote ⁴⁸

The issues involved in the substantive due process are twofold. The first issue relates to whether the challenged measure had a rational basis.Footnote ⁴⁹ The Houston Federation of Teachers argued that EVAAS went against the protection of substantive due process, since there was no rational relationship between EVAAS scores and the Houston Independent School District’s goal of ‘having an effective teacher in every [Houston Independent School District] classroom so that every [Houston Independent School District] student is set up for success’.Footnote ⁵⁰ However, the court cited several examples of case law which supported the argument that a rational relationship existed in the present case and that ‘the loose constitutional standard of rationality allows governments to use blunt tools which may produce only marginal results’.Footnote ⁵¹ The second issue surrounding substantive due process concerned vagueness. The general standard for unconstitutional vagueness is whether a measure ‘fail[s] to provide the kind of notice that will enable ordinary people to understand what conduct it prohibits’ or ‘authorize[s] and even encourage[s] arbitrary and discriminatory enforcement’.Footnote ⁵² On the other hand, the court also acknowledged that a lesser degree of specificity is required in civil cases and that ‘broad and general regulations are not necessarily vague’.Footnote ⁵³ The court determined that the disputed measure in the present case was not vague, as the teachers who were impacted had been noticed or advised of the general information and possible effect of the use of the EVAAS evaluation system by their institutions.Footnote ⁵⁴

Finally, the court reviewed the plaintiff’s equal protection claim. If a measure lacks a rational basis for the difference in treatment, that is, if the classification system used to justify the different treatment fails to rationally relate to a legitimate governmental objective, it may violate the Equal Protection Clause.Footnote ⁵⁵ However, in this present case, the court denied the plaintiff’s claim that the EVAAS rating scores represented a classification system. Even if they had, the court deemed that a rational basis existed, as explored with regard to the substantive due process claims.Footnote ⁵⁶ In summary, the Houston Independent School District’s motion for summary judgment on the procedural due process claim was denied, but summary judgment on all other claims was granted.Footnote ⁵⁷

10.4 Judicial Review as Algorithmic Governance? Controversies, Ramifications, and Critical Reflections

It should be noted that, before the summary judgment ruling was reached in Houston Federation of Teachers v Houston Independent School District, some existing literature mentioned the issue of policy failures within the Houston Independent School District’s algorithmic work performance evaluation systems and the subsequent measures implemented on the teachers who were adversely affected. Some policies have noted that, while high-quality teachers can greatly benefit students, the ‘effectiveness’ of teachers may be difficult to assess because it correlates with non-observable characteristics.Footnote ⁵⁸ To address the challenges of teacher evaluation and management, better information on real-world quality contributes to the productiveness of personnel policies and management decisions, but the accuracy of such information and its correlation with student performance cannot be easily observed.Footnote ⁵⁹

Julie Cullen and others conducted an empirical study that compared the patterns of attrition before and after the implementation of the Houston Independent School District’s automated work performance evaluation system as well as the relationship between these patterns and student achievement. These researchers found that, although the algorithmic work performance evaluation system seemingly improves the quality teacher workforce, as it increases the exit rate of low-performing teachers, the statistics that imply this relationship are exclusively more obvious in low-achieving schools, as opposed to middle- and high-achieving schools.Footnote ⁶⁰ More importantly, Cullen et al. also found that the exits resulting from the automated work performance evaluation system were too poorly targeted to induce any meaningful gains in student achievement and net policy effects.Footnote ⁶¹ They further suggested that the Houston Independent School District’s algorithmic work performance measures were ineffective and proposed other substitutive measures via recruitment of new teachers or improvements in existing teaching employees.Footnote ⁶²

Bruce Baker and colleagues discussed legal controversies over unfair treatment and inadequate due process mechanisms since such automated teacher evaluation models are embedded with problematic features and parameters, such as non-negotiable final decisions, inaccessible information, and the use of imprecise data.Footnote ⁶³ Algorithmic teacher evaluation models like EVAAS systems are prone to structural problems. First, such systems require that all ‘objective measures of student achievement growth’ be considered, which may lead to inaccurate outcomes, since the model disregards the fact that the validity and reliability of these measures can vary and that random errors or biases may occur, with no opportunity to question and reassess the validity of any measure.Footnote ⁶⁴ Second, the standards for placing teachers into effectiveness score bands and categories are unjustifiable, as the numerical cutoffs are rigid and temporally static. A difference in one point or percentile does not necessarily indicate any actual differences in the performance of the evaluated teachers. However, it can lead to a distinctly different effectiveness category and consequentially endanger a teacher’s employment rights.Footnote ⁶⁵ While models that are based on VAMs theoretically attempt to reflect student achievement growth that can be attributed (directly) to a specific instructor’s teaching quality and performance, they can hardly succeed in making a fair connection in reality, since it is nearly impossible to discern whether the evaluation estimates have been contaminated by uncontrollable or biased factors, and the variation in ratings is quite broad.Footnote ⁶⁶ By dismissing teachers under such an arbitrary evaluation system, possible violations of due process rights under the Fourteenth Amendment in the form of harm to liberty interests by adversely affecting teachers’ employment or harm to property interests in continued employment may likely occur, as shown in Houston Federation of Teachers v Houston Independent School District. Likewise, VAMs may be challenged against procedural or substantive due process claims surrounding the technical flaws of value-added testing policies, including the instability of the reliability of those measures along with their questionable interpretations, the doubtful validity of the measure and the extent to which it proves a specific teacher’s influence over student achievement, and the accessibility and understandability of the measures to an evaluated teacher as well as the teacher’s ability to control relevant factors.Footnote ⁶⁷ VAMs are limited measures in terms of properly assessing teacher ‘effectiveness’, and ‘it would be foolish to impose on these measures, rigid, overly precise high stakes decision frameworks’.Footnote ⁶⁸

In Houston Federation of Teachers v Houston Independent School District, the court found a procedural due process violation mainly because those teachers had no way to replicate and challenge their scores. In addition, the court also indicated concern over the accuracy issue of the algorithmic tool, which has never been verified or audited whatsoever.Footnote ⁶⁹ In a way, the case marks ‘an unprecedented development in VAM litigation’, and as a result, VAMs used in other states and elsewhere in education management policies should garner greater interest and concern.Footnote ⁷⁰ As per the judge in Houston Federation of Teachers v Houston Independent School District, when a government agency adopts a management policy of making highly consequential decisions with regard to employment renewal and termination based on opaque algorithms incompatible with minimum due process, the court is poised to offer a proper remedy to overturn the use of this algorithmic tool.Footnote ⁷¹ After Houston Federation of Teachers v Houston Independent School District, other states and districts in similar situations have been strongly incentivized to reconsider their use of the EVAAS algorithmic teacher evaluation system or other VAMs by separating consequential personnel decisions from evaluation estimates to avoid potential claims of due process violations.Footnote ⁷² On the other hand, the use of EVAAS (or other VAMs) for low-stakes purposes should also be reconsidered, as the court in Houston Federation of Teachers v Houston Independent School District expressed its concern over the actual extent to which ‘teachers might understand their EVAAS estimates so as to use them to improve upon their practice’.Footnote ⁷³

As a number of states have adopted automated teacher performance evaluation systems that allow VAM data to be the sole or primary consideration in the decision-making process with regard to review, renewal, or termination of employment contracts, the outcome of Houston Federation of Teachers v Houston Independent School District and its legal and policy ramifications might demonstrate a broad reach.Footnote ⁷⁴ Indeed, the lawsuit itself has opened up the possibility for teachers (at least those employed in public schools) to seek remedies for the controversial use of VAMs and other algorithmic teacher performance evaluation systems, especially when the teachers who had challenged such systems had been generally unsuccessful. Houston Federation of Teachers v Houston Independent School District, despite being ultimately settled, paves a viable litigation path to challenge the increasingly automated worker performance evaluation in the education sector.

Now it seems possible that due process challenges (at least procedural due process) will persist, as the court drew attention to ‘the fact that procedural due process requires a hearing to determine if a district’s decision to terminate employment is both fair and accurate’.Footnote ⁷⁵ As noted by Mark Paige and Audrey Amrein-Beardsley, Houston Federation of Teachers v Houston Independent School District raised awareness about concerns over government transparency and ‘control of private, for-profit corporations engaged in providing a public good’,Footnote ⁷⁶ especially with regard to the use of black box algorithmic decision-making tools in the education sector. The case strongly questions the reliability of the EVAAS system in assessing and improving teacher quality, especially since undetectable errors can lead to significant consequences, including calls for public scrutiny, and seems to offer the potential to compel policymakers and practitioners to both re-examine and reflect on the level of importance (if any) VAM estimations should play in personnel decisions. An independent study on automated decision-making on the basis of personal data in the context of comparison between European Union and United States, which has been submitted to the European Commission’s Directorate-General for Justice and Consumers, also underlines that the court’s decision in Houston Federation of Teachers v Houston Independent School District ‘demonstrates that the Due Process Clause can serve as an important safeguard when automated decisions have a legal effect’.Footnote ⁷⁷

Nevertheless, regrettably, the controversial characteristics of such worker performance evaluation algorithms – the proprietary, black box, inaccessible, and unexplainable decision-making routesFootnote ⁷⁸ – have not occupied a critical spot of concern for legal challenges. The lawsuit in no way means that VAMs and other algorithmic worker evaluation systems should be systematically examined, fixed, or abandoned. As noted, the dominance of automated tools for workplace surveillance and worker performance evaluation may distort the nature of the relationship between the employer and the employed and weaken psychological contracts, job engagement, and employee trust. The gap in power asymmetry has been institutionally widened by the systematic use of algorithmic tools that are neither reliable nor transparent and are also prone to bias and discrimination. All of these issues remain out of the scope of examination in terms of judicial review. In line with this argument, Ryan Calo and Danielle Citron point out the problems of this growingly Automated State, noting a number of controversial cases, including Houston Federation of Teachers v Houston Independent School District. The researchers cite the ‘looming legitimacy crisis’ and call for a reconceptualization and new vision of the modern administrative state in the algorithmic society.Footnote ⁷⁹ They argue that, while scholarly have been asking how we might ensure that these automated tools can align with the existing legal contours such as due process, broader and structural questions on the legitimacy of automating public power remain unanswered.Footnote ⁸⁰ Indeed, without proper gatekeeping or accountability mechanisms, the growing algorithmization of worker performance evaluation can go unharnessed, especially when such practices are spreading at such a rapid rate that regulators struggle to catch up and employees face widening power asymmetry.

10.5 Conclusion

Automated worker productivity monitoring and performance evaluation indicate a system of mechanical enforcement, if not suppression, which practically dehumanizes the inherent person-to-person process of work management without empathyFootnote ⁸¹ or moral responsibility. The algorithmic tool, as implemented widely in Houston Federation of Teachers v Houston Independent School District, focuses not on process but on results, which are observed and calculated based on arbitrary parameters or the existing unfair and discriminatory practices. Cloaked in technologically supported management and data-driven efficiency, algorithmic worker productivity monitoring and performance evaluation systems create and likely perpetuate a way to rationalize automatic layoffs without meaningful human supervision. Given the black box characteristics of these automated systems, human supervisors cannot easily detect and address mistakes and biases in practice.

The court in Houston Federation of Teachers v Houston Independent School District provides a baseline for future challenges in the use of these algorithmic worker productivity monitoring and performance evaluation systems by public authority (not the private sector). Here, judicial review appears necessary and to some extent effective to ensure a basic level of due process protection. However, the ruling arguably only scratches the surface of the growing automation of workplace management and control and the resulting power asymmetry. Indeed, it merely touches on procedural due process and leaves intact critical questions such as algorithmic transparency, explainability, and accountability. In this sense, judicial review, with the conventional understanding of due process and rule of law, cannot readily serve as an adequate form of algorithmic governance that can harness data-driven worker evaluation systems.

Again, salient in Houston Federation of Teachers v Houston Independent School District, the affected teachers encountered formidable challenges to examine proprietary algorithms developed by a private company to assess public school teacher performance and make consequential employment decisions. The teachers who were ‘exited’ had no access to the algorithmic systems and received little explanation or context for their termination. Experts who were offered limited access to the source codes of the EVAAS also concluded that the teachers had no way to meaningfully verify their scores assigned by the system. The algorithmization of worker performance evaluation and surveillance is not and will not be limited to specific industry sectors or incomes. Individuals in other professions may not enjoy comparable social and economic support systems as the teachers in Houston Federation of Teachers v Houston Independent School District to pursue judicial review and remedies, and the algorithmic injustice they face may never be addressed.

Finally, the increasingly blurred line between public and private authorities and their intertwined collaboration in designing and applying these algorithmic tools pose new threats to the already weak effectiveness of rule of law and due process protection under the existing legal framework.Footnote ⁸² Any due process examination falls short at the interface of public and private collaboration, since the proprietary algorithms held by the private company constitute a black box barrier. The court in Houston Federation of Teachers v Houston Independent School District expressed significant concerns over the accuracy of the algorithmic system, noting that the entire algorithmic system was flawed with inaccuracies and was like a house of cards – the ‘wrong score of a single teacher could alter the scores of every other teacher in the district’ and ‘the accuracy of one score hinges upon the accuracy of all’.Footnote ⁸³ However, the black box process and automation itself were not considered problematic at all. Due process is needed in the context of the growing algorithmization of worker monitoring and evaluation so that affected employees may be able to partially ascertain the rationale behind data-driven decisions and control programs,Footnote ⁸⁴ but it must be reconceptualized and retooled to protect against the abovementioned threats to the new power dynamics.

11 Supervising Automated Decisions

Tatiana Cutts

11.1 Introduction

AI and ADM tools can help us to make predictions in situations of uncertainty, such as how a patient will respond to treatment, and what will happen if they do not receive it; how an employee or would-be employee will perform; or whether a defendant is likely to commit another crime. These predictions are used to inform a range of significant decisions about who should bear some burden for the sake of some broader social good, such as the relative priority of organ transplant amongst patients; whether to hire a candidate or fire an existing employee; or how a defendant should be sentenced.

Humans play a critical role in setting parameters, designing, and testing these tools. And if the final decision is not purely predictive, a human decision-maker must use the algorithmic output to reach a conclusion. But courts have concluded that humans also play a corrective roleFootnote ¹ – that, even if there are concerns about the predictive assessment, applying human discretion to the predictive task is both a necessary and sufficient safeguard against unjust ADM.Footnote ² Thus, the focus in academic, judicial, and legislative spheres has been on making sure that humans are equipped and willing to wield this ultimate decision-making power.Footnote ³

I argue that this focus is misplaced. Human supervision can help to ensure that AI and ADM tools are fit for purpose, but it cannot make up for the use of AI and ADM tools that are not. Safeguarding requires gatekeeping – using these tools just when we can show that they take the right considerations into account in the right way. In this chapter, I make some concrete recommendations about how to determine whether AI and ADM tools meet this threshold, and what we should do once we know.

11.2 The Determinative Factor

In 2013, Eric Loomis was convicted of two charges relating to a drive-by shooting in La Crosse, Wisconsin: ‘attempting to flee a traffic officer and operating a motor vehicle without the owner’s consent’.Footnote ⁴ The pre-sentence investigation (PSI) included COMPAS risk and needs assessments.Footnote ⁵ COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a suite of ADM/AI tools developed and owned by Equivant.Footnote ⁶ These tools are designed to predict recidivism risk for individual offenders and patterns across wider populations, by relying upon inferences drawn from representative pools of data. The sentencing judge explicitly invoked each COMPAS assessment to justify a sentence of six years in prison and five years of extended supervision.Footnote ⁷

Though the literature often refers to ‘the COMPAS algorithm’,Footnote ⁸ COMPAS is not a single algorithm that produces a single risk-score; rather, the COMPAS software includes a range of ADM tools that use algorithms to predict risk, which are described by Equivant as ‘configurable for the user’.Footnote ⁹ The tools available include: Pre-Trial Services,Footnote ¹⁰ which principally concern the risk that the accused will flee the jurisdiction; and three assessments (the General Recidivism Risk scale (GRR), the Violent Recidivism Risk scale (VRR), and the ‘full assessment’) which involve predictions about recidivism. The GRR, VRR, and full assessment are designed to inform public safety considerations that feed into decisions about resource-allocation across populations,Footnote ¹¹ and are used in several jurisdictions to decide how to treat individual offenders.

As the COMPAS software is a trade secret, only the score is revealed to the defendant and court. Nevertheless, Equivant’s public materials explain that the GRR includes factors such as: ‘criminal associates’;Footnote ¹² ‘early indicators of juvenile delinquency problems’;Footnote ¹³ ‘vocational/educational problems’;Footnote ¹⁴ history of drug use;Footnote ¹⁵ and age.Footnote ¹⁶ The enquiry into ‘vocational/educational problems’ in turn includes data points that are identified by defendants’ responses to questions such as: ‘how hard is it for you to find a job above minimum wage’; ‘what were your usual grades in school’; and ‘do you currently have a skill, trade, or profession at which you usually find work’.Footnote ¹⁷ Equivant notes that these data points are strongly correlated to ‘unstable residence and poverty’, as part of a pattern of ‘social marginalisation’.Footnote ¹⁸

The ‘full assessment’Footnote ¹⁹ is designed to assess a much wider set of ‘criminogenic need’Footnote ²⁰ factors, which are identified by the literature as ‘predictors of adult offender recidivism’.Footnote ²¹ These include ‘anti-social friends and associates’;Footnote ²² poor family and/or marital relationships (including whether the defendant was raised by their biological parents, parental divorce or separation, and family involvement in criminal activity, drugs, or alcohol abuse);Footnote ²³ employment status and prospects;Footnote ²⁴ school performance;Footnote ²⁵ and ‘poor use of leisure and/or recreational time’.Footnote ²⁶

Some of these factors are assessed according to the defendant’s own input to a pre-trial questionnaire, some are subjective observations made by the assessing agent, and some are objective data (such as criminal record). Scores are then incorporated by the agent into an overall narrative, which forms the basis of a sentencing recommendation by the district attorney. COMPAS is used by corrections departments, lawyers, and courts across the United States to inform many elements of the criminal process, including decisions about pre-trial plea negotiations; ‘jail programming’ requirements; community referrals; bail applications; sentencing, supervision, and probation recommendations; and the frequency and nature of post-release contact.Footnote ²⁷

Loomis’ PSI included both risk scores and a full criminogenic assessment, and each assessment informed the trial court’s conclusion that the ‘high risk and the high needs of the defendant’ warranted a six-year prison sentence with extended supervision.Footnote ²⁸ Loomis filed a motion for post-conviction relief, arguing that the court’s reliance on COMPAS violated his ‘due process’ rights in three ways: first, Loomis argues that ‘the proprietary nature of COMPAS’ prevented him from assessing the accuracy of predictive determinations;Footnote ²⁹ second, Loomis argued that use of COMPAS denied him the right to an ‘individualized’ sentence;Footnote ³⁰ finally, he argued that COMPAS ‘improperly uses gendered assessments’.Footnote ³¹ The trial court denied the post-conviction motion, and the Wisconsin Court of Appeals certified the appeal to the Supreme Court of Wisconsin (SCW).

Giving the majority judgment, Ann Walsh Bradley J. rejected the claim that Loomis had a right to see the internal workings of the COMPAS algorithms; it was, she said, enough that the statistical accuracy of the COMPAS risk scales had been verified by external studies,Footnote ³² and that Loomis had access to his own survey responses and COMPAS output.Footnote ³³ She noted that ‘some studies of COMPAS risk assessment have raised questions about whether they disproportionality classify minority offenders as having a higher risk of recidivism’.Footnote ³⁴ Nevertheless, the judge felt that this risk could be mitigated by requiring that the sentencing court be provided with an explanatory statement outlining possible shortcomings in overall risk prediction and the distribution of error.Footnote ³⁵

Addressing Loomis’ argument that use of the COMPAS scores infringed his right to an ‘individualized’ sentence, the judge considered that ‘[i]f a COMPAS risk assessment were the determinative factor considered at sentencing this would raise due process challenges regarding whether a defendant received an individualized sentence’.Footnote ³⁶ By contrast, ‘a COMPAS risk assessment may be used to enhance a judge’s evaluation, weighing, and application of the other sentencing evidence in the formulation of an individualized sentencing program appropriate for each defendant’,Footnote ³⁷ as ‘one tool available to a court at the time of sentencing’.Footnote ³⁸ The judge emphasised that the court, like probation officers, should feel empowered to disagree with algorithmic predictions as and where necessary.Footnote ³⁹

Finally, the judge rejected Loomis’ arguments about the ‘inappropriate’ use of gendered assessments, noting that ‘both parties appear to agree that there is statistical evidence that men, on average, have higher recidivism and violent crime rates compared to women’.Footnote ⁴⁰ Indeed, the judge concluded that ‘any risk assessment which fails to differentiate between men and women will misclassify both genders’.Footnote ⁴¹

Applying these considerations to the instant case, the judge concluded that there had been no failure of due process, because the COMPAS score had been ‘used properly’.Footnote ⁴² Specifically, ‘the circuit court explained that its consideration of the COMPAS risk scores was supported by other independent factors, its use was not determinative in deciding whether Loomis could be supervised safely and effectively in the community’.Footnote ⁴³

Human reasoning clearly feeds into processes of AI design and development, and humans are often needed to use the predictive outputs of algorithmic processes to make decisions. The question is whether the SCW was correct to conclude that, even if there are doubts about the quality of the algorithmic assessment (overall accuracy, distribution of the risk of error, or some other concern), human supervision at the time of decision-making is a sufficient safeguard against unjust decisions.

11.3 Individualism and Relevance

Justice is sometimes described as an ‘individualistic’ exercise, concerned with the ‘assessment of individual outcomes by individualized criteria’.Footnote ⁴⁴ Prima facie, this seems to be a poor fit use of statistics to make decisions about how to treat others. As a science, ‘statistics’ is the practice of amassing numerical data about a subset of some wider population or group, for the purpose of inferring conclusions from the former about the latter. And in Scanlon’s words, ‘statistical facts about the group to which a person belongs do not always have the relevant justificatory force’.Footnote ⁴⁵

But we often make just decisions by reference to the characteristics of a group to which the decision-subject belongs. During the COVID-19 pandemic, decisions about how to prioritise vaccination and treatment were made by governments and doctors across the world on the basis of facts about individuals that were shared with a representative sample of the wider population. There being statistical evidence to demonstrate that those with respiratory or auto-immune conditions were at an aggravated risk of serious harm, patients with these conditions were often prioritised for vaccination, whilst mechanical ventilation was reserved for seriously ill patients who were likely to survive treatment.Footnote ⁴⁶ Making ‘individualised’ decisions does not require us to ignore relevant information about other people; it simply requires us not to ignore relevant information about the decision-subject.

In this context, ‘relevant’ means rationally related to the social goal of improving health outcomes. A doctor ought to consider features of particular patients’ circumstances that shape their needs and likely treatment outcomes. She might, for instance, decide to ventilate an older but healthy patient – taking into account the patient’s age and an assessment of their overall well-being to conclude that treatment survival is highly likely. This is an ‘individualised’ assessment, in that it takes into account relevant facts, which are characteristics that this patient shares with others. By contrast, her decision should be unaffected by facts that do not bear on treatment success, such as whether the patient is a family member.

So, to justify a policy that imposes a burden on some people for the sake of a social goal, the policy must aim at some justified social goal, to which our selection criteria must be rationally related. The next question is whether ADM and AI tools can help us to make decisions on the basis of (all and only) relevant criteria.

11.4 Statistical Rules and Relevance

In 1943, Sarbin published the results of a study comparing the success of ‘actuarial’ (statistical) and ‘clinical’ (discretionary) methods of making predictions.Footnote ⁴⁷ The goal of the exercise was to determine which method would predict academic achievement more accurately. To conduct the experiment, Sarbin chose a sample of 162 college freshman, and recorded honor-point ratios at the end of the first quarter of their freshman year.Footnote ⁴⁸

Actuarial assessments were limited and basic: they were made by entering two variables (high school percentile rank and score on college aptitude test) into a two-variable regression equation. Individual assessments were made by the university’s clinical counsellors and included a far broader range of variables: an interviewer’s form and impressions; test scores for aptitude, achievement, vocation, and personality; and the counsellor’s own impressions.

Sarbin found that the actuarial method was more successful by a small margin than the individual method at predicting academic achievement, concluding that ‘any jury sitting in judgment on the case of the clinical versus the actuarial methods must on the basis of efficiency and economy declare overwhelmingly in favour of the statistical method for predicting academic achievement’.Footnote ⁴⁹

Many other studies have produced similar results across a range of different areas of decision-making, including healthcare, employee performance, and recidivism.Footnote ⁵⁰ Conrad and Satter compared statistical and discretionary predictions about the success of naval trainees in an electrician’s mate school.Footnote ⁵¹ They pitted the output of a two-factor regression equation (electrical knowledge and arithmetic reasoning test scores) against the predictions of interviewers on the basis of test scores, personal history data, and interview impressions. Their conclusions favoured the statistical method.

In principle, human reasoning that is unconstrained by (statistical or other) rules can be sensitive to a limitless range of relevant facts. But there are several caveats to this promising start. First, humans are easily influenced by irrelevant factors, or over-influenced by relevant factors, and extremely poor at recognising when we have been influenced in this way. There is now a great deal of literature detailing the many ‘cognitive biases’ that affect our decision-making, such as: ‘illusory correlation’ (hallucinating patterns from a paucity of available data) and ‘causal thinking’ (attributing causal explanations to those events).Footnote ⁵²

Second, the availability of more information does not necessarily translate into a broad decision process. Indeed, Sarbin found that the high-school rank and college aptitude test accounted for 31 per cent of the variance in honour-point ratio and for 49 per cent in the clinical predictions in his experimentFootnote ⁵³ – which is to say, the counsellors overweighted these two factors, and did not take into account any other measures available to them in a systematic way.

Thus, this theoretical advantage often fails to translate into better decision-making. Yet, AI and ADM tools are no panacea for decision-making under conditions of uncertainty. Predictive success depends on many factors, one of which is the relationship between the chosen proxy and the social goal in question. Sarbin himself noted the limitations of using honour-point ratio as a proxy for academic achievement,Footnote ⁵⁴ and the same concerns arise in many other areas of decision-making. For instance, predictions about recidivism are hampered by the fact that crime reports, arrest, and conviction data poorly mirror the actual incidence of crime.

Predictive success also depends upon the quality of the data, including whether that data is representative of the wider target population. The anti-coagulant medication warfarin is regularly prescribed to patients on the basis of dosing algorithms, which incorporate race as a predictor along with clinical and genetic factors.Footnote ⁵⁵ Yet, most of the studies used to develop these algorithms were conducted in cohorts with >95 per cent white European ancestry, and there is now robust evidence that these algorithms assign a ‘lower-than-needed dose’ to black patients, putting them at serious risk of heart attack, stroke, and pulmonary embolism.Footnote ⁵⁶

The Model for End-Stage Liver Disease (MELD) is used to calculate pre-treatment survival rates in liver transplant patients, on the basis of factors such as levels of bilirubin and creatinine in the blood. MELD scores are used to make decisions about which patients to prioritise for transplant. Yet, the MELD was developed on the basis of several studies that either did not report sex data, or which reported a statistical makeup of 70 per cent men (without disaggregating data in either case),Footnote ⁵⁷ and a recent study has found that women have a 19 per cent increased risk of wait-list mortality compared to men with the same MELD scores.Footnote ⁵⁸

So, AI and ADM tools can sometimes help us to make decisions on the basis of criteria that are rationally related to our social goal. Whether they do have this effect depends (inter alia) upon the quality of the data and the relationship between the chosen proxy and social goal in question. Yet, there may be countervailing reasons to exclude certain relevant factors from the decision-making process. I turn to these considerations now.

11.5 Choice

Overdose deaths from opioids across the United States increased to 75,673 in the twelve-month period ending in April 2021, up from 56,064 the year before.Footnote ⁵⁹ In 2020, more people in San Francisco died of opioid overdoses than of COVID-19.Footnote ⁶⁰ A significant portion of that uptick has been attributed to a pattern of aggressive and successful marketing of the prescription opioid OxyContin between 1996 and 2010. When OxyContin was reformulated in 2010 to make it more difficult to abuse, many of those who were addicted to prescription opioids switched to heroin and, eventually, fentanyl. One study found that 77 per cent of individuals who used both heroin and nonmedical pain relievers between 2011 and 2014 had initiated their drug use with prescription opioids,Footnote ⁶¹ and there is now a broad consensus that the introduction of OxyContin can ‘explain a substantial share of overdose deaths’ over twenty years.Footnote ⁶²

Many different measures have been taken to prevent addiction and abuse, and to support those who are suffering from addiction. One preventative measure is the Opioid Risk Tool (ORT), which was published in 2005 on the basis of several studies that identified correlations between certain facts and opioid misuse.Footnote ⁶³ This questionnaire, which is used in several jurisdictions across the world, consists of ten scorable components, including family or personal history of substance abuse or psychological disorder; patient age; and (if the patient is female) a history of preadolescent sexual abuse.

According to Webster, author of the ORT, his goal was ‘to help doctors identify patients who might require more careful observation during treatment, not to deny the person access to opioids’.Footnote ⁶⁴ Yet, the ORT is in fact used in clinical practice to decide whether to deny or withdraw medical treatment from patients,Footnote ⁶⁵ which has had a severe impact on patients, particularly women, who suffer from severe and chronic pain.Footnote ⁶⁶ High ORT scores have resulted in the termination of doctor–patient relationships, as well as attracting negative interpersonal treatment by members of medical staff, adding emotional distress to physical pain.Footnote ⁶⁷

Many authors have objected to use of the ORT to make prescribing decisions on the basis that this practice discriminates against women.Footnote ⁶⁸ Yet, ‘discrimination’ is an umbrella term. The wrongfulness of discrimination lies in the fact that the characteristics upon which we make decisions that disadvantage certain groups do not justify that treatment,Footnote ⁶⁹ and there are different reasons to object to policies that have this effect.

The first reason that we might invoke to object to decision-making policies or practices that rely upon the ORT is that our decisions are based on criteria (such as the preadolescent sexual abuse of women) that are not rationally related to the social goal of preventing and reducing opioid addiction.Footnote ⁷⁰ The second reason concerns the broader significance of this failure to develop and implement sound medical policy. It might, for instance, indicate that policymakers have taken insufficient care to investigate the connection between the sexual abuse of women and opioid abuse. When the consequence is placing the risk of predictive error solely upon women, the result is a failure to show equal concern for the interests of all citizens.Footnote ⁷¹ Finally, we might object to use of the ORT on the basis that the policy reflects a system in which women are treated as having a lower status than men – a system in which practices of exclusion are stable, so that women are generally denied opportunities for no good reason.Footnote ⁷²

But there is also an objection to policies that rely upon the ORT that has nothing to do with inequality. The argument is that, when we impose burdens on some people for the sake of some benefit to others, we should (wherever possible) give those people the opportunity to avoid those burdens by choosing appropriately. Policies that impose burdens upon individuals on the basis of facts about the actions of others, such as sexual abuse and patterns of family drug abuse, deny those opportunities.

Take the following hypothetical, which I adapt from Scanlon’s What We Owe to Each Other:Footnote ⁷³

Hazardous Waste: hazardous waste has been identified within a city’s most populous residential district. Moving the waste will put residents at risk by releasing some chemicals into the air. However, leaving the waste in place, where it will seep into the water supply, creates a much greater risk of harm. So, city officials decide to take the necessary steps to move and dispose of the waste as safely as possible.

City officials have an important social goal, of keeping people safe. That goal involves the creation of a ‘zone of danger’ – a sphere of activity that residents cannot perform without serious risk of harm. Accordingly, to justify such a policy, officials need to take precautions that put people in a sufficiently good position to take actions to avoid suffering the harm. They should fence the sites and warn people to stay indoors and away from the excavation site – perhaps by using posters, mainstream media, or text message alerts.

Scanlon uses this hypothetical to explore the justification for the substantive burdens imposed by criminal punishment.Footnote ⁷⁴ There is an important social goal – keeping us safe. The strategy for attaining this goal entails imposing a burden – denying that person some privilege, perhaps even their liberty. Thus, there is now a zone into which people cannot go (certain activities that they cannot perform) without risk of danger. To justify a policy of deliberately inflicting harm on some people, we should give those people a meaningful opportunity to avoid incurring that burden, which includes communicating the rules and consequences of transgression, and providing opportunities for people to live a meaningful life without transgression.

We can apply this logic to the ORT. The ORT was created with an important social goal in mind: preventing opioid misuse and addiction. A zone of danger is created to further that goal: certain patients are denied opioids, which includes withdrawing treatment from those already receiving pain medication, and may include terminating doctor–patient relationships. Patients may also suffer the burden of negative attitudes by medical staff, which may cause emotional suffering and/or negative self-perception. Yet, this time, the patient has no opportunity to avoid the burden of treatment withdrawal: that decision is made on the basis of facts about the actions of others, such as the decision-subject’s experience of sexual abuse and/or a family history of drug abuse.

The question, then, is how human oversight bears on these goals: first, making sure that decisions about how to impose burdens on certain individuals for the sake of some social good take into account all and only relevant facts about those individuals; second, making sure that our decisions do not rely upon factors that (even if relevant) we have reason to exclude. In the rest of this chapter, I will look at the knowledge that we need to assess algorithmic predictions, and the threshold against which we make that assessment. I argue that those elements differ markedly according to whether the prediction in question is used to supply information about what a particular decision-subject will do in the future.

11.6 Group One: Predictions about Facts Other Than What the Decision-Subject Will Do

The first set of cases are those in which the predictive question is about the (current or future) presence of something other than the actions of the decision-subject, such as: the success of a particular course of medical treatment, or the patient’s chances of survival without it; social need and the effectiveness of public resourcing; and forensic assessments (e.g., serology or DNA matching). To know whether we are justified in relying upon the predictive outputs of AI and ADM tools in this category, we need to determine whether the algorithmic prediction is more or less accurate that unaided human assessment, and how the risk of error is distributed amongst members of the population.

There are three modes of assessing AI and ADM tools that we might usefully distinguish. The first we can call ‘technical’, which involves understanding the mechanics of the AI/ADM tool, or ‘opening the black box’. The second is a statistical assessment: we apply the algorithm to a predictive task across a range of data, and record overall success and distribution of error. The final mode of assessment is normative: it involves identifying reasons for predictive outputs, by exploring different counterfactuals to determine which facts informed the prediction.

To perform the second and third modes of assessment, we do not need to ‘open the black box’: the second can be performed by applying the algorithm to data and recording its performance; the third can be performed by applying the algorithm to data and incrementally adjusting the inputs to identify whether and how that change affects the prediction.Footnote ⁷⁵

To know whether the AI/ADM tool performs better than unaided human discretion, we must perform a statistical assessment. We need not perform either the first or third mode of assessment: we do not need to know the internal workings of the algorithm,Footnote ⁷⁶ and we do not need to know the reasons for the prediction.

TrueAllele, developed by Cybergenetics and launched in 1994, is an ADM tool that can process complex mixtures of DNA (DNA from multiple sources, in unknown proportions). Prior to the development of sophisticated AI/ADM tools, human discretion was required to process mixtures of DNA (unlike single-source samples), with poor predictive accuracy.Footnote ⁷⁷ Probabilistic genotyping is the next step in forensic DNA, replacing human reasoning with algorithmic processing.

Like COMPAS, the TrueAllele software is proprietary.Footnote ⁷⁸ In Commonwealth v Foley,Footnote ⁷⁹ which concerned the defendant’s appeal against a murder conviction, one question amongst others was whether this obstacle to accessing the code itself rendered TrueAllele evidence inadmissible in court. On appeal, the defendant argued that the trial court had erred in admitting the testimony of one Dr Mark Perlin, an expert witness for the prosecution, who had communicated the results of a TrueAllele assessment to the Court.

In Foley, a sample containing DNA from the victim and another unknown person was found underneath the fingernail of the victim. The mixed sample was tested in a lab, and Perlin testified that the probability that this unknown person was someone other than the defendant was 1 in 189 billion.Footnote ⁸⁰ The defendant argued that the testimony should be excluded because ‘no outside scientist can replicate or validate Dr Perlin’s methodology because his computer software is proprietary’.Footnote ⁸¹ On appeal, the Court concluded that this argument ‘is misleading because scientists can validate the reliability of a computerized process even if the “source code” underlying that process is not available to the public’.Footnote ⁸²

The TrueAllele prediction is not about what the defendant has done; assessments of guilt or innocence are assessments that the Court (official or jury) must make. Rather, it is about the likelihood of a DNA match – specifically, that the unknown contributor to the DNA sample was someone other than the defendant. In this category of case, I have argued that the Court was correct to indicate that a statistical assessment is sufficient – if such an assessment is sufficiently robust.Footnote ⁸³

If the statistical assessment reveals a rate and distribution of predictive success that is equal to or better than unaided human decision-making, we can justify using the prediction to make decisions. And if it is, we should do consistently, resisting the urge to apply our own discretion to predictions. Of course, we will often take into account the margin of error when applying our judgement to the algorithmic output. For instance, the TrueAllele assessment is only 97 per cent accurate, this ought to affect the weight that we assign to that output in drawing a conclusion about guilt or innocence. But that is a very different exercise from using human judgement to determine the probability of a DNA match in the first place.

If, by contrast, the statistical assessment reveals a rate and distribution of predictive success that is worse than unaided human decision-making, we cannot justify using the prediction to make decisions; there is no meaningful sense in which individual decision-makers can compensate for predictive flaws on an ad hoc basis, and no reason to try, given the availability of a better alternative.

In Loomis, the SCW concluded that wrinkles in the COMPAS assessment process and output could be remedied by the application of discretion: ‘[j]ust as corrections staff should disregard risk scores that are inconsistent with other factors, we expect that circuit courts will exercise discretion when assessing a COMPAS risk score with respect to each individual defendant’.Footnote ⁸⁴ This, I have argued, is an unhappy compromise: either the AI/ADM tool has a better rate and distribution of error, in which case we should not be tempted to override the prediction by applying a clinical assessment, or the AI/ADM tool has a worse rate and distribution of error, in which case unaided human decision-making should prevail unless and until a comprehensive and systematic effort can be made to revise the relevant algorithm.

11.7 Group Two: Predictions about What the Decision-Subject Will Do

The second type of case involves the use of AI and ADM tools to make predictive assessments about what the decision-subject will do. This includes, for instance, whether they will misuse drugs or commit a crime, how they will perform on an assessment, or whether they will be a good employee or adoptive parent. To assess whether we are justified in using the predictive outputs of this category of AI and ADM tool, we need to know the facts upon which the prediction is based. This requires us to conduct a counterfactual assessment.

If the prediction is based only on facts that relate to the past actions of the decision-subject, and if the decision-subject has been given a meaningful opportunity to avoid incurring the burden, we may be justified in using the outputs to inform decisions. Whether we are will turn also on the same assessment that we made above: statistical accuracy and the distribution of error. But if the algorithmic output is not based only upon facts that relate to the past actions of the decision-subject, we cannot justify using it to make decisions. If we do so, we deny the decision-subject the opportunity to avoid the burden by choosing appropriately.

Those who have evaluated COMPAS have challenged both its overall predictive success, and its distribution of the risk of error.Footnote ⁸⁵ But there is an additional problem: each of the COMPAS assessments, most notably the wider ‘criminogenic need’ assessment, takes into account a range of facts that either have nothing to do with the defendant’s actions (such as family background), or which are linked to actions that the defendant could never reasonably have suspected would result in criminal punishment (such as choice of friends or ‘associates’). Thus, they deny the defendant a meaningful opportunity to choose to act in a manner that will avoid the risk of criminal punishment. And if the prediction takes into account facts that we have good reason to exclude from the decision, the solution is not to give the predictive output less weight (by applying human discretion). It is to give it no weight at all.

11.8 Safeguards

We cannot safeguard effectively against unjust decisions by applying human discretion to a predictive output at the time of decision-making. Appropriate ‘safeguarding’ means ensuring that the decision-making tools that we use take into account the right information in the right way, long before they enter our decision-making fora. I have made some concrete recommendations about how to determine whether the ADM/AI tool meets that threshold, which I summarise here.

The first question we should ask is this: is the prediction about what the decision-subject will do? If the answer to that question is no, we can in principle justify using the ADM/AI tool. Whether we can in practice turns on its predictive success – its overall success rate, and how the risk of error is distributed. We can assess these things statistically – without ‘opening the black box’, and without identifying reasons for any given prediction. If the ADM/AI tool fares just as well or better than humans, we can use it, and we can offer explanations to the decision-subject that are based on how we use it. If it does not fare just as well or better than humans, we cannot.

If the prediction is about what the decision-subject will do, we need to know the reasons for the prediction, which we can determine by using the counterfactual technique. We can only justify using the ADM/AI tool if three conditions are satisfied: (i) as above, the prediction is accurate and the risk of error is distributed evenly; (ii) the prediction is based solely on what the decision-subject has done; and (iii) the defendant has had sufficient opportunity to discover that those actions could result in these consequences.

It bears emphasis that the concern about policies that deny individuals a meaningful opportunity to avoid incurring certain burdens is not confined to the sphere of ADM. Courts in Wisconsin are permitted to take into account educational background and PSI results in sentencing decisions,Footnote ⁸⁶ and the Wisconsin DOC directs agents completing the PSI to take into account a range of factors that include: intelligence; physical health and appearance; hygiene and nutrition; use of social security benefits or other public financial assistance; the nature of their peer group; and common interests with gang-affiliated members.Footnote ⁸⁷ Thus, safeguarding efforts should not merely be directed towards ADM; they should take into account the broader law and policy landscape, of which ADM forms one part.

When we impose burdens on some people for the sake of some benefit to others, we should (wherever possible) present these people with valuable opportunities to avoid those burdens by choosing appropriately. And when the burdens that we impose are as exceptional as criminal incarceration, this requirement is all the more urgent: we cannot justify sending people to prison because they received poor grades in school, because their parents separated when they were young, or because of choices that their friends or family have made; we must base our decision on the choices that they have made, given a range of meaningful alternatives.

12 Against Procedural Fetishism in the Automated State

Monika Zalnieriute

^*

12.1 Introduction

The infamous Australian Robodebt and application of COMPAS tool in the United States are just a few examples of abuse of power in the Automated State. However, our efforts to tackle these abuses have largely failed: corporations and states have used AI to influence many crucial aspects of our public and private lives, from our elections to our personalities and emotions, to environmental degradation through extraction of global resources to labour exploitation. And we do not know how to tame them. In this chapter I suggest that our efforts have failed because they are grounded in what I call procedural fetishism – an overemphasis and focus on procedural safeguards and assumption that transparency and due process can temper power and protect the interests of people in the Automated State.

Procedural safeguards, rules and frameworks play a valuable role in regulating AI decision-making and directing it towards accuracy, consistency, reliability, and fairness. However, procedures alone can be dangerous for legitimizing excessive power, and obfuscating the largest substantive problems we are facing today. In this chapter, I show how procedural fetishism acts as an obfuscation and redirection of the public from more substantive and fundamental questions about the concentration and limits of power to procedural micro-issues and safeguards in the Automated State. Such redirection merely reinforces the status quo. Procedural fetishism detracts from the questions of substantial accountability and obligations by diverting the attention to ‘fixing’ procedural micro-issues that have little chance of changing the political or legal status quo. The regulatory efforts and scholarly debate, plagued by procedural fetishism, have been blind to colonial AI extraction practices, labour exploitation, and dominance of the US tech companies, as if they did not exist. Procedural fetishism – whether corporate or state – is dangerous. Not only does it defer social and political change, it also legitimizes corporate and state influence and power under an illusion of control and neutrality.

To rectify the imbalance of power between people, corporations, and states, we must shift the focus from soft law initiatives to substantive accountability and tangible legal obligations by AI companies. Imposing data privacy obligations directly upon AI companies with an international treaty is one (but not the only) option. The viability of such an instrument has been doubted: human rights law and international law, so it goes, are state-centric. Yet, as data protection law illustrates, we already apply (even if poorly) certain human rights obligations to private actors. Similarly, the origins of international law date back to powerful corporations that were the ‘Googles’ and ‘Facebooks’ of their time. In parallel to such global instrument on data privacy, we must also redistribute wealth and power by breaking and taxing AI companies, increasing public scrutiny by adopting prohibitive laws, but also by democratizing AI technologies by making them public utilities. Crucially, we must recognize colonial AI practices of extraction and exploitation and paying attention to the voices of Indigenous peoples and communities of the so-called Global South. With all these mutually reinforcing efforts, a new AI regulation will resist procedural fetishism and establish a new social contract for the age of AI.

12.2 Existing Efforts to Tame AI Power

Regulatory AI efforts cover a wide range of policies, laws, and voluntary initiatives at national level, including domestic constitutions, laws and judicial decisions; regional and international instruments and jurisprudence; self-regulatory initiatives; and transnational non-binding guidelines developed by private actors and NGOs.

Many recent AI regulatory efforts aim to tackle private tech power with national laws. For example, in the United States, five bipartisan bills collectively referred to as ‘A Stronger Online Economy: Opportunity, Innovation and Choice’ have been proposed and seek to restrain tech companies’ power and monopolies.Footnote ¹ In China, AI companies once seen as untouchables (particularly Alibaba and Tencent) have faced a tough year in 2021.Footnote ² For example, the State Administration for Market Regulation (SAMR) took aggressive steps to rein in monopolistic behaviour, levying a record US$2.8 billion fine on Alibaba.Footnote ³ AI companies are also facing regulatory pressure in Australia targeting anti-competitive behaviour.Footnote ⁴

At a regional level, perhaps the strongest example of AI regulation is in the European Union, where several prominent legislative proposals have been tabled in recent years. The Artificial Intelligence Act,Footnote ⁵ and the Data ActFootnote ⁶ aim to limit the use of AI and ADM systems. These proposals build on the EU’s strong track record in the area: for example, EU General Data Protection Regulation (GDPR)Footnote ⁷ has regulated the processing of personal data. The EU has been leading AI regulatory efforts on a global scale, with its binding laws and regulations.

On an international level, many initiatives have attempted to draw the boundaries of appropriate AI use, often resorting to the language of human rights. For example, the Organisation for Economic Co-operation and Development (OECD) has adopted AI Principles in 2019,Footnote ⁸ which draw inspiration from international human rights instruments. However, despite the popularity of the human rights discourse in AI regulation, international human rights instruments, such as the International Covenant on Civil and Political RightsFootnote ⁹ or the International Covenant on Economic, Social and Cultural Rights,Footnote ¹⁰ are not directly binding on private companies.Footnote ¹¹ Instead, various networks and organizations try to promote human rights values among AI companies.

However, these efforts to date have been of limited success in taming the power of AI, and dealing with global AI inequalities and harms. This weakness stems from the proceduralist focus of AI regulatory discourse: proponents have assumed that procedural safeguards, transparency and due process can temper power and protect the interests of people against the power wielded by AI companies (and the State) in the Automated State. Such assumptions stem from the liberal framework, focused on individual rights, transparency, due process, and procedural constrains, which, to date, AI scholarship and regulation have embraced without questioning their capacity to tackle power in the Automated State.

The assumptions are closely related to the normative foundations of AI and automated decision-making systems (ADMS) governance, which stem, in large part, from a popular analogy between tech companies and states: how AI companies exert quasi-sovereign influence over commerce, speech and expression, elections, and other areas of life.Footnote ¹² It is also this analogy, and the power of the state as the starting point, that leads to the proceduralist focus and emphasis in AI governance discourse: just as the due process and safeguards constrain the state, they must now also apply to powerful private actors, like AI companies. Danielle Keats Citron’s and Frank Pasquale’s early groundbreaking calls for technological due process have been influential: it showed how constitutional principles could be applied to technology and automated decision-making – by administrative agencies and private actors.Footnote ¹³ Construction of various procedural safeguards and solutions, such as testing, audits, algorithmic impact assessments, and documentation requirements have dominated AI decision-making and ADMS literature.Footnote ¹⁴

Yet, by placing all our energy on these procedural fixes, we miss the larger picture and are blind to our own coloniality: we rarely (if at all) discuss the US dominance in AI economy, we seldom mention environmental exploitation and environmental degradation caused by AI and AMDS technologies. We rarely ask how AI technologies reinforce existing power disparities globally between the so-called Global South and Imperialist West/North, how they contribute to climate disaster and exploitation of people and extraction of resources in the so-called Global South. These substantive issues matter, and arguably matter more than a design of a particular AI auditing tool. Yet, we are too busy designing the procedural fixes.

To be successful, AI regulation must resist what I call procedural fetishism – a strategy, employed by AI companies and state actors, to redirect the public from more substantive and fundamental questions about the concentration and limits of power in the age of AI to procedural safeguards and micro-issues. This diversion reinforces the status quo, reinforces Western dominance, accelerates environmental degradation and exploitation of the postcolonial peoples and resources.

12.3 Procedural Fetishism

Proceduralism, in its broadest sense, refers to ‘a belief in the value of explicit, formalized procedures that need to be followed closely’,Footnote ¹⁵ or ‘the tendency to believe that procedure is centrally important’.Footnote ¹⁶ The term is often used to describe the legitimization of rules, decisions, or institutions through the process used to create them, rather than by their substantive moral value.Footnote ¹⁷ Such trend towards proceduralism – or what I call procedural fetishism – also dominates our thinking about AI: we believe that having certain ‘safeguards’ for AI systems is inherently valuable, that those safeguards tame power and provide sufficient grounds to trust the Automated State. However, procedural fetishism undermines our efforts for justice for several reasons.

First, procedural fetishism offers an appearance of political and normative neutrality, which is convenient to both AI companies and policymakers, judges, and regulators. Proceduralism allows various actors to ‘remain agnostic towards substantive political and moral values’ when ‘faced with the pluralism of contemporary societies’.Footnote ¹⁸ At the ‘heart’ of all proceduralist accounts of justice, therefore, is the idea that, as individual members of a pluralist system, we may agree on what amounts to a just procedure (if not a just outcome), and ‘if we manage to do so, just procedures will yield just outcomes’.Footnote ¹⁹ However, procedural fetishism enables various actors not only to remain agnostic, but to avoid confrontation with hard political questions. For example, the courts engage in procedural fetishism to appear neutral and avoid tackling the politically difficult questions of necessity, proportionality, legitimacy of corporate and state surveillance practices, and have instead come up with procedural band-aids.Footnote ²⁰ The focus on procedural safeguards provides a convenient way to make an appearance of effort to regulate without actually prohibiting any practices or conduct.

A good example of such neutralizing appearance of procedural fetishism is found in the AI governance’s blind eye to very important policy issues impacted by AI, such as climate change, environmental degradation, and continued exploitation of the resources from the so-called Third World countries. The EU and US-dominated AI debate has focused on inequalities reinforced through AI in organizational settings in business and public administration, but it has largely been blind to the inequalities of AI on a global scale,Footnote ²¹ including global outsourcing of labour,Footnote ²² and the flow of capital through colonial and extractive processes.Footnote ²³ While it is the industrial nations in North America, Europe, and East Asia who compete in the ‘race for AI’,Footnote ²⁴ AI and ADM systems depend on global resources, most often extracted from the so-called Global South.Footnote ²⁵ Critical AI scholars have analyzed how the production of capitalist surplus for a handful of big tech companies draws on large-scale exploitation of the soil, minerals, and other resources.Footnote ²⁶ Other critical scholars have described the processes of extraction and exchange of personal data itself as a form of dispossession and data colonialism.Footnote ²⁷ Moreover, AI and ADMs systems have also been promoted as indispensable tools in international developmentFootnote ²⁸ but many have pointed how those efforts often reinforce further colonization and extraction.Footnote ²⁹ Procedural fetishism also downplays the human labour involved in AI technologies, which draws on the underpaid, racialized, and not at all ‘artificial’ human labour primarily from the so-called Global South. The AI economy is one in which highly precarious working conditions for gig economy ‘click’ workers are necessary for the business models of AI companies.

12.3.1 Legitimizing Effect of Procedural Fetishism

Moreover, procedural fetishism is used strategically not only to distract from power disparities but also to legitimize unjust and harmful AI policies and actions by exploiting people’s perceptions of legitimacy and justice. As early as in the 1980s, psychological research undermined the traditional view that substantive outcomes drove people’s perception of justice by showing that it was more about the procedure for reaching the substantive outcome.Footnote ³⁰ Many of the ongoing proceduralist reforms, such as Facebook’s Oversight Board, are primarily conceived for this very purpose – to make it look that Facebook is doing the ‘right thing’ and delivering justice, irrespective of whether substantive policy issues change or not. Importantly, such corporate initiatives divert attention from the problems caused by the global dominance of the AI companies.Footnote ³¹

The language of ‘lawfulness’ and constitutional values, prevalent in AI governance debates, is working as a particularly strong legitimizing catalyst both in public and policy debates. As critical scholars have pointed out, using the terminology, which is typically employed in context of elected democratic governments, misleads, for it infuses AI companies with democratic legitimacy, and conflates corporate interests with public objectives.Footnote ³²

In the following sections, I suggest that this language is prevalent not accidentally, but through sustained corporate efforts to legitimize their power and business models, to avoid regulation, and enhance their reputation for commercial gain. AI companies often come up with private solutions to develop apparent safeguards against their own abuse of power and increase their transparency to the public. Yet, as I have argued earlier, many such corporate initiatives are designed to obfuscate and misdirect policymakers, researchers, and the public in the bid to strengthen their brand and avoid regulation and binding laws.Footnote ³³ AI companies have also successfully corporatized and attenuated the laws and regulations that bind them. Through many procedures, checklists, and frameworks, corporate compliance with existing binding laws has often been a strategic performance, devoid of substantial change in business practices. Such compliance has worked to legitimize business policy and corporate power to the public, regulators, and the courts. In establishing global dominance, AI companies have also been aided by the governments.

12.3.2 Procedural Washing through Self-Regulation

First, corporate self-regulatory AI initiatives are often cynical marketing and social branding strategies to increase public confidence in their operations and create a better public image.Footnote ³⁴ AI companies often self-regulate selectively by disclosing and addressing only that which is commercially desirable for them. For example, Google, when creating an Advanced Technology External Advisory Council (Council) in 2019 to implement Google’s AI Principles,Footnote ³⁵ refused to reveal the internal processes that led to the selection of a controversial member, anti-LGBTI advocate and climate change denial sponsor Kay Coles James.Footnote ³⁶ While employees’ activism forced Google to rescind the Council, ironically, this showed Google’s unwillingness to publicly share the selection criteria of their AI governance boards.

Second, AI companies self-regulate only if it pays off for them in the long run, so profit is the main concern.Footnote ³⁷ For example, in 2012 IBM provided police forces in Philippines with video surveillance technology which was used to perpetuate President Duterte’s war on drugs through extrajudicial killings.Footnote ³⁸ At the time, IBM defended the deal with Philippines, saying it ‘was intended for legitimate public safety activities’.Footnote ³⁹ The company’s practice of providing authoritarian regimes with technological infrastructure is not new and dates back to the 1930s when IBM supplied the Nazi Party with unique punch-card technology that was used to run the regime’s censuses and surveys to identify and target Jewish people.Footnote ⁴⁰

Third, corporate initiatives also allow AI companies to prevent any regulation of their activities. A good example of pro-active self-regulation is Facebook’s Oversight Board, which reviews individual decisions, and not overarching policies. Thus, the attention is still diverted away from critiquing the legitimacy or appropriateness of Facebook’s AI business practices themselves and is instead focused on Facebook’s ‘transparency’ about them. The appropriateness of the substantial AI policies themselves are obfuscated, or even legitimated, through the micro procedural initiatives, with little power to change status quo. In setting up the board, Facebook has attempted not only to stave off regulation, but also to position itself as an industry regulator by inviting competitors to use the Oversight Board as well.Footnote ⁴¹ AI companies can then depict themselves as their own regulators.

12.3.3 Procedural Washing through Law and Help of State

Moreover, AI companies (and public administrations) have also exploited the ambiguity of laws regulating their behaviour through performative compliance with the laws. Often, policymakers have compounded this problem by creating legal provisions to advance the proceduralist agenda of corporations, including via international organizations and international law, and regulators and courts have enabled corporatized compliance in applying these provisions by focusing on the quality of procedural safeguards.

For instance, Ezra Waldman has shown how the regulatory regime of data privacy, even under the GDPR – the piece of legislation which has gained the reputation as the strongest and most ambitious law in the age of AI – has been ‘managerialized’: interpreted by compliance professionals, human resource experts, marketing officers, outside auditors, and in-house and firm lawyers, as well as systems engineers, technologists, and salespeople to prioritize values of efficiency and innovation in the implementation of data privacy law.Footnote ⁴² As Waldman has argued, many symbolic structures of compliance are created; yet, apart from an exhaustive suite of checklists, toolkits, privacy roles, and professional training, there are hardly substantial actions to enhance consumer protection or minimize online data breaches.Footnote ⁴³ These structures comply with the law in name but not in spirit, which is treated in turn by lawmakers and judges as best practice.Footnote ⁴⁴ The law thus fails to achieve its intended goals as the compliance metric developed by corporations becomes dominant,Footnote ⁴⁵ and ‘mere presence of compliance structures’ is assumed to be ‘evidence of substantive adherence with the law’.Footnote ⁴⁶ Twenty-six recent studies analyzed the impact of the GDPR and US data privacy laws and none have found any meaningful influence of these laws on data privacy protection of the people.Footnote ⁴⁷

Many other laws itself have been designed in the spirit of procedural fetishism, enabling corporations to avoid liability and change their substantive policies by simply establishing proscribed procedures. For example, known as ‘safe harbours’, such laws enable the companies to avoid liability by simply following a prescribed procedure. For example, under the traditional notice-and-consent regime in the United States, companies avoid liability as long as they post their data use practices in a privacy policy.Footnote ⁴⁸

Regulators and the courts, by emphasizing procedural safeguards, also engage in performative regulation, grounded in procedural fetishism, that limits pressure for stricter laws by convincing citizens and institutions that their interests are sufficiently protected without inquiring substantive legality of corporate practices. A good example is Federal Trade Commission’s (FTC) audits and ‘assessment’ requirements, which require corporations to demonstrate compliance through checklists.Footnote ⁴⁹ Similar procedural fetishism is also prevalent in jurisprudence, which does not assess specific state practices by reference to their effectiveness in advancing the proclaimed goals, but rather purely to the stringency of the procedures governing that practice.Footnote ⁵⁰

12.3.4 Procedural Washing through State Rhetoric and International Law

Procedural washing by AI companies have also been aided by executive governments – both through large amounts of public funding and subsidization to these companies, and through the development of the laws, including international laws, that suit corporate and national agenda. Such support is not one-sided, of course, the state expands its economic and geopolitical power through technology companies. All major powers, including the United States, European Union, and China, have been active in promoting their AI companies. For example, mutually beneficial and interdependent relationship between the US government and information technology giants has been described as the information-industrial-complex, data industrial complex, and so on.Footnote ⁵¹ These insights build on Herbert Schiller’s work, who described the continuous subsidization by US companies of private communications companies back in the 1960s and 1970s.Footnote ⁵² For example, grounding their work on classical insights, Powers and Jablonski describe how the dynamics of the information-industrial-complex have catalyzed the rapid growth of information and communication technologies within the global economy while firmly embedding US strategic interests and companies at the heart of the current neoliberal regime.Footnote ⁵³ Such central strategic position necessitates continuous action and support from the US government.

To maintain the dominance of US AI companies internationally, the US government aggressively promotes the global free trade regime, intellectual property enforcement, and other policies that suit US interests. For example, the dominance of US cultural and AI products and services worldwide is secured via the free flow of information doctrine at the World Trade Organization, which the US State Department pushed with the GATT, GATS, and TRIPS.Footnote ⁵⁴ The free flow of information doctrine allows the US corporations to collect and monetize personal data of individuals from around the world. This way, data protection and privacy are not part of the ‘universal’ values of the Internet, whereas strong intellectual property protection is not only viable and doable, but also strictly enforced globally.

Many other governments have also been complicit in this process. For example, the EU AI Act, despite its declared mission to ‘human centred AI’ is silent about the environmental degradation and social harms that occur in other parts of the world because of large-scale mineral and resource extraction and energy consumption, necessary to produce and power AI and digital technologies.Footnote ⁵⁵ The EU AI Act is also silent on the conditions under which AI is produced and the coloniality of the AI political economy: it does not address precarious working conditions and global labour flows. Thus, EU AI Act is also plagued by procedural fetishism: it does not seek to improve the global conditions for an environmentally sustainable AI production. Thus, at least the United States and EU have prioritized inaction, self-regulation over regulation, no enforcement over enforcement, and judicial acceptance over substantial resistance. While stressing the differences in US and EU regulatory approaches has been popular,Footnote ⁵⁶ the end result has been very similar both in the EU and the United States: the tech companies collect and exploit personal data not only for profit, but for political and social power.

In sum, procedural fetishism in AI discourse is dangerous for creating an illusion that it is normatively neutral. Our efforts at constraining AI companies are replaced with the corporate vision of division of power and wealth between the corporations and the people, masked under the veil of neutrality.

12.4 The New Social Contract for the Age of AI

The new social contract for the age of AI must try something different: it must shift its focus from soft law initiatives and performative corporate compliance to substantive accountability and tangible legal obligations by AI companies. Imposing directly binding data privacy obligations on AI companies with an international treaty is one (but not the only!) option. Other parallel actions include breaking and taxing tech companies, increasing competition and public scrutiny, and democratizing AI companies: involving people in their governance.

12.4.1 International Legally Binding Instrument Regulating Personal Data

One of the best ways to tame AI companies is via the ‘currency’ which people often ‘pay’ for their services – the personal data. And the new social contract should not only be concerned with the procedures that AI companies should follow in continuing to exploit personal data. Instead, it should impose substantive limits on corporate AI action, for example, data cannot be collected and used in particular circumstances, how and when it can be exchanged, manipulative technologies and biometrics are banned to ensure mental welfare, and social justice.

Surely, domestic legislators should develop such laws (and I discuss that below too). However, given that tech companies exploit our data across the globe, we need a global instrument to lead our regulatory AI efforts. Imposing directly binding obligations on AI companies with an international treaty should be one (but not the only!) option. While exact parameters of such treaty are beyond the scope of this chapter, I would like to rebut one misleading argument, often used by the AI companies, that private companies cannot have direct obligations under international law.

The relationship between private actors and international law has been a subject of intense political and scholarly debate for over four decades,Footnote ⁵⁷ since the first attempts to develop a binding international code of conduct for multinational corporations in the 1970s.Footnote ⁵⁸ Most recent efforts have led to the ‘Third Revised Draft’ of the UN Treaty on Business and Human Rights released in 2021, since the process started with the so-called Ecuador Resolution in 2014.Footnote ⁵⁹ The attempts to impose binding obligations on corporations have not yet been successful because of enormous political resistance from private actors, for whom such developments would be costly. Corporate resistance entail many fronts, here I can only focus on debunking a corporate myth that such constitutional reform is not viable, and even legally impossible because of the state-centric nature of human rights law. Yet, as data protection law, discussed above, illustrates, we already apply (even if poorly) certain human rights obligations to private actors. We can and should demand more from corporations in other policy areas.

Importantly, we must understand the role of private actors under international law. Contrary to the popular myth that international law was created by and for nation-states, ‘[s]ince its very inception, modern international law has regulated the dealings between states, empires and companies’.Footnote ⁶⁰ The origins of international law itself date back to powerful corporations that were the Googles and Facebooks of their time. Hugo Grotius, often regarded as the father of modern international law, was himself counsel to the Dutch East India Company – the largest and most powerful corporation in history. In this role, Grotius’ promotion of the principle of the freedom of the high seas and his views on the status of corporations were shaped by the interests of the Dutch East India Company to ensure the security and efficacy of the company’s trading routes.Footnote ⁶¹ As Peter Borschberg explains, Grotius crafted his arguments to legitimize the rights of the Dutch to engage in the East Indies trade and justify the Dutch Company’s violence against the Portuguese, who claimed exclusive rights to Eastern Hemisphere.Footnote ⁶² In particular, Grotius aimed to justify the seizure by Dutch of the Portuguese carrack Santa Catarina in 1603:

[E]ven though people grouped as a whole and people as private individuals do not differ in the natural order, a distinction has arisen from a man-made fiction and from the consent of citizens. The law of nations, however, does not recognize such distinctions; it places public bodies and private companies in the same category.Footnote ⁶³

Grotius argued that moral personality of individuals and collections of individuals do not differ, including, to what was for Grotius, their ‘natural right to wage war’. Grotius concluded that ‘private trading companies were as entitled to make war as were the traditional sovereigns of Europe’.Footnote ⁶⁴

Therefore, contrary to the popular myth, convenient to AI companies, the ‘law of nations’ has always been able to accommodate private actors, whose greed and search for power gave rise to many concepts of modern international law. We must therefore recognize this relationship and impose hard legal obligations related to AI on companies under international law precisely to prevent tech companies’ greed and predatory actions which have global consequences.

12.4.2 Increased Political Scrutiny and Novel Ambitious Laws

We must also abolish the legislative regimes that have in the past established safe harbours for AI companies, such as the EU-US Transatlantic Privacy Framework,Footnote ⁶⁵ previously known as Safe Harbour and Privacy Shield. Similarly, regimes, based on procedural avoidance of liability, such as the one under Section 230 of the US Communications Decency Act 1996, should be reconsidered. This provision provides that websites should not treated as the publisher of third party (i.e., user submitted content); and it is particularly useful for platforms like Facebook.

Some of the more recent AI regulatory efforts might be displaying first seeds of substantive-focused regulation. For example, many moratoriums have been issued on the use of facial recognition technologies across many municipalities and cities in the United States, including the state of Oregon, and NYC.Footnote ⁶⁶ In EU too, some of the latest proposals also display an ambition to ban certain uses and abuses of technology. For example, the Artificial Intelligence Act provides a list of ‘unacceptable’ AI systems and prohibits their use. The Artificial Intelligence Act has been subject to criticism about its effectiveness,Footnote ⁶⁷ yet its prohibitive approach can be contrasted with earlier EU regulations, such as GDPR, which did not proclaim that certain areas should not be automated, or some data should not be processed at all/ fall in the hands of tech companies. On an international level, the OECD has recently announced a landmark international tax deal, where 136 countries and jurisdictions representing more than 90 per cent of global GDP agreed to minimum corporate tax rate of 15 per cent on the biggest international corporations which will be effective in 2023.Footnote ⁶⁸ While this is not tackling tech companies business practices, it is aimed at fairer redistribution of wealth, which too must be the focus of the new social contract, if we wish to restrain the power of AI.

12.4.3 Breaking AI Companies and Public Utilities Approach

We must also break AI companies many of which have grown so large that they are effectively gatekeepers in their markets. Many scholars have recently proposed ways to employ antitrust and competition law to deal with and break big tech companies,Footnote ⁶⁹ and such efforts are also visible on political level. For example, in December 2020, the EU Commission published a proposal for two new pieces of legislation: the Digital Markets Act (DMA) and the Digital Services Act (DSA).Footnote ⁷⁰ The proposal aims to ensure platform giants, such as Google, Amazon, Apple, and Facebook, operate fairly, and to increase competition in digital markets.

We already have legal instruments for breaking the concentration of power in AI sector: for example, the US Sherman Act 1890 makes monopolization unlawful.Footnote ⁷¹ And we must use the tools of competition and antitrust law (but not only them!) to redistribute the wealth and power. While sceptics argue Sherman Act case against Amazon, Facebook, or Google would not improve economic welfare in the long run,Footnote ⁷² we must start somewhere. For instance, as Kieron O’Hara suggested, we could prevent anticompetitive mergers and require tech giants to divest companies they acquired to stifle competition, such as Facebook’s acquisition of WhatsApp and Instagram.Footnote ⁷³ We could also ring-fence giants into particular sectors. For example, Amazon’s purchase of Whole Foods Market (a supermarket chain) would likely be prevented by that strategy. We could also force tech giants to split its businesses into separate corporations.Footnote ⁷⁴ For instance, Amazon would be split into its E-commerce platform, physical stores, web services, and advertising business.

However, antirust reforms should not obscure more radical solutions, suggested by critical scholars. For example, digital services could be conceived as public utilities: either as closely regulated private companies or as government-run organizations, administered at municipal, state, national, or regional levels.Footnote ⁷⁵ While exact proposals of ‘Public utility’ approach vary, they aim at placing big AI companies (and other big enterprises) under public control.Footnote ⁷⁶ This provides a strong alternative to market-driven solutions to restore competition in technology sector, and has more potential to address the structural problems of exploitation, manipulation, and surveillance.Footnote ⁷⁷

12.4.4 Decolonizing Technology Infrastructure

We should also pay attention to the asymmetries in economic and political power on global scale: this covers both the US dominance in the digital technologies and AI, US influence in shaping international free trade and intellectual property regimes, rising influence of China, as well as EU’s ambitions to set global regulatory standards in many policy areas and both business and public bodies in the so-called Global South on the receiving end of Brussels demands of what ‘ethical’ AI is, and how ‘data protection’ must be understood and implemented.Footnote ⁷⁸

We should also incorporate Indigenous epistemologies – they provide strong conceptual alternatives to dominant AI discourse. Decolonial ways to theorize, analyze, and critique AI and ADMS systems must be part of our new social contract for the age of AI,Footnote ⁷⁹ because people in the so-called Global South relate very differently to major AI platforms than those who live and work where these companies are headquartered.Footnote ⁸⁰ A good example in this regard is the ‘Technologies for Liberation’ project which studies how queer, trans, two-spirit, black, Indigenous, and people of colour communities are disproportionately impacted by surveillance technologies and criminalization.Footnote ⁸¹ Legal scholars must reach beyond our comfortable Western, often Anglo-Saxon position, and bring forward perspectives of those who have been excluded and marginalized in the development of AI and ADMS tools.

The decolonization however must also happen in laws. For example, the EU’s focus on regulating AI and ADMS as a consumer ‘product-in-use’ requiring individual protection is hypocritical, and undermines the claims to regulate ‘ethical’ AI, for it completely ignores the exploitative practices and global implications of AI production and use. These power disparities and exploitation must be recognized and officially acknowledged in the new laws.

Finally, we need novel spaces for thinking about, creating and developing the new AI regulation. Spaces that are not dominated by procedural fetishism. A good example of possible resistance, promoted by decolonial data scholars, is a Non-Aligned Technologies Movement (NATM) – a worldwide alliance of civil society organizations which aims to create ‘techno-social spaces beyond the profit-motivated model of Silicon Valley and the control-motivated model of the Chinese Communist Party. NATM does not presume to offer a single solution to the problem of data colonialism; instead it seeks to promote a collection of models and platforms that allow communities to articulate their own approaches to decolonization’.Footnote ⁸²

12.5 Conclusion

The new social contract for the age of AI must incorporate all these different strategies – we need a new framework, and not just quick, procedural fixes. These strategies might not achieve substantive policy change alone. However, together, acting in parallel, the proposed changes will enable us to start resisting corporate and state agenda of procedural fetishism. In the digital environment dominated by AI companies, procedural fetishism is an intentional strategy to obfuscate the implications of concentrated corporate power. AI behemoths legitimize their practices through procedural washing and performative compliance to divert the focus onto the procedures they follow, both for commercial gain and to avoid their operations being tempered by regulation. They are also helped and assisted by states, which enable corporate dominance via the laws and legal frameworks.

Countering corporate procedural fetishism, requires, first of all, returning the focus back to the substantive problems in the digital environment. In other words, it requires paying attention to the substance of tech companies’ policies and practices, to their power, not only the procedures. This requires a new social contract for the age of AI. Rather than buying into procedural washing as companies intend for us to do, we need new binding, legally enforceable mechanisms to hold the AI companies to account. We have many options, and we need to act on all fronts. Imposing data privacy obligations directly on AI companies with an international treaty is one way. In parallel, we must also redistribute wealth and power by breaking and taxing tech companies, increasing public scrutiny by adopting prohibitive laws, and democratizing and decolonizing big tech by giving people power to determine the way in which these companies should be governed. We must recognize that AI companies exercise global dominance with significant international and environmental implications. This aspect of technology is related to global economic structure, and therefore cannot be solved alone: it requires systemic changes to our economy. The crucial step to such direction is developing and maintaining AI platforms as public utilities, which operate for the public good rather than profit. The new social contract for the age of AI should de-commodify data relations, rethink behaviour advertising as the foundation of the Internet, and reshape social media and internet search as public utilities. With all these mutually reinforcing efforts, we must debunk the corporate and state agenda of procedural fetishism and demand basic tangible constraints for the new social contract in the Automated State.

Book contents

Part III - Synergies and Safeguards

Summary