1. IntroductionFootnote 1
Artificial Intelligence (AI) is increasingly supplementing decision-making at the highest levels of government in the context of national security and defense (Blanchard & Taddeo, Reference Blanchard and Taddeo2023; Adam & Carter, Reference Adam and Carter2023). It is likely to be part of highly contentious and morally weighty decisions made by governments. This article focuses on one of those: the decision to resort to force at the state level (see Deeks, Lubell & Murray, Reference Deeks, Lubell and Murray2019; Erskine, Reference Erskine2024a). As Erskine (Reference Erskine2024a, p.181) notes in this context, the constitutive mechanisms of AI likely to be engaged to support the decision to go to war are “frequently opaque and unpredictable,” meaning that “recommendations and predictions by AI-enabled decision-support systems can often be neither audited nor explained by those who are guided by them.” The prospect of an unauditable, opaque AI in the context of resort-to-force decision-making is troubling because it engages the problem of accountable AI in the most pressing terms. Scholars and practitioners alike argue that AI systems problematize accountability in public life because of the “black box” problem of algorithmic and technical opacity implicit in AI’s use in decision-making. Those seeking to remedy this “accountability gap” have proposed a range of responses, from moratoria to increased transparency and explainability and allocation of an AI-specific duty of care (Lechterman, Reference Lechterman, Bullock, Chen, Himmelreich, Hudson, Kornek, Young and Zhang2022; Busuoic, Reference Busuioc2021; Pasquale, Reference Pasquale2015).
This article focuses on the second of these proposed solutions: transparency as a response to the problem of AI accountability. It suggests that the push for AI accountability via these mechanisms is, at its core, a search for epistemic certainty, meaning certainty about what we know about the world and events in it. In making this suggestion, the article draws links between certainty, transparency and accountability. It frames the search for AI accountability, including in resort-to-force decision-making, as a search for certainty about the processes of decision-making. This is because in the context of accountable AI we expect that accountability is only possible if we are certain about what decision-making events took place, and that certainty about what took place is only possible if those processes are transparent – i.e., if accurate information about them is available to those who seek to hold the decision-makers to account. This search for certainty is evident in the role explainable AI (xAI) and interpretable AI play in the search for accountable AI (Samek et al., Reference Samek, Montavon, Vedaldi, Hansen and Müller2019). These terms are often used interchangeably but share an emphasis on transparency of ML processes, which drive AI as a way to build trust in the system.
Key to this article’s arguments is the claim that the relationship between certainty, transparency and accountability is socially and historically constructed rather than inevitable. This relationship, and attempts to implement it in accountable AI, draws on ideas about the relationship between transparency and accountability in democratic domestic polities familiar to Western scholars and policymakers. But the article argues that it is not the only way to arrive at certainty. It proposes instead that the pursuit of accountable AI in the context of resort-to-force decision-making can learn much from other high-stakes environments that manage to enforce certainty and accountability in the context of epistemic uncertainty about events and in the absence of transparency about those events. In particular, the article argues that the rules of evidence in international criminal law offer a useful lesson of alternative models of enforcing accountability without certainty or transparency.
In criminal law, decisions about accountability must be made based on “certainty” as adduced from evidence, despite the epistemic uncertainty inevitable in both the human mind and the fog of history. The article explores this claim by interrogating the International Criminal Tribunal for the former Yugoslavia (ICTY) and its battle to make decisions about accountability for war crimes. These decisions took place in the face of epistemic uncertainty about what had actually happened – the events in question were not transparent, of course. The ICTY Chambers sought to reduce uncertainty by introducing new rules of evidence, including facilitating the admission of new types of evidence facilitated by new technology. The article focuses on two new forms of technology-borne evidence: forensic DNA and digital satellite imagery. Like AI, the use of these technologies relied on Chambers trusting the technology rather than having access to its inner workings, which were not transparent to nonexperts such as the Tribunal’s judges. The article argues that the process by which the outputs of these technologies were interrogated to deliver a level of certainty about what occurred sufficient to deliver accountability in the absence of transparency of what occurred offers useful lessons for those seeking to enforce accountability in the absence of transparency in the context of AI used in resort-to-force decision-making.
The article proceeds in four parts. The first outlines the relationship between certainty, transparency and accountability, suggesting that the problem of accountable AI is at its core a problem of a lack of certainty about what has occurred in the decision-making process. It summarizes the literature on accountability and discusses the historical role of transparency as a way to reduce uncertainty and deliver accountability. It briefly touches on the role documents can play in delivering transparency and accountability, introducing concepts from documentation theory. The second part of the article argues that international criminal law grapples with the problem of epistemic uncertainty, the absence of transparency and accountability, by substituting evidence for transparency and adopting evidence as sufficient grounds for apportioning responsibility. In this context, evidence is used where transparency about the nature of events is unavailable, and thus serves as a structural foundation for certainty about what has occurred and who should and can be held accountable. The article argues that the rules of evidence in international criminal law offer useful examples for addressing the problem of accountability in the absence of transparency, which faces those seeking to develop systems of accountable AI. The third part of the article examines two areas of evidence in international criminal law that mirror the epistemic uncertainty induced by AI by demonstrating how law, as a system of accountability, also grapples with the probative value of new technologies. This section focuses on two new technologies used by the ICTY, which was the first international criminal law tribunal to allow them as evidence: (i) forensic DNA evidence and (ii) digital satellite imagery.
The fourth and concluding part of the article draws three lessons from this discussion and applies them to resort-to-force decision-making. It argues that the experiences of the ICTY in adopting new forms of evidence suggest that accountability in the face of uncertainty can be fostered by attention to three processes: corroboration, contestability and authenticity. Judgments at the ICTY adopted these processes to weigh the extent to which new technologies, too complex for laypeople to understand, could nonetheless add to (and detract from) the certainty necessary for accountability in the absence of transparency. The paper thus proposes a thick, qualitative approach to accountable AI in the context of resort-to-force decision-making, which recognizes its sociotechnical rather than simply technical complexity. It contextualizes this complexity against another complex sociotechnical problem – the evolution of the rules of evidence in international criminal law in the context of new technologies – to offer optimism rather than despair in the face of technological complexity and opacity.
2. Certainty, transparency and accountability
The key problem underlying the search for AI is that it is opaque and thus unauditable. Its decision-making processes are not transparent, and so apportioning accountability for that decision-making is impossible because we simply do not know how and why those decisions are made. Despite its core role in the pursuit of accountable AI, the concept of accountability is ill-defined. Novelli et al. (Reference Novelli, Taddeo and Floridi2023) argue that this lack of definition is arguably because accountability is the product of subjective, socially and politically defined contexts (see also Olsen, Reference Olsen and Olsen2017). Lechterman (Reference Lechterman, Bullock, Chen, Himmelreich, Hudson, Kornek, Young and Zhang2022) interrogates the meaning of accountability specifically in the context of AI governance. He identifies two types of accountability at play in the broader literature on accountable AI: forensic accountability and agent accountability. Forensic accountability is “backward looking and relates closely to responsibility” (Lechterman, Reference Lechterman, Bullock, Chen, Himmelreich, Hudson, Kornek, Young and Zhang2022, p.65). It seeks to link individuals to their actions and the consequences of their actions. This logic of consequences links forensic accountability to the demand for justification – to an actor being expected to provide answers or render an account of events, which may in turn give rise to sanctions. Agent accountability, by contrast, is more socially determined and relates to the relationship between principals who delegate tasks to agents and then monitor their performance (Waldron, Reference Waldron2014). These two types of accountability are often connected, with forensic accountability generally nested within agent accountability. For example, it is possible that agent accountability might involve forensic accountability. An agent’s conduct might be classed as a crime, for example, such that the principal can bring the agent before a court. In the courtroom, rules of evidence shape the practice of forensic accountability at play in establishing the nature of events and the actors involved and in determining sanctions – agent accountability (Waldron, Reference Waldron2014, p.1).
Some scholars working in political theory argue that agent accountability is the essence of democracy, as it provides a way to characterize the relationship between citizens and public officials (Binns, Reference Binns2018). Modern political democracy is a system of governance “… in which rulers are held accountable for their actions in the public realm by citizens […]” (Schmitter & Karl, Reference Schmitter and Karl1991, p.76). But the agent accountability at play in democracies has a forensic underbelly in the form of often assumed constitutive links between transparency and accountability. As Baume and Papadopolous (Reference Baume and Papadopoulos2018) argue, since the end of the 18th century, transparency has been a normatively oriented project increasingly employed in opposition to opacity and capricious decision-making and in the uncritical service of “good governance” (see Hood & Heald, Reference Hood and Heald2006; Ball, Reference Ball2009). In this reading, transparency provides forensic accountability by “showing” the machinery of government and decision-making to the citizens who hold government accountable via agent accountability, linking individuals to their actions. It is the implicit link between transparency and these two types of accountability that underpins the assumption that accountable AI requires transparency as a precondition for explainability, interpretability and ultimately accountability (see Deeks, Reference Deeks2019; Lechterman, Reference Lechterman, Bullock, Chen, Himmelreich, Hudson, Kornek, Young and Zhang2022).
But the link between transparency and accountability, as applied to arguments for xAI and interpretable (and thus accountable) AI exhibits little definitive discussion on what transparency looks like and how it is to be achieved. As Annany and Crawford (Reference Ananny and Crawford2018, p.973) note in the context of AI, “being able to see a system is sometimes equated with being able to know how it works and govern it.” Adopting a similar timeline to Baume’s and Papadopolous’s work on transparency in a political context, they show that the literature on the history and philosophy of science reveals an emphasis on transparency as an empirical ideal appearing in the epistemological foundations of 18th century ideas about the knowability of the social world, which adopted the tools and empirical foundations of natural science (see Daston, Reference Daston1992; Daston & Galison, Reference Daston and Galison2021).
This lack of clarity travels freely between political theory and work on AI (Lechterman, Reference Lechterman, Bullock, Chen, Himmelreich, Hudson, Kornek, Young and Zhang2022). Finel and Lord (Reference Finel and Lord1999, p.319), for example, define transparency by hitting familiar notes of institutions and bureaucratic responsibility:
Transparency comprises the legal, political, and institutional structures that make information about the internal characteristics of a government and society available to actors both inside and outside the domestic political system. Transparency is increased by any mechanism that leads to the public disclosure of information, whether a free press, open government, hearings, or the existence of nongovernmental organizations with an incentive to release objective information about the government.
But this definition, like others, does not specify the instruments and practices via which transparency and thus accountability are achieved beyond the broadest of strokes. A lack of specificity about the nature of transparency and its link to accountability limits our ability to transfer this thinking to the problem of accountable AI and problematizes approaches which do so uncritically.
Novelli et al.’s (Reference Novelli, Taddeo and Floridi2023) concept of report in the context of AI accountability arrives at a clearer understanding of the relationship between accountability and transparency. They identify four implicit goals of accountability: compliance, report, oversight and enforcement. Of the report, they note, “The goal is to ensure that the agent’s conduct is properly recorded to explain and justify it to the forum (or the principal). The reporting of relevant information enables the forum (or the principal) to challenge and disapprove the agent’s conduct. Determining which information is relevant is not always easy but can be based on the requirements of the associated oversight” (2023, p.1874). Echoing Briet (Reference Briet2006), Dubnick (Reference Dubnick2005) notes that “in many instances [reporting] is a mirror of (and surrogate for) the act of direct monitoring by a principal of the behavior and act” (Dubnick, Reference Dubnick2005, p.83), offering transparency and thus certainty about what has occurred in the face of unyielding epistemic uncertainty about the past.
In this reading, reporting is a mechanism of transparency that travels through time and space, linking noncontiguous events and citizens. If we cannot assume that accountability – whether forensic or agent – is instantaneous, then we must have a mechanism via which accountability can be enforced over space and time. Reporting fulfills this function in the sense that it makes things transparent. It provides a level of certainty about what has occurred in the past (forensic accountability) and facilitates accountability in the present and the future (agent accountability). In the context of resort-to-force decision-making, we can see the link between “reporting” and the role of bureaucratic archives in providing forensic and agent accountability for decisions made in the past.Footnote 2 As Novelli et al. (Reference Novelli, Taddeo and Floridi2023) argue, information that is “properly recorded” is foundational to oversight. In the context of democratic oversight, familiar with ideas about responsibility for the decision to resort to force, “properly recording” means adhering to norms of bureaucratic record-keeping. The decision to bomb Hiroshima is well documented in the records of the Harry S Truman Presidential Library, for example. The collection holds over 500 documents relating to the decision, mainly comprising memoranda produced by bureaucrats according to standard formats and norms, and collected by the presidential library according to standard archiving procedures and rules. The collection provides transparency over what occurred and thus facilitates accountability for the decisions that were made.
Theorists of documents offer a description that defines the important role they play in determining transparency through time and space and thus delivering adequate certainty about what has occurred so that forensic and agent accountability may be enforced. These concepts implicitly underpin the bureaucratic relationship between documents, transparency and accountability. Anthropologist Matthew Hull writes that documents “represent or engage with autonomous entities, realities “in the world” independent from the processes through which they are produced” (Hull, Reference Hull2012, p.25). For Suzanne Briet, documents have a temporal and probative function: they are “preserved or recorded toward the ends of representing, of reconstituting, or of proving a physical or intellectual phenomenon” (2006, p.10).Footnote 3 It is here that the minutes, records and archives of decisions find themselves: reporting decision-making so that the decision may be interrogated in the future and decision-makers held accountable.Footnote 4 The function of documents in the context of accountability is to provide transparency and thus a level of certainty about what has occurred in the otherwise uncertain past. Documents operationalize reports in the service of accountability.Footnote 5
But AI does not produce documents recording its decision-making processes, or at least not those that can be read by humans. This opacity can stem from technological illiteracy, from size and complexity, which are beyond human capacity (see Søgaard, Reference Søgaard2023), and from the AI’s lack of decomposability, meaning that it is sensitive to highly complex inputs that cannot be broken down and understood by humans (Fleisher, Reference Fleisher2022). It is this lack of documentation and the transparency it affords that are at the core of the search for xAI and thus accountable AI. Following Novelli et al. (Reference Novelli, Taddeo and Floridi2023), there is no report, and no oversight without documentation. AI decision-making is epistemically opaque in that it is impossible to identify its epistemic elements and thus to document them and make it transparent and thus accountable in the sense of report (Novelli et al. Reference Novelli, Taddeo and Floridi2023). The apparent absence of transparency is at the heart of disquiet around accountable AI in public policymaking, especially in the context of the high stakes of resort-to-force decision-making. It means there is a lack of certainty available to those who wish to hold decision-makers accountable in the context of the inherent epistemic uncertainty of past events (Lechterman, Reference Lechterman, Bullock, Chen, Himmelreich, Hudson, Kornek, Young and Zhang2022, pp.169, 173).Footnote 6 It is not only a matter of assigning moral agency to the actors involved (Erskine, Reference Erskine2024b) but also one of understanding, communicating and adjudicating what the actors involved have actually done. xAI attempts to address this by offering a number of options to replicate or explain the algorithm’s workings in the context of its outputs: to explain rather than document its epistemic elements in the service of broader, overarching values of transparency and accountability (Deeks, Reference Deeks2019). However, the nature of this explanation and problems of complexity, trade secrets and expertise weigh heavily on the project of xAI, which is yet to be realized (Deeks, Reference Deeks2019, p.1834; Lechterman, Reference Lechterman, Bullock, Chen, Himmelreich, Hudson, Kornek, Young and Zhang2022, p.173).
3. International criminal law, evidence and certainty
Perhaps there is another way to address accountability without relying on transparency in the face of AI’s transparency deficit. This section of the article argues that polities already deal with uncertainty in many different forms and already navigate accountability in the face of uncertainty in ways that have relevance for our thinking of accountability and AI. Epistemic uncertainty and the absence of transparency are features of many international and domestic policy environments. These approaches have rather less absolute relationships with certainty, transparency and documentation and may offer guidance for how to approach the issue of accountable AI, especially in the context of resort-to-force decision-making.
In international politics, transparency operates rather differently than it does in domestic politics, where – at least in familiar Weberian bureaucracies – transparency is available via documentation. Outside the container of domestic politics, there are no shared practices of transparency: no minutes, no cabinets, no archives that operate across national boundaries in a transnational space rather than a multilateral space. And in international politics, transparency is also clouded by the uncertainty induced by wildly varying worldviews and practices (Katzenstein, Reference Katzenstein2022) and by powerful yet epistemically inchoate actors such as states, leaders, private actors and communities. For example, there is no “archive” of the South China Sea in the context of recent disputes within which engage Vietnam, the Philippines, China, the US, local fishing communities and the International Court of Justice. There are no minutes taken that cut across the container of domestic politics or institutional practices. Without Weberian structures of documentary certainty and transparency shared across states, actors in international politics look for certainty and accountability in other ways – in formal treaties, for example, in espionage, or in the rituals and practices of international institutions (Grant & Keohane, Reference Grant and Keohane2005).
Given these features, this paper argues that rather than taking the model of accountability from domestic models of bureaucratic decision-making, as xAI seeks to do in transparency-oriented models of accountability, we should look to other decision-making environments. In particular, a similar environment of international epistemic uncertainty, high stakes and closed, elite decision-making exists in international criminal law – specifically, decision-making in the context of apportioning criminal responsibility. Indeed, concerns about the problem of accountability for international atrocities committed in so-called “autonomous wars” engineered by AI agents have been debated at the highest levels of international politics since at least 2010 (Garcia, Reference Garcia2024, pp.195–197).
International courts and tribunals search for evidence about the black box of human interiority and the black box of the past. They do so in an environment where documents in the Weberian sense may not be readily available. States may not share documentary traditions or languages, and where they may not be willing participants, they may refuse to share documents. International tribunals have also historically operated under innovative evidentiary procedures, bringing together a range of legal traditions and practices to devise ways to establish certainty. Given these unstable and developing evidentiary practices, international criminal tribunals operate under conditions of uncertainty about what has actually occurred and why, much as in AI decision-making. And yet they make a decision about the certainty of what has occurred and who should be responsible and apportion accountability accordingly. In doing so, they offer lessons for how to hold AI accountable, proceed in the absence of transparency and certainty about what has actually occurred in the case of AI decision-making.
The problem of establishing certainty is key to the practice of law. To decide how responsibility is to be apportioned, the rule of law demands that sufficient certainty be established as to the facts of the case, which are by definition uncertain. Evidence admitted to the court offers a way for decisions to be made with sufficient, if not absolute, certainty. Given the structural role evidence plays in fashioning certainty and thus accountability, I argue that approaches to evidence in international criminal law offer lessons for how black boxes and inescapable epistemic uncertainty are already approached to resolve epistemic uncertainty and apportion responsibility and – in sentencing – deliver accountability in the way demanded by the search for accountable AI in resort-to-force decision-making.
In particular, the rules, use and practices of evidence in international criminal law offer an insight into an existing model of accountability in the face of documentary gaps and epistemic uncertainty, especially when applied to evidence derived from new technologies. This existing model of accountability exists in an environment – international, uncertain and high stakes – which reflects the conditions under which resort-to-force decision-making takes place. In this context, evidence in international law constructs a scaffold of certainty about the nature of events in the past for presiding judges. It plays a similar role to documents in Weberian bureaucracy in delivering transparency. Like documents, evidence “represent[s] or engage[s] with autonomous entities, realities ‘in the world’ independent from the processes through which they are produced” (Hull, Reference Hull2012). In criminal judgments, the aim is to judge not only whether an individual is guilty of a crime, but in the process of doing so, provide a level of certainty as to whether a fact of the matter is proven or disproven based on the evidence provided. Rules of evidence, then, shape the “documents” which are permitted to represent “realities in the world.” Accountability for crimes is based on the evidence, which is permitted to be evaluated by judges through court procedures. But a court judgment is not a demonstration of absolute certainty. Instead, it is a demonstration of certainty in the context of a socially defined form of accountability through law, just as administrative accountability does through bureaucracy. The process of the evaluation of evidence is itself a recognition of the existence of uncertainty.
Adopting lessons in this vein from attempts to reduce uncertainty about the events which drive international criminal law builds on Deeks (Reference Deeks2019), who argues for the role of common law tools in shaping the “nature and form” of xAI (Deeks, Reference Deeks2019, p.1829). She describes an exogenous approach to xAI (and thus accountable) which does not attempt to “explain the inner workings of the algorithm itself but rather provides relevant information to the algorithm’s user or subject about how the model works using extrinsic, orthogonal methods” (Deeks Reference Deeks2019, p.1835). Within this broader class of approaches, Deeks identifies “model-centric” approaches (also referred to as global interoperability) which are essentially “thick descriptions of the parts of the model that are knowable” provided to users or subjects. This could mean, for example, explaining the creator’s intentions in building a model, describing the training data in a qualitative sense, and describing the testing process. It is these thick, extrinsic and orthogonal approaches to reducing uncertainty and thus improving accountability that this article seeks to draw from the example of international criminal law.
4. Lessons in certainty from evidence in international criminal law
Rules of evidence in international criminal law are evolving, and the process by which evidence is allowed, contested and adjudicated can differ between judges and between trials (Freeman, Reference Freeman2022). As above, the rules of evidence establish the way in which legal actors arrive at certainty and thus forensic accountability. This article argues that examining the ways in which international criminal trials have approached the problem of uncertainty in the context of new technologies and emerging standards of evidence offers useful lessons for developing mechanisms of accountability in the use of AI in resort-to-force decision-making. This section briefly outlines the adoption of two types of evidence that have become increasingly important in international criminal law since the resurgence in the 1990s, and which have been subject to debate about their admissibility. These debates are at their core about the extent to which items can be classed as “documents” in the sense that they engage with autonomous entities in the world, providing an approximation of certainty as to events in time and space (Hull, Reference Hull2012).
Ideas about evidence in international criminal law began with a clear emphasis on documentary evidence at Nuremberg in the sense of Weberian, bureaucratic records of the Nazi state. In their earliest iterations, international criminal tribunals were able to simply adapt the documentary certainty provided by domestic bureaucratic practices. At both the Nuremberg and Tokyo tribunals, prosecutors were aided by victors’ access to vast bureaucratic records of Nazi Germany and Imperial Japan and by a victor’s prerogative to assume the authenticity of those documents and admit them as evidence (Jackson Reference Jackson1947; May & Wierda, Reference May and Wierda1998; Tanaka et al. Reference Tanaka, McCormack and G2011; Roling & Cassese, Reference Roling and Cassese1993).Footnote 7 But today, international criminal law operates without even these minimal constructs of documentary certainty. The post-World War (WW) 2 trials, which form the basis for international criminal law today, took place after a clear peace had been established, and were designed and implemented by the victors. By comparison, the resurgence of international criminal law in the 1990s, beginning with the ICTY, faces a more clouded evidentiary landscape, with consequences for how certainty and accountability are apportioned.
Today, evidence of international war crimes is rarely documentary (Combs, Reference Combs2010; May & Wierda, Reference May and Wierda1998). Debates focus on the role of expert witnesses, the admissibility of open-source evidence and the admissibility of new types of technologically mediated evidence, such as forensic DNA evidence and digital evidence. This section focuses on the last of these, arguing that the analogy to AI is clearest here. It focuses explicitly on the use of this technologically mediated evidence in the form of forensic DNA and digital evidence at the ICTY. Both forensic DNA evidence and digital evidence offer important examples of how courts have arrived at certainty despite judges lacking the requisite technical knowledge to understand and interrogate the technology itself, recalling the opacity of AI decision-making.Footnote 8
The ICTY was the first war crimes court created by the United Nations (UN) and the first international war crimes tribunal since the Nuremberg and Tokyo trials at the end of WW2. During its mandate, which lasted from 1993 to 2017, the ICTY helped reinvigorate the enforcement of international humanitarian law. It is the source of important jurisprudence on war crimes, genocide and crimes against humanity. It blended inquisitorial and adversarial common law systems, with rules frequently amended by judges over time (Boas, Reference Boas, Boas and Schabas2003). This resulted in a fairly fluid process of development and responses to perceived problems of practice as they arose. This fluidity makes the ICTY a useful site to examine ways in which decisions about accountability were made in the context of not only imperfect evidence but also fluid and evolving rules about evidence and the necessary and sufficient conditions for decision-making and apportioning accountability in the face of uncertainty.
For example, the ICTY statute, unlike the Statute of the International Criminal Court, contains no directions that directly consider the issue of precedent, and the common law principle of stare decisis does not explicitly apply. In addition, reference to national jurisdictions was adopted at the ICTY in only minor ways and was not mandated (Harris, Reference Harris2001). Sources of customary law for the ICTY were thin and sometimes explicitly denied by the relevant statutes (sources of customary law in international criminal law remain controversial – see Tan, Reference Tan2018). Despite the considerable influence of the post-WW2 tribunals in the broader context of international criminal law, for example, the ICTY was reluctant to apportion too much weight given those tribunals’ status as “victors” justice’ (Harris, 2001).
Within this broader fluidity, judges at the ICTY chose to adopt a principle of the “free evaluation of evidence” (Klamberg, Reference Klamberg and Heller2020, p.1) meaning that the ICTY rules of evidence represent the first attempt to create a coherent and credible code of procedure and evidence for the prosecution of international criminal conduct and, particularly, the prosecution of violations of international humanitarian law (Boas, Reference Boas, Boas and Schabas2003). This means that the ICTY offers an important view of the development of new ideas about accountability in the face of epistemic uncertainty, a lack of transparency about what has actually occurred, and offers an example of the implementation of accountability regardless of these factors. Importantly, this occurred in the context of significant technological developments, especially forensic DNA evidence and digital evidence in the form of satellite imagery.Footnote 9
The ICTY experience thus offers useful lessons in how decisions about certainty and accountability are made in the context of new and complex technologies like AI, in which decision-makers lack expertise and for which clear and well-established rules have yet to be established, and in conditions where documentary evidence is lacking. Discussions at the ICTY show how the Tribunal resolved the problem of accountability in the face of a transparency and documentary gap, establishing sufficient minimum transparency for certainty. In adopting the example of the admission of a new type of evidence at the ICTY, the paper follows Peters (Reference Peters2024) in arguing that insisting on absolute certainty regarding the processes of AI decision-making need not be necessary for accountability. Rather, in drawing lessons from the ICTY, the article shows that minimal necessary truth (NT) conditions in the face of epistemic uncertainty in international legal tribunals exist and suggests a similar approach for xAI and accountable AI.
4.1 Forensic DNA evidence
The ICTY saw a shift to the use of new forensic technologies by the prosecution to establish epistemic certainty as to the matter at hand, primarily how many dead, their identities, and the circumstances in which they died. Because judges were not experts in these technologies, this meant allowing expert testimony to facilitate judicial certainty as to what had occurred. Assessment of the certainty afforded by such technologies, including DNA analysis, was contested in cross-examination of expert witnesses and incorporated into judgments for the first time. The process of nonexperts (judges) weighing and assessing scientific and technological evidence shows a procedural approach to certainty, where in the process of judicial assessment, science “enters the courtroom not in the form of bare facts or claimed truths about the world, but as evidence” (Jasanoff, Reference Jasanoff2006, p.329), which is then debated, weighed and assessed.
For example, during the Krstić trial, forensic evidence from mass graves corroborated the testimonies of witnesses concerning the mass executions and burial of thousands of Bosnian Muslim men. The trial marked the first time forensic evidence was used in international criminal law and established the foundations for ongoing debates about its admissibility and use. The forensic evidence in Krstić included reports on exhumations, autopsies and laboratory analysis as well as photographic evidence, material artifacts and DNA evidence. It helped to establish firstly that genocide had occurred and secondly to establish the genocidal intent the Trial Chamber accorded to Krstić by showing that the conditions of the bodies in the graves and the nature of the graves themselves showed that Krstić must have had responsibility for ordering the relevant murders in a systematic manner.Footnote 10 Expert witnesses were called upon to interpret the scientific data for the court, and played a neutral role rather than appearing as witnesses for either side – as “… a witness of truth before the Tribunal and, inasmuch as he or she is required to contribute to the establishment of truth, not strictly a witness for either party.”Footnote 11 Forensic expertise used by prosecutors encompassed various disciplines such as forensic anthropology, archaeology and pathology. Forensic experts were cross-examined extensively and were called upon to describe and justify the analytic and scientific methods employed in the field.
The judgment in Krstić was clear on the role and value of forensic evidence in reducing uncertainty via corroboration across domains of knowledge production: “The accounts given by the survivors of the execution sites are corroborated by forensic evidence (such as shell casings and explosive and tissue residues) at some of the execution sites, expert analysis of the contents of mass graves and aerial reconnaissance photographs taken in 1995” (Krstić 2001, pp.2, 21, 25, 71). Forensic experts gathered and interpreted evidence to provide “unequivocal corroboration to what could otherwise be suspect or dubious evidence” (Blewitt Reference Blewitt1997, p.284). In this, the judgment emphasized the probative weight of forensic DNA evidence in corroborating evidence from other sources, such as documentary evidence or witness statements. That is, the judgment saw the epistemic value of forensic evidence as corroborative rather than the “truth” of the forensic science itself. In this, scientists were framed not as “disintegrated agents but rather … immersed in a web of relations that play an important role in determining the character of truths that emerge from their interaction” (Browne et al. Reference Browne, Keeley and Hiers1998, p.50).
The contestability of forensic evidence itself has also somewhat paradoxically contributed to the development of judgments as to certainty. Familiar common law adversarial elements of domestic criminal trials in a range of jurisdictions are designed to arrive at a type of certainty. The probative value of the evidence that emerges from contestation is a cipher for certainty as to the individual facts themselves. Elements of the adversarial system adopted by the ICTY included examination-in-chief by the party calling the witness, cross-examination by opposing parties, and, if necessary, re-examination to deal with matters brought out under cross-examination. This adversarial mode was employed to arrive at certainty in the face of not only human-oriented epistemic gaps, such as the reliability of witness statements, but also, for the first time internationally, in the context of forensic evidence, applying these models to a technologically and scientifically oriented “black box” in the context of the workings of forensic technologies and without an established standard for admissibility and weighting of such evidence, especially given the principle of free evaluation of evidence which the Tribunal had adopted. In Popović, for example, the defense challenged the prosecution’s forensic evidence, specifically the use of DNA matching to identify victims, by calling witnesses who challenged both the scientific methods employed and the professional credentials of forensic experts called by the prosecution who had collected and analyzed the original evidence. In the final judgment, Chambers noted that the contestability provided by opposing witness testimonies had strengthened the probative weight of the DNA-focused testimony provided by the prosecution, noting that the analysis provided by the defense witnesses “serves only to strengthen the conclusion that the DNA analysis conducted by the ICMP [prosecution witnesses] is reliable.”Footnote 12 The value of contestability in providing certainty is key to adversarial systems in common law practice, (Tuzet, Reference Tuzet, Dahlman and Feteris2012), but in this instance the contestability was engaged by expert witnesses, describing scientific models, methods and practices with which judges were not conversant.
4.2 Satellite imagery
Satellite imagery provided by the U.S. government was also used in an international criminal tribunal for the first time at the ICTY. Aerial images offered by the prosecution were admitted to show areas of disturbed earth that represented mass graves in the cases of Krstić, Popović et al. and Tolimir (Freeman, Reference Freeman2018, p.301). The images also showed other items of interest, such as buildings, vehicles and groups of prisoners. The admission of satellite imagery of mass graves as documentary evidence was controversial, largely because their provenance was unclear, because the US refused to release information about how they were acquired, citing its national security interests. The defense in Tolimir objected to the admissibility of the images as evidence, arguing that their veracity could not be proven, largely because the US stipulated as a condition of providing these images that the sources and methods used to create them could not be discussed in the courtroom (see generally Tolimir).
The images provided an interesting problem of authentication and verification, which continues to resonate in international criminal law. It also resonates with the problem of AI accountability because it shows the Tribunal grappling with the problem of achieving certainty via transparency (the images showing disturbed earth) if the mechanism by which transparency was achieved (the details of the acquisition of satellite imagery) cannot be verified, potentially destabilizing the certainty that it provides. This usefully recalls the problem of “opaque and unpredictable” AI decision-making and the difficulty in divining accountability under such circumstances.
The prosecution used the concept of documents to incorporate the satellite imagery as evidence and thus as an aid to transparency, certainty and accountability. Incorporating the images as documentary evidence meant they could be evaluated as evidence on the same basis as paper documents, as a record of events (though still subject to contestation as to the accuracy of that record, with probative weight accorded by corroboration by other evidence). In addition, the imagery was incorporated into evidence as a “self-authenticating” document, much as a certified paper document, or a paper document with a logo could be considered self-authenticating as to its source. Admitting the images as an authenticated document meant thatFootnote 13 certainty as to their provenance could be assumed because of the credibility of the source of the imagery (US government satellites), removing that from debate.Footnote 14 Treating satellite images as documents meant that they provided a route to transparency as to what had occurred, and as a result, to certainty and accountability.Footnote 15
The ruling that the satellite images were self-authenticating documents meant that to challenge their admission and their reliability, the defense would have to show that they lacked reliability or were not authentic. This suggests that the value placed on contestability in the adversarial element of the trial was contingent on the possibility of meaningful debate about the epistemic elements of the evidence presented. Because the epistemic elements of the US satellite imagery were not available, and because their provenance – in the US national security apparatus – was considered self-verifying, the Chamber implicitly acknowledged that meaningful debate as to their provenance was not possible. Instead, the images were simply admitted as documents, defining them as adequate engines of transparency and certainty without debate, and thus as adequate evidence via which to at least begin to enforce accountability via the mechanism of a report.
This process, in turn, suggests that assigning authenticity to epistemic elements is possible without further interrogation of those elements, although only if it is treated as a document in the context of the trial. By contrast, for example, in the Ayyash et al. case before the Special Tribunal for Lebanon (STL), the defense attempted to submit WikiLeaks documents as evidence. However, these were ultimately deemed unverifiable by the STL Trial Chamber, thus rendering them inadmissible. This shows that the existence of documents and the certainty, transparency and accountability they provide can be determined in relatively fluid ways by the organ of accountability itself rather than conforming to an external standard of transparency and truthfulness.
The concept of authenticity and the possibility of documents being considered as self-authenticating appear to be defined not only on the basis of trusted sourcing but also on the basis of the extent to which digital evidence has been mediated. In her work on the use of digital open-source technologies in international human rights law, Lindsay Freeman shows that digital evidence in international criminal procedures is generally considered as either documentary (meaning self-authenticating in the common law system) or forensic, depending on whether analysis or scientific procedure has been applied in order to validate or verify the digital item. Satellite images are thus generally considered documentary evidence, as are emails. Audio enhancement of a voice file or an expert report using raw digital data that may involve an expert witness would require additional conditions to be met because they involve mediation, and as such would be treated as forensic evidence (Freeman, Reference Freeman2018, Reference Freeman2022; Freeman & Vazquez Llorente, Reference Freeman and Vazquez Llorente2021).
In the context of xAI and accountable AI, this suggests two outcomes, especially in light of the manner in which US satellite imagery was incorporated as self-authenticating documentary evidence. The first is that even in the high-stakes and highly uncertain arena of international criminal law, self-authenticating evidence can be admitted because trusted sources can exist without interrogation of the methods by which information is created. It also suggests an emerging argument about the nature of a self-authenticating document (not present at the ICTY but implicit in the different treatment of DNA evidence and satellite imagery) that authenticating sourcing is not only a matter of identifying the sourcing but also of identifying and evaluating the extent of mediation of information.
5. Conclusion: Lessons for AI in resort-to-force decision-making
As in all courts, adjudicating evidence in international criminal law requires neutral adjudicators to make judgments to ascertain the minimum NT conditions to reduce the epistemic uncertainty of international events. The treatment of evidence in international criminal law shows the results of a fluid, highly contested environment of legal decision-making without the benefit of clear rules of precedent, especially in the context of evidence which is used, like documents in the Weberian sense, to reduce epistemic uncertainty and apportion accountability. Within the broader fluid approach of international criminal law, this article has examined the ICTY’s approach to the problem of new forms of epistemic uncertainty, in forensic evidence and expert witness statements, and elsewhere in the use of digital evidence. This approach is relevant for developing systems of accountability for AI in the context of resort-to-force decision-making.
The limited analysis advanced in this article suggests at least three epistemic practices by which judgments are made under conditions of irresolvable technologically mediated uncertainty in international criminal law: basis of corroboration, contestability and authentication. Each of these offers lessons for actors who are searching for accountability in internationally oriented AI decision-making systems, especially in high-stakes contexts such as resort-to-force decision-making. The concept of corroboration suggests that developing expertise and information from a range of sources can sufficiently reduce epistemic uncertainty in the court’s view. This is reminiscent of current approaches in attributing cyber-attacks, for example, where epistemic certainty is approached via corroboration and contestability rather than documentary certainty, which is unachievable (Canfil, Reference Canfil2022; Crootof, Reference Crootof2017; Rid & Buchanan, Reference Rid and Buchanan2015). In the context of resort-to-force decision-making, this suggests a role for multisource corroboration of decision-making by technological systems.
Contestability suggests that uncertainty can be sufficiently mitigated by debate about the relationship between evidence and fact, including debate about the production of knowledge via contestation of working methods and professional standing. The experience of the ICTY suggests that facilitating contestability is a valuable and established way to reduce uncertainty, even if debate is not necessarily conducted under the auspices of entirely knowledgeable oversight regarding technology or scientific method. The judgments at the ICTY regarding forensic evidence were made by nonexperts, for example, but certainty and thus accountability were still satisfactorily achieved in the eyes of the Tribunal. Approaching epistemic uncertainty through a lens that emphasizes contestation even in the absence of absolute expertise reduces the temptation of technoscientism apparent in a “black box” approach to AI (Jasanoff, Reference Jasanoff2006). In practical terms, for xAI and accountable AI, facilitating contestability – even within a single state’s decision-making process – could facilitate the production of counterfactuals by competing systems. Reflecting ICTY debates about professional standing and expertise, emphasizing certainty and accountability in accountable AI could mean ensuring that information that is relevant to the relationship between the AI agent, the forum in which decisions are made, and accountability is generated and assessed by human decision-makers (Miller, Reference Miller2019).
Contestability also raises the issue of private actors and private interests in that process of debate. The example of US satellite data being admitted as evidence and automatically authenticated despite Defense objections showed that actors outside the court’s jurisdiction could limit the court’s access to contestability. This reflects work by critical scholars of AI who argue that the political economy of AI limits our capacity to hold it accountable because the corporate origin of AI means trade secrets can be invoked even if the technical barriers currently limiting xAI are broached (Wexler, Reference Wexler2018). The ICTY cases discussed in this article suggest that this is problematic in the context of accountable AI because of the capacity private (or nonjurisdictional) actors have to limit contestability in this context. This is particularly important when applying this thinking to resort-to-force decision-making, where state actors are currently largely reliant on private actors to develop AI tools.
Finally, the treatment of evidence in international criminal law suggests a role for authentication, reducing epistemic uncertainty. Authentication is the process whereby information is admitted as documentary evidence based on decisions about the reliability of its sourcing. In the context of AI, this could refer to special attention paid to the custodial chain of training data. This issue is a live debate in international criminal law in the context of evidence lockers and social media data, as in the case of The Gambia v. Facebook at the International Court of Justice, which hinges on Facebook’s role in fomenting genocide in Myanmar. In that case, scholars and practitioners are working to find ways to authenticate social media data so that it may be admitted as evidence (Mooney et al. Reference Mooney, Pundyk, Raymond and Simon2021). The treatment of evidence in the ICTY trials also suggests that conceptually, the differentiation between documentary evidence and forensic evidence and its relationship to authentication bears epistemic fruit. This means that deciding on a “point zero” beyond which decisions, or training data, may be considered as forensic evidence of decision-making because they have been “altered,” much like an audio recording can be edited, might offer conceptual clarity. It could add to the certainty provided by AI outputs further along or away from the point at which a decision may be considered a (self-authenticating) “document.” Deciding whether to treat the outputs of resort-to-force decision-making as self-authenticating documents produced by authentic sources or as forensic evidence of a decision or series of decisions helps to decide how to treat the (un)certainty offered by AI decision-making tools. Importantly, any move towards authentication as a mode of addressing the quest for certainty will demand that adjudicators pay attention to the custodial chains of training data as well as to the machinations of AI models themselves.
Adapting principles of corroboration, contestability and authentication from international criminal law to the search for AI is a thick, qualitative approach to accountable AI that adopts model-centric approaches to problems of explanability and thus accountability. This is in contrast to a decompositional model, which searches for solutions within the model itself and which has dominated certain strands of scholarship and policymaking on AI governance (Deeks, Reference Deeks2019, p.1835). The lessons from evidentiary approaches at the ICTY help to build a case for a “socially situated” xAI and thus accountable AI, rather than the “document”-driven, algorithmically focused approach to accountability offered by a decompositional model (see Ehsan et al. Reference Ehsan, Liao, Muller, Riedl and Weisz2021). This usefully distinguishes between the approach of the US National Institute of Standards and Technology and the EU Artificial Intelligence Act, for example, and arguably between xAI and interpretable AI. Peters (Reference Peters2024) notes the value of this distinction when he distinguishes between the moral value of epistemic opacity and the moral value of epistemic uncertainty (see Koskinen, Reference Koskinen2024). He argues that approaching AI through the context of inevitable epistemic opacity – allowing simply that AI cannot be understood – affords AI developers moral irresponsibility. Accepting uncertainty allows for incomplete approaches and forces the hand of developers who might remove themselves from responsibility otherwise.
International criminal law shows us that accountability can proceed in the face of uncertainty, regardless of complexity and opacity. We can develop, and already have developed, ways to implement accountability in high-stakes contexts such as international criminal law despite the epistemic uncertainty inherent in the thick fog of war, the past and technology. Returning to the question of AI and resort-to-force decision-making, adapting the lessons of the ICTY’s approach to new technologies in the absence of clear precedent and inherent uncertainty has several implications for developing approaches to AI accountability in this context. Primarily, it means re-examining attachments to Weberian, bureaucratic notions of accountability and moving beyond them to other possibilities. “Thicker” approaches to the problem of xAI in the context of resort-to-force decision-making can move the discussion beyond simple concepts of documentation. But perhaps most of all, they can suggest a robust, practical approach to the problem of uncertainty, which moves the problem of accountability and responsibility forward.
Competing interest
The author has no competing interest or relevant funding to declare.
List of cases cited
Ayyash: Prosecutor v. Salim Jamil Ayyash et al., case no STL-11-01/I/TC
Bemba: Prosecutor v Jean-Pierre Bemba Gombo, case no ICC-01/05-01/08
Krstić: Prosecutor v. Radislav Krstić:, case no. IT-98-33-T
Kupreskić: Prosecutor v. Zoran Kupreskić, Mirjan Kupreskić, Vlatko Kupreskić, Drago Josipović, Dragan Papić, Vladimir Šantić, case no. IT-9516-T
Popović: Prosecutor v Vujadin Popović, Ljubiša Beara, Drago Nikolić, Ljubomir Borovćanin, Radivoje Miletić, Milan Gvero, Vinko Pandurević, case no IT-05-88-T
Tolimir: Prosecutor v Zdravko Tolimir, case no. IT-05-88/2