A. Introduction
The use of Artificial Intelligence (AI) has caused a disruption in the legal sector, increasingly affecting courts. The discussion of AI potentially taking over the task of adjudication in the not-too-distant future has become somewhat of a “scholarship favorite” in the last few years.Footnote 1 If one listened to AI enthusiasts only, one might get the impression that the advent of fully-fledged “Robo-judges” was already imminent. However, this idea has at least as many opponents as proponents. In legal scholarship, in particular, the concept of an AI judge tends to be met primarily with skepticism. While the approaches may vary quite drastically from one another, the overall conclusion seems to be rather uniform: AI cannot replace the human judge for both technological and legal reasons.Footnote 2
So far, so good. Interestingly enough, though, many scholars do not leave it at that. Rather, they oftentimes extrapolate from negating the feasibility of a complete replacement of the human judge by AI to proclaiming that AI may “only” be used to support the human judge by assisting them in the course of the decision-making process.Footnote 3 This later claim is mostly based on some version of the notorious “human in the loop” argument.
The mask of this cure-all remedy, however, quickly starts to crack if one zooms in on this framing of the judge as the human in the loop. The widespread underlying intuition that having AI tools assisting a human judge is categorically less problematic than fully delegating judicial decision-making to AI turns out to have a rather shaky foundation. To challenge this assumption, it is essential to carve out the core arguments in favor of keeping a human in the loop and trace them back to their roots.Footnote 4 Doing so establishes the baseline for identifying the risks of opting for human in the loop constructs.Footnote 5 To illuminate the technological and legal challenges of “merely supporting” AI tools may pose for the judicial decision-making process specifically, stopping at AI as an abstract concept, is not sufficient. Rather, the focus must shift to concrete AI tools, categorized against the backdrop of the different stages and features of court proceedings.Footnote 6 These AI use cases may then serve as representative examples of graduated degrees of AI involvement in the judicial decision-making processes, with each of them having a rather distinct potential to impact the judicial decision. The outcome of the analysis urges a rethinking of the current understanding of AI “co-working” with judges. It proposes a move towards a clearer division of labor, where AI takes on specific, well-defined tasks that do not intrinsically require human judicial expertise, rather than an intertwined assistance model. For those scenarios where human and AI contributions are deeply interwoven, a conscious decision whether AI or a human judge should perform the underlying tasks in the future is required.Footnote 7
B. AI Assistance Over AI Delegation: Roots and Core Arguments of the Human in the Loop-Intuition
As we all know by now, AI is everywhere and here to stay. The manner in which AI is incorporated into our lives remains, however, a moving target and so does the scholarly debate analyzing it. For many years, the discussion has mostly been focused on whether and in which domains AI will replace humans. As it is the case with most technological advancements, one of the first questions widely discussed with the rise of AI has been which jobs will no longer exist for humans. The legal profession, let alone the judiciary, were not, however, at the heart of this debate. Rather, the debate centered around whether we will still require human drivers, human pilots, human servers, or human nurses, to name a few.Footnote 8 Recently, the discussion is more and more shifting towards the question of AI assisting humans. This may seem counter-intuitive at first glance, as one may consider moving from AI as a mere assisting tool with limited capabilities to AI as an alternative to a human in the course of technological evolution to be the more obvious scenario. This paradox can, however, be explained when taking into consideration which tasks are at the heart of the “AI assisting humans” versus “AI replacing humans” debate. With the advances in AI, the areas in which AI is, in fact, capable of replacing the human or has done so already, are increasingly not worth debating because AI outperforms the human more and more visibly without creating additional or greater risks in comparison to sticking with humans to perform the task in question. For example, it is no longer up for debate that AI is greatly outperforming any human counterpart when it comes to collecting data about user’s online behavior and interpreting this data to recommend them a new book to read or a new video to watch. Such use of AI nevertheless, of course, raises numerous questions, including whether we actually want AI to take on these tasks – for example, due to concerns about subliminally influencing the decision-making process of the users. Such objections are, however, not rooted in doubts that AI is able to perform the task in question. In fact, the opposite is the case: AI is so good at performing this task that humans may question whether the task itself requires reevaluation. With regard to tasks in areas in which humans being replaced by AI has been considered rather unlikely from the start, for example, due to its core human skills-oriented nature or lack of a reliable and objective way to measure the accuracy of the outcome produced by the AI tool, our perception has not substantially changed with technological advancements. On the contrary: The more we understand how AI functions, the less we think it is likely for it to fully replace humans with regard to these tasks. This, in turn, results in scholarship moving beyond AI versus human-scenarios and instead putting AI and human-scenarios, characterized by an AI assisting a human to fulfil tasks, increasingly at the heart of their assessment.Footnote 9
What characterizes these AI assisting humans scenarios? They concern areas in which AI has not proven its capabilities in a way allowing us to fully rely on it, but enough to make it seem a plausible tool to support the human in charge. More specifically, the tasks in question usually have features which require abilities AI may have become surprisingly good at mimicking but, in fact, cannot tackle in the manner we, as humans, consider appropriate. Therefore, we rule out replacing the human with AI by fully delegating the corresponding tasks to it. However, due to the oftentimes astonishing mimicking abilities of AI, the temptation increasingly grows to make use of AI despite the dissonance between the abilities required to conquer these tasks, on the one hand, and how AI actually functions beyond merely mimicking human outcomes, on the other hand. The same effect can be observed, when certain advantages, such as lowering the cost of fulfilling a certain task or increasing the effectiveness of the underlying decision making process, are equally attributed to the involvement of AI as are risks or drawbacks, such as discriminatory effects, due to its use.
One of the, if not the, most popular coping mechanism for AI lacking some abilities required to fulfill a certain task or for at least reducing the risks and disadvantages resulting from using AI has been constant reassurances towards those affected by it that AI, figuratively speaking, will not be let off the leash. Rather, so the argument goes, the human remains in charge.Footnote 10 Depending on the specific AI application and the tasks, the terms used may differ; all of them, however, collectively uphold the importance of a human monitoring AI when in use.Footnote 11
Judicial decision making is oftentimes referred to as a task for which keeping a human judge in the loop is of particular importance.Footnote 12 Nevertheless, the underlying reasoning of human oversight is, of course, neither unique nor limited to AI involvement in court proceedings. Rather, as elaborated, the intuition “AI assistance over AI delegation” spreads across various areas and disciplines in an equal manner. Examples are seemingly endless, with some of the most notorious ones being practicing medicine,Footnote 13 driving a car,Footnote 14 and using weapons.Footnote 15 In the course of this Article, I will draw on empirical evidence concerning human in the loop-scenarios not only in judicial decision making but rather also in some of these other areas just mentioned, provided that the reference points of the studies are comparable to the ones of court proceedings.
I. Technological Reasoning
So, what is the foundation of favoring having a human in the loop whenever AI involvement is discussed? What are the reasons given across the fields? Broadly speaking, one may identify three types of reasoning. The first one is what I call “technological reasoning”: As any other technology, AI is more or less prone to glitches which could be exploited by ill-intended people. In addition, AI systems may be directly manipulated or hacked. These technological “mishaps” are, in turn, likely to result in wrong output, undeservedly favoring the ill-intended or harming others. Human oversight can, of course, never fully shield innocent bystanders from technological errors of an AI system. However, some scholars claim that if there is human oversight, manipulation, hacking, and glitch exploitation are at least not as dangerous as they would be in case of delegating the task fully to AI.Footnote 16
Of course, technological based reasoning does not only apply to these worst case scenarios of ill-intended people actively trying to “break” or “trick” the AI system to act in their favor.Footnote 17 Rather, reservations towards fully delegating tasks to AI due to technological concerns oftentimes reach further, including also any scenarios in which AI may “merely” act unpredictably on its own from the perspective of a human observer.Footnote 18 In such cases, AI is not manipulated. In fact, it may actually perform as it was set out to do. Due to a lack of understanding of AI’s decision-making process, however, the humans involved do not feel confident to let AI impact the course of action in an unfiltered manner.Footnote 19 The human in the loop is meant to perform a type of “rationality check.”
II. Legal Reasoning
The second type of reasoning in favor of having a human in the loop when using AI is one founded in law, therefore legal reasoning. In many scenarios, scholarship frames the necessity of a human remaining in the loop primarily as a result of current liability regimes.Footnote 20 This does not come as a surprise as AI systems are, at least for now, not considered legal subjects or actors who can be held directly liable for the damages they cause.Footnote 21 Therefore, a human is required to which the AI system in question may be attributable.Footnote 22 Placing a human in the loop for liability-reasons is not, however, primarily aimed at making sure that the person damaged by an AI system will, in fact, get compensated. Rather, it is a mechanism for those developing an AI system to protect themselves by, at least partly, shifting the burden of any mistakes their AI system may make to another human, namely the human in the loop responsible to oversee the AI system in use.Footnote 23 The underlying reasoning corresponds with the overarching concern when it comes to fully automated decision-making: the awareness that the AI tool is not reliable enough to perform the task in question by itself, without any human oversight; in this case with the apposition: at a tolerably low liability risk for the product developer.Footnote 24
When it comes to the state using AI in order to fulfill its sovereign tasks, the reasoning of keeping a human in the loop has some similarities with the one based on liability concerns. It is, however, oftentimes additionally embedded in more general concerns of accountability and democratic legitimacy:Footnote 25 Whenever the state uses AI in the course of fulfilling its sovereign tasks, keeping a state representative in the loop is not merely a mechanism to shield those developing the AI tool in use from liability. Rather, through the lens of the law, decisions of the state are unique insofar as who decides, and also who is the human in the loop in case of “mere” AI assistance, matters for them to qualify as sovereign actions. Generally speaking, in order for the decision to be sovereign, the state needs to be the one to make the decision, to apply the law, to enforce it. The state does so by appointing certain individuals out of the pool of “the people.” In the context of judicial decision-making, which is, of course, one of the core cases of the state using AI in order to fulfill its tasks, it is the individual serving as the judge. Because the idea of appointing an AI system as a judge is widely rejected on various grounds,Footnote 26 only human judges may decide “in the name of the state.” Having an AI system make judicial decisions without the appointed human judge staying in the loop would result in a decision not imputable to the state. The state would “outsource” some of its core tasks to an entity, in this case an AI system, which is not considered a representative of the state.Footnote 27 The outcome, therefore, would not qualify as a “judicial decision.”Footnote 28
With regard to judicial decision-making specifically, further legal reasons in favor of keeping a human in the loop concern the specific function a judge plays within the legal system and the core legal principles guiding judicial decision-making. These are first and foremost judicial independence, transparency of the judicial decision-making process, being treated equally before the law, and the overall right to a fair trial.Footnote 29 This type of legal reasoning is, however, not only referred to when justifying the need for a human judge to remain in the loop. Rather, these legal principles determining the process of judicial decision making are also the relevant benchmarks when it comes to evaluating the admissibility of specific AI tools used in the course of judicial decision-making altogether, thus also in assisting capacity.
III. Psychological Reasoning
Thirdly, there seem to be, though not always made explicitly, psychological reasons for favoring human-in-the-loop solutions over fully delegating a task to AI. The most obvious ones are based on both the person using AI and the person being affected by the use of AI, feeling more at ease with having human oversight. While this is, of course, not a universal truth and, on top of that, something that may change over the course of time, quite a few studies have been conducted confirming that humans do in fact prefer to keep a human in the loop whenever AI is involved. This seems to be the case not only with regard to decisions directly aimed at and affecting the individual in question but rather also reflecting upon designing decision making processes in the abstract. One reason for this intuition is that there is an “identifiable entity,” an aspect relevant not only through the lens of legal reasoning, but rather also for the psychological dimension.Footnote 30 Interestingly enough, it seems to be the case that whenever anything goes wrong, we as a society would by and large prefer to attribute the mistake to a specific human than to have “no one to blame.” It may even go so far as accepting a higher error rate and have a human in the loop to shift the blame to than to having a statistically lower error rate without a human in the loop.Footnote 31
There is, however, another more subtle psychological dimension which could explain why keeping a human in the loop in the context of AI has been a widely popular approach: The human in the loop-scenario could be coined as a case of “making the choice for the option in the middle.” The underlying reasoning is best known from customer behavior when choosing between three versions of the same product with varying levels of functions or sophistication, on the one hand, and prices on the other hand: Studies have shown that most people do not want to buy the cheapest version of the product as it entails having the one with the lowest level of sophistication. At the same time, they are also not willing to purchase the most advanced version of the product as it is the most expensive one, requiring them to spend the maximum amount of money for the product in question. Therefore, they settle on the one in the middle: The price they pay is not as high as the one of the most expensive version of the product. The level of sophistication of the product is also not as low as the one of the least expensive version. The one in the middle is perceived to be the best value, the best product one can get for the least amount of money, the happy medium, so to speak.Footnote 32
This psychological reasoning appears to fit the scenario discussed here, equally: Sticking with tasks being solely performed by humans, having no AI-involvement at all, can be understood as the least sophisticated yet safest option from the perspective of the individual confronted with the choice of using AI to fulfil a task or not. In contrast, fully replacing the human with AI, resulting in AI performing the task without any human involvement, may be perceived as the most sophisticated yet riskiest option in terms of the “price” one pays. It is often considered, figuratively speaking, as coming at too high a cost to delegate tasks to AI without any human oversight. When applying this logic, keeping the human in the loop while still having AI supporting them may very well be considered the “best of both worlds,” the middle ground.Footnote 33
C. The Dangers of Having a Human (Judge) in the Loop: Potential Conceptual Misunderstandings
Having outlined the foundations of favoring having a human in the loop whenever AI is involved as well as the main types of reasons given across the fields, I now shift the focus to the risks of opting for human in the loop constructs. The aim of this part of the Article is to carve out not only general concerns regarding human in the loop scenarios but also showcase why having AI supporting human judges can be particularly harmful.
I. The Good Outweighing the Bad and the Ugly?
When analyzing the scholarship on using AI in court proceedings, be it instead of a human judge or to assist them, it is rarely the case that the involvement of AI is considered to be fully positive without any reservations, risks, or drawbacks. Quite the opposite: Scholars have been pointing out various potentially negative implications of using AI in the course of judicial decision-making, flagging risks such as opaque decisions, a lack of accountability, a violation of judicial independence, and concerns of unfair and discriminatory treatment of those subject to the judicial decisions.Footnote 34 Alongside scholars flagging AI-related hazards, the world witnessed quite a few incidents of “using AI in court gone wrong” which seemingly confirmed many of the concerns; in fact, so much so that these AI use cases have, in turn, been serving as notorious examples to showcase just how risky it is to use AI in the course of law enforcement. Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) is potentially the most famous out of them.Footnote 35
As outlined above, though, limiting the involvement of AI to a “merely” assisting tool in support of the human here—the human judge, who is still in charge of making the decision—is often framed to be “as good as it gets.”Footnote 36 Quite a few scholars claim that while involving AI in judicial decision-making goes hand in hand with potential risks and downsides, using—at least some types of—AI tools while human judges remain in the loop do overall more good than bad. All things considered, AI supporting human judges is therefore considered to be an improvement when compared with the status quo, such as sticking with human judges without any AI involvement in their decision-making process, on the one hand, or fully delegating judicial decision-making to AI, on the other hand.Footnote 37
Some scholars, in fact, seem to consider AI involvement in judicial decision making as mainly positive, so long as the human judges are the ones making the final decision.Footnote 38 In support of this position, scholars are pointing to weaknesses human judges notoriously display. First and foremost, human judges are susceptible to a variety of cognitive biases as well as personal prejudices. Humans—and consequently human judges—are also known to be inconsistent in their decision-making.Footnote 39 Against this backdrop, AI is often considered to be a tool which could potentially help to “de-bias” judges,Footnote 40 and increase their consistency.
II. AI Versus Human (Abilities): Not a Matter of Degree but of Kind
At first glance, approaching the question of AI-assistance in the course of judicial decision-making based on the “Do Benefits Outweigh Drawbacks?” criterion appears intuitive. However, upon closer examination, this framing raises fundamental problems that cast doubt on its plausibility.
A core flaw of this “Do Benefits Outweigh Drawbacks?” approach is that it downplays the fact that humans and AI are operating on the basis of gravely different inner logics which in turn directly affects how to determine the quality of output as well as how they each achieved it.Footnote 41 With AI’s mode of operation being fundamentally different from that of humans, it seems rather questionable from the outset to what extent one may speak of “better or worse” or “more or less.”
This fundamentally different mode of operation of humans and AI manifests itself in numerous ways. For the use of AI in the course of judicial decision-making, the differences in handling natural language are at the center. The law is, as is well known, expressed in natural language, with natural language being the medium, as it is always the case when used by humans. AI is also capable of processing natural language. In the case of generative AI, such systems can even independently generate natural language. However, unlike humans, AI does not use natural language as merely the medium for expressing meaning or communicating information and content. Rather, for AI, natural language represents the end goal.
This distinction is a direct result of how AI processes natural language: Even in the case of Large Language Models (LLMs), its processing is a purely statistical analysis based solely on historical data and patterns recognized therein. With the release of newer models like Claude 3.7 Sonnet and Deepseek r1, the reasoning process of LLMs has become more similar to how the human brain works. This is because these newer models integrate reasoning as a core capability within a single model, therey using a hybrid reasoning approach which combines quick answers with deliberate, step-by-step analysis for complex problems, similar to how humans might approach different tasks. In contrast, the models prior to them used separate models, namely one for quick answers and another one for solving complex problems. However, these technological advances do not change the fact that the way these LLMs “think”—even when in so-called “extended thinking” mode—is fundamentally a different type of intelligence, as they still lack the human elements of consciousness, emotions, and embodied experience. They merely show their work steps which enables the user to read how the model got to the answer it provided (“Chain-of-Thought”), and even this aspect is not fully reliable.Footnote 42 AI thus does not have an understanding of the subject matter expressed through natural language and, consequently, lacks language comprehension in the human sense.Footnote 43 With the rise of ChatGPT, Noam Chomsky, Ian Roberts, and Jeffrey Watumull have poignantly emphasized this aspect by stressing that any equating of AI-based and human language processing is based on a “fundamentally flawed conception of language and knowledge.”Footnote 44 Though one might get a different impression given their human-like output, AI-based systems are incapable of distinguishing “the possible from the impossible” as they are detached from the real physical world. They cannot, by nature, develop an understanding of the “physical and social situations” expressed through natural language.Footnote 45
This lack of genuine language comprehension is already problematic when using natural language in everyday conversational settings. In the context of law and its application, the negative implications of AI’s deficient language comprehension are particularly severe. Judicial legal application is notably not—merely—language processing.Footnote 46 Rather, they follow a specific methodology that has evolved within a legal community and is passed on to the next generation through legal education.Footnote 47 Moreover, the law and its application are inherently dynamic. Structurally, the process of judicial legal application, at least in the case of legal systems which follow the civil law tradition, exhibits a top-down approach: The judge derives from a general norm how the specific case should be adjudicated. In contrast, AI systems, at least in the case of Machine Learning-applications such as LLMs, do not, by their operational nature, start out with a general rule but rather with numerous individual cases. Consequently, AI would require the capability of deriving the general rule bottom-up from all these individual cases. However, this in itself currently presents a nearly insurmountable challenge for AI.Footnote 48 To replicate the process of judicial legal application, AI would furthermore need to evaluate the specific case at hand based on this general rule derived. As of now, AI fails to do so, as well. Instead, AI evaluates the case at hand simply based on patterns it has derived from numerous individual cases and transferred to the specific case, without following legal methodology.Footnote 49 While the output generated by AI may be identical to that of the judge, the path to this output could hardly be more different.
Against this backdrop, it is apparent that judicial legal assessment is far from being a simple algorithmic procedure. Rather, it represents a complex, multi-faceted endeavor that resists comprehensive replication by AI systems. While narrow, standardized, legal procedures may be amenable to automation, the vast majority of judicial legal assessment involves a degree of nuance and complexity that current AI technology is ill-equipped to handle.
A similar picture can be painted with regard to judicial tasks aimed at establishing the facts of the case. AI may demonstrate potential in isolated aspects, particularly in specific tasks such as document authentication or organizational functions. However, fundamental limitations persist, especially in areas requiring deep language understanding, emotional intelligence, and holistic reasoning. Even advanced AI applications, such as deep learning-based language models, fall short of the judicial competencies required for comprehensive fact-finding. The inability to fully grasp communicated content or to interpret nonverbal cues precludes AI from wholly supplanting human judges in this domain.
Given that these limitations of AI are, as elaborated, the result of fundamental differences in cognitive architecture between human and machine intelligence, they are not merely quantitative, that is, a matter of processing power or data volume, but qualitative.Footnote 50 The fact that AI is “thinking” in ways that are fundamentally alien to human reasoning leads to what we might term insurmountable structural incompatibilities with the judicial process.Footnote 51
To conclude, the process of judicial decision-making is inextricably linked to law as a human construct, expressed through natural language and interpreted through culturally-informed hermeneutics. AI can, at best, imitate aspects of this process but falls short of replicating this level of contextual understanding and reflective, context-sensitive, adaptive reasoning. Even in cases where AI systems demonstrate high accuracy rates in predicting judicial outcomes, as seen in some assessments of “predictive justice” tools, this superficial success belies a deeper failure to grasp the underlying legal norms and methodological requirements that guide judicial legal assessment. The inability of AI to comprehend these elements, even at a rudimentary level, underscores the qualitative gap between statistical correlation and true legal reasoning.Footnote 52
III. Focus on the Output?
Some scholars are, however, not convinced that the fundamentally different functioning of AI and humans as such is reason enough to reject the “Do Benefits Outweigh Drawbacks?” criterion when approaching the question of AI-assistance in the course of judicial decision-making. After all, human judges are “black boxes” too. What happens inside a human judge, how they actually make decisions, and for which reasons are not accessible for anyone besides themselves.Footnote 53 Therefore, so the argument goes, the output is the only relevant criterion for evaluating whether AI shall assist a human judge; the path by which AI reaches an output shall, in contrast, be completely disregarded.Footnote 54
By solely focusing on the output and evaluating it from an external perspective, this approach reduces judicial reasoning to a simple performative test: what occurs within the decision making system is irrelevant. Only the final product matters, a position that directly mirrors the conceptual framework of the Turing Test. In the judicial context, this approach would pose a singular evaluative question to determine whether an AI tool meets the threshold set by a human judge: Can the AI-generated output—more specifically, AI-generated judicial reasoning—be distinguished from a human-authored text? Focusing on the output seems to be the overarching approach of quite a few experiments centering LLMs solving legal questions and comparing their output to the one of judges or law students. Most of these studies do so while simultaneously acknowledging that judicial decision-making is more than its output.Footnote 55
This approach, however, fundamentally misapprehends the nature of judicial reasoning. Judicial decision-justification is not merely an exercise in generating text that superficially resembles legal reasoning. It is a dynamic process inherently linked to the decisional moment itself. A judicial explanation is not a post-hoc narrative imposed upon an arbitrary decision, but a critical component of the judicial process that reveals the internal logic of the decision.Footnote 56
Even from a legal realist perspective which acknowledges that judicial reasoning may not perfectly align with the idealized narrative of purely rational decision-making,Footnote 57 the justification process remains constitutive of judicial decision-making; even if it is only through ex post rationalization by the human judge. The requirement of reasoned explanation fundamentally constrains judicial discretion. A legal outcome that cannot be articulated through legally requisite forms of reasoning cannot be legitimately rendered.Footnote 58
The Turing Test, originally conceived as an “imitation game,” captures only the external perspective of reasoning. Judicial explanation, by contrast, demands an internal perspective that current AI systems categorically fail to reproduce. The machine can mimic, but it cannot truly reason.
VI. Human Judges as “Rubber Stamper”: The Inability of Humans to Properly Monitor AI and the Decline of Human Professionals
However, the mere fact that AI has a fundamentally different way of approaching legal reasoning from human judges and that this internal perspective is inherently linked to our current understanding of judicial decision making does not, as such, explain why having AI support human judges can be particularly harmful, even when compared to fully delegating judicial tasks to AI. In fact, quite the opposite seems intuitive, particularly when contrasted with delegating judicial decisions fully to AI: Against the backdrop of AI-tools operating under a whole subset of rules different from the ones of human decision-makers, having a human judge in the loop, making sure that the outcome provided by an AI-system matches what has been intended by the rules of judicial decision-making appears a promising approach. Why is it, then, that having a human in the loop seems to not have the desired effect of a monitoring mechanism but rather leads to an even more opaque, harmful outcome?
The root of the evil lies in the assumption “that human-machine systems represent the best of both worlds and don’t introduce new issues of their own.” Though intuitive, it is wrong and can become dangerous whenever adopted as the premise for a human in the loop construct.Footnote 59 Even if, based on the MABA–MABA (Men Are Better At–Machines Are Better At) approach, one correctly identifies weaknesses of an AI system and corresponding strengths of a human, it does not necessarily follow that they will even each other out when combined. Instead, it becomes even more likely for the enterprise as a whole to fail, with the only dubious difference of having the human in the loop as a scapegoat.Footnote 60
1. Humans Falsely Project their Way of Seeing the World onto AI
The first facet of the explanation of why a hybrid system may rather foster the worst of both worlds instead of combining the best of both worlds concerns how humans perceive AI-tools and how they assess their output. Empirical studies show that humans are prone to project their way of seeing the world onto AI. This, in turn, makes it hard for humans to detect mistakes made by an AI system.Footnote 61 It becomes particularly difficult for humans to identify mistakes of AI tools when AI takes over an aspect of a task humans are not good at performing themselves.Footnote 62 Ironically, these types of tasks are, however, precisely the ones often deemed most suitable for having an AI system support the, not so competent, human.Footnote 63 The human placed in the loop turns into a “rubber-stamper” who does not oversee the decisions made by the AI system.Footnote 64
2. Degradation of Vigilance
The second negative consequence of having humans monitor AI-generated results concerns the ability of humans to be sensitive to unpredictable events and to detect them whenever they may occur over a period of time.Footnote 65 This ability is called “vigilance.”Footnote 66
Empirical evidence suggests that this ability to detect unpredictable events has the tendency to degrade over time particularly due to automation complacency.Footnote 67 As Clark put it: “The more powerful and capable an automation system appears, the greater the vigilance decrement per unit time.” Interestingly enough, “[t]his vigilance decrement effect of automation is most pronounced in environments where automation support is present for only a sub-set of tasks, that is, the subject must also perform other manual tasks in addition to ‘backing up’ or monitoring the automation.”Footnote 68 The reason is that “subjects developed selective ignorance of conflicting information, a bias towards trusting the automation system even in instances where other conflicting data was clearly visible, a ‘looking-but-not-seeing’ phenomenon.”Footnote 69
3. Degradation of the Skill(set)
This degradation in the sphere of the human being placed in the loop to monitor AI performing a task cannot only be observed with regard to the general ability to detect unpredictable events. Rather, the specific skill or skillset of humans is affected in a similar manner; a rather obvious result considering that any human in the loop scenario is characterized by the human no longer using their skills to complete the task in question and instead merely observe an AI attempting to do so.Footnote 70 Empirical studies, once more, seem to confirm this consequence. While these studies mostly focus on “practical skills” like the ones required to fly a plane, drive a car, or diagnose a patient, there is no reason to assume that the skillset necessary to serve as a judge would not be equally affected when judges are reduced to monitoring AI in the course of judicial decision-making.Footnote 71
4. No Good, only the Bad and the Ugly? Supporting AI Gradually Replacing Human Judges
The result of using such supposedly merely supporting AI tools in the course of judicial decision-making is a gradual replacement of human judges, under the guise of its human oversight. The human judge may be perceived as the one making the decision. But: “Humans tend to become reliant on automated decision-making systems. They trust statistical data and begin to give up on their own independent judgment, and become blind to systems errors.”Footnote 72 This overreliance on technology turns human in the loop scenarios to de facto delegating the decision to AI. Any negative consequences of delegating the decision making process as a whole to an AI system meant to be avoided by keeping the human judge in the loop therefore equally manifest.
To make matters worse, there is, as elaborated above, plenty of reason to assume that such de facto delegations to AI are prone to cause even more harm than conscious delegations to AI with a corresponding legal basis. Keeping a human in the loop is oftentimes motivated by the need to create some sense of safety for those affected by the use of AI. In addition, the safety mechanisms for the AI tool are likely to be less rigid in the first scenario as the human oversight is a construct created precisely to compensate for any shortcomings or unpredictable behaviors displayed by the AI tool. Because human oversight, however, does not actually live up to any of these expectations, the sense of security derived is not merely false but opens the floodgate to using AI tools which would never pass as safe, accurate, or trustworthy enough to delegate decisions to.Footnote 73
D. AI Supporting Human Judges: Systematic Overview of AI Use Cases in Court Proceedings
So, what does all of this mean for AI potentially assisting a judge and thereby creating a scenario in which the judge becomes the human in the loop? Given the unique features of judicial decision-making and the various types of AI tools, it seems worthwhile pausing and assessing these tools and their impact on judicial decision-making before recommending how to move forward.
When referring to AI as a supporting tool, assisting the human judges in the course of the decision making process, the range of AI tools forming this category can hardly be overestimated. AI may be used for as little as taking on minor organizational aspects of the judicial decision making processes. It may, however, equally be used for as much as preparing a full draft of the judicial decision, enabling the human judge to simply “sign on” to it to make it their decision.
The scholarship on using AI in support of human judges is similarly heterogeneous to the AI tools themselves, as most scholars simply “pick and choose” some use cases of how AI could assist in the course of judicial decision making instead of providing a holistic analysis. In addition, the normative framework against which these AI use cases are measured varies. Some scholars consult rather vague legal principles, partly obscuring the assessment further by blurring the line between these legal principles and ethical guidelines.Footnote 74 Others, in contrast, conduct more narrow analysis in light of concrete procedural legal norms, such as—parts of—fundamental procedural rights as the right to a fair trial or specific provisions in simple law.
In the following part of the Article, I attempt to categorize the variety of AI tools which may be used in different stages of court proceedings or are aimed to recreate certain features of judges and their decision making process. The purpose of this categorization is not to comprehensively list all AI tools potentially suitable to be used in court proceedings; such an undertaking would be neither possible nor useful in light of the focus of this Article. Rather, the AI use cases chosen here shall be understood as representative examples of graduated degrees of AI-involvement in the judicial decision making processes, with each of them having a rather distinct potential to affect the judicial decision making. Within the different degrees of AI-involvement, the specific judicial task the AI aims to support is the determining factor for categorizing and analyzing AI use cases in court proceedings.
I. Pre-Trial and Post-Decision-Making
The first category of AI tools concerns aspects of the judicial decision making process which are usually not considered to be core tasks of a judge or even tasks to be personally carried out by a judge at all. They are, nevertheless, judicial decision making-adjacent. This concerns tasks carried out in preparation for the actual trial and tasks required after the judge made a decision. I refer to them “pre-trial” and “post-decision-making” use cases of AI supporting tools. Although the spectrum of tools qualified as AI is notoriously broad, some of these workflow-optimizing tools applied at a stage of court proceedings before the judge in charge actually assesses the assigned case may merely be digitization of analog tasks. As will be elaborated below, these types of assisting tools are the least problematic from the perspective of judicial decision-making. Therefore, the blurry line between mere digitalization and using less sophisticated AI tools in the course of court proceedings is innocuous for the purpose of this article.
With regard to the pre-trial stage, the main tasks involve receiving information from potential future parties, organizing it, and drawing certain conclusions from it which are mostly of formal nature at this stage of a court proceeding. AI-based tools of this kind are often described under the collective term “Case Management Systems.”Footnote 75
One specific way of using AI at this stage of court proceedings concerns how the individual seeking a judicial decision is communicating with the court as an institution and, subsequently, the competent judge. With the rise of LLMs such as ChatGPT, AI-based Chatbots are becoming an increasingly popular tool to provide information to prospective parties and communicate with them directly. The advantages of such Chatbots are self-evident: No restrictions to the opening hours of courts as well as an overall low threshold to initiate the first interaction with the judicial system, usually without any costs for the individual making use of the Chatbot. In addition, if the Chatbot is provided by the courts, the information fed into the system can be controlled more easily, resulting in generally more trustworthy and tailored outputs for the users, at least when compared to general search engines such as Google or non-specific LLM-based chatbots like ChatGPT.Footnote 76
When it comes to the post-decision making phase of court proceedings, the ways of using AI to support the judge and courtroom staff greatly depends on how the legal system one focuses on is designed. Broadly speaking, AI can assist when it comes to how a decision is communicated to those affected by it as well as the legal community as a whole. AI could, for instance, format mere text provided by the judge and turn it into a judgement by inserting it into a template. AI may also adjust the elements of the template to fit the features of decision in the specific case such as the date of the decision making, the mailing address of the party and the name or identification number of the competent judge. Furthermore, AI could once again be utilized in the form of an LLM-based chatbot, providing additional information to the parties upon request or answering their questions about the content of the decision in an easier to digest question answer format.Footnote 77 Finally, in legal systems which provide access to judicial decisions to the public only after anonymizing them, AI tools are offering a promising way to support this anonymization process.Footnote 78
II. Organizational Tools Supporting the Decision-Making Process as Such
Organizational tasks are not, however, restricted to the period of time before and after the judge makes the decision but rather are features of the judicial decision making process as a whole. Therefore, AI tools aimed at tackling these organizational aspects of judges may be incorporated into the process of judicial decision-making as such in an equal manner.
One may think of AI tools retrieving data from documents in the course of a court proceeding, for instance by using Optical Character Recognition, or extracting information from data bases. Furthermore, AI tools could filter this information and organize it according to a criterion defined by the deciding judge. In these cases, the AI tools used are merely providing information that already exist and are accessible to the human in charge of making the judicial decision; AI may, however, be able to collect them more efficiently and provide the relevant information in a more intuitive format for the judge.Footnote 79
A rather specific, yet highly relevant AI-based tool potentially assisting judges is speech-to-text software. As its name already suggests, such AI tools convert spoken language into written text. The main field of application is, of course, the interaction between the judge and the parties in the course of oral proceedings. Nevertheless, such AI tools may also be used to transcribe any other spoken language relevant to the judge in the course of judicial decision-making. The main advantage of such an AI tool, especially when compared to a transcription of oral protocols by a third person, is that anyone present in the court room can simultaneously review the transcript of the hearing and fix potential errors made by the AI system with the approval of the rest of the people involved. This makes the process of recording court hearings and turning it into a written protocol not only significantly faster but has the potential to also increase its accuracy.Footnote 80
The common feature of all these AI-based support mechanisms is that they “merely” aim at simplifying both internal and external processes constituting the judicial decision making processes and thereby making the court proceeding overall more efficient which, in turn, benefits the judge, the parties, and the legal community as a whole. AI tools of the kind as just outlined do so, however, without autonomously contributing to the court proceedings and their outcomes in any substantive manner. Therefore, they may, at most, indirectly impact how the judge decides a case.
III. Summarization of Information
The next degree of AI-involvement in the judicial decision-making process is AI-based summarization of information which are either requested by or provided to the judge.Footnote 81 In the context of judicial decision-making, such AI-based summarization may assist the judge in the process of establishing the facts of the case as well as when it comes to legally assessing these facts. With regard to the process of establishing the facts, AI could, for example, summarize documents submitted by the parties as evidence or expert opinions, sparing the judge from reading them in full length. With regard to the process of legally assessing the established facts, AI may be used to summarize case law or relevant scholarly contributions like law review articles.Footnote 82
By moving from AI-based organizational tools to AI-based summarization tools, we are entering an area of activity in which AI is increasingly taking on tasks with a potentially substantive impact on the judicial decision. Contrary to AI-based organization of information, AI-based summarization is not merely a case of rearranging information in the way they are presented, such as regarding the format of their output provided to the deciding judge. Rather, AI-based summarization is a case of processing information in an altering fashion. This is a necessary consequence of the fact that summaries per definition do not include everything in the original document or text but only an aggregated version of it.Footnote 83 Therefore, when it comes to AI-generated summarization, the AI-system is the one deciding—at least to some degree—which parts of the information provided are of relevance for the judge and their decision-making process. The degree of information altering by the AI tool is, nevertheless, comparatively low in the case of AI-based summarization. It is limited to picking parts of available information and potentially rephrasing them if required to provide a coherent summary. The summarizing AI tool does not, however, add any new information which the initial text or document did not contain.
IV. Detection of Certain Features or Conditions
This last aspect sets apart the types of AI assistance outlined so far from AI systems aimed at detecting certain features or conditions within data to provide new insights. In contrast to AI tools merely organizing or summarizing existing information, AI-based feature detection is not only targeted at making judicial tasks less time-consuming for the deciding judge. Instead, it is first and foremost a way of generating information previously unknown to the judge. AI-assistance of this kind directs the focus of the human judge to aspects of data not, easily, detectable for humans without AI-support.
In the course of establishing the facts, such AI tools may be used to detect whether a witness or a party lies during their testimony. Humans are notoriously weak at determining whether someone else is telling the truth or not. With AI being able to pick up on involuntary external clues of internal emotions such as micro expressions occurring within a fraction of a second, it may have the potential to compensate some of the deficits human judges display when it comes to emotion recognition.Footnote 84
AI tools aimed at detecting certain features or conditions within data to provide new insights may further be consulted to generate predictions in light of the file of the specific case.Footnote 85 Such predications could play a similar role to statements of expert witness. In both scenarios, the human judge requires additional non-legal knowledge they do not possess to establish the facts.Footnote 86 Alternatively, a human judge may consult an AI tool to receive a second opinion or to contrast it with the result of their own assessment.
With regard to legally assessing the established facts, AI tools may assist the judge in a feature-detection manner by identifying similar cases to the case provided.Footnote 87 The structurally identical approach may also be chosen to have AI assist the judge to track down pertinent case law or pieces of legal scholarship.
V. Answering Case-Specific Questions
These AI-based substantive assessments with regard to the specific case lays the groundwork for seamlessly moving to the next degree of AI involvement in the judicial decision-making processes, namely AI tools assisting the judge by answering concrete questions arising from the case in question.
When deciding a case, judges do not merely add one excerpt of case law or legal scholarship after the other. Rather, they are required to provide a legal assessment of the specific pending case. Therefore, judges may not only pose abstract legal questions to an AI tool but also aim for tailoring them to fit the characteristics of the specific case. Instead of requesting an overview of the current legal situation of, for example, strict liability from the AI system, the judge deciding a case of strict product liability might ask the AI system: “Assuming the conditions x, y, and z, is producer A liable for the harm to person B which was caused when using product c”?
When asked a specific case related question, the AI tool would conduct a search in a similar manner as outlined above. Instead of the results of its search being the output of the AI tool, though, it could additionally draft a response to the question asked on the basis of these results and communicate it to the human judge using natural language. Therefore, LLMs are, of course, particularly fitting for this type of AI assistance.Footnote 88
VI. Making Drafts and Recommendations
The highest degree of involving AI in the course of judicial decision making without fully delegating the decision-making process to the AI tool, resulting in what is usually called a “fully automated decision,” is having an AI tool generate a draft of a decision.Footnote 89 The human in charge of making the judicial decision may use such a draft as a basis for their own decision. Alternatively, the judge may also fully adopt the draft and declare it as their own decision.
Instead of requesting only one single draft from the AI tool in use, it could also produce multiple drafts it considers suitable for deciding the case at hand, leaving it to the judge to choose the draft they deem fitting best in light of the specific circumstances of the case. The judge may also decide that none of the drafts provided by the AI system are, in fact, in line with their take on the case, resulting in rejecting all of the drafts. An AI tool providing multiple drafts for one case to the judge may further support the judge by recommending one draft out of the proposed ones as, for example, particularly likely to not be overturned by an appeal court.Footnote 90 It seems also feasible that AI could add additional explanatory notes to each of the drafts, which are not part of the draft as such but are making it potentially easier for the judge to choose one draft over the others.
The transition from AI “merely” answering questions posed by the judge in light of a specific case to AI drafting a decision which is then proposed to the deciding judge is, of course, once again fluent. An AI system may, for instance, merely assist the judge in drafting an opinion without fully providing all of the parts on its own; such a scenario could qualify as both, answering a case-specific question or providing a—part of a—draft of a judicial decision.Footnote 91 The same holds true for the line between such an AI-generated draft of a judicial decision on the one hand and an AI-generated judicial decision on the other hand.
E. Protecting the Good against the Bad and the Ugly? Concluding Thoughts
Having outlined different degrees of potential AI-involvement in the judicial decision-making processes and some specific use cases of AI tools exemplifying them, I conclude by turning to the implications of the findings for how to approach AI tools assisting a judge.
It seems neither required nor useful to collectively declare AI tools dreadful for the process of judicial decision making. Instead, it is worth digging deeper and reevaluating what we actually mean when we are speaking of an AI tool supporting the human judge. Under which circumstances does the human judge remain in the loop in a manner which is in line with the concerns outlined in this Article?
I. Division of Labor Instead of “Co-working” with AI as Glimmers of Hope
The concept of human in the loop-scenarios as it currently prevails in scholarship is, at least partly, misleading. Why? Usually, they also include scenarios in which the focus lies on a broad task or an area in which AI may be used to cover some aspects or subtasks of it. In the case of the judge, the task, understood in this broad sense, is judicial decision-making as a whole. Any involvement of AI in this process in an assisting manner would therefore qualify as “keeping the human judge in the loop.”Footnote 92
However, not all of these scenarios are necessarily cases in which the use of AI may be understood as “only” to support the human judge by assisting them in the course of the decision-making process, as discussed in this Article. With regard to the concerns raised here, it makes a big difference whether a general, broad task like judicial decision making is divided into various smaller tasks amongst which some remain for the human judge to complete whilst others are delegated to AI, or whether a more specific task is completed by the human judge and an AI system coworking on it.
Only the latter is a case of an actually human in the loop scenario as discussed here. The former, in contrast, is a case of merely assessing a bundle of smaller tasks, previously assigned as a whole to the human judge, in light of a new entity entering the scene as a potential agent for fulfilling these tasks. Against the backdrop of the abilities of humans on the one hand and the abilities of AI on the other hand, some tasks out of the aforementioned bundle may simply be reassigned, namely to AI instead of the human. Particularly fitting are, of course, those judicial tasks that do not intrinsically require a human judge to start with given the non-specific skill set necessary to perform it. This concerns especially judicial tasks discussed as the first and second category of AI tools in Section D, namely pre-trial and post-decision-making judicial tasks as well organizational tasks throughout the court proceeding. It goes without saying that in order to reassign some of the tasks to AI, a thorough assessment of whether the specific AI tool in question is, in fact, able to perform a certain task in the manner required, is always needed. However, once confirmed, there are no additional hazards resulting from a human and an AI-system working side by side and dividing the labor of a broader task between them.
II. What about the Actual Human in the Loop-Scenarios?
Reevaluating AI-based tools used in the course of judicial decision-making through the lens of division of labor rather than human and AI co-working with the human overseeing the AI answers some of the challenges outlined above. However, there are certain judicial tasks as well as types of AI assistance leading to an allocation of tasks between the human and the AI tool which can hardly be characterized as “division of labor.” This is because of how greatly intertwined the contributions of the human judge on the one hand and the contributions of the AI on the other hand to the task in question are.
How should we approach these scenarios where human and AI activities are so intricately interwoven that individual contributions cannot be meaningfully disaggregated? One compelling suggestion in scholarship is a “shift from human oversight to institutional oversight for regulating government algorithms.” Such institutional oversight would require a “written justification of its decision to adopt an algorithm in high-stakes decisions,” including “evidence that any proposed forms of human oversight are supported by empirical evidence.” Additionally, these written justifications must be made publicly available in order to receive public review and approval.Footnote 93
This mechanism, proposed by Green, certainly is a conceptually promising mechanism to challenge any type of “blanket rules” allowing state actors to make use of algorithms as long as humans remain in the loop, providing oversight. And yet, it seems that some scenarios in which this approach will lead to odd and somewhat artificial justification attempts in hope to bypass restrictions to incorporate AI into the judicial decision making process will nevertheless remain. Therefore, while Green’s approach of requiring institutional oversight makes it undoubtedly much harder and less likely to sneak in the use of AI-support through the backdoor, the backdoor nevertheless remains.
Against this backdrop, we should take these remaining human in the loop scenarios for what they are, even if it results in prohibiting AI in certain aspects of the judicial decision-making process altogether. We should acknowledge the fundamentally different approaches of humans on the one hand and AI-tools on the other hand when it comes to certain aspects of judicial decision making. And if we are not content with the outcome, we should invest our effort in proposing changes to how we conceptualize judges and their decision-making processes. We should not, however, risk blurring the lines in hope for short-sighted relief in workload, and presumed increase in efficiency. Because once lines become blurry, we cannot expect anyone to see a boundary when we cross it, least of all a human—judge—placed “in-the-loop.”
Acknowledgements
The author declares none.
Competing Interests
The author declares none.
Funding Statement
No specific funding has been declared in relation to this Article.