Hostname: page-component-74d7c59bfc-56bg9 Total loading time: 0 Render date: 2026-02-10T10:22:49.979Z Has data issue: false hasContentIssue false

Upskilling human actors against AI automation bias in strategic decision making on the resort to force

Published online by Cambridge University Press:  27 January 2026

Yee-Kuang Heng*
Affiliation:
Graduate School of Public Policy and Director of Security Studies Unit, Institute for Future Initiatives, The University of Tokyo, Tokyo, Japan
Rights & Permissions [Opens in a new window]

Abstract

The use of artificial intelligence-driven decision-support systems (AI DSS) to assist human calculations on the resort to military force has raised concerns that automation bias may displace human judgments. Such fears are compounded by the complexities and pathologies of organisational decision making. Discussions of AI often revolve around better training AI models with more copious amounts of technical data, but this article poses research questions that shift the focus to a human-centric and institutional approach. How can governments better train human decision makers and restructure institutional settings within which humans operate to minimise the risks of automation bias and deskilling? This article begins by exploring how governments have invested in AI literacy education and capacity-building. Second, it demonstrates how the need to question groupthink and challenge assumptions in decision making becomes even more relevant as the use of AI DSS become more prevalent. Third, human decision makers operate within institutional structures with internal audit trails and organisational cultures, inter-agency networks and intelligence-sharing partnerships that may mitigate the risks of human deskilling. Bolstering these three inter-locking, mutually reinforcing elements of education, challenge functions and institutions offers some avenues for managing automation bias in decisions on the resort to force.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press.

1. IntroductionFootnote 1

The hitherto “neglected prospect of AI-driven systems influencing state-level decision making on the resort to force” (Erskine & Miller, Reference Erskine and Miller2024, p. 135) has received much-overdue attention. Yet, more research needs to be done on such a broad and wide-ranging topic. Decision makers who lack literacy in the artificial intelligence (AI) tools that assist them could make momentous choices on the initiation of war without fully grasping the AI model’s limitations. The institutional structures that hitherto have supported human decision makers may also succumb to an AI version of “groupthink,” whereby misplaced faith is put in AI-generated decision recommendations. This article therefore seeks to engage with two complications that arise. The first concerns automation bias and de-skilling of human actors (Skitka, Mosier & Burdick, Reference Skitka, Mosier and Burdick1999). The second regards the impacts of AI upon pathologies of organisational decision making and implementation in complex social settings, such as groupthink (Lawrence, Reference Lawrence1991; Younis et al., Reference Younis, Marwa and Azzam2024). Given how widely AI technologies are being developed and deployed, this article focuses its analysis on specific contexts where AI decision-support systems (DSS) are used in human–machine teaming to support human decision making on intelligence analysis and the resort to force.

Discussions of AI often revolve around training AI models with larger and more copious amounts of data (Ananthaswamy, Reference Ananthaswamy2023), yet more is not always better. This article argues that a recurrent long-standing issue of training human actors and understanding how their institutional settings affect decision making now requires renewed and urgent attention. Indeed, empirical studies demonstrate that the public over-trusts AI recommendations about whether to kill (Holbrook et al., Reference Holbrook, Holman, Clingo and Wagner2024). Other studies suggest that humans with lower levels of competence or familiarity with AI are more likely to have over-confidence in machines (Horowitz & Kahn, Reference Horowitz and Kahn2024). To counter this slippery downward slope of de-skilling human actors, this article argues that up-skilling may help fend off some – but obviously not all – concerns over automation bias and diminishing human agency. Clearly, this is no silver bullet or panacea to a complex problem that manifests itself in unpredictable ways in real-world strategic decision making that often involves psychological failings of human actors (such as availability bias), not to mention broader institutional constraints that are well known (e.g., stubborn bureaucratic silos). However, before human–machine teams utilise AI DSS in real-world crisis situations that are stressful and extremely time-constrained, more resilience and safeguards (for both individual human actors and institutions) must be embedded at the pre-decision stage.

As with all complex contested concepts, there is no singular definition of AI. Generally speaking, “artificial intelligence refers to computer systems that can perform complex tasks normally done by human reasoning, decision making, creating, etc.” (NASA, 2024). Myriad varieties of AI technologies have been developed. Large language models (LLMs) refer to “a type of foundation model that is trained on vast amounts of text to carry out natural language processing tasks” (Parliamentary Office of Science and Technology, 2024). Meanwhile, generative AI is defined as an “AI model that generates text, images, audio, video or other media in response to user prompts” (Parliamentary Office of Science and Technology, 2024). It uses machine learning (ML) to create new data that has characteristics similar to the data it was trained on. ML refers to “a type of AI that allows a system to learn and improve from examples without all its instructions being explicitly programmed” (Parliamentary Office of Science and Technology, 2024).

In the field of strategic studies, attention has predominantly focused on ML employed in lethal autonomous weapons systems (LAWS) that are programmed to engage targets on their own on the battlefield at a tactical level (Christie et al., Reference Christie, Ertan, Adomaitis and Klaus2024; Ferl, Reference Ferl2023). However, this article is primarily concerned with ML algorithms deployed in AI DSS that are used in human–machine teaming to support intelligence analysts and strategic decision makers at the political level. AI DSS are typically used to recommend a decision. Walker and Smeddle (Reference Walker and Smeddle2024, p. 7) observe that “human–machine decision making involves a combination of human input and machine analysis. Humans provide the context, goals, and constraints, while machines process data, identify patterns, and generate recommendations.” AI DSS are defined as “computerised tools that use AI software to display, synthesise and/or analyse data and in some cases make recommendations – even predictions – in order to aid human decision making in war” (Stewart & Hinds, Reference Stewart and Hinds2023). Faster decision cycles and increased tempo are seen as benefits that can give an edge over an adversary, despite the dangers of automation bias. Defence analysts may prioritise the speed of AI DSS to filter massive amounts of data, especially in high-stress conditions, even at the expense of accuracy. AI DSS may help to fill in gaps and assist the human analyst in avoiding a sense of being overwhelmed in crisis situations. Although AI DSS are sometimes argued to risk escalation in the resort to force (Zala, Reference Zala2024), paradoxically, such assistance can also help to calm things down and potentially de-escalate the situation.

AI DSS are usually deployed for targeting decisions in tactical contexts, such as in the facial recognition of a target person or suggesting where and when to place a precision-guided munition. Well-known examples include Israel’s alleged use of the Lavender system in Gaza and Palantir’s MetaConstellation system in Ukraine (Ferey & de Roucy-rochegonde, Reference Ferey and de Roucy-rochegonde2024). From a tactical warfighting perspective, the US Army’s 18th Airborne Corps use of the AI-driven Maven Smart System to enhance its intelligence support to operations in Ukraine is a good example of “user/warfighter-driven innovation” in a military context (Probasco et al., Reference Probasco2024, p. 1). Even pacifist states such as Japan that have not fought a war in decades are actively considering the impact of AI on their military operations. Anduril Industries and Japan’s Sumisho Aero-Systems are now cooperating on Anduril’s Lattice AI-enabled software platform to integrate third-party assets and data sources to enhance situational awareness and decision advantage for Japan’s Maritime Self-Defense Force (MSDF) at operational levels (Anduril, 2024). Japan’s ongoing experimentation with Anduril and the US experience with Maven Smart demonstrates how militaries are already working to better integrate AI DSS into their tactical operations. Another example is the People’s Liberation Army’s (PLA) Strategic Support Force until it was replaced in 2024. How exactly China seeks to integrate AI into its military remains opaque, however (Nelson & Epstein, Reference Nelson and Epstein2022). Russia too has prioritised “improving command, control, communication and decision-making with AI” (Zysk, Reference Zysk2023), although its efforts to provide literacy for intelligence analysts working with AI DSS also remain unclear.

Compared to its neighbours, Japan’s Self-Defense Forces are more forthcoming in explaining a focus on literacy education for service personnel. The Air Self-Defense Force (ASDF) noted the importance of “cultivating cyber human resources” and “literacy education” (Ministry of Defense, 2024b, p. 69). Researchers at the MSDF Command and Staff College’s Future Warfare Research Center also called for an increase in personnel well versed in AI analysis as machine-interfacing in warfare accelerates (Fukuyama, Reference Fukuyama2022). Japan’s Self-Defense Forces clearly understand the necessity for better AI literacy as personnel operate in human–machine teams using AI DSS. In July 2024, Japan’s Ministry of Defense issued its first ever Basic Policy on AI where the technology will be deployed in “assisting commanders in making decisions” (Ministry of Defense, 2024a, p.8). The document stresses that because AI is accompanied by risks such as errors and biases, Japan will adopt a “human-centred” approach, and that “there is a need to ensure human involvement, as what AI does is assist human judgment” (Ministry of Defense, 2024a, pp. 7–8). Compared to other military powers such as the US and the UK, Japan’s document was issued relatively late in the game. However, Japan’s document, like those issued by other governments, remains relatively silent regarding the political decision-making level on resort to force that this article addresses.

AI DSS can be deployed at multiple command levels, from the higher political level down to the strategic, operational and tactical levels (ICRC &Geneva Academy, 2024). This article considers the implications of AI DSS at the political level, rather than at the level of commanders deployed in operational theatres. The focal point of analysis is scenarios where strategic policymakers make decisions on the resort to force by relying on recommendations generated by intelligence analysts working in human–machine teams with AI DSS tools.

With the growing prominence of AI in many levels of society, war and government, more attention is being paid to education in AI literacy for policymakers (Hughes et al., Reference Hughes, Carter, Harland and Babuta2024). Governments have also invested in building futures literacy (Washington, Reference Washington2022) throughout their civil services. The preceding discussion prompts the first research question: In terms of upskilling, can education campaigns to enhance AI literacy in human–machine teams help combat automation bias arising from the use of AI DSS? The organisational pathologies of decision making also demand attention. Individual human decision makers operate within “military decision-making eco-systems” that are “sociotechnical systems” comprising “interactions among people applying technologies to enact roles within mission-oriented collectives” (Osoba, Reference Osoba2024, pp. 237–38). On the one hand, there are fears that AI might affect the dynamics of groupthink and cognitive bias of decision makers in a crisis context (Chivvis & Kavanagh, Reference Chivvis and Kavanagh2024). On the other hand, Holmes and Wheeler (Reference Holmes and Wheeler2024) adopt a more positive approach that AI can enhance decision making by mitigating misperceptions and by building empathy with an adversary’s fears and anxieties. The social and institutional settings of decision making also shapes the degree to which groupthink takes hold. For instance, leadership styles and psychological dispositions (directive or facilitative) as well as organisational norms may favour compliance (Grube & Killick, Reference Grube and Killick2023).

The need to guard against groupthink also extends to inter-agency collaboration. An Australian Government (2021, p. 4) handbook on inter-agency leadership advises that to be conscious of groupthink, “you need to create a team environment centred on constructive consultation and contesting ideas and assumptions to ensure the team has looked at an issue from a range of angles.” These initiatives amount to what this article terms a “challenge function” within an organisational culture that routinely questions assumptions. As human–machine teams are likely to become increasingly prevalent in national security planning, this prompts the second question: How might institutions put in place challenge mechanisms to minimise the risk that humans working with AI DSS fall victim to groupthink and automation bias? To address these questions, the three sections below evaluate how education, challenge functions, and institutions can perform inter-locking and mutually reinforcing functions to mitigate the risks of automation bias and human deskilling.

2. Education

Researchers remind us that “building an AI-powered society that benefits all requires each of us to become literate about AI: to know when AI is being used and evaluate the benefits and limitations of it in a particular use case that might impact us” (Firth-Butterfield, Toplic, Anthony & Reid, Reference Firth-Butterfield, Toplic, Anthony and Reid2022). Although multiple definitions exist of AI literacy as an “emerging concept” (Ng, Leung, Chu & Qiao, Reference Ng, Leung, Chu and Qiao2021), it has been defined as “human proficiency in different subject areas of AI that enable purposeful, efficient, and ethical usage of AI technologies” (Pinski & Benlian, Reference Pinski and Benlian2024, p. 1). AI literacy, as used in this article, is defined simply as the ability to understand the basic techniques and concepts behind AI in different products and service (Ng et al., Reference Ng, Leung, Chu and Qiao2021).

Existing literature has highlighted the importance of training and education for overlooked segments of the AI supply chain in military operations, for instance AI developers (Chiodo, Müller & Sienknecht, Reference Chiodo, Müller and Sienknecht2024). Hands-on training to enhance the familiarity of end-users (such as policymakers) with AI models and their flaws and biases is also being delivered. One good example is the Congressional Boot Camp on AI targeted at staffers from both the United States (US) House and the Senate, run by Stanford University’s Institute for Human-Centred AI (HAI).Footnote 2 Indeed, given the high stakes in the use of AI DSS, human expertise should certainly be elevated, not relegated (Davis, Reference Davis2024).

Different components of AI literacy will be relevant to different AI user groups (experts versus non-experts, developers versus lay/personal users). For instance, the needs of a military or civilian intelligence analyst tasked with parsing through AI-enabled raw intelligence will be different from the needs of a strategic decision maker. It is therefore imperative to understand the types of AI literacy and specific buckets of education that are to be supplied to different types of audiences. The US Department of Defense (DOD) AI Education Strategy has six archetypes of AI learning needs: those who lead AI, drive AI, create AI, facilitate AI, embed AI, or employ AI as end users (Department of Defense (US), 2020). DOD AI training involves a mix of open online courses, virtual live small group sessions, virtual classrooms and a capstone on-the-job project.

A broad survey of academic literature shows that AI user groups that have received AI literacy training includes journalists, construction workers, medical doctors, librarians and mechanical engineers (Pinski & Benlian, Reference Pinski and Benlian2024). There is scarce mention in the academic literature so far of training intelligence analysts and decision makers working in human–machine teams with AI DSS. What sort of AI literacy might decision makers at the political level require and do they actually know what they are still lacking and where they require more training?Footnote 3 Many senior leaders lack an understanding of AI simply because of a shortage of time and bandwidth to undertake training courses that appear less pressing than other daily operational tasks. Surveys suggest that up to “58% of executives have never participated in AI training or taken an AI course, despite acknowledging its growing importance” (John, Reference John2024). Leaders may also experience discomfort with the need for continual learning about AI and its rapid advances (Pinski et al., Reference Pinski and Benlian2024).

This specific group of AI users is particularly relevant to our purposes, yet their needs and skill requirements as part of AI literacy education remain less well understood. For instance, where issues of judgment are informed by affect and morality, human moral judgment becomes imperative in human-teams working with AI DSS: “Machine intelligence is very adept at prediction but not judgment” (Basuchoudhary, Reference Basuchoudhary2025, p. 75). Recognising the dangers of “false confidence” (Renic, Reference Renic2024, p. 247) in recommendations generated by AI DSS should also be part and parcel of any training programme. Empirical studies show worryingly that the public over-trusts machines (Holbrook et al., Reference Holbrook, Holman, Clingo and Wagner2024). Given that the pitfalls accompanying human reliance on machines are already identified in the literature, educating decision makers on these potential downsides is imperative.

To instil healthier levels of scepticism towards machine-generated recommendations, desirable AI proficiency components should therefore include knowledge, awareness, hands-on experience of AI DSS, and skills such as the capacity to choose the most suitable action from a range of options. Participants should also be taught that AI DSS can analyse patterns in intelligence data to predict future scenarios and recommend courses of action, but it may overlook unique under-represented cases in its training data sets. This means understanding the centrality of training data in ML algorithms and recognising that a biased or incomplete dataset can affect the recommendations generated by AI DSS. For instance, at the tactical level, the US Maven Smart system and Ukrainian Kropyva can analyse massive volumes of data and video footage from drones using ML algorithms to generate targeting suggestions (Nadibaidze, Bode & Zhang, Reference Nadibaidze, Bode and Zhang2024). At the strategic decision making level, policymakers depending on AI DSS need to be cognisant that impressive-looking intelligence analysis and datasets may be incomplete or miss out on vital clues regarding an enemy’s ultimate intention.

To preserve meaningful human agency in the process of human–machine teaming with AI DSS, any literacy programme should therefore focus on understanding the technical properties of an AI model, such as its precision, recall, and accuracy (Knack et al., Reference Knack, Carter and Babuta2022). Training and early exposure to the AI DSS that human analysts will be interacting with should also be a priority area: “Analysts do not respond and commit effort to understand an output from an ML model alone but also take into consideration other factors such as their experience of the model’s prior performance” (Knack et al., Reference Knack, Carter and Babuta2022). Already, many countries, including the US, have adopted such training approaches.

There is also a need to educate human actors in human–machine teams about how AI DSS may affect groupthink dynamics (Chivvis & Kavanagh, Reference Chivvis and Kavanagh2024). As Nadibaidze et al. (Reference Nadibaidze, Bode and Zhang2024, p. 43) suggest in more concrete terms, this means “providing human operators with protocols, training, and guidance that allow for them to exercise (1) critical assessments of systems’ outputs and (2) assessments of a decision’s legal, strategic, and humanitarian impacts.” A recent analysis of eSports athletes experienced in machine–human interaction revealed that another important dimension of any education programme is “cross-training, which is training in other teammate roles, to improve perspective taking and coordination” (Lancaster et al., Reference Lancaster, Duan, Mallick and McNeese2025, p. 1). The preceding literature has already identified desirable contents that should be contained in any training programme focused on AI DSS and human–machine teaming. How such contents are best delivered further deserves attention.

Interactive methods of content delivery such as games and experimental role-play exercises can teach policymakers to think about potential impacts of AI (Avin et al., Reference Avin, Gruetzemacher and Fox2020). In the field of strategy and war, role playing is routinely utilised to test scenarios and performance of military forces through wargaming. Role-playing games are especially applicable for individuals engaged in high-stakes situations, such as resort-to-force decision making. Role-playing games such as Intelligence Rising involve participants assuming the roles of strategic decision makers in China and US (such as the president and secretary of defence) to explore how states decide on using AI capabilities in domains such as military and social control (Avin et al., Reference Avin, Gruetzemacher and Fox2020). Since such game templates already exist, designers could tweak specific scenarios to focus on resort-to-force decision making. The exercise content and outcomes must above all flag to participants the dangers of automation bias in human–machine teams working with recommendations generated by AI DSS.

Creative writing and fiction are another promising pedagogical tool utilized in training programmes targeted at policymakers (Frauenfelder, Reference Frauenfelder2023; Liveley et al., Reference Liveley, Slocombe and Spiers2021). In fact, FicInt, also known as “fictional intelligence” or “useful fiction,” has become widely used in military organisations. For instance, the US Marine Corps Special Operations Command document titled MARSOF 2030: A strategic vision for the future contained several vignettes of “imagining the concepts in action” (US Marine Corps, 2018, p. 23). These included future fictional scenarios such as Marine special forces deploying in 2030 to the Middle East to corroborate digital intelligence collected on a high-value target with other inter-agency representatives using an array of human and other signal intercepts (US Marine Corps, 2018). Such fictional stories could depict the use of AI DSS to support strategic decision making on the resort to force.

The UK’s Defence Science and Technology Laboratory (Dstl) has even commissioned sci-fi writers PW Singer and August Cole to create stories about future threat scenarios because of the pedagogical advantage that stories have in conveying content: “lessons about potential future threats are more impactful when woven into stories as opposed to more traditional ways of learning” (Defence Science and Technology Laboratory (Dstl), 2023a). These stories encourage participants to rebut assumptions and “spark discussion and creative insight which might challenge established thought” (Dstl, 2023b). One story titled “The AI of Beresford Bridge” demonstrates how over-eager military commanders experimenting with AI systems broke the rules of warfare.

Military educational institutions already utilize creative writing formats to transmit lessons on decision making. The Future Warfare Writing Program organised by the US Army University Press notes that “fiction allows us to imagine the details of reality-as-it-might-happen in order to understand potential consequences of decisions that we need, or might need, to make” (Army Press, Reference Army Press2016). Interactive pedagogical tools such as role play or creative writing are utilised by military and defence institutions to emphasise a need to question established concepts and assumptions. Writers could thus be commissioned to curate specific stories or design role-playing games about human–machine teaming with AI DSS, highlighting the dangers of automation bias and human deskilling in decisions on the resort to force.

3. Challenging machine-generated recommendations

Educating human actors about AI literacy is a necessary but not sufficient condition for mitigating automation bias and groupthink. Human decision makers must also internalise the obligation to challenge assumptions and recommendations generated by AI DSS. Rather than simply viewing ML as a tool for decision making, proper human–machine teaming could treat AI DSS output (in an ideal context) as equivalent to another analyst. Ideally such a system “could be a bit feisty or argumentative” to prompt human analysts to consider their own bias or misplaced assumptions (Knack et al., Reference Knack, Carter and Babuta2022, p. 33). At the same time, however, the downside is that anthropomorphising machines can also foster higher degrees of misplaced confidence in AI DSS outputs, even to the point where humans accept judgements they would otherwise think are wrong (Erskine, Reference Erskine and Miller2024, p. 180–81). This is why an Alan Turing Institute report on AI-enabled intelligence and strategic decision making recommended that “short, optional expert briefings should be offered immediately prior to high-stakes national security decision-making sessions where AI-enriched intelligence underpins load-bearing decisions” (Hughes et al., Reference Hughes, Carter, Harland and Babuta2024).

The American and British experiences of questioning assumptions behind intelligence analysis contain important lessons for AI DSS and decision making on the resort to force. Both are Five Eyes intelligence powers with world-leading intelligence collection and analysis capabilities. In the US, the Pentagon’s Office of Net Assessment (ONA) not only specialises in “the comparative analysis of military, technological, political, economic, and other factors governing the relative military capability of nations” (Department of Defense (US), 2009): crucially, it is also tasked to question assumptions.

A well-known historical example is how ONA reached conclusions that differed from those provided by the Central Intelligence Agency (CIA), which had inaccurately assessed Soviet defence spending as a proportion of the overall Soviet economy during the Cold War (Desch, Reference Desch2014). Competition over the reliability of intelligence estimates and analyses was in fact commonplace (Central Intelligence Agency (CIA), 2007). Another famous case is the Team B exercise in 1976, which involved calling in outside experts to question in-house CIA assessments of Soviet military capabilities. The Team B report led by Harvard academic Richard Pipes eventually resulted in the CIA revising its own estimates. Yet, while intelligence organisations focus on the other side’s military capabilities, individual decision makers tend to pay only “selective attention” (Yarhi-Milo, Reference Yarhi-Milo2014, pp. 2–5) to the adversaries’ intentions on the basis of their own pre-existing beliefs, theories and personal impressions. Historically, long-standing tussles over intelligence reports suggest that promoting competition over the reliability of estimates can provide some safeguards against automation bias when recommendations are generated using AI DSS.

The need to question prevailing assumptions also prompted the writing of the UK Ministry of Defence’s (MOD) 2018 handbook, The Good Operation. This was in direct response to the Iraq Inquiry led by Lord Chilcot, which criticised the flawed intelligence dossiers the UK government had presented to the public and the “propensity for groupthink” (UK Ministry of Defence, 2018, p. 7) in the government’s case for invading Iraq in 2002–2003. Although the document’s starting point was Iraq, many of the principles it highlights are also appropriate for combating AI DSS-induced automation bias. This included reminders that “everyone can challenge in the interest of “good decisions” (UK Ministry of Defence, 2018, p. 13). Although operational planning inevitably involves time pressure and decision making under stress, the handbook emphasises “building in sufficient challenge, diversity of thought and critical thinking to head off groupthink.” (UK Ministry of Defence, 2018, p. 11). This includes red teaming (where an independent group offers challenges); inviting diverse thinking (including independent or external viewpoints) into the process; and wargaming. The document additionally provides a how-to guide to pose a “reasonable challenge.” These principles – written with lessons from Iraq in mind – should also apply to combating automation bias and groupthink in human–machine teams working with AI DSS.

The MOD Secretary of State’s Office of Net Assessment and Challenge (SONAC), established in 2022, also includes external commissions, red teaming and wargaming to test assumptions and safeguard against the dangers of groupthink. SONAC has developed a challenge training pack with the Foreign, Commonwealth and Development Office (FCDO). The British government’s Cosmic Bazaar tournament, in which 1300 civil servants were asked to answer strategic questions such as whether China would invade Taiwan, is another attempt to improve predictive intelligence (Dahl & Strachan-Morris, Reference Dahl and Strachan-Morris2024). These pre-existing initiatives can be re-tasked to question AI DSS-generated recommendations. In 2024, the UK Joint Intelligence Organisation (JIO) and GCHQ jointly commissioned a report by the Alan Turing Institute on how AI-enabled intelligence may affect strategic decision making. The report highlighted that robustness and source validation are indispensable for “AI-enriched insights to be used effectively and wisely in the assessments which inform National Security decisions” (Hughes et al., Reference Hughes, Carter, Harland and Babuta2024). The report reaffirms that analysts require training on how to interpret and challenge AI-enriched intelligence and, by doing so, further reinforces the importance of the kind of education that the preceding section has outlined.

The need to refute assumptions and foster contrarian thinking mirrors the rise of red teaming since the 1960s. Pioneered by the RAND Corporation, red teaming involved simulations of how a blue team (the US military) might perform against a red team (the Soviet military). Red teaming involves robustly challenging plans and assumptions through an adversarial approach. In the field of cybersecurity, the US DOD has issued guidelines for DOD Cyber Red Teams to uncover vulnerabilities and cyber risks in a congested and contested cyberspace (Department of Defense (US), 2024). Red teaming is also applicable to governmental decision making, including decisions on the resort to force.

The UK MOD’s Development, Concepts and Doctrine Centre (DCDC) updated its Red Teaming Handbook with a third edition in 2021. Targeted at “individuals and teams faced with making decisions across all levels of an organization” (UK Ministry of Defence, 2021, p. V), the Handbook stresses that red-teaming techniques can address cognitive biases in decision making. To address concerns specifically to do with automation bias, governments should also consider red teaming. In the context of challenging AI systems, red teaming is defined as “a structured process for probing AI systems and products for the identification of harmful capabilities, outputs, or infrastructural threats” (Zhang et al., Reference Zhang, Shaw, Anthis, Milton, Tseng, Suh and Gray2024). Governments can choose from several red-teaming methods which can be manual (involving external human red teaming), automated, or a mixture of both. It is also important to consider design factors such as composition of a red team, access levels, and the guidance required to conduct red teaming (OpenAI, 2024). When designing red teams to mitigate automation bias in decision making, governments can draw on considerable experience in relevant industries on how to red team AI models.

The technical processes involved in training AI DSS can also interrogate AI-generated outputs. After all, an adversarial process is already deployed in the training of ML models.Footnote 4 Adversarial examples are inputs designed to fool a machine learning model into making incorrect predictions, sometimes with very high confidence. To mitigate this vulnerability, adversarial training exposes the model to examples of manipulated inputs and forces the model to learn to become more robust and resistant to such attacks. The adversarial learning that is already present in the training of ML models could push AI DSS to more robustly interrogate their own recommendations.

Other AI systems may also be used to challenge human–machine teams working with AI DSS. One example is the Intelligence Advanced Research Projects Agency (IARPA), which runs the Rapid Explanation, Analysis, and Sourcing Online (REASON) programme. This programme is developing a Chat GPT-type generative AI system to evaluate and challenge the written reports of intelligence analysts to improve their findings and arguments; suggest additional overlooked evidence; and identify strengths and weaknesses in their reasoning (Probasco Reference Probasco2024, August). Such a generative AI system could assist decision makers to challenge the recommendations of human analysts – including those that may have been derived from another ML-driven AI DSS. In other words, this means pitting different types of AI systems against each other.

How human–machine teams cope with automation bias depends ultimately on the scenario and context in which AI DSS generate recommendations and the time constraints that humans face to rebut those recommendations. As researchers from the Alan Turing Institute concluded, “how an analyst treats an output from a Machine Learning (ML) model is highly context-specific,” depending on urgency and prioritisation (Knack et al., Reference Knack, Carter and Babuta2022, p. 4). Here, behavioural science can help to better understand how humans react to the spectrum of human–machine teaming arrangements, which can range from seeing AI DSS as back-up assistants to treating them as equal to human analysts. Gaining such understanding includes asking, for instance, what levels of explainability influence how humans trust or distrust AI-generated recommendations; how quickly humans can react to a potentially wrong AI recommendation; and which AI recommendations deserve human prioritisation (Ozkaya, Reference Ozkaya2020). The preceding discussion suggests that building up a better picture of how humans perform in human–machine teaming with AI DSS in different behavioural contexts can help to reinforce the impact of red teaming. Behavioural science may be just as important as the technical specifications of any AI DSS in determining how human–machine teams can mitigate automation bias.

Another level of challenging assumptions involves questioning the overall strategy and war aims. War initiators who make decisions to wage wars have their own (often unstated and sometimes misplaced) assumptions and theories of victory.Footnote 5 Can AI DSS be programmed to make those human assumptions explicit and to challenge such notions of victory if the underlying assumptions are invalid or unrealistic? Can AI DSS conduct sophisticated auditing and assessments of an adversary’s relative military power capabilities to challenge unfounded or inflated expectations of what one’s own military forces can do on the battlefield? This means assessing the relative merits of one’s own war strategy and capabilities versus an opponent’s. AI DSS may, in this rather optimistic reading, act as a restraint against the mistaken resort to force based on unrealistic expectations of victory.

On the other hand, however, empirical studies through wargame simulations point worryingly towards the contrary: actions recommended by AI agents escalate rapidly during crisis decision-making scenarios, even to the point of AI agents recommending the use of nuclear weapons (Rivera et al., Reference Rivera2024). Bearing in mind such demonstrable risks of escalation, software developers should not only train AI DSS to beware of escalation ladders, but also to bring into the open the underlying assumptions of victory that human decision makers often leave unstated. This is especially pertinent if AI DSS are to be treated as equivalent to human analysts in human–machine teams.

4. Institutions

The preceding two sections have argued that educating human actors in AI literacy goes hand in hand with the need to contest recommendations proposed through human–machine teaming with AI DSS. For education campaigns and challenge mechanisms to operate optimally, it is necessary to consider a third related dimension of institutional structures that can either entrench or undermine the process of questioning AI DSS-generated recommendations (Zenko, Reference Zenko2015).

If human–machine teams are to mitigate automation bias, it matters how institutions are set up. Given the focus of this paper on decision making on the resort to force, the relevant institutions range from intelligence agencies and national security councils to defence and foreign ministries. To guard against groupthink and automation bias, an organisational culture of leadership and a mindset that accepts and even encourages challenging AI DSS recommendations are the elusive Holy Grail of organisational change. As the UK’s then-Cabinet Secretary noted, “one of the most important lessons of all from Chilcot… is not so much what meetings you fix up or do not fix up, it is what culture and spirit of challenge you have within those meetings” (House of Commons, 2017, p. 22). Several guiding principles of leadership remain relevant: “The boss must buy in” and “be willing to listen to the bad news and do something with it” (Zenko, Reference Zenko2015). Supportive top-level senior leadership that endorses challenging recommendations suggested by AI DSS is therefore indispensable.

For a challenge to be effective within an institution, location matters. For instance, a “red team has to be situated correctly to the target institution that it’s red teaming” (Zenko, Reference Zenko2015). If it is placed in an obscure location, its effectiveness will be undermined. But if, on the other hand, the red team is too integrated and ingrained within the institution, there is a danger of institutional capture. Senior officials therefore need to carefully consider where best to situate red teams to challenge human–machine teams using AI DSS. Networking and inter-agency collaborations can also help to bolster this challenge function. A useful example can be seen in current UK practice where different platforms play devil’s advocate. The FCDO’s Research Analysis department shares papers with SONAC and the MOD’s in-house think tank, the Development, Concepts and Doctrine Centre (DCDC). Defence Intelligence within the MOD also provides information to rebut reports published by net assessment teams at SONAC, whereas SONAC itself runs red teaming exercises with other relevant agencies such as Dstl and DCDC. Such inter-agency mechanisms for questioning assumptions can help to reduce the risks of automation bias and human deskilling.

Procedures and processes within institutions can further entrench the need to challenge AI DSS recommendations. For instance, establishing an audit trail to identify the various users involved at different stages in the intelligence analysis process can add layers of accountability and transparency. Institutions can also make changes to management processes and structures to ensure diversity of thought and input (from system developers and behavioural scientists to end users and programme managers) to help understand the failures and limitations of AI DSS (Knack et al., Reference Knack, Carter and Babuta2022). To ensure that human decision making is augmented rather than automated, institutions must establish protocols and standing orders that enable analysts to require better explainability and interpretability of AI DSS before accepting their decisions. The Alan Turing Institute’s Project ExplAIn already provides best practice guidance including checklists on tasks to undertake and people to identify who hold key roles across decision making processes (Alan Turing Institute, 2025). Such check lists could be incorporated into audit trails in intelligence agencies and national security councils to require explainability of AI-generated suggestions.

Furthermore, decision making increasingly tends to be cross-departmental rather than purely military. Existing professional military education courses targeted at DOD and other federal civilian decision makers (such as the US Defense Senior Leader Development Program [DSLDP] run by the Defense Civilian Personnel Advisory Service) could include specific modules on combating automation bias in strategic decision making. The US Army Management Staff College offers another potential cross-government federal training platform through its Continuing Education for Senior Leaders – Strategic Leadership (CESL-SL) course.

Finally, a multilateral institutional setting where there is close intelligence sharing among trusted allies and like-minded partners can also help to mitigate automation bias. Since AI DSS compress the time that policymakers have for deliberation, allies may hesitate to use these tools in decisions on the resort to force, if decision makers from different allied states have divergent levels of technological competency or trust in AI DSS (Lin-Greenberg, Reference Lin-Greenberg2020). While this may complicate the coordination required for alliance decision making in time-compressed situations, such hesitation can also provide an extra layer of insulation against hasty choices based on AI DSS recommendations.

Trusted relationships exist among North Atlantic Treaty Organisation (NATO) allies, the long-standing Five Eyes intelligence network, not to mention newer partnerships such as the Australia–UK–US (AUKUS) submarine programme and the UK–Japan–Italy Global Combat Air Programme (GCAP). These offer potential avenues for allies to challenge recommendations generated by human–machine teams using AI DSS. The UK’s “AI for Acoustics” project for instance processes large volumes of underwater acoustic data, as part of AUKUS collaboration with Australian and US partners (House of Commons, 2025, p.2). Even though it is not a part of AUKUS, Japan has already sent observers to the AUKUS Autonomous Warrior 2024 exercises in Australia to test interoperability and remote tactical control of each other’s autonomous systems (Greene & Fernandez, Reference Greene and Fernandez2024). Japan’s participation suggests that AUKUS members and associated partners may eventually face a practical need to understand how intelligence collected from these autonomous platforms is analysed and used. If such intelligence is analysed by AI DSS to generate recommendations, AUKUS members and partners will have to safeguard strategic decision-making processes against automation bias and groupthink. The Five Eyes network has also issued guidance for protecting their AI systems, focused on improving confidentiality and integrity; assuring that known cybersecurity vulnerabilities are secured; and implementing a robust series of safeguards to detect and prevent malicious activity (Cybersecurity and Infrastructure Security Agency, 2024). Such guidance should also elaborate on the need to challenge automation bias and groupthink when using AI DSS.

5. Conclusion

Human–machine teams working with AI DSS are often attracted by the ability of AI to analyse massive data streams and recommend solutions faster than human analysts can. However, the perils of automation bias and the displacement of human judgement must be mitigated. This article has proposed three inter-locking themes of education, challenge functions and institutions, which, working together in tandem, might help to manage the risks that arise from human–machine teaming with AI DSS at the level of strategic decision making.

A central component of any education programme on AI literacy must remind human decision makers that the resort to force is all about contexts and the limitations of AI DSS to internalise those contexts. After all, how a nation “chooses, defines and perceives its enemies, estimates their intentions and plans to counter them necessarily comes from its unique expression, arising out of its systems and organisations” (Bathurst, Reference Bathurst1993, p. 125). Such subjective, human-centric contexts are often too complicated for AI DSS to copy and understand. AI DSS can recommend a potential next move, but they are ultimately constrained by their prior learning models. Simply put, AI DSS cannot understand and copy the exact decision-making processes of political or military leaders with their very human cognitive flaws and shortcomings. Crucially, AI DSS also cannot weigh the future consequences of actions, nor can they feel dread or (im)patience. This means that how humans are equipped and trained to function in any human–machine team operating AI DSS remains of utmost importance: “the simple rational choice model suggests that rational decision-making would increase the demand for human judgment, assuming that judgment is a normal good” (Basuchoudhary, Reference Basuchoudhary2025, p. 83).

Much of the existing literature assumes that human upskilling through better education and improved AI literacy may lead to better use of AI DSS in human–machine teams. However, such improvements should not be taken for granted. Analysts who have undergone AI literacy training might also be excessively confident in their own capabilities or simply complacent.Footnote 6 Such complacency may arise from human actors neglecting to monitor automated systems, although recent experiments show that system- and person-related variables such as alleviating workloads also interact in influencing complacent behaviour (Harbath et al., Reference Harbath, Gößwein, Bodemer and Schnaubert2025). This suggests that institutional design is an important factor in addressing the risks of over-confidence as well as encouraging challenges to be posed when AI DSS are used. Furthermore, training programmes must somehow keep up with the dizzying pace at which AI DSS technologies develop. Walker and Smeddle (Reference Walker and Smeddle2024, p. 4) stress that “utilising emerging and uncertain ML techniques in human–machine teaming will require continuous testing and oversight.” This has implications for educating policy makers on AI literacy. For instance, how often might refresher courses be needed? Will institutions and managers be willing to allow their analysts to be away regularly for training, and how will this be balanced with operational day-to-day demands in resource-constrained environments?

Education on AI literacy only provides the basic building blocks for a second, perhaps more important and inter-linked point: that human decision makers must be trained and even empowered to routinely and robustly challenge automation bias and AI DSS-enabled groupthink. Yet, there are long-standing flaws in how humans decide, with or without AI DSS. It bears remembering that human decision making (even before AI DSS emerged) has historically suffered from cognitive biases. Automation bias is only the latest affliction to impair human decision making. Furthermore, humans make decisions often not out of the “best” choices but, rather, out of informed albeit biased calculations. In the words of Jervis (Reference Jervis2006, p. 3), “confirmation bias was rampant” in flawed intelligence estimates leading up to war on Iraq in 2003. Explaining why the Pentagon’s so-called Office of Special Plans was set up to supply alternative raw intelligence on Iraq, the office’s head, Douglas Feith argued: “It’s healthy to criticize the CIA’s intelligence” (Fox News, Reference Fox News2007). Others, such as Senator Carl Levin, however, claimed that intelligence on the Iraq-Al Qaeda relationship presented by the office was “manipulated” with “alternative intelligence” generated to support the administration’s preferred policy (Washington Post, 2007). Analysts must therefore not only beware the politicisation of intelligence, they must now also be alert to automation bias that can accompany AI DSS recommendations in human–machine teams. Multiple levels of challenge are needed at different stages of the analysis and decision-making process, from the proverbial coalface of raw intelligence analysis to the intermediate interface between intelligence and decision, to the final strategic decision maker.

Finally, it behoves strategic decision makers and intelligence analysts who work in human–machine teaming with AI DSS to ensure that institutional structures are in place to encourage questioning of AI DSS-generated recommendations. Leadership plays a critical role in effective institutional design to embed and promote challenges of assumptions throughout any institution. In this regard, visible top-level endorsement and active buy-in from senior officials sends important signals for the institutional culture and mindset to take challenging AI DSS recommendations seriously. Procedures that institutionalise a need to challenge assumptions can then become part and parcel of routine day-to-day operational activities. These include audit trails to enhance the accountability, traceability and transparency of how decisions were arrived at through machine-teaming with AI DSS. Information sharing and testing assumptions within inter-agency structures and with close international allies such as the Five Eyes network can inject additional layers of analysis to minimise automation bias.

Regardless of how far AI DSS are relied upon in human–machine teaming, decisions on the resort to force entail the most human of tragic consequences that any strategic decision maker and intelligence analyst must face. It should never be a decision that human–machine teams make without the proper integration of education, challenge mechanisms and institutional support structures.

Funding statement

Research and presentation of this work was supported by Japan Society for Promotion of Science project grant 18H03620 (PI: Hideaki Shiroyama) and Australian Department of Defence Strategic Policy Grant for “Anticipating the Future of War: AI, Automated Systems, and Resort-to-Force Decision Making” (2023-2025) (PI: Toni Erskine).

Competing interests

The author declares none.

Yee-Kuang HENG is Professor at the Graduate School of Public Policy and Director of the Security Studies Unit, Institute for Future Initiatives, The University of Tokyo, Japan. After completing his BSc (First Class Hons) and PhD in International Relations at the London School of Economics and Political Science, he held faculty posts lecturing at Trinity College Dublin (Ireland); the University of St Andrews (Scotland), and the National University of Singapore. He was also Senior Academic Visitor at the University of Cambridge’s Centre for the Study of Existential Risk. Current research interests include strategic studies; futures literacy and existential risks; Britain’s defence cooperation with Japan as part of the Indo-Pacific tilt. Recent publications include “Building futures literacy: Nudging civil servants to cope with uncertainties and threats,” European Journal of International Security (2024).

Footnotes

1 This is one of fourteen articles published as part of the Cambridge Forum on AI: Law and Governance Special Issue, AI and the Decision to Go to War, guest edited by Toni Erskine and Steven E. Miller. The author would like to thank two anonymous reviewers, Lieutenant Colonel Paul Lushenko, Ashley Deeks, Toni Erskine, Steven E. Miller and Tuukka Kaikkonen for their invaluable support and advice.

2 I am grateful to Ashley Deeks for this example.

3 I thank Ashley Deeks for this excellent feedback.

4 I thank an anonymous reviewer for reminding me of this critical point.

5 I thank Steven Miller for this excellent point.

6 I thank an anonymous reviewer for this point.

References

Alan Turing Institute. (2025). Project ExplAIn. Retrieved from https://www.turing.ac.uk/research/research-projects/project-explain, Accessed 9 August, 2025.Google Scholar
Ananthaswamy, A. (2023, March 8 ). In AI, is bigger always better?, Nature. Retrieved from https://www.nature.com/articles/d41586-023-00641-w, Accessed 10 August, 2025.Google Scholar
Anduril. (2024 July 31 ). Anduril industries, Sumisho Aero-systems to demonstrate diverse command and control for the Japan Maritime Self-Defense Force. Retrieved from https://www.anduril.com/article/anduril-industries-sumisho-aero-systems-to-demonstrate-diverse-command-and-control-for-the-japan/, Accessed 1 September, 2025.Google Scholar
Army Press, (2016, November 23 ). The Army Press’s Future Warfare Writing Program. Retrieved from https://mwi.westpoint.edu/army-presss-future-warfare-writing-program/, Accessed 09 June, 2024.Google Scholar
Australian Government. (2021). Reflections of Interagency Leadership, Australian Civil-Military centre. Retrieved from https://www.acmc.gov.au/sites/default/files/2021-03/Taskforce%20Reflections%20of%20Interagency%20Leadership%20e-Publication.pdf, Accessed 24 November, 2024.Google Scholar
Avin, S., Gruetzemacher, R., & Fox, J. (2020, February 7 ). Exploring AI futures through role play. AIES ‘20: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. Retrieved from https://doi.org/10.1145/3375627.3375817CrossRefGoogle Scholar
Basuchoudhary, A. (2025). AI and warfare: A rational choice approach. Eastern Economic Journal, 51, 7486. https://doi.org/10.1057/s41302-024-00280-7CrossRefGoogle Scholar
Bathurst, R. (1993). Intelligence and the Mirror: On Creating an Enemy. London: Sage Publications.Google Scholar
Central Intelligence Agency (CIA). (2007, April 16 ). Net Assessments in National Intelligence. Retrieved from https://www.cia.gov/readingroom/docs/CIA-RDP84B00049R001403580003-3.pdf, Accessed 05 May, 2024.Google Scholar
Chiodo, M., Müller, D., & Sienknecht, M. (2024). Educating AI developers to prevent harmful path dependency in AI resort-to-force decision making. Australian Journal of International Affairs, 78(2), 210219. https://doi.org/10.1080/10357718.2024.2327366CrossRefGoogle Scholar
Chivvis, C. S., & Kavanagh, J. (2024, June 17 ). How AI might affect decisionmaking in a national security crisis, Carnegie Endowment for International Peace. Retrieved from https://carnegieendowment.org/research/2024/06/artificial-intelligence-national-security-crisis?lang=en, Accessed 25 May, 2025.Google Scholar
Christie, E. H., Ertan, A., Adomaitis, L., & Klaus, M. (2024). Regulating lethal autonomous weapon systems: Exploring the challenges of explainability and traceability. AI Ethics, 4, 229245. https://doi.org/10.1007/s43681-023-00261-0CrossRefGoogle Scholar
Cybersecurity and Infrastructure Security Agency. (2024, April 15 ). Joint Guidance on Deploying AI Systems Securely. Retrieved from https://media.defense.gov/2024/Apr/15/2003439257/-1/-1/0/CSI-DEPLOYING-AI-SYSTEMS-SECURELY.PDF, Accessed 06 July, 2025.Google Scholar
Dahl, E. J., & Strachan-Morris, D. (2024). Predictive intelligence for tomorrow’s threats: Is predictive intelligence possible? Journal of Policing, Intelligence and Counter Terrorism, 19(4), 423435. https://doi.org/10.1080/18335330.2024.2404834Google Scholar
Davis, J. L. (2024). Elevating humanism in high-stakes automation: Experts-in-the-loop and resort-to-force decision making. Australian Journal of International Affairs, 78(2), 200209. https://doi.org/10.1080/10357718.2024.2328293CrossRefGoogle Scholar
Defence Science and Technology Laboratory (Dstl) (2023a, February 28 ). Futuristic visions from sci-fi writers offer insights for defence. Retrieved from https://www.gov.uk/government/news/futuristic-visions-from-sci-fi-writers-offer-insights-for-defence, Accessed 07 May, 2025.Google Scholar
Department of Defense (US) (2009, December 23 ). Director of Net Assessment, Directive Number 5111.11. Retrieved from http://www.dtic.mil/whs/directives/corres/pdf/511111p.pdf, Accessed 09 February, 2024.Google Scholar
Department of Defense (US) (2024, January 11 ). DOD CYBER RED TEAMS, DOD INSTRUCTION 8585.01. Retrieved from https://www.esd.whs.mil/Portals/54/Documents/DD/issuances/dodi/858501p.pdf?ver=0J4GT4-ji4H0Dd7mOa173w%3D%3D, Accessed 25 May, 2025.Google Scholar
Department of Defense. (US) (2020), AI Education Strategy. Retrieved from https://www.ai.mil/docs/2020_DoD_AI_Training_and_Education_Strategy_and_Infographic_10_27_20.pdf, Accessed 30 May, 2024.Google Scholar
Desch, M. (2014, December 17 ). “Don’t Worship at the Altar of Andrew Marshall”, The National Interest. Retrieved from https://nationalinterest.org/feature/the-church-st-andy-11867, Accessed 04 April, 2024.Google Scholar
Dstl. (2023b, February 28 ). Stories from the Future: exploring new technology through useful fiction. Retrieved from https://www.gov.uk/government/publications/stories-from-the-future-exploring-new-technology-through-useful-fiction, Accessed 12 January, 2025.Google Scholar
Erskine, T. (2024). Before algorithmic Armageddon: Anticipating immediate risks to restraint when AI infiltrates decisions to wage war. Australian Journal of International Affairs, 78(2), 175190. https://doi.org/10.1080/10357718.2024.2345636CrossRefGoogle Scholar
Erskine, T., & Miller, S. E. (2024). AI and the decision to go to war: Future risks and opportunities. Australian Journal of International Affairs, 78(2), 135147. https://doi.org/10.1080/10357718.2024.2349598CrossRefGoogle Scholar
Ferey, A., & de Roucy-rochegonde, L. (2024). From Ukraine to Gaza: Artificial intelligence in war. Politique Étrangère, 89(3), 3950. https://doi.org/10.3917/pe.243.0039Google Scholar
Ferl, A. K. (2023). Imagining meaningful human control: Autonomous weapons and the (De-) legitimisation of future warfare. Global Society, 38(1), 139155. https://doi.org/10.1080/13600826.2023.2233004CrossRefGoogle Scholar
Firth-Butterfield, K., Toplic, L., Anthony, A., & Reid, E. (2022). Without universal AI literacy, AI will fail us. https://www.weforum.org/stories/2022/03/without-universal-ai-literacy-ai-will-fail-us/, Accessed 23 May, 2024.Google Scholar
Fox News, (2007, February 11 ). Transcript: Former Defense Undersecretary Douglas Feith on ‘FNS.’ Retrieved from https://www.foxnews.com/story/transcript-former-defense-undersecretary-douglas-feith-on-fns, Accessed 10 June, 2025.Google Scholar
Frauenfelder, M. (2023, February 19 ) Five Actions to Jump-Start Creativity, Institute for the Future. Retrieved from https://www.iftf.org/insights/five-actions-to-jump-start-creativity/, Accessed 4 May, 2025.Google Scholar
Fukuyama, T. (2022, July 11 ). Digital Literary 3: AI and Deep Learning Fields (デジタルリテラシー③AI・ディープラーニング領域–), Japan Maritime Self-Defense Force Command and Staff College Institute for Future Warfare Studies. Retrieved from https://www.mod.go.jp/msdf/navcol/assets/pdf/2022_0711_01.pdf, Accessed 06 June, 2024.Google Scholar
Greene, A., & Fernandez, T. (2024, October 24 ). Japanese officials observe secretive Jervis Bay exercises ahead of likely AUKUS invitation. Australian Broadcasting Corporation. Retrieved from https://www.abc.net.au/news/2024-10-24/japan-observes-aukus-exercises-jervis-bay/104514578, Accessed 09 May, 2025.Google Scholar
Grube, D., & Killick, A. (2023). Groupthink, polythink and the challenges of decision-making in cabinet government. Parliamentary Affairs, 76(1), 211231. https://doi.org/10.1093/pa/gsab047CrossRefGoogle Scholar
Harbath, L., Gößwein, E., Bodemer, D., & Schnaubert, L. (2025). (Over)Trusting AI recommendations: How system and person variables affect dimensions of complacency. International Journal of Human–Computer Interaction, 41(1), 391410. https://doi.org/10.1080/10447318.2023.2301250CrossRefGoogle Scholar
Holbrook, C., Holman, D., Clingo, J., & Wagner, A. R. (2024). Overtrust in AI recommendations about whether or not to kill: Evidence from two human-robot interaction studies. Scientific Reports, 14, 19751. https://doi.org/10.1038/s41598-024-69771-zCrossRefGoogle ScholarPubMed
Holmes, M., & Wheeler, N. J. (2024). The role of artificial intelligence in nuclear crisis decision making: A complement, not a substitute. Australian Journal of International Affairs, 78(2), 164174. https://doi.org/10.1080/10357718.2024.2333814CrossRefGoogle Scholar
Horowitz, M., & Kahn, L. (2024). Bending the automation bias curve: A Study of human and AI-based decision making in national security contexts. International Studies Quarterly, 68(2), 115. https://doi.org/10.1093/isq/sqae020CrossRefGoogle Scholar
House of Commons. (2017, February 27 ). Lessons still to be learned from Chilcot Inquiry. Retrieved from https://publications.parliament.uk/pa/cm201617/cmselect/cmpubadm/656/656.pdf, Accessed 29 May, 2024.Google Scholar
House of Commons. (2025, April 4 ). Government response to Developing AI capacity and expertise in UK Defence, Defence Committee, Third Special Report of Session 2024–25, HC812. Retrieved from https://committees.parliament.uk/publications/47384/documents/245593/default/, Accessed 1 September, 2025.Google Scholar
Hughes, M., Carter, R., Harland, A., & Babuta, A. (2024, April 22 ). AI and Strategic Decision-Making, Alan Turing Institute. Retrieved from https://cetas.turing.ac.uk/publications/ai-and-strategic-decision-makingGoogle Scholar
ICRC & Geneva Academy. (2024, March). Expert Consultation Report on AI and Related Technologies in Military Decision-Making on the Use of Force in Armed Conflicts. Retrieved from https://www.geneva-academy.ch/joomlatools-files/docman-files/Artificial%20Intelligence%20And%20Related%20Technologies%20In%20Military%20Decision-Making.pdf, Accessed 06 June, 2025.Google Scholar
Jervis, R. (2006). Reports, politics, and intelligence failures: The case of Iraq. Journal of Strategic Studies, 29(1), 352.CrossRefGoogle Scholar
John, S. M. (2024, December 16 ). Survey reveals 58% of executives lack AI training, highlighting knowledge gap in leadership. Retrieved from https://wire19.com/survey-reveals-58-of-executives-lack-ai-training/, Accessed 05 May, 2025.Google Scholar
Knack, A., Carter, R. J., & Babuta, A. (2022, December). Human-Machine Teaming in Intelligence Analysis, Alan Turing Institute. Retrieved from https://cetas.turing.ac.uk/sites/default/files/2022-12/cetas_research_report_-_hmt_and_intelligence_analysis_vfinal.pdf, Accessed 15 June, 2024.Google Scholar
Lancaster, C. M., Duan, W., Mallick, R., & McNeese, N. J. (2025). Human-centered team training for human-AI teams: From training with AI tools to training for AI teammates. Proceedings of the ACM on Human-Computer Interaction, 9(2), 138. https://dl.acm.org/doi/10.1145/3710998, Accessed 01 September, 2025.CrossRefGoogle Scholar
Lawrence, T. (1991). Impacts of Artificial intelligence on organisational decision making. Journal of Behavioural Decision Making, 4(3), 195214. https://doi.org/10.1002/bdm.3960040306Google Scholar
Lin-Greenberg, E. (2020). Allies and Artificial Intelligence: Obstacles to operations and decision-making. Texas National Security Review, 3(2), 5676. https://doi.org/10.26153/tsw/8866Google Scholar
Liveley, G., Slocombe, W., & Spiers, E. (2021, January). Futures literacy through narrative. Futures, 125(Article 102663), 102663. https://doi.org/10.1016/j.futures.2020.102663CrossRefGoogle Scholar
Ministry of Defense. (2024a, July). Basic Policy on AI (防衛省AI活用推進基本方針). Retrieved from https://www.mod.go.jp/j/press/news/2024/07/02a_03.pdf, Accessed 08 January, 2025.Google Scholar
Ministry of Defense. (2024b). Overview of the Air Self-Defense Force. Retrieved from https://www.mod.go.jp/asdf/doc/special/download/booklet/gaiyou2024.pdf, Accessed 23 June, 2025.Google Scholar
Nadibaidze, A., Bode, I., & Zhang, Q. (2024, November 4 ). AI in Military Decision Support Systems: A Review of Developments and Debates. Retrieved from https://www.autonorms.eu/ai-in-military-decision-support-systems-a-review-of-developments-and-debates/, Accessed 09 September, 2025.Google Scholar
NASA. (2024). What is Artificial Intelligence? Retrieved from https://www.nasa.gov/what-is-artificial-intelligence/, Accessed 09 August, 2025.Google Scholar
Nelson, A., & Epstein, G. (2022, December 23 ). The PLA’s Strategic Support Force and AI Innovation, Brookings. Retrieved from https://www.brookings.edu/articles/the-plas-strategic-support-force-and-ai-innovation-china-military-tech/, Accessed 8 June, 2024.Google Scholar
Ng, D. T. K., Leung, J. K. L., Chu, S. K. W., & Qiao, M. S. (2021). Conceptualizing AI literacy: An exploratory review. Computers and Education: Artificial Intelligence, 2, 100041. https://doi.org/10.1016/j.caeai.2021.100041Google Scholar
OpenAI. ( 2024, November 21 ). Advancing red teaming with people and AI. Retrieved from https://openai.com/index/advancing-red-teaming-with-people-and-ai/Google Scholar
Osoba, O. A. (2024). A complex-systems view on military decision making. Australian Journal of International Affairs, 78(2), 237246. https://doi.org/10.1080/10357718.2024.2333817CrossRefGoogle Scholar
Ozkaya, I. (2020). The Behavioral Science of Software Engineering and Human–Machine Teaming. IEEE Software, 37(6), 36. https://doi.org/10.1109/MS.2020.3019190Google Scholar
Parliamentary Office of Science and Technology. (2024, January 23 ). Artificial intelligence (AI) glossary. Retrieved from https://post.parliament.uk/artificial-intelligence-ai-glossary/, Accessed 09 June, 2025.Google Scholar
Pinski, M., & Benlian, A. (2024). AI literacy for users – A comprehensive review and future research directions of learning methods, components, and effects. Computers in Human Behavior: Artificial Humans, 2(1), 100062. https://doi.org/10.1016/j.chbah.2024.100062CrossRefGoogle Scholar
Pinski, M. et al. (2024, April 9 ). Why Executives Can’t Get Comfortable With AI, MIT-Sloan Management Review. Retrieved from https://sloanreview.mit.edu/article/why-executives-cant-get-comfortable-with-ai/, Accessed 15 June, 2025.Google Scholar
Probasco, E. (2024, August). Building the Tech Coalition, Centre for Security and Emerging Technology Policy Brief. Retrieved from https://cset.georgetown.edu/publication/building-the-tech-coalition/, Accessed 9 July, 2025.Google Scholar
Probasco, E. et al. (2024). Not Oracles of the Battlefield: Safety Considerations for AI-Based Military Decision Support Systems, Proceedings of the Seventh AAAI/ACM Conference on AI, Ethics, and Society. Retrieved from https://ojs.aaai.org/index.php/AIES/article/view/31712/33879, Accessed 14 August, 2025.Google Scholar
Renic, N. (2024). Tragic reflection, political wisdom, and the future of algorithmic war. Australian Journal of International Affairs, 78(2), 247256. https://doi.org/10.1080/10357718.2024.2328299CrossRefGoogle Scholar
Rivera, J. P. et al. (2024, May). Escalation Risks from LLMs in Military and Diplomatic Contexts, Stanford University Policy Brief, HAI Policy & Society. Retrieved from https://hai.stanford.edu/sites/default/files/2024-05/Escalation-Risks-Policy-Brief-LLMs-Military-Diplomatic-Contexts.pdf, Accessed 10 May, 2025.CrossRefGoogle Scholar
Skitka, L. J., Mosier, K. L., & Burdick, M. (1999). Does Automation Bias Decision-Making? International Journal of Human-Computer Studies, 51(5), 9911006. https://doi.org/10.1006/ijhc.1999.0252CrossRefGoogle Scholar
Stewart, R., & Hinds, G. (2023, October 24 ). Algorithms of war: The use of artificial intelligence in decision making in armed conflict, ICRC Blogs. Retrieved from https://blogs.icrc.org/law-and-policy/category/topics/technology-in-humanitarian-action/, Accessed 15 August, 2025.Google Scholar
UK Ministry of Defence. (2018). The Good Operation. Retrieved from https://assets.publishing.service.gov.uk/media/5a81f19440f0b62305b91a48/TheGoodOperation_WEB.PDF, Accessed 09 May, 2024.Google Scholar
UK Ministry of Defence. (2021). Red Teaming Handbook, 3rd edition. Retrieved from https://www.gov.uk/government/publications/a-guide-to-red-teaming, Accessed 09 May, 2024.Google Scholar
US Marine Corps. (2018). MARSOC 2030: A Strategic Vision for the Future. Retrieved from https://www.marsoc.marines.mil/About/Initiatives/MARSOF-2030/, Accessed 23 May, 2024.Google Scholar
Walker, J., & Smeddle, L. (2024, May) Decision-making: How do human-machine teamed decision makers, make decisions? DCDC Concept Information Note 4. Retrieved from https://assets.publishing.service.gov.uk/media/635931b18fa8f557d066c1b1/A_Brief_Guide_to_Futures_Thinking_and_Foresight_-_2022.pdf, Accessed 18 June, 2025.Google Scholar
Washington, S. (2022, September 29 ). Building Foresight Capability – A Curated Conversation Between Jurisdictions, ANZSOG Research Insights No. 25. Australia and New Zealand School of Government. Retrieved from https://anzsog.edu.au/research-insights-and-resources/research/building-foresight-capability/, Accessed 28 May, 2024.CrossRefGoogle Scholar
Yarhi-Milo, K. (2014). Knowing the Adversary: Leaders, Intelligence, and Assessment of Intentions in International Relations. Princeton, New Jersey: Princeton University Press.Google Scholar
Younis, Z., Marwa, I., & Azzam, H. (2024). The impact of artificial intelligence on organisational behavior: A risky tale between myth and reality for sustaining workforce. European Journal of Sustainable Development, 13(1), 109. https://doi.org/10.14207/ejsd.2024.v13n1p109CrossRefGoogle Scholar
Zala, B. (2024). Should AI stay or should AI go? First strike incentives & deterrence stability. Australian Journal of International Affairs, 78(2), 154163. https://doi.org/10.1080/10357718.2024.2328805CrossRefGoogle Scholar
Zenko, M. (2015, November 10 ). ‘Red Team: How to Succeed By Thinking Like the Enemy,’ CFR Fellows’ Book Launch. Retrieved from https://www.cfr.org/event/red-team-how-succeed-thinking-enemy-0, Accessed 23 June, 2024.Google Scholar
Zhang, A., Shaw, R., Anthis, J. R., Milton, A., Tseng, E., Suh, J., & Gray, M. L. (2024). The human factor in AI red teaming: Perspectives from social and collaborative Computing, companion of the 2024 computer-supported cooperative work and social computing (CSCW Companion ‘24). Retrieved from https://doi.org/10.48550/arXiv.2407.07786CrossRefGoogle Scholar
Zysk, K. (2023, November 20 ). Struggling, Not Crumbling: Russian Defence AI in a Time of War. RUSI Commentary. Retrieved from https://rusi.org/explore-our-research/publications/commentary/struggling-not-crumbling-russian-defence-ai-time-war, Accessed 14 August, 2025.Google Scholar