1. Introduction
Design thinking (DT) is widely described as a human-centered approach to innovation that foregrounds empathy with users, creative ideation, iterative prototyping and holistic problem solving (Verganti, Vendraminelli & Iansiti Reference Verganti, Vendraminelli and Iansiti2020; Hsu et al. Reference Hsu, Lin, Chen, Liu, Chiou and Shen2024). Developed through collaborations at Stanford’s d.school and IDEO, it was intended to translate designers’ professional methods for non-design audiences and to connect creativity with organizational goals (Mayseless et al. Reference Mayseless, Saggar, Hawthorne and Reiss2018; Zemke, Stahmann & Janiesch Reference Zemke, Stahmann and Janiesch2025). Artificial intelligence (AI), by contrast, designates a family of computational techniques that approximate cognitive functions such as learning, reasoning and pattern recognition through data-driven models (Li et al. Reference Li, Lou, Zheng, Feng, Gao, Zeng and Tan2024b; Akada et al. Reference Akada, Tabuchi, Okamoto, Tanabe, Nishikawa, Yamane, Tsujikawa and Okuno2025). As organizations seek faster and more targeted innovation, these two approaches increasingly meet: the empathic, exploratory logic of DT intersects with AI’s analytical power to automate complex tasks, reveal regularities in large datasets and augment problem-solving with machine-generated insight (Verganti et al. Reference Verganti, Vendraminelli and Iansiti2020). Recent discussions suggest that carefully combining them may extend the breadth of exploration and the depth of evidence available to teams, while critics warn that such combinations raise questions of ethics, trust and procedural integrity (Lin et al. Reference Lin, Hwang, Wang, Zhou, Li, Liu and Liang2023; Lin & Chang Reference Lin and Chang2024). The debate has sharpened as generative models matured in the 2020s and as DT spread into new domains (Sauder & Jin Reference Sauder and Jin2016; Efeoğlu & Møller Reference Efeoğlu and Møller2023).
Practice, however, shows persistent bottlenecks. Novice teams sometimes apply DT as a fixed sequence rather than an iterative stance (Chen et al. Reference Chen, Liu, Hu, Du, Bai, Ren, Lan and Yu2023; Lau-Min et al. Reference Lau-Min, Marini, Shah, Pucci, Blauch, Cambareri, Mooney, Agarwal, Johnston, Schumacher, White, Gabriel, Rosin, Jacobs and Shulman2024). Empathy work is frequently constrained by small, unrepresentative samples; idea generation can succumb to fixation or fatigue, narrowing the option space and prototyping or testing often proceeds with low-fidelity artifacts or limited trials that miss real-world complexity (Verganti et al. Reference Verganti, Vendraminelli and Iansiti2020; Ahern Reference Ahern2025). Human limitations, including biases, cognitive load, time constraints and budgetary restrictions, contribute to these outcomes (Eftekhari Reference Eftekhari2019; Berglund Reference Berglund2024). AI capabilities map onto these pain points in recognizable ways. Natural language processing (NLP) and analytics can synthesize large volumes of user narratives to surface latent needs that manual research would miss (Lau-Min et al. Reference Lau-Min, Marini, Shah, Pucci, Blauch, Cambareri, Mooney, Agarwal, Johnston, Schumacher, White, Gabriel, Rosin, Jacobs and Shulman2024). Generative systems such as GPT-style models and domain-specific design tools can expand early ideation by producing numerous alternatives that a team then curates for relevance, novelty and fit (Ezzatian & Aminzade Reference Ezzatian and Aminzade2024). Simulation and predictive modeling can evaluate options virtually, from interface behavior at scale to service interactions across thousands of hypothetical users, thereby accelerating learning without displacing final human judgment (Lin et al. Reference Lin, Hwang, Wang, Zhou, Li, Liu and Liang2023; Zhang et al. Reference Zhang, Pang, Li and Guo2024). Used in this manner, AI does not replace empathy or iteration; rather, it may widen their evidentiary base and tempo, provided that designers retain responsibility for selection, interpretation and the quality of engagement with stakeholders (Gero & Milovanovic Reference Gero and Milovanovic2020).
Against this backdrop, our study explores and investigates how AI is reshaping the logic and practice of DT across models and fields. We conduct a qualitative review of research and cases that integrate AI into an established model’s stages. Applications from Art and Design, Education, Industry and Business, Healthcare and Engineering Design are examined to compare where and how AI augments particular stages and to note when it complicates them. The objective is a comparative synthesis that aligns specific AI capabilities with specific steps in these models, identifying stage-level enhancements as well as points of potential friction. Attention is given to model-specific adaptations; for example, how empathy-building practices differ between consumer product contexts and public healthcare services, as well as to recurrent concerns that cut across settings. Finally, we consider implications for the human-centered ethos that anchors DT and outline safeguards, ethical and operational, that are recommended to keep processes accountable and responsive as AI is introduced.
This article’s structure is organized as follows: the second section reviews core concepts in DT and AI; the third section details our SPAR-4-SLR methodology and search strategy and the fourth section synthesizes AI applications across canonical DT models and compares findings across the five domains. The fifth section indicates the impact of AI across DT fields. The sixth section addresses cross-cutting challenges and limitations, and the last section concludes with recommendations.
2. Background: intersection between DT and AI
2.1. DT
Modern DT reframes design as a generalizable approach to problem solving rather than a domain-specific craft. Early theorists characterized design as a science of moving from existing to preferred situations; Rittel & Webber (Reference Rittel and Webber1973)introduced “wicked problems,” demanding iterative reframing and Schön’s (Reference Schön1987) reflection-in-action legitimized problem framing and practitioner judgment. By the 2000s, these ideas became accessible process models: the Stanford d.school’s five-stage sequence (Empathize→Define→Ideate→Prototype→Test) and the Double Diamond (Discover→Define→Develop→Deliver) supplied a shared vocabulary for divergent–convergent innovation (Design Reference Design2007; Hasso Plattner Institute of Design 2010; Mccarthy Reference Mccarthy2020; Hawryszkiewycz & Alqahtani Reference Hawryszkiewycz and Alqahtani2020). The popularization of the topic intensified debates, as scholars identified multiple discourses, advocated for critiques informed by science and technology studies and emphasized the need for stronger evidence while cautioning against the risk of “construct collapse” (Micheli et al. Reference Micheli, Wilner, Bhatti, Mura and Beverland2019). Tensions between rigor and flexibility persist: teachable stages risk rigid performance, prompting practitioner use of IDEO’s flexible 3I model (Inspiration–Ideation–Implementation) (Pestana, Neves & Daly Reference Pestana, Neves and Daly2019). A broader dispute concerns DT’s status as methodology versus mindset and its adequacy for multi-stakeholder, systemic problems that may require approaches “more systemic (rather than user-centered) and reflective (rather than purely ideated)” (Verganti et al. Reference Verganti, Vendraminelli and Iansiti2020; Lai & Chen Reference Lai and Chen2021). In sum, DT has evolved through theory and toolkits while facing unresolved issues of definition, theoretical rigor and scope.
Cross-sector studies identify persistent constraints across Empathize, Define, Ideation, Prototype and Test. Empathize relies on small samples and note synthesis, producing partial views and under-representing marginalized or dispersed users (Wang et al. Reference Wang, Ma, Wang, Adeyeye and Peng2019; Lin & Chang Reference Lin and Chang2024). The Define phase frequently omits root-cause analysis, particularly in complex systems, leading to poorly defined briefs and misaligned solutions (Li et al. Reference Li, Lou, Zheng, Feng, Gao, Zeng and Tan2024b; Maxim & Arnedo-Moreno Reference Maxim and Arnedo-Moreno2025). Ideation shows early convergence, dominance effects, fixation and “ideation comfort zones”; even productive sessions typically generate only tens of ideas, limiting search and fostering incrementalism (Kannengiesser & Gero Reference Kannengiesser and Gero2019; Eftekhari, Jahanbakht & Sharbafi Reference Eftekhari, Jahanbakht and Sharbafi2021). Prototyping faces fidelity trade-offs: low-fidelity artifacts are fast but miss behavioral or feasibility detail, whereas high-fidelity prototypes are costly and reduce iteration; intangible services and policies are hard to simulate. Testing is constrained by ethics, reliant on small pilots and subjective feedback and obscuring edge cases; long-term outcomes may remain unseen. Scaling empathy, ideation and evaluation is difficult, and evidence demands can clash with DT’s small-scale experimentation (Iandoli Reference Iandoli2023; Hsu et al. Reference Hsu, Lin, Chen, Liu, Chiou and Shen2024). Framed as abductive reasoning and sensemaking, DT faces these limitations, motivating inquiry into AI: data processing, pattern finding and generative variation might expand empathy scale, ideation diversity, prototyping speed and testing scope, while introducing new risks (Lee, Tan & Wong Reference Lee, Tan and Wong2020; Campbell-Salome et al. Reference Campbell-Salome, Jones, Masnick, Walton, Ahmed, Buchanan, Brangan, Esplin, Kann, Ladd, Kelly, Kindt, Kirchner, Mcgowan, Mcminn, Morales, Myers, Oetjens, Rahm, Schmidlen, Sheldon, Simmons, Snir, Strande, Walters, Wilemon, Williams, Gidding and Sturm2021). Despite separate literatures, DT–AI intersections remain underexplored, with limited analysis of how specific AI technologies align with established DT processes. The present study synthesizes Stanford’s five-stage model, IDEO’s 3I, Double Diamond, IBM, Google Design Sprint, HPI and Stingray, spanning Art and Design, Education, Industry, Business and Management, Healthcare and Engineering Design, to clarify where AI can amplify or disrupt DT and how to navigate a hybrid human-centered, data-centric paradigm.
2.2. The evolution of DT – and its turn in the age of AI
Recent studies suggest that AI can substantially enhance each phase of DT by extending designers’ capabilities beyond previous limitations. Four primary affordances in DT workshops, namely enhancing creativity, supporting analytical tasks, facilitating task initiation and accelerating process flow, align with the core stages of the DT process (Magistretti et al. Reference Magistretti, Legnani, Pham and Dell’era2024; Hsu et al. Reference Hsu, Lin, Chen, Liu, Chiou and Shen2024; Freitag & Hämmerle Reference Freitag and Hämmerle2020; Zarattini Chebabi & von Atzingen Amaral Reference Zarattini Chebabi and Von Atzingen Amaral2020). In Empathize and early analysis, NLP and related algorithms synthesize thousands of interviews, comments, reviews or support chats into themes and sentiment, enabling evidence from millions of data points rather than a dozen ethnographic observations; case studies show AI-based clustering that sharpens problem definition (Jiang & Pang Reference Jiang and Pang2023; Houssaini et al. Reference Houssaini, Aboutajeddine, Toughrai and Ibrahimi2024; Jiang, Huang & Shen Reference Jiang, Huang and Shen2025). In Ideation, AI acts as a generative partner: generative adversarial networks and large language models produce written concepts, “How might we…?” prompts, and visual sketches at scale, which teams then curate and refine. In engineering design contexts, evolutionary and other optimization algorithms instead explore large parametric design spaces, generating families of structural or geometric alternatives that human designers evaluate against functional and contextual constraints (Huang Reference Huang2024; Hsu et al. Reference Hsu, Lin, Chen, Liu, Chiou and Shen2024; Jiang et al. Reference Jiang, Huang and Shen2025). Empirical work reports that consultants equipped with GPT-4 at Boston Consulting Group produced solutions faster and of higher quality than peers without AI, and another study found generative systems could surpass average human performance in idea generation tasks, offering more diverse options at lower cost (Staub et al. Reference Staub, Van Giffen, Hehn and Sturm2023). During Prototyping, generative design algorithms output viable configurations from parameter sets in engineering, letting teams parallelize prototyping; AI tools auto-generate mockups or front-end code from sketches, speeding the build–measure–learn cycle; digital twins test ideas virtually (Iandoli Reference Iandoli2023; Poleac Reference Poleac2024). In Testing, AI analyzes usability videos and interaction logs to detect struggle points, stands in as an artificial user via chatbots and processes survey or analytics data to flag pain points or unexpected patterns, augmenting validation beyond anecdote (Rowan Reference Rowan2024). AI thus undertakes data crunching, option generation and routine testing so designers focus on interpretation, creative direction and ethical or strategic decisions; interactive tools support sketch-and-refine co-creativity, a “double act” that keeps human judgment central while widening research, ideation and iteration (Schleith et al. Reference Schleith, Norkute, Mikhail and Tsar2022; Hsu et al. Reference Hsu, Lin, Chen, Liu, Chiou and Shen2024).
Integration also brings challenges that sustain the human-in-the-loop imperative (Swaid & Suid Reference Swaid and Suid2021). Over-reliance risks creative atrophy and homogenization if teams accept average, data-shaped outputs; human-only ideation remains advisable to sustain divergent thinking (Lin & Chang Reference Lin and Chang2024). Systems trained on skewed data can reinforce stereotypes or exclude underrepresented groups (e.g., persona generation or concept ranking mirroring majority traces); mitigation relies on diverse training data and algorithmic transparency (Magistretti et al. Reference Magistretti, Legnani, Pham and Dell’era2024). Privacy obligations are integral to the Empathize-stage mining of social media or sensor data, which demands consent, anonymization and data sovereignty; healthcare workflows require secure enclaves and explicit permissions (Lau-Min et al. Reference Lau-Min, Marini, Shah, Pucci, Blauch, Cambareri, Mooney, Agarwal, Johnston, Schumacher, White, Gabriel, Rosin, Jacobs and Shulman2024). Black-box recommendations face skepticism and compliance barriers, motivating explainable AI and audit trails (Nishant, Kennedy & Corbett Reference Nishant, Kennedy and Corbett2020). Poor role definition can dampen collaboration and ownership; new skills in prompt engineering and data interpretation add learning curves; misaligned toolchains or poor data quality revive garbage-in, garbage-out risks (Almeida et al. Reference Almeida, Canedo, Albuquerque, De Deus, Orozco and Villalba2022). Heavy dependence can reduce resilience, so teams need override mechanisms and context-grounded rationales. Scholars therefore advocate hybrid intelligence: deploy AI chiefly in divergent phases, rely on human judgment in convergent phases, maintain veto power and bias audits and keep transparent records of the AI’s role. Early research suggests teams that treat AI as a collaborator with well-scoped sub-tasks and retained accountability outperform both AI-averse and automation-first approaches. Yet many studies remain stage- or domain-specific, leaving a gap that motivates systematic synthesis across DT models and sectors such as healthcare and education (Lau-Min et al. Reference Lau-Min, Marini, Shah, Pucci, Blauch, Cambareri, Mooney, Agarwal, Johnston, Schumacher, White, Gabriel, Rosin, Jacobs and Shulman2024; Radic et al. Reference Radic, Busch-Casler, Vosen, Herrmann, Appenzeller, Mucha, Philipp, Frank, Dauth, Köhm, Orak, Döhmann and Böhm2024).
3. Methodology
3.1. Search strategy
We developed a structured search protocol to investigate how AI intersects with DT. Following an exploratory scan of seminal work to stabilize terminology, we refined Boolean strings and field tags to fit Web of Science (WoS) syntax and to maximize recall with manageable precision. In a preliminary piloting phase, we experimented with alternative Boolean combinations and field tags in both WoS and Google Scholar to check whether a set of benchmark AI–DT articles that we had identified a priori were consistently retrieved while off-topic records remained manageable. We inspected the first pages of results for each trial query and iteratively removed overly broad terms and added AI-related synonyms until the balance between recall and precision was satisfactory. Once the final search strings had been agreed upon by the author team, we implemented the formal, reproducible search in the Web of Science Core Collection only. The temporal window was set to 2005–August 2025 to capture contemporary AI advances and the diffusion of DT across fields and models. AI terms were queried as TS = (“artificial intelligence” OR “deep learning” OR “machine learning” OR “neural networks” OR “natural language processing” OR “reinforcement learning” OR “supervised learning” OR “unsupervised learning” OR “semi-supervised learning” OR “transfer learning”). DT terms targeted innovation contexts using: TS = (“design thinking”). The papers identified through these queries were subsequently filtered to identify those that specifically addressed the intersection of AI and DT by utilizing keywords, abstracts and titles. Then, a full-text examination was performed to verify that the articles we selected were relevant to our research topics. We focused on research that examined the utilization, challenges or outcomes of integrating AI with DT. As summarized in Figure 1, the overall review process followed the SPAR-4-SLR protocol to support careful planning, consistent execution and transparency suitable for replication.

Figure 1. The SPAR-4-SLR protocol.
3.2. Inclusion and exclusion criteria
Studies were eligible if they explicitly examined the intersection or integration of DT with AI; both empirical and conceptual contributions were permitted. We included peer-reviewed journal articles, review papers and selected peer-reviewed conference proceedings indexed in the Web of Science Core Collection, all published in English between 2005 and August 2025. Conference papers were retained only when they reported empirical findings or design-relevant frameworks at a level of detail comparable to journal articles. Exclusions comprised items not directly related to DT–AI integration, non-English publications, duplicates and records lacking scholarly rigor or substantive content. These criteria were applied consistently at title/abstract screening and again at full-text assessment.
3.3. Data sources and selection process
After running the final search strings in the Web of Science Core Collection, 216 records were retrieved. After deduplication (7 articles), we screened titles and abstracts for relevance to DT–AI integration and then assessed full texts against the inclusion criteria (4 articles). As a result, 205 documents were ultimately chosen for the systematic literature review, comprising mainly journal articles and review papers, alongside a smaller number of eligible conference proceedings. For studies meeting criteria, we extracted bibliographic data and substantive variables, including context/sector, DT stages addressed, AI techniques employed, study design, key findings and stated implications. The objective was to assemble a corpus that enables the synthesis of synergies and barriers across this multidisciplinary space.
3.4. Review protocol
We conducted the systematic review in line with the SPAR-4-SLR protocol to ensure careful planning, consistent execution and transparent reporting (Paul et al. Reference Paul, Lim, O’cass, Hao and Bresciani2021). SPAR-4-SLR structures systematic reviews into three stages and six sub-stages: “assembling” (identification and acquisition of synthesized literature), “arranging” (organization and purification of the corpus) and “assessing” (evaluation and reporting of the synthesized evidence). In our review, the Assembling stage corresponds to the identification of the AI–DT domain, the formulation and piloting of search strings and the acquisition of records from WoS; Arranging covers the organization of bibliographic information, coding of DT stages and AI techniques and the purification of records against inclusion/exclusion criteria; Assessing encompasses the descriptive mapping, narrative synthesis, agenda building and reporting shown in Figure 1. This study prioritized the Web of Science Core Collection given its strong field categorization and broad journal coverage in management, innovation and design research(Vieira & Gomes Reference Vieira and Gomes2009). The validity and reliability of WoS for bibliometric and structured reviews have been demonstrated in prior scholarship, making it suitable for our study domain (Qiu & Lv Reference Qiu and Lv2014). The retrieved records underwent a two-stage screening process: title/abstract review followed by full-text assessment against the inclusion and exclusion criteria. Eligible publications were then subjected to systematic and context data extraction covering bibliographic details, research aims, methodological approaches, AI techniques employed, DT stages addressed, major findings and contributions. To safeguard rigor, each included study was evaluated for methodological adequacy and topical relevance, and only those meeting a minimum threshold of quality were retained. Extracted information was organized into a comparative dataset that enabled mapping of synergies, barriers and knowledge gaps. The subsequent synthesis was structured as a logical narrative that integrates current insights, identifies recurring mechanisms and highlights areas requiring further investigation. This systematic approach ensured that the review met standards of thoroughness, reproducibility and dependability, thereby allowing us to draw robust conclusions about the evolving relationship between AI and DT.
4. Result (qualitative analysis)
4.1. Cross-cutting mechanisms of AI across DT
Across settings, design teams increasingly ask how AI can augment rather than replace DT by tracing where AI touches the process and how those touches reshape teamwork. AI functions as a family of capabilities that expand attention, temper bias and accelerate feedback while leaving judgment with people. Synthesizing evidence across the five fields, four cross-cutting mechanisms emerge: data-grounded empathy and problem framing, AI-augmented ideation, simulation-driven prototyping and testing and decision support with post-launch learning and interdisciplinary integration.
4.2. Data-grounded empathy and problem framing
AI grounds empathy and problem framing in richer data. Upstream, NLP and analytics sift reviews, support tickets, surveys and social media at scale, surface unmet needs that small studies miss and translate signals across disciplines so designers, engineers and business stakeholders converge on a shared brief (Gutiérrez et al. Reference Gutiérrez, Aguilar, Ortega and Montoya2025). Proposals for generative DT extend this by positioning AI at macro, meso and micro cycles to broaden evidence and strengthen evaluation criteria (Wang, Li & Lei Reference Wang, Li and Lei2025a). These mechanisms allow teams to frame problems with wider, more representative input while maintaining human responsibility for interpretation.
4.3. AI-augmented ideation
AI augments ideation by widening the creative search space. In idea generation, large language models and generative image systems deliver prompts, alternatives and sketches rapidly, countering anchoring and framing effects and enabling abductive reframings by juxtaposing uncommon elements (Lin & Chang Reference Lin and Chang2024). Teams gain most when they treat outputs as raw material and refine deliberately; targeted prompting for assumption-breaking options preserves variety (Hsiao & Zhang Reference Hsiao and Zhang2023). AI-assisted ideation thus widens creative search while still relying on human judgment to select and shape options.
4.4. Simulation-driven prototyping and testing
AI reshapes prototyping and testing through simulation and rapid iteration. Downstream, prototyping shifts as generative tools produce layouts, part designs and workflows for rapid iteration; in software, code generated from descriptions enables earlier functional testing (Liu et al. Reference Liu, Mazurek, Whitehead, Hood, Choi, Gupte, Ottensmeyer, Fintelmann, Uppot, Andriole, Gee, Brink and Succi2024). Simulation becomes central – from finite-element and user-behavior models to user-in-the-loop tests for AI products – while automation of repetitive build tasks and optimized machine settings move effort from production overhead to learning (Siwiec & Pacana Reference Siwiec and Pacana2025). Testing draws on analytics at scale: platforms analyses click, gaze and paths; predictive models extrapolate longer-term outcomes from short sessions; adaptive protocols, simulated participants and sentiment analysis or conversational agents elicit richer feedback, although final validation still rests on real users.
4.5. How AI impacts models of DT
DT is widely understood as a human-centered approach to innovation that privileges empathy with users, creative ideation, iterative prototyping and holistic problem solving. Rather than a single recipe, leading frameworks such as Stanford’s five-step sequence, IDEO’s Inspiration–Ideation–Implementation (3I) model and the Double Diamond provide structured yet adaptable pathways for addressing complex challenges (Banerjee & Gibbs Reference Banerjee and Gibbs2016; Wang et al. Reference Wang, Huang, Xu, Li and Qin2023; Zemke et al. Reference Zemke, Stahmann and Janiesch2025). Historically, these approaches have relied on human intuition, creativity and qualitative insights derived from relatively small user samples (Lin & Chang Reference Lin and Chang2024). Read as theory, they function as process architectures that organize inquiry while leaving room for situated judgment; read as practice, they guide teams through cycles of discovery and definition, concept generation and evaluation (Lin et al. Reference Lin, Chen, Chan, Peng, Chen, Xie, Liu and Hu2022; Colombari & Manresa Reference Colombari and Manresa2024).
Against this background, the rapid incorporation of AI technologies, including machine learning, NLP, generative design algorithms and predictive analytics, is altering how individual stages are executed (Tyrväinen et al. Reference Tyrväinen, Silvennoinen, Talvitie-Lamberg, Ala-Kitula and Kuoremäki2018; Grange et al. Reference Grange, Demazure, Ringeval, Bourdeau and Martineau2026). Capabilities are extended beyond manual effort and may at times reshape key activities in the DT cycle. Our qualitative analysis of six prominent DT models suggests that AI does not replace human-centered design; rather, it tends to amplify and accelerate it by augmenting human creativity and decision-making with data-driven intelligence. The following sections outline each model and consider stage-specific effects, using comparative tables that contrast pre-AI with AI-enabled practices. These tables (Tables 1–6) synthesize shared patterns alongside model-specific adaptations.
Table 1. Stanford d.school model: AI integration across stages

Table 2. IDEO 3I model: AI integration across stages

Table 3. Double diamond model: AI integration across stages

Table 4. IBM design thinking loop: AI integration across stages

Table 5. Google design sprint: AI integration across stages

Table 6. Hasso Plattner Institute (HPI) model: AI integration across stages

4.6. Stanford d.school model
The Stanford d.school’s five-stage model of Empathize, Define, Ideate, Prototype and Test originated at the Hasso Plattner Institute of Design at Stanford (Mayseless et al. Reference Mayseless, Saggar, Hawthorne and Reiss2018). Rather than prescribing a single recipe, the sequence orients teams toward human-centered inquiry: empathizing with users’ experiences, articulating a core problem, generating candidate solutions, building tangible artifacts and testing with users to elicit feedback (Mayseless et al. Reference Mayseless, Saggar, Hawthorne and Reiss2018). Its intuitive and teachable structure has been credited with democratizing design practice beyond expert communities. Yet discussion in the literature also notes limits. Novice teams can treat the stages as a rigid checklist or perform them superficially, and the empathize stage may be difficult to scale under resource constraints (Fritzsche et al. Reference Fritzsche, Barbazzeni, Mahmeen, Haider and Friebe2021). In AI-augmented enactments of the d.school model, AI tends to be positioned as an “iteration accelerator” that compresses the cycle time between early field inputs, sensemaking and rapid reframing (Fritzsche et al. Reference Fritzsche, Barbazzeni, Mahmeen, Haider and Friebe2021; Akada et al. Reference Akada, Tabuchi, Okamoto, Tanabe, Nishikawa, Yamane, Tsujikawa and Okuno2025). Its integration is structurally lightweight and deliberately subordinate to the model’s bias toward action: AI is used to increase the frequency and diversity of testable moves rather than to stabilize a single “optimal” framing. Distinctively, AI support is distributed across repeated loops (not concentrated in one gate), reinforcing the model’s emphasis on fast learning-through-making under persistent ambiguity (Hsiao & Zhang Reference Hsiao and Zhang2023; Lin & Chang Reference Lin and Chang2024).
4.7. IDEO’s 3I model
The 3I model developed by IDEO, which consists of Inspiration, Ideation and Implementation, provides a concise framework for DT that closely reflects IDEO’s established practices (Miron et al. Reference Miron, Muck, Karagiannis and Götzinger2018). Emerging in the 2000s, it moves from immersion in user contexts to concept generation and development, then to prototyping and implementation (Pestana et al. Reference Pestana, Neves and Daly2019). AI’s most model-specific manifestation in 3I is its continuity across the explicit “Implementation” phase, where AI is treated not merely as a design aid but as an operational component whose behavior must be maintained, monitored and adapted after release (Swaid & Suid Reference Swaid and Suid2021). This makes AI integration structurally more end-to-end than in phase-neutral depictions of DT: the handoff from concept to delivery is reframed as an ongoing design space rather than closure. The 3I framing also encourages earlier coupling of desirability-driven exploration with downstream feasibility and governance constraints that become salient once AI is deployed in real contexts.
4.8. Double Diamond model
The Double Diamond model, introduced by the UK Design Council in 2005, structures design into two successive cycles of divergence and convergence: Discover and Define, followed by Develop and Deliver, and has been widely adopted in public and private sector practice and aligned with ISO guidance. Within the Double Diamond’s explicit diverge–converge symmetry, AI integration is typically differentiated by mode: expansive support during divergence and discriminative support during convergence, with different expectations for breadth versus defensibility (Chiu, Makany & Silva Reference Chiu, Makany and Silva2025). The model’s strength for AI is its explicit gating logic – AI outputs are repeatedly forced through convergence moments that privilege selection, prioritization and justification rather than accumulation (Wang et al. Reference Wang, Huang, Xu, Li and Qin2023). Structurally, this produces a clearer audit trail of how AI-influenced possibilities are narrowed into commitments across both problem and solution spaces (Clune & Lockrey Reference Clune and Lockrey2014).
4.9. IBM model (loop model)
IBM’s design framework, formulated as the continuous Observe-Reflect-Make loop, emphasizes the tight integration of design work with agile product teams and enterprise systems (Lucena et al. Reference Lucena, Braz, Chicoria and Tizzei2016). IBM’s framework tends to embed AI through its formal artifacts and governance cadence – AI is integrated into “Hills” as a mechanism for maintaining traceability from user outcomes to design choices, and into “Playbacks” as a way to structure critique and decision-making at scale (Dell’era et al. Reference Dell’era, Magistretti, Cautela, Verganti and Zurlo2020; Böckle & Kouris Reference Böckle and Kouris2023). The distinctive feature is not speed but organizational reproducibility: AI support is designed to be legible across teams, stakeholders and time, aligning with the model’s enterprise orientation (Zarattini Chebabi & Von Atzingen Amaral Reference Zarattini Chebabi and Von Atzingen Amaral2020). AI is therefore framed less as an ad hoc creative partner and more as infrastructure that strengthens consistency, accountability and post-release learning across portfolios.
4.10. Google design sprint model
Google’s Design Sprint, popularized by Jake Knapp, condenses DT into a 5-day sequence: understand and map the problem, sketch solutions, decide, prototype and test with users (Koivumaa Reference Koivumaa2017).
AI integration in the Sprint is structurally episodic and bottleneck-focused, concentrated where the process is most constrained: rapid option generation under time pressure, high-fidelity prototyping within days and accelerated synthesis from a small set of user tests (Poliakova Reference Poliakova2017). What is distinctive is the model’s strict temporal compression, which forces AI use to privilege speed, coherence and decision closure over exhaustive exploration (Böckle & Kouris Reference Böckle and Kouris2023). AI’s role is therefore shaped by the Sprint’s sequencing – front-loading momentum and producing a single testable artifact quickly – rather than by broader notions of ongoing iteration (Poulter, Wang & Delrio Gayo Reference Poulter, Wang and Delrio Gayo2022).
4.11. Hasso Plattner Institute (HPI) model
The Hasso Plattner Institute model, taught at the HPI School of Design Thinking in Potsdam and closely connected to engineering and software programmers, adapts the five-stage Stanford framing but places stronger emphasis on technical feasibility, iterative experimentation and integration with digital infrastructures (Meinel & Thienen Reference Meinel and Thienen2022). In HPI-style DT, AI often manifests as a team-level alignment and externalization device, supporting how interdisciplinary groups maintain a shared problem space, negotiate interpretations and preserve rationale across iterations (Almeida et al. Reference Almeida, Canedo, Albuquerque, De Deus, Orozco and Villalba2022; Klopfenstein et al. Reference Klopfenstein, Flint, Heeren, Prendke, Chaoui, Ocker, Chromik, Arnrich, Balzer and Poncette2025). The structurally distinctive element is the model’s emphasis on pedagogy, facilitation and reflective mindsets: AI is integrated in a way that is explicitly subject to critique practices and learning goals rather than treated as a neutral productivity layer (Thienen et al. Reference Thienen, Szymanski, Weinstein, Rahman and Meinel2022). As a result, AI use is framed as part of capability-building in collaborative reasoning, not only as a means of accelerating outputs.
4.12. Stingray: an AI-native DT model
Contemporary DT often treats AI as a tool embedded within human-led stages; the Stingray Model, instead, invites AI from the first moment as a co-designer (Song & Bai Reference Song and Bai2025). It frames design as a continuing human–machine conversation and replaces stepwise progression with overlapping activities informed by large-scale data and situated judgment, diverging from Stanford’s and the Double Diamond’s primarily human-directed sequences (Song & Bai Reference Song and Bai2025). Stingray comprises three entwined phases, including Train, Develop and Iterate. Train begins with data immersion rather than interview-led empathy: teams assemble heterogeneous inputs (behavioral telemetry, usage logs, market studies, trend reports) and use AI to detect patterns, cluster signals and surface opportunity spaces that small teams might miss. For example, correlating forum complaints with sensor anomalies can reveal latent needs absent from small-N qualitative work. These inferences extend discovery, provided data quality is high and safeguards against drift and bias are explicit. Unlike the canonical DT models above, the Stingray framework does not simply map AI into existing stages but explicitly frames AI as a co-designer whose role alternates with humans across clarify–develop–decide cycles.
In Develop, humans and machines brainstorm together: the system proposes concepts or partial prototypes, and the team assesses, redirects or discards them; framing and search proceed in tandem, compressing cycles and widening the solution space under human review(Song & Bai Reference Song and Bai2025). Iterate is a continuous evaluation loop: simulation, predictive analytics and agent-based proxies estimate responses, flag constraints and preview business outcomes, though synthetic tests still require validation with users. Five contrasts follow: foundations shift to an AI-centered conception grounded in computation and large, diverse datasets; structure becomes concurrent rather than sequential; AI’s role moves from assistant to partner; inputs begin with large-N analytics while reserving targeted qualitative inquiry; outputs expand from a single prototype to portfolios with simulated evidence (predicted segment reactions, feasibility diagnostics) (Gavriilidis et al. Reference Gavriilidis, Dimitriadis, Jaulent and Natsiavas2021; Nahar & Lopez-Jimenez Reference Nahar and Lopez-Jimenez2022). The approach scales what teams can see and try but raises interpretability and auditability concerns, calling for fluency with model rationales, checks for spurious correlations and governance via curation, bias audits and explicit assumptions. Ultimately, Stingray sketches AI-collaborative design contingent on disciplined methods, real-world validation, and human judgment, ethics and context. Table 7 indicates the differences between each stage of the stingray model and traditional models.
Table 7. Comparative overview: traditional AI-enhanced design models versus the stingray model

The Stingray Model proposes a hybrid intelligence workflow in which human creativity and machine computation are interwoven from the outset, reframing design by treating AI as a co-designer. Teams can iterate at “the speed of thought” and explore a wider solution space, yet acceleration may invite shortcuts around empathy. Algorithmic biases can steer patterns or recommendations so outcomes drift from equitable or contextually appropriate solutions; the risk hinges on data curation and interpretation. Proponents present Stingray as disciplined co-creation rather than automation: systems surface patterns, propose directions and test ideas while humans retain judgment.
Operating this way requires data literacy, ethical judgment and the ability to interrogate AI outputs. Early adopters report steep learning curves but gains: with human-centered criteria in view, the model sparks novel combinations, speeds screening and refinement and supports rigorous checks on short timelines. Compared with conventional DT, Stingray shifts where sense-making occurs and how evaluation is staged – inviting machines into the creative core while humans arbitrate appropriateness. It may yield more imaginative and more rigorously validated designs, but only if teams keep a critical stance and acknowledge the limits of synthetic testing. The agenda is to determine when hybrid intelligence advances human-centered goals, identify failure conditions and refine competencies across products, services and systems. This explicit treatment of role-sharing and hand-offs makes Stingray a compact summary of the hybrid-intelligence patterns we synthesize across models in the cross-cutting analysis. Taken together, these models show that AI does not replace DT but refracts through it – amplifying different capacities depending on whether a framework prioritizes pedagogy, temporal delivery, organizational integration or hybrid human–machine cognition.
4.13. Impact of AI on fields practicing DT
Beyond the design process itself, AI’s integration is reverberating across multiple fields that practice DT, altering how problems are approached in those domains. Five illustrative domains are: (1) Art and Design, (2) Education, (3) Industry, Business and Management, (4) Healthcare and (5) Engineering Design. In each of these areas, DT provides a workflow for innovation – and in each, AI is now being applied to augment creativity, insight and effectiveness. This section describes each domain’s DT workflow and then discusses AI’s applications and impacts, enriched by examples and current literature. The intersection between applications of AI and DT in these five fields was shown in Tables 8–12.
Table 8. Impact of AI on art and design related to design thinking

Table 9. Impact of AI on education related to design thinking

Table 10. Impact of AI on industry, business and management related to design thinking

Table 11. Impact of AI on healthcare related to design thinking

Table 12. Impact of AI on engineering design related to design thinking

4.14. Art and design
DT in graphic design, product design, UX/UI and environmental or architectural work typically follows a cycle that begins with understanding audiences and contexts, proceeds through concept generation and moves into prototyping and iterative refinement (Li et al. Reference Li, Li, He, Wang, Zhong, Jiang, He, Qiao, Chen, Yin, Lc, Han, Yang and Shidujaman2024a). This exploratory orientation has become a testbed for computational support: evidence from studio settings indicates that generative systems such as DALL-E, Midjourney and GPT-4 can widen the solution space by producing many alternatives in a short time (Nahar & Lopez-Jimenez Reference Nahar and Lopez-Jimenez2022). An interior designer might prompt dozens of room layouts in different styles and then synthesize a hybrid unlikely to surface through manual sketching alone. Such outputs help break fixation and expose unfamiliar directions, yet overreliance may yield homogenization; studies report that while quantity increases, the diversity of truly distinct concepts can diminish without deliberate human intervention (Latto Reference Latto2023; Rahimi & Sevilla-Pavón Reference Rahimi and Sevilla-Pavón2025). Hence, emerging practice treats machine suggestions as raw material to be pushed beyond, preserving authorship and leaving room for judgment about fit and meaning.
A second shift concerns research and reference gathering. By mining extensive repositories such as art collections, stock libraries and Pinterest boards, AI retrieves patterns and examples that stimulate creativity and reveal connections that individuals might overlook. For example, an abstract pattern in nature identified through deep learning retrieval can inform the development of a logo that enhances a mood board (Eftekhari & Gill Reference Eftekhari and Gill2018; Matthews et al. Reference Matthews, Doherty, Worthy and Reid2023). This assistance accelerates research and supports cross-pollination across genres and media, contingent on careful curation and on designers’ capacity to situate references within project constraints (Wang et al. Reference Wang, Zhu, Xie, Martin-Payo, Xu and Zhang2025b). Downstream, computational design and simulation inform prototyping and decision making: architects test structural performance of parametrically generated forms with AI-driven models; UX teams employ predictive models or wizard-of-oz simulations to anticipate engagement effects; ceramic designers estimate kiln behavior or glaze appearance to limit waste and tighten iteration. In such cases, evidence supplements professional judgment. For example, an AI system may analyze hundreds of posters for legibility and determine that the current font choice has a contrast below 90% of successful posters. However, the designer may override this recommendation if stylistic considerations better align with the project brief.
4.15. Education
Educational design applies DT to build learner-centered curricula, services and tools through empathizing with students and teachers, defining challenges, ideating, prototyping and classroom testing (Wang & Wang Reference Wang and Wang2024; Wei, Li & You Reference Wei, Li and You2024). The field is data-rich yet human-sensitive: assessments and learning analytics must be interpreted alongside motivation and socio-emotional context. Personalization illustrates both promise and risk (Tu, Liu & Wu Reference Tu, Liu and Wu2018; Hsueh et al. Reference Hsueh, Zhou, Chen and Yan2022). Intelligent Tutoring Systems and adaptive platforms adjust content in real time; an AI algebra tutor can observe problem solving, offer timely hints or skip ahead when mastery appears (Hsiao & Zhang Reference Hsiao and Zhang2023; Eftekhari, Mannan & Torres Reference Eftekhari, Mannan and Torres2025). Empathize extends to the agent, which continuously infers needs from data. Studies suggest improved engagement and outcomes when cognitive support is paired with motivational cues such as adaptive feedback tone (Huang Reference Huang2024; Zhang et al. Reference Zhang, Zhang, Wu, Wu, Cai and Shen2025). Yet automated delivery can crowd out teacher judgment if adopted uncritically, so designers reserve human attention for higher-order goals while delegating routine delivery to AI.
Assessment and feedback accelerate: automated essay scoring and critique provide formative guidance so students can revise within a single session; instructor dashboards aggregate error patterns and misconceptions, enabling targeted adjustments and A/B tests across a semester (Xu & Jiang Reference Xu and Jiang2022). Evidence that NLP can appraise soft skills supports this direction; reports indicate that critical thinking in engineering education can be evaluated with some success, though human oversight remains necessary (Chongwatpol Reference Chongwatpol2024). Collaboration is changing as AI-moderated workshops increase participation in DT activities, and AR/VR with AI create remote labs for prototyping. Universities run AI-assisted hackathons and design sprints, with data informing facilitation (Lin et al. Reference Lin, Chen, Chan, Peng, Chen, Xie, Liu and Hu2022). Creative and critical capacities develop through co-creation and critique: teachers use AI painting tools to discuss style and technique; engineering students design with AI and interrogate its choices; nursing educators report GPT-based patient simulations improving clinical reasoning and empathy via immediate, scenario-specific feedback. Overall, AI advances learner-centered experiences by adapting to individuals, shortening iteration cycles, and automating tedious tasks, while human judgment remains central to meaning, fairness and context.
4.16. Industry, business and management
In corporate innovation and organizational problem solving, DT enables customer-centric change: teams understand user and stakeholder needs, translate findings into business-framed opportunities, generate concepts, prototype, test and scale (Huber et al. Reference Huber, Niesel, Oberländer, Stahl and Übelhör2021). Because implementation spans marketing, engineering and operations, alignment with feasibility and return on investment is expected, and rapid iteration overlaps with lean startup and agile (Flores et al. Reference Flores, Golob, Maklin, Tucci, West and Stoll2020). AI now appears not only as a feature of offerings but as an aid to the design process, potentially reshaping judgment and risk assessment. Yet practice often stalls on partial information and organizational bias; the task is to widen the solution space without losing constraints, and to do so at market pace (Freitag & Hämmerle Reference Freitag and Hämmerle2020).
AI addresses these pressure points by augmenting analysis and decision-making at unprecedented scale and speed (Almeida et al. Reference Almeida, Canedo, Albuquerque, De Deus, Orozco and Villalba2022). Generative tools produce variations of shapes, features and service flows with simulation: automotive teams can generate dozens of virtual car designs and run crash tests within minutes, surfacing prototypes that balance safety and weight. Human–AI hybrid work encourages bolder, viable concepts by exposing constraints early and proposing cross-silo combinations (Rowan Reference Rowan2024). Efficiency gains follow as automated analysis and reporting free designers and managers for synthesis and strategy; operations explore route optimization and predictive maintenance, while customer service personalizes via learning chatbots (Melles, De Vere & Misic Reference Melles, De Vere and Misic2011; Clemmensen, Ranjan & Bødker Reference Clemmensen, Ranjan and Bødker2018). Data-driven insight deepens Define and Decide: in retail banking, models analyze transactions and feedback to explain churn, reframed as a “How might we” challenge; predictive models estimate the impact of alternatives, prioritizing experiments without replacing user contact (Palomino-Flores, Cristi-López & Paul Reference Palomino-Flores, Cristi-López and Paul2024). AI also enables systemic and sustainability modeling, such as digital-twin traffic simulations to test bike lanes or congestion pricing and assess ripple effects on emissions and commute times. Overall, AI acts as a force multiplier for analytical rigor and creative breadth, shortening cycles and improving alignment with needs, while capability dependence demands data-science skills, AI ethics, cautious interpretation and continued human oversight.
4.17. Healthcare
DT in healthcare is used to improve patient experiences, services and clinical processes through empathizing with patients and providers, defining problems, ideating solutions, prototyping and evaluating via pilots or simulations; strict regulation, patient safety and multistakeholder involvement make co-design essential (Tyrväinen et al. Reference Tyrväinen, Silvennoinen, Talvitie-Lamberg, Ala-Kitula and Kuoremäki2018; Wang et al. Reference Wang, Zhu, Xie, Martin-Payo, Xu and Zhang2025b). Given the sector’s complexity, data richness and high stakes, AI is a promising adjunct that helps refocus innovation on genuine needs, a “need-driven innovation” stance. In the Empathize/Discover work, AI sifts clinical notes, public health records, workflow logs and patient surveys to validate and quantify needs so teams address the right problems; for example, NLP can analyze thousands of patient comments and reveal discharge communication as a top frustration, or uncover a hidden bottleneck in post-visit outreach that becomes the target of a design project (Steffny et al. Reference Steffny, Dahlem, Reichl, Gisa, Greff and Werth2023). AI also supports patient-centered care by bringing rich patient data into the design process: analyzing wearable sensors and preferences to identify a subset of diabetes patients who would benefit from early warnings, or predicting readmission risk so designers can create preventative, personalized pathways (Rowan Reference Rowan2024). One case used an AI model to flag heart failure patients at risk and then designed targeted check-ins and AI-driven alerts, turning multimodal data into actionable empathy insights and testable prototypes (Félix Reference Félix2025). AI further facilitates human-in-the-loop refinement in clinical settings. During prototyping of diagnostic support or triage tools, clinicians and AI iterate through simulated cases: the system proposes recommendations, clinicians critique usability and trust and designers adjust models or interfaces before live use; a hospital refined an AI triage system by running it in the background, having nurses review decisions in simulation and incorporating feedback, which produced alignment and buy-in at launch (Sumner et al. Reference Sumner, Bundele, Lim, Phan, Motani and Mukhopadhyay2023). At the system level, AI enables digital twins of care ecosystems such as emergency departments so teams can test interventions in silico; designers concerned with pandemic management modeled hospital processes and found that staff reallocation plus an AI symptom-checker could reduce wait times, then piloted with confidence (Novak et al. Reference Novak, Harris, Koonce and Johnson2021; Steffny et al. Reference Steffny, Dahlem, Reichl, Gisa, Greff and Werth2023).
4.18. Engineering design
Engineering design applies DT to the conception, analysis and realization of physical products, structures and systems, often under tight performance, safety and cost constraints (Polster, Bilgram & Görtz Reference Polster, Bilgram and Görtz2024). Within this context, AI extends both the creative and the analytic sides of the process (Table 12). Generative AI and conversational agents are used as “thinking assistants” to expand concept exploration, produce early sketches or configurations and help students and engineers overcome fixation and psychological inertia in the ideation phase (Jiang & Pang Reference Jiang and Pang2023). Digital twins, simulation models and optimization algorithms enable virtual crash tests, stress analyses and multi-criteria trade-off studies on large families of alternatives before any physical prototype is built, thereby compressing the iterate-and-test cycle and improving confidence in selected designs (Freitag & Hämmerle Reference Freitag and Hämmerle2020; Jiang et al. Reference Jiang, Huang and Shen2025). Data-driven models such as neural networks support the detailed design of components (e.g., thin-walled sections, robotic elements) and the tuning of manufacturing processes, while AI-enabled inspection systems detect defects and feed information back into redesign (Polster et al. Reference Polster, Bilgram and Görtz2024). In parallel, engineering curricula integrate AI literacy, no-code machine learning tools and project-based learning so that future engineers can critically combine analytical models with human-centered DT, developing competencies in responsible AI use, problem framing and systems-level reasoning.
4.19. Decision support, post-launch learning and interdisciplinary integration
AI supports decision-making, post-launch learning and interdisciplinary integration. Decision-making benefits as models reduce cognitive load, rank concepts by predicted success, expose trade-offs, guide choices in complex systems with prescriptive analytics and monitor post-launch, so iteration continues (Nahar & Lopez-Jimenez Reference Nahar and Lopez-Jimenez2022). A cross-cutting effect is interdisciplinary integration: AI tools translate and visualize constraints for non-specialists, maintain knowledge bases that surface analogies across fields and support multilingual collaboration, while education fosters hybrid professionals. These gains heighten governance demands; ethical guardrails and domain-sensitive use are required so technically elegant results remain socially acceptable. Overall, data-driven insight grounds empathy and definition, AI-assisted ideation widens creative search and simulation and prediction makes prototyping and evaluation more rigorous; applied with care, these mechanisms produce designs that are more innovative, well tested and aligned with user needs, while sustained attention to empathy, ethics and context keeps the collaboration human-centered. The integration of AI into DT can therefore be summarized as a set of recurring roles across stages and domains. Table 13 maps key AI–DT references by canonical DT models, and the subsequent three-phase timeline (Table 14 and Figure 2) shows how these roles have evolved over time.
Table 13. Mapping key AI–design thinking references by canonical DT models

Table 14. Three-phase timeline of AI’s evolving roles across design thinking stages and domains


Figure 2. Three-phase timeline of AI.
4.20. Timeline of AI integration in DT
AI’s role in DT has not emerged as a sudden rupture but as a gradual realignment of existing practices around increasingly capable computational tools. To contextualize our stage-by-stage synthesis, we map a three-phase timeline that traces how specific AI technologies have entered different points in the DT cycle and across the five focal fields (Art & Design, Education, Industry, Business & Management, Healthcare and Engineering design). The earlier sections showed that DT models such as the Double Diamond, Stanford’s five-step sequence and IDEO’s 3I framework were originally conceived as human-centered, largely qualitative processes. At best, AI was a background technology used in isolated analytical tasks. Over time, however, advances in machine learning, NLP, simulation and generative models have progressively moved AI from a back-end decision engine to a visible collaborator in empathy work, ideation, prototyping and evaluation. The timeline makes this shift explicit by aligning characteristic AI techniques with the DT stages they most strongly affect and the sectors in which these patterns are most evident.
Table 14 presents this evolution as three overlapping phases: Pre-Integration and Foundational AI, Scaling Augmentation and Data-Driven Insight and Generative AI Transformation and Hybrid Paradigm. Each phase is defined by a distinct configuration of human and machine roles. In Phase 1, traditional AI appears mainly as decision-support at the Test/Evaluate end of the cycle, while foundational design theory on “wicked problems” establishes the problem-framing ethos into which later data-driven tools will be inserted. Phase 2 records the mainstreaming of machine learning, digital twins and fuzzy multi-criteria models that scale Discover/Empathize and accelerate Prototype/Test across engineering, healthcare and service contexts. Phase 3 captures the rise of generative AI and AI-centric frameworks such as the Stingray and 4S models, in which AI participates in ideation and even systemic orchestration, and ethics-oriented curricula re-anchor Empathize and Define around critical reflection. Taken together, the phases illustrate a conceptual shift from AI as a specialized optimizer of predefined options toward AI as a co-creator embedded throughout the DT process, with human designers increasingly oriented toward sense-making, governance and strategic judgment.
4.21. Challenges and limitations of AI across all fields
AI augments DT while introducing limitations that require explicit management to keep outcomes effective, equitable and trusted. Constraints are domain-specific and cross-cutting, clustering around five themes: risks to human creativity and judgment, bias and other ethical harms (including privacy), opacity that weakens explainability and trust, regulatory ambiguity and practical hurdles of integration, skills and over-reliance (Prasad et al. Reference Prasad, Bhaumik, Jamadagni and Narasimha2023; Lin & Chang Reference Lin and Chang2024; Gutiérrez et al. Reference Gutiérrez, Aguilar, Ortega and Montoya2025). Across fields, these challenges shape when and how AI should be used, and what safeguards are needed to keep design practice human-centered.
The first theme concerns creativity and human judgment. Creativity can erode when algorithmic idea generation induces stylistic convergence on average training patterns, narrowing exploration, reducing divergent thinking and undermining confidence in human invention (Colombari & Manresa Reference Colombari and Manresa2024). Over time, teams may become dependent on machine suggestions and less willing to pursue unconventional directions. To counter this, teams keep people in control, use automation chiefly in divergent phases and avoid grading individuals by opaque metrics (Lin et al. Reference Lin, Chen, Chan, Peng, Chen, Xie, Liu and Hu2022). Alternating human-only and AI-assisted sessions and treating machine proposals as hypotheses to critique helps preserve human agency and critical reflection.
A second theme centers on bias, ethics and privacy. Models inherit patterns from their corpora, so skewed data can entrench under-representation and overlook accessibility or cultural nuance; if persona generation or concept ranking draws on narrow traces, proposals may favor dominant groups while appearing objective (Dennehy, Schmarzo & Sidaoui Reference Dennehy, Schmarzo and Sidaoui2022). Data privacy is integral: research using social media or sensors must honor consent, anonymization and data sovereignty; health projects need secure enclaves and explicit permissions, with privacy-by-design strategies such as synthetic or aggregated records (Zemke et al. Reference Zemke, Stahmann and Janiesch2025). Mitigation involves diverse teams, curated datasets and algorithmic transparency so recommendations can be interrogated and corrected.
The third theme involves opacity, explainability and regulatory ambiguity. Many systems are black boxes that cannot justify suggestions, which undermines trust and complicates accountability. Opacity and unsettled law undermine trust and compliance in finance, transport and health, and can slow or block deployment even when technical performance is strong. Design responses include explanation interfaces and audit trails, as well as involving ethicists and legal experts in co-design to plan accountability and anticipate regulatory requirements. Regulatory immaturity in high-stakes sectors compounds hesitation, prompting partnerships with policymakers, self-governance frameworks and sandboxes where novel solutions can be explored safely.
The fourth theme relates to skills gaps, data quality and over-reliance. Data quality and availability are persistent bottlenecks, and “garbage in, garbage out” can misdirect effort; practice therefore begins with data design – cleaning, governance and careful definition of decision variables. Skill gaps impede collaboration – many designers lack machine-learning training and many data scientists have limited exposure to user research – while machine outputs are often incomplete drafts, creating a centaur problem that demands revised division of labor and budget for validation and polishing (Gavriilidis et al. Reference Gavriilidis, Dimitriadis, Jaulent and Natsiavas2021). Over-reliance reduces resilience: if systems fail or generate confident errors, organizations that have surrendered craft capacity will struggle to improvise; even when analytics are narrowly correct, contextual meaning may be missing, arguing for fallback plans, human overrides and rationales tied to local values.
Finally, organizational and cultural conditions shape whether AI strengthens or weakens DT. Projects falter when leadership is solution-driven or fear-driven, pre-commits to one model, tolerates weak coordination or moves too slowly in multi-stakeholder settings. Trust-building requires human overrides and transparency about machine roles (Andone et al. Reference Andone, Vasiu, Bogdan, Mihaescu, Vert, Iovanovici, Ciupe and Dragan2022). The through line is hybrid intelligence that keeps empathy and context at the center, alternating human-only and AI-assisted sessions and treating machine proposals as hypotheses to critique (Polster et al. Reference Polster, Bilgram and Görtz2024). Safeguards include bias checks, curated datasets, privacy-by-design architectures and documentation tracing how inputs shaped decisions. Technical integration calls for data readiness, fit-for-purpose metrics and realistic budgets for validation and rework, while organizational capability depends on cross-training, aligned incentives and sustained stakeholder dialogue. When teams acknowledge these limits and mitigate them through education, ethical review, iterative calibration and continuous user engagement, AI can amplify rather than erode DT; with prudent governance and a sustained commitment to human-centered practice, risks become manageable and innovation gains substantial.
5. Conclusion
This qualitative review set out to synthesize how AI is being integrated into DT across canonical process models and application domains. Drawing on a corpus of peer-reviewed studies from 2005 to August 2025, we mapped AI interventions across six established DT frameworks: Stanford’s five-stage model, IDEO’s 3I process, the Double Diamond, IBM’s Loop, Google’s Design Sprint, and the more recent Stingray model, and across five practice fields: Art and Design, Education, Industry, Business and Management, Healthcare and Engineering Design. Read across these models and domains, AI appears less as a substitute for DT than as a catalyst within it. Computational tools increasingly scale empathy work by synthesizing large, heterogeneous traces of user experience, widening ideation spaces through generative alternatives, compressing the path from concept to prototype via automation and simulation, and strengthening testing by capturing and analyzing richer behavioral signals. These shifts redistribute effort rather than remove the human core: designers retain responsibility for contextual sense-making, creative direction and ethical discernment, while AI contributes speed, breadth and analytic acuity.
Our synthesis yields three distinct contributions to the emerging literature on AI and DT. First, we offer a cross-model, stage-by-stage map of how AI currently intervenes in canonical DT frameworks. Instead of treating DT as a single, generic process, the review demonstrates how Empathize, Define, Ideate, Prototype and Test are being reconfigured in practice and where they remain comparatively under-supported by AI. This integrative view moves beyond prior reviews of AI in innovation management and organizations, which typically do not differentiate between specific design stages or models.
Second, we introduce a three-phase temporal narrative of AI integration into DT, spanning pre-integration decision-support, data-driven augmentation and the current generative, hybrid paradigm. This timeline links technical developments in AI to characteristic patterns of use at different moments in the design process. For example, the shift from back-end optimization tools in Phase 1 to front-stage idea generation and prototyping support in Phase 3. By situating current generative AI tools within this longer trajectory, the review clarifies which observed patterns are likely to be transient (e.g., specific platforms or interfaces) and which stem from more structural properties of design work.
Third, the review provides a comparative, field-sensitive account of how AI-enabled DT unfolds in five domains. We show that AI is increasingly entering empathic research and assessment work in education, decision-support and simulation in healthcare and engineering design, data-driven experimentation and personalization in industry and business and style exploration and artefact generation in art and design. This field-by-field mapping reveals both cross-cutting concerns, such as governance, accountability and equity, as well as domain-specific blind spots. As a result, it provides a more nuanced foundation for future empirical and normative research than single-domain studies can offer.
Taken together, these contributions support a more precise understanding of hybrid intelligence in DT. Rather than framing AI simply as a threat to human-centered design, our findings indicate that the quality of outcomes depends on how division of labor, governance mechanisms and human oversight are configured. When AI is deployed to widen exploration, reveal patterns in complex evidence and stress-test options. While designers adjudicate meaning, values and consequences, teams can expand what they accomplish under real constraints without relinquishing responsibility or empathy. Conversely, poorly governed deployments risk opaque automation, homogenized solutions and diminished trust. The implications for practice and policy, therefore, hinge less on whether AI is used in DT than on how it is integrated into the stages, models and fields documented in this review.
Although specific tools, interfaces and model architectures will continue to change rapidly, the patterns we identify around division of labor, governance and hybrid human–AI collaboration follow from structural features of design work and are therefore likely to remain valid beyond current systems. By contrast, some fine-grained observations in this review, such as specific software platforms, interface affordances or regulatory thresholds, may prove transient as capabilities and policies evolve. Taken together, these contributions position this review as a field-level reference for locating, governing and extending AI-enabled DT across canonical models and practice domains.
6. Implications
The stage-by-stage, field-sensitive synthesis in this review carries implications for how designers, managers, educators and policymakers should position AI within DT practice. Across models and domains, the central design choice is not whether to use AI, but where in the process it should augment human work and under which governance conditions. We highlight implications for four major arenas of practice, followed by three cross-cutting requirements.
Generative systems can dramatically expand the search space and accelerate artefact development, but they also create pressures toward stylistic convergence and a muted authorial voice. Our review suggests treating AI outputs as raw material rather than finished solutions. In practical terms, this implies workflows in which designers iteratively refine prompts based on explicit aesthetic intent, systematically compare AI variants with hand-drawn or low-fidelity sketches, and document how algorithmic suggestions are accepted, transformed or rejected. Intellectual property and data governance practices should acknowledge the provenance of training data, so that originality is recognized as a curatorial and interpretive achievement, not an algorithmic default.
In education, AI-enabled tutors, feedback systems, and collaboration tools can tailor pace, scaffolding and assessment to individual learners. At the same time, our synthesis highlights risks of over-delegating judgment and amplifying inequities in access and digital literacy. Institutions should therefore specify when automation is appropriate (e.g., formative, low-stakes feedback), retain human oversight for high-consequence decisions (e.g., grading, progression, credentialing) and adopt privacy-preserving data policies. To prevent benefits from concentrating among already advantaged students, design teams need to attend to language support, accessibility and infrastructure in parallel with tool selection.
In commercial contexts, AI-enabled customer analytics and simulation technologies, such as digital twins and scenario models, can accelerate development cycles and enable highly personalized offerings. The studies reviewed indicate that these gains materialize when organizations invest in hybrid skill profiles that bridge design, data science and ethics, and when data-driven exploration is explicitly aligned with brand promises and regulatory constraints. Guardrails are needed against automation bias, opaque “shadow” datasets and metric fixation; without them, AI-supported DT can unintentionally erode accountability and stakeholder trust even as it improves local optimization.
In clinical, public health and engineering contexts, large-scale pattern discovery and workflow simulation can surface unmet needs, reveal system-level bottlenecks and anticipate the downstream effects of interventions. However, the reviewed studies underline that meaningful gains occur only when AI-enabled insights are embedded in participatory co-design with clinicians, engineers, patients and other stakeholders. Practical implications include treating co-design as a default rather than an add-on, making consent, transparency and responsibility explicit at each stage, and evaluating AI-supported solutions using mixed-methods approaches that combine behavioral and clinical outcomes with qualitative accounts of experience. Otherwise, there is a risk that optimization for proxy metrics will crowd out care, safety and professional judgment.
Three recurrent implications cut across fields and models. First, domain experts who are designers, clinicians, educators, engineers and policymakers should lead on goals, constraints and evaluation criteria, with AI framed as an instrument that supports, rather than replaces, their judgment. Second, change management and targeted training are prerequisites for adoption: teams need support to understand both the capabilities and the limits of AI, including failure modes, bias risks and appropriate escalation paths. Third, effective and legitimate use of AI in DT requires sustained multidisciplinary collaboration spanning design, engineering, data governance, law and ethics. Only when these communities co-shape workflows, documentation standards and accountability mechanisms can organizations realize the productivity and learning benefits of AI while preserving, and in some cases strengthening, the human-centered commitments at the heart of DT.
7. Limitation
Several limitations qualify these claims. First, the corpus was bounded by language and indexation – English-language publications primarily indexed in WoS (2005–August 2025) – which risks coverage bias; contributions in other languages, regional outlets or grey literature may be absent, muting culturally specific practices and policy contexts. Second, the topic is in flux: AI capabilities and their design applications evolve quickly while publication pipelines lag, making synthesis time-bounded; recent advances or early cases may lie outside the window, so conclusions may attenuate as tools, governance and routines change. Third, the analysis relied on secondary sources rather than primary fieldwork, creating dependencies on others’ operationalization, measurement quality and reporting, and limiting access to tacit practices; heterogeneity in designs and outcome metrics constrains comparability. These limits are scope conditions: findings apply to contexts resembling the sampled literature. Recommendations are provisional; future research should expand multilingual coverage, triangulate databases, gather primary data and adopt living-review updates. The review is constrained by database scope (Web of Science Core Collection as the sole formal source of records; Google Scholar was used only in an exploratory piloting phase to refine search strings), language (English) and the reliance on secondary literature; relevant work indexed elsewhere or published in other languages may have been missed. Time-bounded searching risks obsolescence in a fast-moving AI landscape and the quality and topical alignment of included studies may vary, particularly given the mix of journal articles, review papers and a limited number of conference proceedings. Primary data were not collected, so practice-level nuance may be underrepresented. These boundaries temper generalizability and should be considered when interpreting findings, even though adherence to the SPAR-4-SLR protocol helped us reduce avoidable procedural biases.
8. Future research direction
Future research should replace proofs-of-concept with rigorous tests of hybrid human–AI workflows across Empathize, Define, Ideate, Prototype and Test. Priorities: multi-site field experiments comparing human-only, AI-assisted and AI-led variants; ablation studies on prompt strategies and data quality and longitudinal tracking of learning, creativity plateaus and AI-induced fixation. Metrics must extend beyond usability to novelty, feasibility, equity and impact, with reporting of AI’s role, datasets and safeguards. External validity needs programs in education (higher-order design competencies, self-efficacy/anxiety, real-time LLM assessment without eroding collaboration), healthcare (need-driven co-design with digital twins and human-in-the-loop evaluation for trust, safety, workflow fit) and industry (portfolio tests of AI-augmented sprints’ effects on time-to-evidence and dominant designs, auditing personalization for bias/exclusion). Governance/method requires benchmark tasks (standardized empathic corpora, ideation diversity suites, prototyping simulators), open replication packages and checklists on data lineage, audit trails and explainability, plus comparative work on intellectual property, data sovereignty and accountability, with independent evaluators and audits. Finally, specify hybrid skill profiles (data literacy, prompt design, bias auditing and interpretability) and evaluate curricula and no-code tools that build these capacities without crowding out empathy and judgment.
Acknowledgments
The authors have no acknowledgments to declare.
This research received no external funding, and no individuals beyond the author team contributed to the methods, analysis or writing of this manuscript.