A systematic literature review (SLR) on the adoption of artificial intelligence-assisted SLRS: implications for health technology assessments

Seye Abogunrin; Yifei Liu; Clarissa Higuchi Zerbini

doi:10.1017/S0266462326103535

A systematic literature review (SLR) on the adoption of artificial intelligence-assisted SLRS: implications for health technology assessments

Published online by Cambridge University Press: 16 February 2026

Seye Abogunrin

Yifei Liu and

Clarissa Higuchi Zerbini

Show author details

Seye Abogunrin*: Affiliation:
F Hoffmann-La Roche Ltd, Switzerland
Yifei Liu: Affiliation:
F Hoffmann-La Roche Ltd, Switzerland Department of Health Policy, London School of Economics and Political Science, UK
Clarissa Higuchi Zerbini: Affiliation:
F Hoffmann-La Roche Ltd, Switzerland
*: Corresponding author: Seye Abogunrin; Email: seye.abogunrin@roche.com

Article contents

Abstract
Objectives
Methods
Results
Conclusions
Introduction
Methods
Results
Discussion
Conclusions
Funding statement
Competing interests
References

Rights & Permissions

Abstract

Objectives

Systematic literature reviews (SLRs) are essential for evidence synthesis in healthcare decision making, including health technology assessment (HTA), but their time and resource demands are substantial. Artificial intelligence (AI) may enhance efficiency of conducting SLRs, but its acceptance by HTA bodies remains underexplored. This SLR quantifies published health-related SLRs reporting AI use, identifies AI tools used at each SLR stage, and evaluates HTA guidance on AI in evidence synthesis.

Methods

We searched Embase, Medline, and the Cochrane Library (up to 9 September 2025), supplemented by hand searches and reviews of HTA agency websites. Titles and abstracts were screened in Rayyan by a single reviewer, with full-text review confirming eligibility. Data were extracted and synthesized narratively along key themes.

Results

In total, 112 studies covering 111 unique SLRs were identified, reporting 134 implementations of 45 unique AI tools (29 publicly available; 16 custom-built). AI use has risen since 2013 and was most frequently applied during title and abstract screening (88 of the 134 implementations). Human oversight remained essential, with no fully autonomous AI reported. Three HTA agencies (CDA-AMC, IQWiG, NICE), EUnetHTA, JBI and Cochrane have provided guidance, indicating the formal integration of AI into HTA processes.

Conclusions

This SLR provides a quantitative overview of AI use in health-related SLRs and current HTA guidance. These findings may inform development of clearer methodological recommendations and support integration of AI-assisted evidence synthesis in HTA submissions. Further research and policy development are needed to optimize its role in evidence synthesis and healthcare decision making.

Keywords

artificial intelligence machine learning systematic review health technology assessment healthcare decision making

Information

Type: Assessment
Information: International Journal of Technology Assessment in Health Care , Volume 42 , Issue 1 , 2026 , e29

DOI: https://doi.org/10.1017/S0266462326103535 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2026. Published by Cambridge University Press

Introduction

Health technology assessment (HTA) is a multidisciplinary process that evaluates the safety, effectiveness, and cost-effectiveness of health technologies (Reference Banta1). By considering clinical, economic, social, ethical, and organizational aspects, HTAs inform decisions on adopting, reimbursing, or implementing new and existing technologies (Reference Banta1). Systematic literature reviews (SLRs) are the gold standard for evidence synthesis in HTA and inform dossiers prepared by health technology developers for review by HTA bodies as part of their decision-making process (Reference Munro and Broadhurst2). Their transparent, reproducible methodology ensures comprehensive evaluation of all relevant literature for a given indication (Reference Higgins3), and they serve as a precursor to meta-analyses and indirect comparisons (Reference Es-Skali and Spoors4).

SLRs involve several complex stages, including defining research questions and search strategies, developing a protocol, screening studies, extracting data, assessing risk of bias, and synthesizing findings (Reference Higgins3). They require substantial time, budget, and human resources, and typically take over 67 weeks to complete (Reference Borah, Brown, Capers and Kaiser5). The lengthy process can render findings outdated by the time of publication (Reference Shojania, Sampson, Ansari, Ji, Doucette and Moher6) and delay decision-making and patient access to treatments. HTA bodies require that SLR searches are performed within 3–6 months of dossier submission, adding further logistical challenges for researchers (Reference Munro and Broadhurst2;7). SLRs also demand significant financial resources, requiring expert teams, access to specialized databases and software, and periodic updates to maintain relevance (Reference Borah, Brown, Capers and Kaiser5;Reference Shojania, Sampson, Ansari, Ji, Doucette and Moher6;Reference Moher, Tsertsvadze and Tricco8). Consequently, a single SLR can cost up to US$141,194.80 (Reference Michelson and Reuter9). Despite these challenges, methodological rigor remains critical for minimizing bias and supporting evidence-based HTA decisions (Reference Higgins3).

Artificial intelligence (AI), the simulation of human intelligence for tasks that involve learning, reasoning, and prediction (Reference Xu, Liu and Cao10), has the potential to alleviate the burden of repetitive tasks while preserving human oversight (Reference de la Torre-López, Ramírez and Romero11). Researchers have explored the use of AI to streamline and expedite various steps of the SLR process (Reference Cohen, Hersh, Peterson and Yen12). Advances in natural language processing (NLP) and machine learning (ML) have enhanced the feasibility of integrating AI into SLRs and evidence synthesis (Reference Affengruber, van der Maten and Spiero13). NLP, including large language models (LLMs) such as ChatGPT, enables machines to interpret and generate human language, and ML allows systems to learn from data (Reference Pillay, Topcu and Yenice14). These technologies could enable faster, more efficient evidence synthesis to support timely decision making (Reference Affengruber, van der Maten and Spiero13).

Understanding how HTA bodies view AI-assisted evidence synthesis for HTA submissions is crucial for developing recommendations for its use. Although some agencies have issued recommendations (15–18), many have not defined their position on the adoption of AI technologies in the evidence-synthesis process. Slow adoption of AI by HTA bodies could hinder the exploration and implementation of more dynamic HTA approaches, such as living HTAs and living reviews, which could also be assisted by AI. Furthermore, the increasing workload due to the new EU HTA Regulation (Regulation [EU] 2021/2282), which introduces joint clinical assessments across EU member states and requires consideration of multiple national PICOs within a single assessment, could be better managed using AI (7;Reference Desmet, Brijs, Vanderdonck, Tops, Simoens and Huys19;Reference Schuster20).

This study aimed to (i) quantify published health-related SLRs reporting AI use; (ii) identify AI tools used at each SLR stage and their functionalities; and (iii) determine if HTA agencies accept or recommend the use of AI tools. By addressing these questions, the study seeks to clarify the extent of AI adoption in evidence synthesis and its implications for HTA decision making; it does not examine specific AI methodologies used in SLRs (e.g., ML, deep learning, neural networks) because these have been covered in previous research (Reference Blaizot, Veettil and Saidoung21).

Methods

To identify AI-supported SLRs within the field of health-related research, an SLR was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines (Reference Page, McKenzie and Bossuyt22). This included structured searches across multiple databases, narrative synthesis, and risk-of-bias assessment using the Risk Of Bias In Systematic Reviews (ROBIS) tool. The protocol for this SLR was preregistered with the Open Science Framework (protocol identifier: osf.io/6tavf).

Information sources and search strategy

Three databases were searched to identify relevant AI-supported SLRs: Embase (via Ovid), Medline (via Ovid), and the Cochrane Library (Cochrane Reviews), covering all records from database inception to 25 June 2024, with updated searches conducted on 9 September 2025. To ensure comprehensive coverage, the bibliography lists of included studies were manually reviewed between 17 June and 5 July 2024, and between 3 October and 7 October 2025, to identify potentially eligible articles that may have been missed by the database searches.

The search strategy incorporated key terms related to AI and systematic reviews. The primary keyword combination included (“artificial intelligence” OR “machine learning” OR “deep learning” OR “automation” OR “text mining” OR “natural language processing” OR “large language model” OR “generative AI”) AND “systematic review.” Additionally, the “artificial intelligence” query string was combined with the names of AI tools specifically developed to assist with SLRs, identified through prior knowledge and online searches (Supplementary Tables 1 and 2).

Only articles involving human participants and review articles were included, with the search limited to English-language publications.

Supplementary searches

To identify guidelines from HTA agencies on the use of AI tools for SLRs, the methodological guidelines or guidance sections of HTA agency websites were systematically searched between 17 June and 5 July 2024, and between 3 October and 8 October 2025. This included all fifty-three members of the International Network of Agencies for Health Technology Assessment (INAHTA), with the full member list available at https://www.inahta.org/members/members_list/. The website of the European Network for Health Technology Assessment (EUnetHTA), which is not a member of INAHTA, was reviewed to capture any relevant guidance.

Broader methodological frameworks commonly referenced by HTA bodies were also considered. These included the Cochrane Handbook (Reference Higgins, Thomas and Chandler23), Centre for Reviews and Dissemination guidance (24), and the Joanna Briggs Institute Manual for Evidence Synthesis (Reference Aromataris, Lockwood, Porritt, Pilla and Jordan25).

Eligibility criteria

Owing to the nature of the research questions, the eligibility criteria based on the PICO framework (Population, Intervention, Comparator, Outcomes) were not suitable for this analysis. Instead, a set of concept-driven eligibility criteria relevant to the study objectives were applied to ensure a comprehensive yet focused selection of relevant studies.

Eligible articles included SLRs or reviews that employed a systematic approach, focused on human health and medical research, and reported the use of AI tools to assist in the literature review process. Studies were excluded if they only mentioned the use of reference management or screening platforms, such as Covidence or Rayyan, without confirming the use of their AI or ML functionalities. Conversely, studies using software inherently designed for AI-based active learning (e.g., ASReview) were included because its core screening functionality relies on ML algorithms. Conference abstracts, review protocols, and preprint articles were excluded to maintain a focus on fully peer-reviewed published studies.

Overview of the screening process

Articles from database searches were first deduplicated in EndNote, then further deduplicated in Rayyan. Screening was conducted in Rayyan, beginning with title and abstract screening based on the prespecified eligibility criteria. Full texts of the studies that met the inclusion criteria were reviewed to confirm eligibility before data extraction. Screening, full-text review, and data extraction were conducted collaboratively by multiple reviewers.

Data extraction

The following data were extracted in an Excel sheet using Rayyan: authors, article title, year of publication, type of author affiliations, type of review, topic of review, name of the AI tools used, functionalities of the AI tools, stages of review for which the AI tool was employed, whether human reviewers were involved, and the advantages and disadvantages of using the tool.

Tools designed for automating meta-analyses were beyond the scope of this review and, therefore, were not included. In cases when the functionalities and features of an AI tool were unclear, the tool’s official website was consulted for clarification. A narrative (descriptive) synthesis was conducted along key themes identified from extracted information such as reported AI tools, functionalities, and stages of use.

Risk-of-bias assessment

The ROBIS tool was used to assess the risk of bias in the included reviews (Supplement A). This tool assesses bias across four domains: study eligibility criteria, identification and selection of studies, data collection and study appraisal, and synthesis and findings. Additionally, ROBIS provides an overall assessment of the risk of bias and limitations based on these domains (Reference Whiting, Savovic and Higgins26). A high risk of bias indicates lower quality, which could be due to how the review was designed or reported.

Results

Overview of SLR results

In total, 14,226 records were retrieved from electronic literature database searches. After removing 2,354 duplicates, 11,872 records remained for title and abstract screening. The first round of screening excluded 11,178 records, leaving 694 full texts for review. Of these, 103 articles met the eligibility criteria, and an additional 9 eligible articles were identified through the bibliography searches, resulting in a total of 112 articles (Supplement B). These 112 articles represent 111 unique SLRs, as 2 articles reported different outcomes from the same SLR; information on AI tool use was extracted from only one of these. The PRISMA flowchart is shown in Figure 1.

Figure 1.

PRISMA flow diagram showing the results of the literature search.

Health-related SLRs reporting on the use of AI tools

Among the 111 included studies (112 articles; Table 1), the majority were SLRs with narrative synthesis only (58 articles), followed by SLRs with meta-analysis (32 articles referring to 31 studies), scoping reviews (11 articles), rapid reviews (2 articles), umbrella reviews (2 articles), evidence gap maps (2 articles), an integrative review, a living review, a living review with meta-analysis, a rapid evidence mapping, and a systematic evidence mapping. The most common research areas covered by these reviews were neurology (15 articles), mental health (14 articles), health services research (14 articles referring to 13 studies), public health (10 articles), oncology (8 articles), cardiovascular health (8 articles), immunology (4 articles), and lifestyle research (4 articles).

Table 1.

Characteristics of included studies

^a The articles identified in the systematic literature review are listed in Supplement B.

^b Custom tools.

AI, artificial intelligence; ML, machine learning; NLP, natural language processing; RCT, randomized controlled trial.

The earliest reviews reporting the use of AI tools were published in 2013, with an increasing number of AI-assisted reviews identified over time (Figure 2). Among the 112 included articles (111 unique SLRs), 91.1 percent were published since 2020. For 2025, 38 AI-assisted SLRs had been published as of September 2025. Of the 112 articles, 103 were authored solely by researchers from academic institutions, 7 involved collaborations between academia and government organizations, and 2 were collaborations between academia and the pharmaceutical industry (Table 1).

Figure 2.

Number of health-related reviews reporting on the use of AI over the years.

Although the findings from the included reviews were not synthesized in the current study, the ROBIS tool was used to assess the quality of AI-assisted reviews. Among the 111 included studies, 51 (45.9 percent) had a high risk of bias, 51 studies (45.9 percent) had a low risk of bias, and 9 studies (8.1 percent) had an unclear risk of bias (Supplementary Table 3).

Adoption of AI in SLRs and usage trends

In total, 134 implementations of AI were identified across the 111 included reviews. Within these, forty-five distinct AI tools were used, comprising twenty-nine publicly available software-based tools and sixteen custom tools developed by the article authors.

AI tools were used most frequently for title and abstract screening (eighty-eight instances, including two studies in which the screening stage was unspecified), followed by data extraction (twenty-one instances), search strategy development (ten instances), risk-of-bias assessment (five instances), full-text review (four instances), data synthesis (three instances), and supplementary searches (three instances) (Figure 3).

Figure 3.

Use of AI software across the different stages of an SLR.

The most frequently used AI software was ASReview (twenty-four reviews), followed by the ML functionality in Rayyan (eighteen reviews), Autolit (Nested Knowledge; eight reviews) and EPPI-Reviewer (five reviews), all of which were used for title and abstract screening (EPPI-Reviewer was also used for data extraction and AutoLit was also used for data extraction and search strategy development). In addition to tools specifically designed for SLRs, OpenAI’s generative AI models (such as ChatGPT and other GPT-based systems) were used for multiple tasks in eight separate reviews, including search strategy generation, title and abstract screening, data extraction, risk-of-bias assessment, data synthesis, and report writing. The remaining AI software tools were each used in five or fewer studies (Figure 4). Although some authors who used custom tools shared their code, many of these tools are likely to be inaccessible to the public.

Figure 4.

AI software used in published health-related reviews.

Search strategy development

AI tools were used for search strategy development in twelve instances. Three studies applied LLMs (two using ChatGPT and one using BioMedGPT-LM-7B) and five studies employed platform-based tools such as Elicit, Nested Knowledge, MySLR, and scite.ai/consensus.ai to identify relevant concepts and expand terminology. The remaining three studies used custom NLP- or ML-based tools, including an automated search strategy developed as part of the Human Behaviour Change Project.

Title/abstract screening

Title and abstract screening was the most common stage for use of AI tools, with eighty-eight instances, including two studies in which the screening stage (title/abstract or full text) was unspecified. Various software-based tools were used for screening, with ASReview being the most frequently used (twenty-four instances). The primary AI feature applied during title and abstract screening was active learning–based relevance prediction, in which the algorithm continuously reprioritized records based on reviewer feedback to optimize efficiency and accuracy. Similarly, most custom tools were developed for screening. The main features of these tools included relevance prediction and topic modeling. Some custom tools were also designed to calculate scores based on metrics, such as the presence and frequency of keywords, to determine study relevance.

Full-text screening

AI-assisted full-text screening was identified in four instances. In total, three software-based text analysis tools – Nvivo, Wordstat, and QDA Miner – were used to search for keywords and synonyms indicating relevance to the research question. In one review, the authors developed a custom article segmentation tool that extracted the methods and results sections from full-text articles. A syntactic parsing tool was then applied to identify studies containing predefined keywords, enabling full automation of citation screening in that study.