Hostname: page-component-89b8bd64d-9prln Total loading time: 0 Render date: 2026-05-06T19:58:32.890Z Has data issue: false hasContentIssue false

Generative artificial intelligence use in evidence synthesis: A systematic review

Published online by Cambridge University Press:  24 April 2025

Justin Clark*
Affiliation:
Institute for Evidence-Based Healthcare, Bond University, Gold Coast, QLD, Australia
Belinda Barton
Affiliation:
Bond Business School, Bond University, Gold Coast, QLD, Australia
Loai Albarqouni
Affiliation:
Institute for Evidence-Based Healthcare, Bond University, Gold Coast, QLD, Australia
Oyungerel Byambasuren
Affiliation:
Institute for Evidence-Based Healthcare, Bond University, Gold Coast, QLD, Australia
Tanisha Jowsey
Affiliation:
Faculty of Health Sciences and Medicine, Bond University, Gold Coast, QLD, Australia
Justin Keogh
Affiliation:
Faculty of Health Sciences and Medicine, Bond University, Gold Coast, QLD, Australia
Tian Liang
Affiliation:
Institute for Evidence-Based Healthcare, Bond University, Gold Coast, QLD, Australia
Christian Moro
Affiliation:
Faculty of Health Sciences and Medicine, Bond University, Gold Coast, QLD, Australia
Hayley O’Neill
Affiliation:
Faculty of Health Sciences and Medicine, Bond University, Gold Coast, QLD, Australia
Mark Jones
Affiliation:
Institute for Evidence-Based Healthcare, Bond University, Gold Coast, QLD, Australia
*
Corresponding author: Justin Clark; Email: jclark@bond.edu.au
Rights & Permissions [Opens in a new window]

Abstract

Introduction

With the increasing accessibility of tools such as ChatGPT, Copilot, DeepSeek, Dall-E, and Gemini, generative artificial intelligence (GenAI) has been poised as a potential, research timesaving tool, especially for synthesising evidence. Our objective was to determine whether GenAI can assist with evidence synthesis by assessing its performance using its accuracy, error rates, and time savings compared to the traditional expert-driven approach.

Methods

To systematically review the evidence, we searched five databases on 17 January 2025, synthesised outcomes reporting on the accuracy, error rates, or time taken, and appraised the risk-of-bias using a modified version of QUADAS-2.

Results

We identified 3,071 unique records, 19 of which were included in our review. Most studies had a high or unclear risk-of-bias in Domain 1A: review selection, Domain 2A: GenAI conduct, and Domain 1B: applicability of results. When used for (1) searching GenAI missed 68% to 96% (median = 91%) of studies, (2) screening made incorrect inclusion decisions ranging from 0% to 29% (median = 10%); and incorrect exclusion decisions ranging from 1% to 83% (median = 28%), (3) incorrect data extractions ranging from 4% to 31% (median = 14%), (4) incorrect risk-of-bias assessments ranging from 10% to 56% (median = 27%).

Conclusion

Our review shows that the current evidence does not support GenAI use in evidence synthesis without human involvement or oversight. However, for most tasks other than searching, GenAI may have a role in assisting humans with evidence synthesis.

Information

Type
Review
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of The Society for Research Synthesis Methodology
Figure 0

Figure 1 PRISMA 2020 flow diagram of study inclusion.

Figure 1

Table 1 Characteristics of included studies

Figure 2

Figure 2 Individual study risk of bias.

Figure 3

Figure 3 Domain summary of the risk of bias.

Figure 4

Table 2 Outcomes stratified by evidence synthesis task

Supplementary material: File

Clark et al. supplementary material

Clark et al. supplementary material
Download Clark et al. supplementary material(File)
File 56.2 KB