Hostname: page-component-77f85d65b8-zzw9c Total loading time: 0 Render date: 2026-03-29T21:58:07.747Z Has data issue: false hasContentIssue false

Automated citation searching in systematic review production: A simulation study

Published online by Cambridge University Press:  07 March 2025

Darren Rajit
Affiliation:
Monash Centre for Health Research and Implementation, Faculty of Medicine, Nursing, and Health Sciences, Monash University, Clayton, Victoria, Australia
Lan Du
Affiliation:
Department of Data Science and AI, Faculty of Information Technology, Monash University, Clayton, Victoria, Australia
Helena Teede
Affiliation:
Monash Centre for Health Research and Implementation, Faculty of Medicine, Nursing, and Health Sciences, Monash University, Clayton, Victoria, Australia Monash Partners Academic Health Sciences Centre, Clayton, Victoria, Australia
Joanne Enticott*
Affiliation:
Monash Centre for Health Research and Implementation, Faculty of Medicine, Nursing, and Health Sciences, Monash University, Clayton, Victoria, Australia Monash Partners Academic Health Sciences Centre, Clayton, Victoria, Australia
*
Corresponding author: Joanne Enticott; Email: joanne.enticott@monash.edu
Rights & Permissions [Opens in a new window]

Abstract

Bibliographic aggregators like OpenAlex and Semantic Scholar offer scope for automated citation searching within systematic review production, promising increased efficiency. This study aimed to evaluate the performance of automated citation searching compared to standard search strategies and examine factors that influence performance. Automated citation searching was simulated on 27 systematic reviews across the OpenAlex and Semantic Scholar databases, across three study areas (health, environmental management and social policy). Performance, measured by recall (proportion of relevant articles identified), precision (proportion of relevant articles identified from all articles identified), and F1–F3 scores (weighted average of recall and precision), was compared to the performance of search strategies originally employed by each systematic review. The associations between systematic review study area, number of included articles, number of seed articles, seed article type, study type inclusion criteria, API choice, and performance was analyzed. Automated citation searching outperformed the reference standard in terms of precision (p < 0.05) and F1 score (p < 0.05) but failed to outperform in terms of recall (p < 0.05) and F3 score (p < 0.05). Study area influenced the performance of automated citation searching, with performance being higher within the field of environmental management compared to social policy. Automated citation searching is best used as a supplementary search strategy in systematic review production where recall is more important that precision, due to inferior recall and F3 score. However, observed outperformance in terms of F1 score and precision suggests that automated citation searching could be helpful in contexts where precision is as important as recall.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of The Society for Research Synthesis Methodology
Figure 0

Figure 1 Framework depicting high level methodology of the simulation study. Adapted from protocol (11).

Figure 1

Table 1 Inclusion and exclusion criteria for sample systematic reviews included in study11

Figure 2

Table 2 Performance measures employed in study (recall, precision, and F score)

Figure 3

Figure 2 Schematic depicting the automated citation searching simulation process.

Figure 4

Table 3 Median number of included articles (IQR) and average intracluster semantic similarity (±SD) for systematic reviews in each source database, and all reviews in the dataset

Figure 5

Table 4 Summary baseline characteristics of seed articles successfully retrieved from the OpenAlex and semantic scholar APIs

Figure 6

Table 5 Median (IQR) precision, F1 score, F2 score, and F3 score for all search strategies employed by the systematic reviews in the dataset

Figure 7

Figure 3 (A-D) Comparison of Automated Citation Searching Performance (Best Performing Run) vs Search Strategies employed by Sample Systematic Review, by Precision, F1 Score, F2 Score and F3 score. Observations above dotted line indicates out-peformance of automated method vs reference standard.

Figure 8

Table 6 Performance (precision, F1 score, F2 score, and F3 score) of automated citation searching vs reference systematic review search strategies

Figure 9

Table 7 Median (IQR) recall, precision, F1 score, F2 score, and F3 score of the best performing automated citation searching runs, by systematic review subsets

Figure 10

Table 8 Median % (IQR) of included articles with Valid IDs extracted from systematic reviews in dataset, and baseline retrievability rate of included articles across both APIs: (OpenAlex, Semantic Scholar)

Figure 11

Figure 4 Recall of automated citation searching for each systematic review against various level of recall (A), and against the baseline retrievability rate of included articles of each systematic review (B).

Supplementary material: File

Rajit et al. supplementary material

Rajit et al. supplementary material
Download Rajit et al. supplementary material(File)
File 306.8 KB