Hostname: page-component-5db58dd55d-mhzq2 Total loading time: 0 Render date: 2026-05-31T09:55:05.775Z Has data issue: false hasContentIssue false

Beyond automation: toward a hybrid human-AI architecture for scalable, context-aware, and sustainable global data benchmarking

Published online by Cambridge University Press:  29 May 2026

Silvana Fumega*
Affiliation:
Global Data Barometer
Feng Gao
Affiliation:
Global Data Barometer
*
Corresponding author: Silvana Fumega; Email: silfumega@gmail.com; silvana@globaldatabarometer.org

Abstract

This article explores the integration of large language models (LLMs) and AI research agents into global benchmarking frameworks, with a focus on data for the public good. Against a backdrop of shrinking funding and rising demand for scalable and reproducible assessments, we ask whether AI can assume core roles in indicator development, evidence discovery, and policy evaluation without compromising contextual nuance or democratic legitimacy. Building on pilot experiments conducted within the Global Data Barometer (GDB), we employed a phased, adaptive methodology that tested workflow-based platforms and deep research agents across tasks ranging from legal interpretation to multisource policy analysis. The preliminary findings suggest that while AI systems show strong potential for automating structured assessments, they falter on complex, fragmented, or normatively loaded indicators, raising concerns about opacity, overinterpretation, and inclusivity. To navigate these tensions, we propose a hybrid human-AI architecture that combines standardized workflows, adaptive agent capabilities, and critical human oversight. Central to this model is the concept of a dynamic evidence infrastructure, designed to embed participatory validation and enhance transparency. By reframing automation as augmentation, the study contributes both an empirical, domain-specific assessment of the opportunities and limits of AI-assisted benchmarking and a theoretical framework for sustainable, context-aware evaluation in the age of AI. We argue that the success of AI-assisted benchmarking should be measured not only in efficiency gains but also in its ability to strengthen legitimacy, accountability, and inclusiveness in data ecosystems worldwide.

Information

Type
Commentary
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2026. Published by Cambridge University Press
Figure 0

Table 1. Selection of indicators for experimental testingTable 1. long description.

Figure 1

Table 2. Tools and configuration by research phaseTable 2. long description.

Figure 2

Table 3. Example prompts used during Phase 3Table 3. long description.

Figure 3

Table 4. Summary of findings on AI feasibility, quality, and implications for democratic legitimacyTable 4. long description.

Figure 4

Table 5. AI versus human answer alignment for DPL indicator by question typeTable 5. long description.

Figure 5

Table 6. Case studies of AI performance in DPL analysisTable 6. long description.

Submit a response

Comments

No Comments have been published for this article.