Hostname: page-component-77f85d65b8-zzw9c Total loading time: 0 Render date: 2026-03-29T17:35:41.849Z Has data issue: false hasContentIssue false

Governing synthetic data in the financial sector

Published online by Cambridge University Press:  10 October 2025

Taylor Spears
Affiliation:
University of Edinburgh Business School, Edinburgh, UK
Kristian Bondo Hansen*
Affiliation:
Copenhagen Business School, Copenhagen, Denmark
Ruowen Xu
Affiliation:
Warwick Business School, University of Warwick, Coventry, UK
Yuval Millo
Affiliation:
Warwick Business School, University of Warwick, Coventry, UK
*
Corresponding author: Kristian Bondo Hansen; Email: kbh.msc@cbs.dk
Rights & Permissions [Opens in a new window]

Abstract

Synthetic datasets, artificially generated to mimic real-world data while maintaining anonymization, have emerged as a promising technology in the financial sector, attracting support from regulators and market participants as a solution to data privacy and scarcity challenges limiting machine learning (ML) deployment. This article argues that synthetic data’s effects on financial markets depend critically on how these technologies are embedded within existing ML infrastructural ‘stacks’ rather than on their intrinsic properties. We identify three key tensions that will determine whether adoption proves beneficial or harmful: (1) data circulability versus opacity, particularly the ‘double opacity’ problem arising from stacked ML systems, (2) model-induced scattering versus model-induced herding in market participant behavior, and (3) flattening versus deepening of data platform power. These tensions directly correspond to core regulatory priorities around model risk management, systemic risk, and competition policy. Using financial audit as a case study, we demonstrate how these tensions interact in practice and propose governance frameworks, including a synthetic data labeling regime to preserve contextual information when datasets cross organizational boundaries.

Information

Type
Policy focus
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of the Finance and Society Network
Figure 0

Figure 1. Illustration of the machine learning stack.

Figure 1

Figure 2. Correspondence between synthetic data generation tensions and key regulatory priorities.