The challenges of openness, transparency and impact in a data-driven world
Data science is a broad, interdisciplinary field being that in the UK is being shaped by the activities of the Turing Institute. It encompasses: the technologies that are enabling the capture of an ever-greater volume of data; the techniques (including machine learning and other artificial intelligence applications) that can be used to analyse data and deploy insights for useful purposes; the challenges involved with data at unprecedented scale and from heterogeneous sources; and the ethical considerations of a more data-driven world.
Two newly-launched OA journals from Cambridge University Press – Data & Policy and Data-Centric Engineering are both exploring the transformative impact of data science in different domains. Both journals are open for submissions (see the Call for Papers for DCE and D&P). As publishing experiments they are trying to address similar challenges, outlined below.
Enabling OA for all authors
We’ve engaged key partnering organisations for both journals – important for helping shape these emerging areas, and for supporting authors to publish on an OA basis.
Data & Policy is a collaboration with the Data for Policy conference (dataforpolicy.org), which has successfully networked researchers, policy-makers and others interested in the implications of data-driven technology for policy, governance and the public sector. The conference and journal are partnered by UCL, the Alan Turing Institute and the UK Office for National Statistics (ONS). All three organisations have an interest in the way in which data science can advance the public interest: the ONS, for example, has a Data Science Campus in Newport that investigates the use of new data sources for public good.
Data-Centric Engineering is supported by the Lloyd’s Register Foundation (LRF): a global charity that’s been at the forefront of the potential of new data-driven technology for making engineering systems safer, more secure and resilient. See the LRF’s Foresight Review of Big Data. The LRF is also a partner of the Turing Institute’s data-centric engineering programme.
As a consequence of these partnerships, we are able to remove the financial barriers to publishing as well as the financial barriers to reading. Authors with access to grant and institutional funds are asked to pay an article processing charge (APC) if their article is accepted for publication, but those who do not have access to these kind funds are not required to pay.
Nuances of data access and transparency
We’re interested in exploring the potential of other trends towards openness and collaboration in these new areas of research, including open data and code. But this also involves recognising that in areas of research collecting data at scale and from diverse sources, there can be valid ethical, security and commercial reasons why data cannot be made openly available.
The Open Data Institute and the Lloyd’s Register Foundation recently launched a manifesto for sharing engineering data, which has been signed by a number of organisations including Cambridge University Press. This advances the notion of data “as infrastructure that powers our societies and economies” and calls for engineering data to be as open as possible. But it also recognises that data needs to be stewarded – managed as an asset in society’s interest – and that legal and ethical frameworks are needed to minimise any harmful impacts that data use can bring.
There have been a number of initiatives to power research collaborations through improved access to public and private sector data, whilst recognising the need for safeguards. Networks such as Administrative Data UK have been important for making public sector data available for research purposes, whilst putting in place measures to de-identify data where needed. The concept of data trusts have also been an important development for improving access to public and private sector data for new technologies, whilst recognising the need to retain public trust through clear governance structures and oversight.
For the new journals we’ve launched, transparency is perhaps the most important concept. Three ways in which we are asking authors to think about, and disclose, essential information about their submitted papers are below.
Data availability statements
In line with the Transparency and Openness Promotion (TOP) Guidelines, we’re asking all authors submitting articles to Data & Policy and Data-Centric Engineering to provide a Data Availability Statement with their study on submission. As a general principle authors should make available the resources (data, code and other materials) to allow others to understand, verify and replicate findings. These resources should be placed in a trusted repository and cited in the statement. But in the event that data cannot be made available for valid ethical, security or privacy reasons this should be explained in the statement.
Contributor roles
On submission, authors are required to outline the kinds of contributions that authors (and non-authors) have made to the study, using the CRediT taxonomy as a framework. The goal of the taxonomy is to bring greater transparency to the contributions that individuals have made to a work – including giving credit to roles that are not traditionally recognised in terms of authorship, such as funding acquisition, data visualisation and validation, and software programming and development.
Impact statements
One of the goals of open research is of course to reach new audiences. Both Data & Policy and Data-Centric Engineering are actively trying to cultivate readers and authors outside of academic institutions: in policy, civil society and industry. In order to ‘translate’ the significance of their research to a wider audience, we’re asking authors to provide an impact statement (DCE) and policy significance statement (D&P) that succinctly summarises the article’s findings.
Visit the information pages of Data-Centric Engineering and Data & Policy for more details about the journals – or contact dce@cambridge.org and dataandpolicy@cambridge.org.